Tag: GPU utilization

Learn how to choose optimal batch sizes for LLM serving to cut cost per token by up to 87%. Discover real-world results, batching types, hardware trade-offs, and proven techniques to reduce AI infrastructure costs.

Recent-posts

Visualization Techniques for Large Language Model Evaluation Results

Visualization Techniques for Large Language Model Evaluation Results

Dec, 24 2025

Compression-Aware Prompting: Getting the Best from Small LLMs

Compression-Aware Prompting: Getting the Best from Small LLMs

Jun, 7 2026

Token Probability Calibration in Large Language Models: How to Fix Overconfidence in AI Responses

Token Probability Calibration in Large Language Models: How to Fix Overconfidence in AI Responses

Jan, 16 2026

Containerizing Large Language Models: CUDA, Drivers, and Image Optimization

Containerizing Large Language Models: CUDA, Drivers, and Image Optimization

Jan, 25 2026

Fine-Tuned Models for Niche Stacks: When Specialization Beats General LLMs

Fine-Tuned Models for Niche Stacks: When Specialization Beats General LLMs

Jul, 5 2025