Tag: batch size

How to Choose Batch Sizes to Minimize Cost per Token in LLM Serving

by Phillip Ramos

Learn how to choose optimal batch sizes for LLM serving to cut cost per token by up to 87%. Discover real-world results, batching types, hardware trade-offs, and proven techniques to reduce AI infrastructure costs.

Recent-posts

Fine-Tuned Models for Niche Stacks: When Specialization Beats General LLMs

Jul, 5 2025

Calibration and Outlier Handling in Quantized LLMs: How to Keep Accuracy When Compressing Models

Jul, 6 2025

Value Alignment in Generative AI: How Human Feedback Shapes AI Behavior

Aug, 9 2025

Benchmarking Transformer Variants: Choosing the Right LLM Architecture for Your Workload

Apr, 4 2026

How to Evaluate and Monitor Drift After Fine-Tuning Your LLM

Apr, 10 2026