Tag: cost per token

How to Choose Batch Sizes to Minimize Cost per Token in LLM Serving

by Phillip Ramos

Learn how to choose optimal batch sizes for LLM serving to cut cost per token by up to 87%. Discover real-world results, batching types, hardware trade-offs, and proven techniques to reduce AI infrastructure costs.

Recent-posts

Testing and Monitoring RAG Pipelines: Synthetic Queries and Real Traffic

Aug, 12 2025

Key Components of Large Language Models: Embeddings, Attention, and Feedforward Networks Explained

Sep, 1 2025

Hardware-Friendly LLM Compression: How to Fit Large Models on Consumer GPUs and CPUs

Jan, 22 2026

How Generative AI Is Transforming Prior Authorization Letters and Clinical Summaries in Healthcare Admin

Dec, 15 2025

Why Tokenization Still Matters in the Age of Large Language Models

Sep, 21 2025