Tag: SLM deployment

Explore when to use Edge Inference and Small Language Models (SLMs) over the cloud. Learn about model compression, latency, and on-device AI trade-offs.

Recent-posts

Transformer Efficiency Tricks: KV Caching and Continuous Batching in LLM Serving

Transformer Efficiency Tricks: KV Caching and Continuous Batching in LLM Serving

Sep, 5 2025

Edge Inference for Small Language Models: When On-Device Makes Sense

Edge Inference for Small Language Models: When On-Device Makes Sense

Apr, 4 2026

Combining Pruning and Quantization for Maximum LLM Speedups

Combining Pruning and Quantization for Maximum LLM Speedups

Mar, 3 2026

Visualization Techniques for Large Language Model Evaluation Results

Visualization Techniques for Large Language Model Evaluation Results

Dec, 24 2025

Few-Shot Fine-Tuning of Large Language Models: When Data Is Scarce

Few-Shot Fine-Tuning of Large Language Models: When Data Is Scarce

Feb, 9 2026