Tag: subword segmentation

Explore how tokenizer design choices, vocabulary size, and algorithms like BPE and Unigram impact LLM accuracy, memory usage, and numerical reasoning.

Recent-posts

Interoperability Patterns to Abstract Large Language Model Providers

Interoperability Patterns to Abstract Large Language Model Providers

Jul, 22 2025

How Large Language Models Capture Semantics and Syntax through Self-Supervision

How Large Language Models Capture Semantics and Syntax through Self-Supervision

May, 12 2026

Calibration and Outlier Handling in Quantized LLMs: How to Keep Accuracy When Compressing Models

Calibration and Outlier Handling in Quantized LLMs: How to Keep Accuracy When Compressing Models

Jul, 6 2025

How to Measure Generative AI ROI: Productivity, Quality, and Transformation Metrics

How to Measure Generative AI ROI: Productivity, Quality, and Transformation Metrics

May, 9 2026

Understanding Per-Token Pricing for Large Language Model APIs: A Cost Guide

Understanding Per-Token Pricing for Large Language Model APIs: A Cost Guide

May, 2 2026