Tag: subword segmentation

Tokenizer Design Choices and Their Impacts on LLM Quality

by Phillip Ramos

Explore how tokenizer design choices, vocabulary size, and algorithms like BPE and Unigram impact LLM accuracy, memory usage, and numerical reasoning.

Recent-posts

Content Moderation Pipelines for User-Generated Inputs to LLMs: How to Prevent Harmful Content in Real Time

Aug, 2 2025

Practical Applications of Generative AI: A 2026 Industry Guide

Mar, 30 2026

The Future of Generative AI: Agentic Systems, Lower Costs, and Better Grounding

Jul, 23 2025

Retrieval-Augmented Generation for Generative AI: Grounding Outputs in Verified Sources

Mar, 28 2026

Calibration and Outlier Handling in Quantized LLMs: How to Keep Accuracy When Compressing Models

Jul, 6 2025