Tag: subword segmentation

Tokenizer Design Choices and Their Impacts on LLM Quality

by Phillip Ramos

Explore how tokenizer design choices, vocabulary size, and algorithms like BPE and Unigram impact LLM accuracy, memory usage, and numerical reasoning.

Recent-posts

Chunking Strategies That Improve Retrieval Quality for Large Language Model RAG

Dec, 14 2025

Backlog Hygiene for Vibe Coding: How to Manage Defects, Debt, and Enhancements

Jan, 31 2026

Prompt Injection Defense: How to Sanitize Inputs for Secure Generative AI

May, 11 2026

Navigating the Generative AI Landscape: Practical Strategies for Leaders

Feb, 17 2026

Data Classification Rules for Vibe Coding Inputs and Outputs

Mar, 31 2026