← Glossary

Tokenizer

Tool

Definition

The component that converts raw text into tokens (integer IDs) that the model processes. Most modern LLMs use Byte-Pair Encoding (BPE) or similar subword algorithms. Token count determines cost and fits within the context window limit.