Context Window
ConceptDefinition
The maximum number of tokens a model can process in a single call — including both the input (prompt) and output (completion). Larger windows allow processing entire codebases, books, or long conversations. Measured in tokens, not characters.
Related Terms
Tokenizer
The component that converts raw text into tokens (integer IDs) that the model processes. Most modern LLMs use Byte-Pair Encoding (BPE) or similar subword algorithms. Token count determines cost and fits within the context window limit.
Embedding
A dense numerical vector that represents text, images, or other data in a high-dimensional space where semantic similarity maps to geometric closeness. Foundation of semantic search, RAG systems, and recommendation engines.
RAG (Retrieval-Augmented Generation)
An architecture that enhances LLM outputs by first retrieving relevant documents from a knowledge base (via vector search) and injecting them into the prompt. Grounds the model in external, up-to-date facts without requiring retraining.