RAG (Retrieval-Augmented Generation)
TechniqueDefinition
An architecture that enhances LLM outputs by first retrieving relevant documents from a knowledge base (via vector search) and injecting them into the prompt. Grounds the model in external, up-to-date facts without requiring retraining.
Related Terms
Vector Database
A database optimized for storing and querying high-dimensional embedding vectors via approximate nearest neighbor (ANN) search. Core infrastructure for RAG systems. Examples: Pinecone, Weaviate, Qdrant, pgvector (PostgreSQL extension).
Embedding
A dense numerical vector that represents text, images, or other data in a high-dimensional space where semantic similarity maps to geometric closeness. Foundation of semantic search, RAG systems, and recommendation engines.
Hallucination
When a language model generates confident, fluent text that is factually incorrect, fabricated, or contradictory to the source. A fundamental challenge caused by models optimizing for plausible token sequences rather than factual accuracy.
Context Window
The maximum number of tokens a model can process in a single call — including both the input (prompt) and output (completion). Larger windows allow processing entire codebases, books, or long conversations. Measured in tokens, not characters.