What is RAG (Retrieval-Augmented Generation)? — AI Glossary 2026

Definition

An architecture that enhances LLM outputs by first retrieving relevant documents from a knowledge base (via vector search) and injecting them into the prompt. Grounds the model in external, up-to-date facts without requiring retraining.

Related Terms

Vector Database

A database optimized for storing and querying high-dimensional embedding vectors via approximate nearest neighbor (ANN) search. Core infrastructure for RAG systems. Examples: Pinecone, Weaviate, Qdrant, pgvector (PostgreSQL extension).

Embedding

A dense numerical vector that represents text, images, or other data in a high-dimensional space where semantic similarity maps to geometric closeness. Foundation of semantic search, RAG systems, and recommendation engines.

Hallucination

When a language model generates confident, fluent text that is factually incorrect, fabricated, or contradictory to the source. A fundamental challenge caused by models optimizing for plausible token sequences rather than factual accuracy.

Context Window

The maximum number of tokens a model can process in a single call — including both the input (prompt) and output (completion). Larger windows allow processing entire codebases, books, or long conversations. Measured in tokens, not characters.

Definition

Related Terms

All Terms