AI Glossary
Plain English definitions for AI & ML terms — searchable and linkable.
Agent
ConceptsAn AI system that can autonomously plan and execute multi-step tasks by calling tools, browsing the web, writing code, and interacting with external services — all without continuous human intervention.
Attention Mechanism
TechniquesThe core innovation in Transformers that allows a model to weigh the importance of different tokens in a sequence when generating each output token. Self-attention lets every token "attend" to every other token, capturing long-range dependencies.
Chain-of-Thought
TechniquesA prompting technique where the model is instructed (or naturally encouraged) to output intermediate reasoning steps before a final answer. Dramatically improves performance on multi-step math, logic, and coding problems.
Context Window
ConceptsThe maximum number of tokens a model can process in a single call — including both the input (prompt) and output (completion). Larger windows allow processing entire codebases, books, or long conversations. Measured in tokens, not characters.
Diffusion Model
ModelsA generative model that learns to reverse a gradual noising process. Starting from pure noise, the model iteratively denoises to produce images, audio, or video. Powers Stable Diffusion, DALL-E 3, Midjourney, and Sora.
Distillation
TechniquesTraining a small "student" model to mimic the behavior of a larger "teacher" model. Produces compact models that retain much of the teacher's capability at a fraction of the compute cost. Used by DeepSeek-R1-Zero and many production models.
DPO (Direct Preference Optimization)
TechniquesAn alignment training method that optimizes the model directly on human preference pairs (preferred vs. rejected responses) without needing a separate reward model. Simpler and more stable than RLHF, increasingly preferred for instruction tuning.
Few-shot Prompting
TechniquesProviding the model with a small number of input-output examples (shots) in the prompt before asking it to complete a new example. Helps the model understand the desired format, style, or task without fine-tuning.
Fine-tuning
TechniquesContinuing to train a pre-trained model on a domain-specific or task-specific dataset to specialize its behavior. Ranges from full fine-tuning (updating all weights) to parameter-efficient methods like LoRA and QLoRA.
Function Calling
ToolsA structured capability where the model outputs a JSON object describing which function to call and with what arguments, rather than plain text. The calling application executes the function and feeds the result back. Standard in GPT-4, Claude, and Gemini.
GAN (Generative Adversarial Network)
ModelsAn architecture with two networks — a generator that creates synthetic data and a discriminator that tries to distinguish real from fake. Training as an adversarial game pushes the generator toward photorealistic output. Largely superseded by diffusion models for images.
GRPO (Group Relative Policy Optimization)
TechniquesA reinforcement learning algorithm from DeepSeek that improves upon PPO by comparing multiple sampled responses within a group rather than relying on a separate critic. Used to train DeepSeek-R1's reasoning capabilities.
LLM (Large Language Model)
ModelsA neural network trained on massive text corpora to predict the next token, resulting in emergent abilities like reasoning, coding, and language understanding. Examples include GPT-4, Claude, Gemini, and Llama. Scale in parameters ranges from billions to trillions.
LoRA (Low-Rank Adaptation)
TechniquesA parameter-efficient fine-tuning technique that adds small trainable rank-decomposition matrices to frozen model layers. Achieves near full fine-tuning performance while training less than 1% of parameters. Industry standard for adapting LLMs.
MCP (Model Context Protocol)
ToolsAn open standard by Anthropic for connecting AI assistants to external data sources and tools. Defines a common interface so any MCP-compatible client (Claude, Cursor, etc.) can plug into any MCP-compatible server (databases, APIs, filesystems).
Multimodal
ConceptsA model capable of processing and generating multiple types of data — text, images, audio, and video — within a single unified architecture. Examples include GPT-4o, Gemini, Claude (vision), and Sora.
QLoRA (Quantized LoRA)
TechniquesCombines 4-bit quantization with LoRA fine-tuning, enabling fine-tuning of 65B+ parameter models on a single consumer GPU. Published by Tim Dettmers et al. (2023). Made democratized fine-tuning of large models practical.
Quantization
TechniquesReducing the numerical precision of model weights (e.g., from 32-bit float to 4-bit integer) to shrink model size and speed up inference with minimal accuracy loss. Enables running large models on consumer hardware. Key for local deployments.
RAG (Retrieval-Augmented Generation)
TechniquesAn architecture that enhances LLM outputs by first retrieving relevant documents from a knowledge base (via vector search) and injecting them into the prompt. Grounds the model in external, up-to-date facts without requiring retraining.
RLHF (Reinforcement Learning from Human Feedback)
TechniquesTraining LLMs using human preference signals: human raters compare model outputs, a reward model is trained on these preferences, then the LLM is fine-tuned via RL to maximize the reward. Used to align ChatGPT, Claude, and similar assistants.
Temperature
ConceptsA sampling hyperparameter controlling output randomness. At temperature 0, the model always picks the most probable next token (deterministic). Higher values increase diversity and creativity. Values above 1.0 introduce significant noise.
Tokenizer
ToolsThe component that converts raw text into tokens (integer IDs) that the model processes. Most modern LLMs use Byte-Pair Encoding (BPE) or similar subword algorithms. Token count determines cost and fits within the context window limit.
Tool Use
ConceptsThe ability of an LLM to invoke external tools — web search, code execution, calculators, APIs — during inference. The model decides when and how to call tools, receives the result, and incorporates it into its response.
Top-p (Nucleus Sampling)
ConceptsA sampling strategy where the model selects the next token from the smallest set of candidates whose cumulative probability exceeds p. Balances diversity and coherence more adaptively than fixed top-k sampling. Often used alongside temperature.
Transformer
ModelsThe neural network architecture introduced in "Attention Is All You Need" (2017) that replaced recurrent networks for sequence modeling. Based entirely on self-attention and feed-forward layers. Foundation of virtually all modern LLMs.
Showing 31 of 31 terms