Attention Mechanism
TechniqueDefinition
The core innovation in Transformers that allows a model to weigh the importance of different tokens in a sequence when generating each output token. Self-attention lets every token "attend" to every other token, capturing long-range dependencies.
Related Terms
Transformer
The neural network architecture introduced in "Attention Is All You Need" (2017) that replaced recurrent networks for sequence modeling. Based entirely on self-attention and feed-forward layers. Foundation of virtually all modern LLMs.
Context Window
The maximum number of tokens a model can process in a single call — including both the input (prompt) and output (completion). Larger windows allow processing entire codebases, books, or long conversations. Measured in tokens, not characters.
Articles about Attention Mechanism
All Terms
AgentAttention MechanismChain-of-ThoughtContext WindowDiffusion ModelDistillationDPO (Direct Preference Optimization)EmbeddingFew-shot PromptingFine-tuningFunction CallingGAN (Generative Adversarial Network)GRPO (Group Relative Policy Optimization)HallucinationInferenceLLM (Large Language Model)LoRA (Low-Rank Adaptation)MCP (Model Context Protocol)MultimodalPrompt EngineeringQLoRA (Quantized LoRA)QuantizationRAG (Retrieval-Augmented Generation)RLHF (Reinforcement Learning from Human Feedback)TemperatureTokenizerTool UseTop-p (Nucleus Sampling)TransformerVector DatabaseZero-shot Prompting