What is Attention Mechanism? — AI Glossary 2026

Definition

The core innovation in Transformers that allows a model to weigh the importance of different tokens in a sequence when generating each output token. Self-attention lets every token "attend" to every other token, capturing long-range dependencies.

Related Terms

Transformer

The neural network architecture introduced in "Attention Is All You Need" (2017) that replaced recurrent networks for sequence modeling. Based entirely on self-attention and feed-forward layers. Foundation of virtually all modern LLMs.

Context Window

The maximum number of tokens a model can process in a single call — including both the input (prompt) and output (completion). Larger windows allow processing entire codebases, books, or long conversations. Measured in tokens, not characters.

Articles about Attention Mechanism

TIDE distills diffusion LLMs across architectures

AI Weekly: 2026-04-20 ~ 2026-04-27

Geometry Matters: Understanding Neural Networks Through Manifolds

Vega: Driving with Natural Language Instructions

Small Language Models: Llama 3.2 and Phi-3 Transform AI

Definition

Related Terms

Articles about Attention Mechanism

All Terms