What is Transformer? — AI Glossary 2026

Definition

The neural network architecture introduced in "Attention Is All You Need" (2017) that replaced recurrent networks for sequence modeling. Based entirely on self-attention and feed-forward layers. Foundation of virtually all modern LLMs.

Related Terms

Attention Mechanism

The core innovation in Transformers that allows a model to weigh the importance of different tokens in a sequence when generating each output token. Self-attention lets every token "attend" to every other token, capturing long-range dependencies.

LLM (Large Language Model)

A neural network trained on massive text corpora to predict the next token, resulting in emergent abilities like reasoning, coding, and language understanding. Examples include GPT-4, Claude, Gemini, and Llama. Scale in parameters ranges from billions to trillions.

Embedding

A dense numerical vector that represents text, images, or other data in a high-dimensional space where semantic similarity maps to geometric closeness. Foundation of semantic search, RAG systems, and recommendation engines.

Articles about Transformer

Why Microsoft's $80 Billion AI Capital Expenditure Plan Is the Most I…

UniPool shares MoE experts across layers

Outlier Tokens in DiTs, and How DSR Fixes Them

LLM Biases in Agentic AI Systems

TurboQuant, EDEN, and the citation fight

Definition

Related Terms

Articles about Transformer

All Terms