Transformer
ModelDefinition
The neural network architecture introduced in "Attention Is All You Need" (2017) that replaced recurrent networks for sequence modeling. Based entirely on self-attention and feed-forward layers. Foundation of virtually all modern LLMs.
Related Terms
Attention Mechanism
The core innovation in Transformers that allows a model to weigh the importance of different tokens in a sequence when generating each output token. Self-attention lets every token "attend" to every other token, capturing long-range dependencies.
LLM (Large Language Model)
A neural network trained on massive text corpora to predict the next token, resulting in emergent abilities like reasoning, coding, and language understanding. Examples include GPT-4, Claude, Gemini, and Llama. Scale in parameters ranges from billions to trillions.
Embedding
A dense numerical vector that represents text, images, or other data in a high-dimensional space where semantic similarity maps to geometric closeness. Foundation of semantic search, RAG systems, and recommendation engines.