LLM (Large Language Model)
ModelDefinition
A neural network trained on massive text corpora to predict the next token, resulting in emergent abilities like reasoning, coding, and language understanding. Examples include GPT-4, Claude, Gemini, and Llama. Scale in parameters ranges from billions to trillions.
Related Terms
Transformer
The neural network architecture introduced in "Attention Is All You Need" (2017) that replaced recurrent networks for sequence modeling. Based entirely on self-attention and feed-forward layers. Foundation of virtually all modern LLMs.
Tokenizer
The component that converts raw text into tokens (integer IDs) that the model processes. Most modern LLMs use Byte-Pair Encoding (BPE) or similar subword algorithms. Token count determines cost and fits within the context window limit.
Context Window
The maximum number of tokens a model can process in a single call — including both the input (prompt) and output (completion). Larger windows allow processing entire codebases, books, or long conversations. Measured in tokens, not characters.
RLHF (Reinforcement Learning from Human Feedback)
Training LLMs using human preference signals: human raters compare model outputs, a reward model is trained on these preferences, then the LLM is fine-tuned via RL to maximize the reward. Used to align ChatGPT, Claude, and similar assistants.