Fine-tuning
TechniqueDefinition
Continuing to train a pre-trained model on a domain-specific or task-specific dataset to specialize its behavior. Ranges from full fine-tuning (updating all weights) to parameter-efficient methods like LoRA and QLoRA.
Related Terms
LoRA (Low-Rank Adaptation)
A parameter-efficient fine-tuning technique that adds small trainable rank-decomposition matrices to frozen model layers. Achieves near full fine-tuning performance while training less than 1% of parameters. Industry standard for adapting LLMs.
QLoRA (Quantized LoRA)
Combines 4-bit quantization with LoRA fine-tuning, enabling fine-tuning of 65B+ parameter models on a single consumer GPU. Published by Tim Dettmers et al. (2023). Made democratized fine-tuning of large models practical.
RLHF (Reinforcement Learning from Human Feedback)
Training LLMs using human preference signals: human raters compare model outputs, a reward model is trained on these preferences, then the LLM is fine-tuned via RL to maximize the reward. Used to align ChatGPT, Claude, and similar assistants.
DPO (Direct Preference Optimization)
An alignment training method that optimizes the model directly on human preference pairs (preferred vs. rejected responses) without needing a separate reward model. Simpler and more stable than RLHF, increasingly preferred for instruction tuning.