← Glossary

RLHF (Reinforcement Learning from Human Feedback)

Technique

Definition

Training LLMs using human preference signals: human raters compare model outputs, a reward model is trained on these preferences, then the LLM is fine-tuned via RL to maximize the reward. Used to align ChatGPT, Claude, and similar assistants.