← Glossary

GRPO (Group Relative Policy Optimization)

Technique

Definition

A reinforcement learning algorithm from DeepSeek that improves upon PPO by comparing multiple sampled responses within a group rather than relying on a separate critic. Used to train DeepSeek-R1's reasoning capabilities.