Tag
DPO
2 articles

Research/May 5
How to Build a Vintage LLM Testbed in 5 Steps
Build a 1930-cutoff LLM testbed to study historical reasoning and contamination-free generalization.

Research/Apr 15
Rubric-Based DPO for Visual Preference Tuning
rDPO uses instance-specific rubrics to make visual preference optimization more fine-grained, improving filtering and benchmark results.