Tag
1 articles
rDPO uses instance-specific rubrics to make visual preference optimization more fine-grained, improving filtering and benchmark results.