Tag
1 articles
A new training scheme, HDPO, aims to cut blind tool use in multimodal agents by separating accuracy from tool efficiency.