Tag
1 articles
A new benchmark expands paralinguistic speech evaluation past coarse labels, using 1,000+ queries and pairwise judging to expose model gaps.