Tag
1 articles
BAS evaluates whether LLM confidence helps decide when to answer or abstain, exposing overconfident errors that standard metrics can miss.