Tag
1 articles
ASMR-Bench probes whether auditors can spot subtle sabotage in ML research codebases, and the answer so far is: not reliably.