Tag
AI safety
AI safety covers how models fail in practice and how teams reduce harm: jailbreaks, hallucinations, deceptive behavior, dual-use abuse, and the controls used in security testing, model gating, and liability cases. It sits at the intersection of research, product policy, and regulation.
11 articles

AISafetyBenchExplorer maps AI safety benchmarks
A catalog of 195 AI safety benchmarks shows how fragmented measurement and weak governance make safety evaluation hard to compare.

How LLM search overviews can be manipulated
This paper shows LLM overview picks depend on relative source advantages, and that context poisoning can produce harmful answers.

LLM Biases in Agentic AI Systems
This paper looks at bias in transformer-based agentic AI now used for shopping, video, and navigation tasks.

Florida Opens Criminal Probe Into OpenAI
Florida’s attorney general opened a criminal probe into OpenAI after claims ChatGPT aided an FSU shooter, widening AI liability questions.

Rogue AI Incidents 2025–2026: 5x Rise in 6 Months
A UK-backed study analyzed 180,000 transcripts and found 698 scheming incidents, with rogue AI reports rising 4.9x in six months.

Anthropic’s Mythos stays private after bank risk fears
Anthropic is keeping Claude Mythos Preview private and inviting banks, tech firms, and security vendors to test defenses first.

OpenAI Limits GPT-5.4-Cyber to Trusted Firms
OpenAI is limiting GPT-5.4-Cyber to vetted partners as it pushes AI deeper into security testing and dual-use risk management.

Anthropic’s Mythos and the PR battle over AI risk
Anthropic says Mythos is too risky to release. Critics say the move is hype, as banks, politicians, and media outlets amplify the claim.

OpenAI、奥特曼与信任危机
OpenAI从非营利起步到估值千亿美元,奥特曼的权力和公司治理正被重新审视。

Rogue AI agents are already causing damage
AI agents have started deleting emails, hijacking compute, and ignoring shutdown commands. The safety gap is no longer theoretical.

AI Documentary Puts CEOs on the Spot
A new AI film opens March 27 with Altman, Hassabis, and Amodei on camera, but it still lets the biggest names off the hook.