Back to home

Tag

AI safety

AI safety covers how models fail in practice and how teams reduce harm: jailbreaks, hallucinations, deceptive behavior, dual-use abuse, and the controls used in security testing, model gating, and liability cases. It sits at the intersection of research, product policy, and regulation.

11 articles

AISafetyBenchExplorer maps AI safety benchmarks
Research/May 14

AISafetyBenchExplorer maps AI safety benchmarks

A catalog of 195 AI safety benchmarks shows how fragmented measurement and weak governance make safety evaluation hard to compare.

How LLM search overviews can be manipulated
Research/May 6

How LLM search overviews can be manipulated

This paper shows LLM overview picks depend on relative source advantages, and that context poisoning can produce harmful answers.

LLM Biases in Agentic AI Systems
Research/May 6

LLM Biases in Agentic AI Systems

This paper looks at bias in transformer-based agentic AI now used for shopping, video, and navigation tasks.

Florida Opens Criminal Probe Into OpenAI
Industry News/Apr 23

Florida Opens Criminal Probe Into OpenAI

Florida’s attorney general opened a criminal probe into OpenAI after claims ChatGPT aided an FSU shooter, widening AI liability questions.

Rogue AI Incidents 2025–2026: 5x Rise in 6 Months
AI Agent/Apr 21

Rogue AI Incidents 2025–2026: 5x Rise in 6 Months

A UK-backed study analyzed 180,000 transcripts and found 698 scheming incidents, with rogue AI reports rising 4.9x in six months.

Anthropic’s Mythos stays private after bank risk fears
Industry News/Apr 16

Anthropic’s Mythos stays private after bank risk fears

Anthropic is keeping Claude Mythos Preview private and inviting banks, tech firms, and security vendors to test defenses first.

OpenAI Limits GPT-5.4-Cyber to Trusted Firms
Model Releases/Apr 16

OpenAI Limits GPT-5.4-Cyber to Trusted Firms

OpenAI is limiting GPT-5.4-Cyber to vetted partners as it pushes AI deeper into security testing and dual-use risk management.

Anthropic’s Mythos and the PR battle over AI risk
Industry News/Apr 14

Anthropic’s Mythos and the PR battle over AI risk

Anthropic says Mythos is too risky to release. Critics say the move is hype, as banks, politicians, and media outlets amplify the claim.

OpenAI、奥特曼与信任危机
Industry News/Apr 8

OpenAI、奥特曼与信任危机

OpenAI从非营利起步到估值千亿美元,奥特曼的权力和公司治理正被重新审视。

Rogue AI agents are already causing damage
Research/Apr 3

Rogue AI agents are already causing damage

AI agents have started deleting emails, hijacking compute, and ignoring shutdown commands. The safety gap is no longer theoretical.

AI Documentary Puts CEOs on the Spot
Industry News/Apr 2

AI Documentary Puts CEOs on the Spot

A new AI film opens March 27 with Altman, Hassabis, and Amodei on camera, but it still lets the biggest names off the hook.