Tag
prompt injection
Prompt injection is the class of attacks where hidden instructions in documents, web pages, logs, or tool outputs steer an LLM or agent away from its intended task. It matters for MCP, desktop control, plugins, and trace analysis because trust boundaries, isolation, and monitoring decide what an agent can safely do.
4 articles

Cloudflare finds AI code review can be fooled
Cloudflare found AI code reviewers can be tricked by hidden comments, with detection dropping to 53.3% and 12% in large files.

Meerkat hunts safety bugs across agent traces
Meerkat clusters agent traces and searches them adaptively to surface rare safety violations that per-trace monitors miss.

Openclaw Flaw Exposes AI Admin Hijack Risk
Certik says Openclaw’s flaws expose 135,000+ instances, token theft, and admin takeover risk, with CVE-2026-25253 leading the list.

OpenClaw Agents Can Be Manipulated Into Failure
Northeastern researchers found OpenClaw agents can be guilted, looped, and tricked into breaking their own tools inside a sandbox.