Tag

prompt injection

Prompt injection is the class of attacks where hidden instructions in documents, web pages, logs, or tool outputs steer an LLM or agent away from its intended task. It matters for MCP, desktop control, plugins, and trace analysis because trust boundaries, isolation, and monitoring decide what an agent can safely do.

4 articles

Research/May 4

Cloudflare finds AI code review can be fooled

Cloudflare found AI code reviewers can be tricked by hidden comments, with detection dropping to 53.3% and 12% in large files.

Research/Apr 14

Meerkat hunts safety bugs across agent traces

Meerkat clusters agent traces and searches them adaptively to surface rare safety violations that per-trace monitors miss.

Blockchain & Web3/Apr 1

Openclaw Flaw Exposes AI Admin Hijack Risk

Certik says Openclaw’s flaws expose 135,000+ instances, token theft, and admin takeover risk, with CVE-2026-25253 leading the list.

Research/Mar 28

OpenClaw Agents Can Be Manipulated Into Failure

Northeastern researchers found OpenClaw agents can be guilted, looped, and tricked into breaking their own tools inside a sandbox.