OWASP Agentic Top 10 (2026): A Risk-by-Risk Walkthrough
The 2026 standard reorders agent risk. Here is what changed, what each ASI category actually looks like in production, and how to score it.
Read noteAnalysis of the threat landscape facing autonomous AI agents — frameworks, real incidents, and the controls enterprises are putting in place to deploy agents responsibly in 2026.
Industry analysis on the agent security landscape — threat modelling, regulatory frameworks, real-world incidents, and the controls enterprises are putting in place. Written for the engineers, security leads, and board members who will live with the answers.
The 2026 standard reorders agent risk. Here is what changed, what each ASI category actually looks like in production, and how to score it.
Read noteWe took ATLAS v5.4.0 and turned every technique into an executable probe against ReAct, LangGraph, and MCP agents. The translations were not always obvious.
Read noteCVSS was built for software. Agents have autonomy, tools, and memory. Here is the scoring math we use, with one finding scored end to end.
Read noteOne coordinator, ten ASI specialists, four OWASP-LLM specialists, a shared vector memory. A look at the architecture we landed on after a year of iteration.
Read noteTools plus memory plus PII is not the same risk as a stateless agent. A four-tier model that changes how we prioritise red-team work.
Read noteFive mutators, a seeded RNG, and a 96-probe corpus turn into coverage-guided fuzzing that you can replay byte-for-byte three months later.
Read noteAn MCP server changes its tool descriptions after the user approves it. The agent never notices. Here is what the attack looks like on the wire.
Read noteAgent Card signing in A2A v0.3 is opt-in. We replayed a signed envelope twelve hours later, and the receiving agent accepted it. A hardening recipe.
Read noteReAct loops do not self-correct. One foot-in-the-door turn compounds across iterations until the agent is doing something its operator never asked for.
Read noteEight probes against published plugins and MCP servers. Registry spoofing, plugin hijack, poisoned checkpoints — and what the audit trail looks like.
Read noteAgents that remember are agents you can poison. Persistent triggers, RAG corpus inject, cross-tenant vector bleed — and a probe pattern that catches them.
Read noteLong-horizon agents quietly optimise for the wrong objective. The ASI10 probes that surface drift, mode shift, and capability masking before production does.
Read noteWhen the evaluator becomes the target, every metric downstream lies. A walkthrough of judge injection, detection signals, and defence patterns that hold up.
Read noteHidden text in scanned forms, adversarial pixels in claims photos, ASCII art in uploads. Text-only defences miss all of it. Here is what slips through.
Read notepip install agent-guardian, point it at a LangGraph or REST agent, read the report. The fastest possible path from zero to an AIVSS posture number.
Read noteDomain-specific agents need domain-specific probes. We walk through writing one, registering it under the right ASI category, and emitting a scored finding.
Read notefail-under exit codes, SARIF 2.1.0 upload, and PR comments with AIVSS deltas. The wire-up for GitHub Actions, GitLab, and Jenkins — copy-pasteable.
Read noteThe open-source AI red-team landscape evolved from single-target orchestrators to adversarial swarms. A probe-coverage map across the tools we tested.
Read noteA practical crosswalk from technical findings to NIST AI RMF MANAGE/MEASURE, ISO/IEC 42001 controls, and the 2026 OWASP categories. With table.
Read noteFinance leaders are about to be asked to sign off on a new scoring system. Here is what an AIVSS number tells the board, in language the CFO can use.
Read noteFive questions that separate organisations governing agents from organisations hoping. A board-ready framework, drawn from NIS2, SR 11-7, and OWASP.
Read noteA practical decision tree for the agent red-team capability question — in-house, paid vendor, or the OSS+SaaS dual-track model that is winning in 2026.
Read noteTen questions to drop into your RFP, drawn from OWASP ASI 2026, MITRE ATLAS, AIVSS, and what changed in the landscape after Promptfoo was acquired.
Read noteRun AgentGuardian locally with one pip install, or talk to us about the hosted dashboard.