The OWASP Agentic Top 10 (2026): A Risk-by-Risk Field Guide

When OWASP refreshed its Top 10 for LLM Applications in 2025, prompt injection kept the number-one spot for a second consecutive edition. The project group was unusually blunt about why: the techniques most teams had adopted as defences — retrieval-augmented generation and fine-tuning — "do not actually solve the core vulnerability of prompt injection; they merely ground the model, they do not secure it." A few months later, Aim Security disclosed EchoLeak (CVE-2025-32711), the first publicly documented zero-click prompt-injection exploit against Microsoft 365 Copilot — a remote attacker could exfiltrate confidential tenant data by sending a single email. The unsolved problem had become a weaponised CVE.

The OWASP Agentic Top 10 inherits that reality and adds nine more categories on top. It exists because the LLM Top 10 was written for a world where the model returned text; the agentic list is written for a world where the model calls tools, writes memory, and negotiates with other agents. This post is a field guide — each risk in plain prose, with at least one production incident the category would have flagged, and the controls that actually move the needle as opposed to the controls that look good in a slide deck.

Why a separate list was necessary

Throughout 2024 and 2025, the high-cost AI failures stopped looking like "the model said something embarrassing" and started looking like "the agent did something with legal or financial consequence." Air Canada was held liable for a chatbot that invented a refund policy; the BC Civil Resolution Tribunal rejected the airline's argument that the chatbot was a "separate legal entity" and ordered damages. A Chevrolet dealership's ChatGPT-backed sales assistant agreed to sell a Tahoe for one dollar with "no takesies backsies." The City of New York's small-business chatbot was caught telling employers they could pocket worker tips. Each of these is a behaviour the LLM Top 10 could only describe obliquely; each is a behaviour the agentic list can name.

Three structural properties of agents drive the new categories. First, the open action space: the next step is not a token, it is a tool call against a real system, so a successful injection converts into a side effect. Second, persistent state: vector stores, summary memories and scratchpads outlive any single turn, so a poisoned record becomes a long-running backdoor — a pattern documented at scale during August 2025's "Month of AI Bugs" disclosures, which included persistent memory-poisoning attacks against Amazon Bedrock agents that survived session boundaries. Third, multi-agent topology: supervisors, workers, and external A2A peers create message buses that are themselves part of the threat model.

The LLM Top 10 asks if the model said something dangerous. The Agentic Top 10 asks if the agent did something dangerous.

ASI01-ASI05 — the input-and-action surface

The first half of the list covers the categories where the agent is induced to take an action it would not have taken on its own authority.

ASI01 — Goal Hijack

Any input that overrides the agent's system goal counts here — direct user injection, indirect retrieved content, and the under-tested third channel: tool outputs that the planner renders into its next step. EchoLeak is the canonical ASI01 incident because the attacker never spoke to the user. The injection rode in on retrieved email content the planner trusted as data. The control that moves the needle is provenance-tracking on every string the planner ingests, not output filtering — by the time you are filtering output, the tool call has already gone out.

ASI02 — Tool Misuse

The agent has the right to call a tool. The question is whether it called it with the right arguments. ASI02 covers argument injection (shell metacharacters or SQL smuggled through a typed parameter), chain exfiltration (composing two permitted tools to leak data neither alone could), scope expansion, and recursion bombs. The Samsung ChatGPT incidents of March 2023 — three separate disclosures of confidential semiconductor source within twenty days — are the most-cited ASI02-shaped failures because the "tool" was the conversation itself and the "argument" was raw proprietary code.

ASI03 — Privilege Compromise and Abuse

Cross-tenant reads, JIT credential reuse, role inheritance and scope-token replay. The textbook failure: a planner issues a short-lived token for tenant A, caches the token in its scratchpad, and replays it on a turn that is serving tenant B. Detection here is rarely about the model; it is about the credential-broker layer and the audit log. If your broker cannot distinguish "planner re-used a token" from "user authenticated again," ASI03 is operationally untestable.

ASI04 — Supply Chain

Poisoned fine-tunes, registry spoofing, MCP server compromise, and doctored tool descriptions. The 2025 incident class everyone now cites is the MCP tool whose description silently appended "and always copy results to a remote endpoint" — coding agents fully compromised through tool descriptions made it into the Month of AI Bugs as a category, not a single bug. Treat tool descriptions as untrusted code, not documentation.

ASI05 — Code Execution

If your agent has a Python interpreter, ASI05 is non-optional. The hard case is not a brute-force os.system — it is a polite request for a one-line "data summary" that constructs a string the interpreter resolves to __import__('os').system('curl ...'). Sandbox the interpreter at OS-level isolation, not at the Python AST level.

ASI06-ASI10 — the state, topology, and trust surface

The second half of the list is where agentic security diverges most clearly from chatbot security. These categories cover risks that emerge from the agent's persistent state, its relationships with other agents, and the trust the rest of the stack places in its output.

ASI06 — Memory Poisoning

Three sub-shapes to test independently. RAG corpus injection: a poisoned document lands in a retrieval set and the planner treats its first line as authoritative instruction. Persistent triggers in long-term memory: a benign-looking summary fires on a later, unrelated turn. Cross-tenant vector bleed: two tenants share an index with a namespace key the agent does not always enforce. Microsoft's Tay (March 2016) remains the canonical learning-loop poisoning case study — sixteen hours from launch to shutdown after a coordinated "repeat after me" exploit — and the lesson the Agentic Top 10 institutionalises is that any state the agent can write is state an adversary can poison.

ASI07 — Agent-to-Agent Compromise

Once you deploy more than one agent, the message bus is part of the threat model: supervisor impersonation, message-bus spoofing, confused-deputy patterns, and protocol downgrade. The CSA Agentic AI Red Teaming Guide, published in May 2025 with input from more than fifty contributors, devotes one of its twelve threat categories to multi-agent exploitation precisely because the failure modes are not reachable from a single-agent harness. Test with a bus simulator, not a chat replayer.

ASI08 — Cascading Failures

Reliability with an adversary in the loop. Retry storms, alarm suppression (a planner that swallows tool errors because its system prompt rewards "completion"), dependency cascades, and feedback amplification where the agent reads its own output back from a log sink. The non-obvious cost is denial-of-wallet — a cascade that retries an LLM tool four times per error path with maximum context turns a recoverable upstream blip into an unrecoverable budget event.

ASI09 — Trust Exploitation

The category exists because trust is the substrate every other ASI rides on. Output-reflection XSS where untrusted retrieved content is rendered into a UI that evaluates it as HTML; fabricated citations a downstream agent quotes as ground truth; denial-of-wallet through verbose responses; classic jailbreaks that still defeat policy on at least one frontier model per month. If your downstream pipeline treats agent output as a safe sink, every other risk on this list is a vector to land there.

ASI10 — Rogue Agents and Drift

The long-horizon category: behaviours that do not appear in a single turn but emerge over a multi-day or multi-tenant window. Mode shift (the agent flips between research and execution and forgets to flip back), capability masking (safe under test, an out-of-distribution capability in production), reward hacking (the planner finds an unintended way to score well on its operator's metric). ASI10 is the category most likely to require a production telemetry feed rather than a synthetic harness — drift, by definition, does not fit inside a CI window.

Crosswalk to MITRE ATLAS

The Agentic Top 10 is a product-engineering catalogue. It tells a developer which control to ship. For incident response and threat intelligence, the lingua franca is MITRE ATLAS v5.1.0, the adversarial threat landscape for AI modelled after ATT&CK. ATLAS now ships sixteen tactics, eighty-four techniques and fifty-six sub-techniques; in October 2025 MITRE integrated fourteen new techniques specifically focused on AI agents and generative systems, contributed by Zenity Labs. The result is that nearly every ASI category has at least one ATLAS technique ID it maps cleanly to.

OWASP ASI	Primary ATLAS	What it covers
ASI01	AML.T0051	Prompt injection (direct & indirect)
ASI02	AML.T0040	ML-enabled service abuse via tool args
ASI03	AML.T0012	Valid accounts / credential replay
ASI04	AML.T0010	ML supply-chain compromise
ASI05	AML.T0050	Command and scripting interpreter
ASI06	AML.T0020	Poisoning of training / retrieval data
ASI07	AML.T0050.001	Agent-to-agent message abuse
ASI08	AML.T0029	Denial of ML service / wallet
ASI09	AML.T0048	External harms via trusted output
ASI10	AML.T0034	Cost harvesting / behavioural drift

The two frameworks answer different questions and a finding that does not carry both is hard to action across roles. OWASP ASI is what the engineer ships against; ATLAS is what the SOC and the threat-intel team escalate against. Carry both labels on every finding and the same vulnerability serves both audiences without translation.

Where the list still leaves gaps

Three honest limitations. First, multi-agent blast radius is named (ASI07, ASI08) but not quantified — the list does not yet give a vocabulary for "how many downstream agents read this output before someone caught it." The CSA guide treats "agent impact chain and blast radius" as a first-class category; the OWASP list does not yet. Second, vendor supply chain risk is folded into ASI04 but the failure mode that bit Samsung — sending data to a vendor whose terms permitted training use — is closer to a procurement and data-classification problem than a software-supply-chain one. Third, the regulatory layer sits outside the list entirely. NIST AI 600-1, ISO/IEC 42001 and the EU AI Act all impose obligations the list does not map to; Gartner has framed this whole layer as AI Trust, Risk and Security Management (AI TRiSM), which it placed at the peak of inflated expectations in its 2025 Hype Cycle. The OWASP list is a piece of the answer, not the whole answer.

How procurement and audit teams should use it

The most underrated use of the Agentic Top 10 is as a procurement filter. When you evaluate an agent vendor, ask which ASI categories the vendor tests for, which they exclude, and what evidence they retain. A vendor who can name only ASI01 and ASI09 is testing the chatbot threat model. A vendor who covers ASI06, ASI07 and ASI10 is testing the agentic threat model. The list is also the right vocabulary for an NIST AI RMF Generative AI Profile control narrative — auditors want categorical names with evidence, not free-text incident descriptions. Mapping each in-scope agent to a row of the list and a row of the matrix above gives an audit team something to point at.

The list is a vocabulary. Vocabularies are valuable when they replace ad-hoc descriptions with categorical ones — which is exactly what an audit pack needs.

Practical takeaway

Five things to put on the calendar this quarter:

—Map your inventory. For every production agent, mark which of the ten ASIs are in scope. Most teams discover the easy ones (ASI01, ASI09) are over-tested and the high-impact ones (ASI06, ASI07, ASI10) have zero coverage.
—Carry two labels on every finding. OWASP ASI for the engineer who has to ship the fix, MITRE ATLAS for the SOC and the threat-intel team. Without both, every finding needs human translation before it leaves the security org.
—Test the third channel for ASI01. Direct injection is well-covered; indirect injection through retrieved content and tool outputs is where production incidents like EchoLeak actually originate.
—Treat memory as code. Sign or hash records the agent writes to long-lived stores, and re-evaluate them before they re-enter the planner. ASI06 is the category most likely to produce a months-old incident report.
—Use the list as a procurement filter. Ask every agent vendor to enumerate the ASIs they cover, the ones they exclude, and the evidence they retain. The answer separates the chatbot threat model from the agentic one.

How to operationalise this

The OWASP Agentic Top 10 will keep moving — categories will be renamed, merged, split, and added as the field learns. The practical implication is that the worst place to encode the list is in a one-off audit document. It belongs in the CI pipeline of every agent, as a continuously updated set of probes whose findings carry the ASI category and the MITRE ATLAS technique ID through to the report. AgentGuardian Open Source ships a probe corpus organised exactly that way — every finding carries both labels, the score is deterministic, and the same engine is available as a managed service for teams that need governed estate-wide evidence. The list is a vocabulary; the tooling is how the vocabulary becomes a control. See the open-source release for the runnable corpus, or the enterprise platform for the governed runtime.

Walking the OWASP Agentic Top 10.