What Board Members Should Ask About Agent Risk

The shape of board oversight changed sometime between the end of 2024 and the middle of 2026. Two years ago an AI agenda item at the audit committee meant a slide on generative-AI policy, a line on usage controls, and a reassuring nod from the CISO that nothing was in production. Today that slide is a liability. Agents are running. They are calling tools, reading mailboxes, executing trades, opening tickets, drafting client communications, moving small amounts of money. They are doing this in environments most boards have never inventoried, under frameworks that did not exist when the current charter was written.

Agent risk has crossed the threshold from CISO line item to board-and-CEO matter for one reason: the regulatory regime now attaches personal accountability. NIS2 Article 20 makes management bodies of essential and important entities directly responsible for cybersecurity risk-management measures, including approval and oversight. The U.S. Federal Reserve's SR 11-7 model-risk guidance, originally written for credit and market models, has been read across to agentic systems by every U.S. bank supervisor in the last twelve months. APRA's 30-April-2026 letter, MAS AIRG, RBI FREE-AI, and the EU AI Act high-risk Article 9-15 obligations all ask the same essential question: did the board understand the risk it approved?

The board does not need to debug the agent. It needs to know whether the people who can are doing so, on a schedule, with evidence the regulator will accept.

The five questions

These are the questions a competent board should be asking the CISO at the next quarterly meeting. They are ordered. If question one cannot be answered, the answers to questions two through five are theatre.

1. Do we know which agents are running, and at what tier?

Most organisations cannot answer this. The CMDB lists the systems someone remembered to register. It does not list the LangGraph workflow a product team spun up last quarter, the CrewAI script wired into a Slack channel, the MCP server a vendor quietly enabled inside a SaaS product, or the OpenAI Agents SDK harness running in a developer's sandbox against production data.

The first credible answer is a tiered inventory. AgentGuardian classifies every discovered agent against a four-tier model: T1 (tools + memory + PII), T2 (tools + memory), T3 (tools only), T4 (prompt-only). The board does not need to remember the taxonomy. The board needs to see the count by tier, the trend month-over-month, and the owner per agent.

WHAT TO LOOK FOR

A numeric agent inventory broken down by tier, with named owners, and a separate line item for shadow agents discovered outside the sanctioned platform. If the CISO answers with a percentage rather than a count, ask for the denominator.

2. Have we tested those agents against OWASP ASI 2026 and MITRE ATLAS?

The OWASP Top 10 for Agentic Applications 2026 is the canonical taxonomy. Ten categories — ASI01 Goal Hijack, ASI02 Tool Misuse, ASI03 Privilege Compromise, ASI04 Supply Chain, ASI05 Code Execution, ASI06 Memory Poisoning, ASI07 Agent-to-Agent Compromise, ASI08 Cascading Failures, ASI09 Trust Exploitation, ASI10 Rogue Agents and Drift. Every credible agent assessment maps findings to these categories and to MITRE ATLAS v5.4.0 technique IDs. OWASP's LLM Top 10 covers the model layer; ATLAS covers adversarial machine-learning techniques observed in the wild; ASI 2026 covers the failure modes specific to agents that act.

AgentGuardian Open Source ships a 96-probe corpus sharded across the ten ASI categories and runs a swarm of fourteen concurrent specialist attackers (ten ASI specialists plus four OWASP-LLM specialists for fuzzing, detection evasion, secret extraction, and denial-of-wallet). The relevant question for the board is not the probe count. It is whether the test methodology is inspectable and whether the test cadence keeps up with the rate of change in the underlying agent.

—Test methodology: published, Apache-2.0, auditable on GitHub — not vendor-proprietary.
—Coverage: all ten OWASP ASI 2026 categories, mapped to MITRE ATLAS technique IDs.
—Cadence: continuous for T1, weekly for T2, monthly for T3, pre-release for T4.
—Evidence trail: every finding carries the probe ID, the category, the ATLAS technique, and the AIVSS score.

3. Do our findings carry a numeric score we can put in a board pack?

The board is not equipped to interpret a list of vulnerability narratives. It is equipped to interpret a number, a trend, and a threshold. That is what AIVSS — the AI Vulnerability Scoring System — exists to provide. AIVSS is a deterministic 0–100 risk score computed from severity weight multiplied by tier weight, with severities of critical (1.0), high (0.7), medium (0.4), and low (0.2). The formula is published. It does not require an LLM key or network access to compute. Two independent teams running the same probes against the same target on different days produce identical scores.

That property — determinism — is what makes AIVSS board-ready. A 0–100 number that can be reproduced, signed, and included in the quarterly risk pack alongside CVSS, FAIR, and operational-risk indicators. NIST AI RMF describes the function; AIVSS supplies the metric.

WHAT TO LOOK FOR

A single AIVSS posture score per agent, an aggregate score for the agent estate, and a trend line over the last four quarters. If the score has not moved despite known model upgrades, retrieval changes, or tool additions, the score is stale — or the methodology is not picking up real changes.

4. Who is accountable when an agent takes an unintended action?

This is the question NIS2 Article 20 asks under the surface. When an agent emails a customer the wrong contract, executes a duplicate refund, or surfaces another tenant's data in a chat response, who answers? Not in a public-relations sense — in a regulatory sense. The board needs to be able to point at a named role, a documented escalation path, and an evidence trail showing the controls that should have prevented the action.

A useful accountability model has four layers. The agent owner (a business product manager) owns the agent's purpose. The platform team owns the runtime, the policy decision point, and the rollback path. The security team owns the red-team programme and the residual-risk register. The risk and compliance function owns the regulator-facing artefact. The CISO sits at the join of these layers and reports to the board. If any of these layers is unstaffed or undefined, that is the finding the board should record.

SR 11-7 has been read across to agentic AI by every major U.S. bank in the last year. The model-risk terminology maps cleanly. The agent is the model. The agent's tool calls are the model output. The three-lines-of-defence structure applies. ISO/IEC 42001 codifies the same expectation in an AI management system standard. The board's job is not to invent the model — it is to confirm the model is in operation.

5. Can we produce the evidence pack a regulator will ask for?

The MAS AIRG examiner, the APRA CPS 230 reviewer, the RBI FREE-AI inspector, the EU AI Act notified body — each is going to ask for documentary evidence that the agent was tested, that the findings were triaged, that residual risk was accepted at the right level, and that controls operate at runtime. Screenshots will not survive the request. A 200-page PDF nobody can verify is worse than nothing.

A regulator-grade evidence pack is a single signed artefact — PDF/A-3 with a JSON sidecar, ECDSA-P384 signed under a per-tenant KMS key, RFC 3161 timestamped, hash-chain anchored, archived under immutable storage. The AgentGuardian probe run produces a SARIF 2.1.0 file, a PDF report, an HTML view, a JSON manifest, and an evidence bundle. Every finding carries three standards mappings (OWASP ASI 2026 + MITRE ATLAS v5.4.0 + CSA Agentic-RT) and an AIVSS sub-score. The bundle is reproducible: the deterministic stub mode means an external auditor can re-run the corpus and verify the score.

# what the evidence bundle contains
report.html      # human-readable findings view
report.pdf       # regulator-ready summary
report.sarif     # SARIF 2.1.0 — machine-verifiable findings
report.json      # full manifest, probe IDs, AIVSS sub-scores
evidence/        # raw transcripts, hashes, signing manifest

The board does not need to open the SARIF file. The board needs to know it exists, that it is signed, that the retention policy meets the longest applicable regulatory window, and that the methodology behind it is inspectable on GitHub under Apache-2.0.

How to read an AgentGuardian posture report at the board level

A board pack does not need the SARIF. It needs three things on a single page, every quarter, for every regulated agent class. They are:

—Inventory: count of agents by tier (T1/T2/T3/T4), shadow-agent line item, owner coverage percentage.
—Posture: aggregate AIVSS score for the estate, distribution by severity (critical/high/medium/low), trend over four quarters.
—Evidence: count of signed evidence packs produced this quarter, retention status, regulator framework coverage (MAS AIRG, APRA CPS 230, RBI FREE-AI, EU AI Act, NIST AI RMF, ISO/IEC 42001).

If the inventory is incomplete, the posture is unreliable. If the posture is unreliable, the evidence pack is decorative. The three lines are read in order. A board that asks for them in that order is reading the discipline of the programme, not the comfort of its narrative.

What good looks like

The organisations that have moved fastest on agent risk in the last six months share three characteristics. They have a named owner for every production agent. They run the OWASP ASI 2026 probe set continuously against T1 agents and produce a SARIF artefact on every change. They publish the AIVSS trend to the board pack in the same format every quarter, alongside CVSS, FAIR, and operational-risk indicators.

None of this requires a SaaS vendor in the loop. The open-source toolkit is pip install agent-guardian, the scan command is agent-guardian scan ./agent.py, the output is the same SARIF the enterprise edition signs. What the enterprise edition adds is the discovery layer, the runtime policy decision point, the signed evidence packs, and the regulator-framework mapping. What both editions share is the methodology — and that is the only thing the board has to be convinced of.

The board does not approve the agent. The board approves the discipline. The five questions are the discipline.

A CISO who can answer all five does not need the board's help. A CISO who cannot answer the first one needs a different conversation than the one currently on the agenda.

What board members should ask the CISO about agent risk.