Why runtime policy matters for AI agents

Red-team an agent thoroughly enough and you'll find every category of issue that can be found. Then you ship it. Three weeks later, someone changes the system prompt, a vendor swaps the underlying model, retrieval starts pulling from a new corpus, and the assumptions your red-team programme tested under no longer hold.

Testing tells you what the agent can do. Runtime policy tells you what the agent is allowed to do — every single call, every single time.

What runtime policy actually does

A runtime policy decision point (PDP) sits between the agent and its side-effects. Every tool call, every retrieval call, every model egress, every A2A message, every approval gate — the PDP authorises it, denies it, or marks it for human approval. The decision is logged.

The decisions a useful PDP makes:

—Tool allowed at all? Which agents may invoke send_email?
—Tool allowed with these arguments? send_email is fine; send_email(to=*@external.com) needs gating.
—Data access allowed? The agent can query orders; it cannot query orders for tenant_id != session.tenant_id.
—Model egress allowed? Anthropic Claude is fine; gemini-pro-via-OpenRouter requires approval.
—A2A interaction trusted? Agent A may delegate to agent B; agent B may not loop back into agent A.
—High-impact action — escalate? Refunds above $500 always require human approval.

Why testing alone isn't enough

Three real-world reasons:

Models change underneath you

You red-team against claude-3-5-sonnet. Two months later your vendor auto-upgrades to claude-3-7-sonnet. The behaviour you measured is now stale. A runtime PDP is invariant to model swaps — the rules are about actions, not about which model produced them.

The attack surface drifts

New tools added. New retrieval sources. New A2A integrations. Each one is a new surface that wasn't tested. The PDP gives you a default-deny floor: even if a new tool ships without a red-team pass, the PDP refuses it until someone explicitly allows it.

Some actions need a human in the loop, always

No amount of red-teaming makes "approve a $50,000 wire transfer" a fully-automatable action. Some decisions require an approver. The PDP is where that gate lives.

What a good PDP looks like

The bar we hold AgentGuardian Enterprise to:

—Formally verifiable. We use Cedar 4.5 — the same engine AWS uses internally for IAM-adjacent decisions. Cedar policies are formally analysable: you can prove non-trivial properties (e.g. 'no policy allows agent-X to call send_email with to=external'). Hand-written policy isn't enough.
—Low-latency. ≤10ms p99 added latency. If the PDP is slow, teams will route around it.
—Default-deny, fail-closed. Unknown action → denied. Network partition between agent and PDP → denied. The wrong failure mode is 'agent acts because the PDP is unavailable.'
—Shadow → canary → enforce promotion. New policies go live in shadow first (decision recorded, not enforced), then canary (a percentage of traffic enforced), then enforce. This is how you safely deploy policy without taking the platform down.
—Per-tenant. Each tenant gets its own policy set, its own KMS key for signing decisions, its own audit ledger. Multi-tenant systems where one tenant's policy can affect another's decisions are not safe to ship.

The closed loop

The most underrated feature of a good runtime PDP is the loop between red-team and policy. When a red-team finding lands — say, "the agent can be coerced into emailing another customer's order" — the finding becomes a Cedar rule:

forbid (
  principal in Agent::"customer-support-*",
  action == Action::"send_email",
  resource is Email
) when {
  resource.to.domain != principal.tenant.domain
};

The rule deploys in shadow first to measure noise; if quiet, it promotes to enforce. The same finding that previously produced a PDF report now produces an enforced rule. Every block thereafter goes to SIEM with a trail back to the red-team finding that motivated it.

Why this matters for compliance

Regulators are increasingly explicit about operational controls, not just policy documents. APRA's 30-April-2026 letter — the most prescriptive AI letter in APRA's history — asks specifically whether the firm has "automated controls that constrain agent actions in production." MAS AIRG asks for evidence of "monitoring and constraint mechanisms applied at the point of decision." NIST AI RMF's MANAGE function asks for documented controls applied during deployment.

A runtime PDP is the answer to those questions. Without one, the answer is "we tested it, then we trusted it" — which is not an answer regulators are comfortable with anymore.

Test what the agent can do. Then enforce what it's allowed to do. The two are separate problems and they need separate layers.

Why runtime policy matters for AI agents.

What runtime policy actually does

Why testing alone isn't enough

Models change underneath you

The attack surface drifts

Some actions need a human in the loop, always

What a good PDP looks like

The closed loop

Why this matters for compliance

More from the blog.

What is AI agent governance.

Red teaming AI agents: what developers need to test before production.

AgentGuardian Open Source vs AgentGuardian Enterprise.

Want to test your own agent?