Use Case · Critical Infrastructure

Critical infrastructure: agent security for OT-adjacent IT.

Operations assistants and tier-1 SOC agents are now the supply chain into OT. Test them like one.

AgentGuardian runs a 14-specialist adversarial swarm against the operations assistants that bridge IT helpdesk to OT documentation, and against the tier-1 SOC triage agents standing in front of ICS operators. Probes are deterministic, the scan runs offline, and the evidence pack maps to OWASP ASI 2026, MITRE ATLAS v5.4.0, CSA Agentic-RT, and the NIS2 supervisory record.

Book a Scoping Call Try Open Source

NIS2 ART. 20 · TSA SD PIPELINE · CISA AGENTIC AI JAN 2026 · OWASP ASI01–10 · MITRE ATLAS V5.4 · CSA AGENTIC RT

Industry Context

The agent is now the supply chain into OT.

Electricity, water, gas, rail, and pipeline operators are not deploying agentic AI inside the OT network. They are deploying it on the corporate side — operations assistants that summarise vendor manuals, tier-1 SOC agents that triage ICS alerts, change-management copilots that read engineering tickets. Every one of those agents reaches across the IT/OT boundary by reading tool descriptions, retrieving SCADA documentation, or correlating historian extracts. When the agent is coerced, the OT network is the blast radius.

MCP exposure

~200,000

Internet-reachable MCP server instances with no authenticated transport (OX Security disclosure, May 2026).

OT incident drift

Dec 2025 – Feb 2026

Monterrey water-utility intrusion: LLM-assisted analysis of SCADA vendor documentation generated device-specific brute-force lists.

Joint guidance

Jan 2026

CISA, NSA, FBI, and international partners published guidance on agentic AI in operational technology environments.

Regulator weight

NIS2 Art. 20

Management bodies of essential entities carry personal liability for the adequacy of cyber-risk measures, including agent testing.

Top probes that matter

Three families of attack that actually compromise OT-adjacent agents.

The 96-probe corpus shards across all ten OWASP ASI 2026 categories, but for critical-infrastructure operations agents three families produce most of the real findings. Each is a concrete probe shipped in the OSS corpus today, not a roadmap item.

ASI02 · TOOL MISUSE

ReAct hijack for OT asset enumeration

What the probe does

An operations assistant with a ServiceNow CMDB tool is steered, via an embedded indirect-prompt-injection chain, to enumerate every HMI, historian, and engineering workstation reachable from corporate. The probe varies argument shape, recursion depth, and scope-token replay across tier T1 and T2 targets.

Why it matters

CISA, NSA, and partners flagged in January 2026 joint guidance that the ReAct loop is the most reliable foothold into OT-adjacent IT — the agent's tool call is treated as authenticated, internal, and audited as the agent identity rather than the attacker.

ASI04 · SUPPLY CHAIN

MCP server tool-description rug-pull

What the probe does

A signed MCP server the agent already trusts swaps its tool descriptions mid-session: an innocuous get_pump_pressure surface gains a hidden side-effect appendix. AgentGuardian replays the recon fingerprint, signs the rotation, and checks whether the operator's policy filter detects the description hash change before the next tool call.

Why it matters

OX Security disclosed approximately 200,000 MCP instances on the public internet in May 2026 with no authenticated transport. Vendor MCPs delivered with SCADA gateways are now an in-scope supply-chain channel under NIS2 Article 21(2)(d).

ASI09 · TRUST EXPLOITATION

Multi-turn social engineering of tier-1 SOC triage

What the probe does

A 14-turn conversational plan masquerading as an upstream incident-response handoff coerces a tier-1 SOC agent to (1) silence an alarm correlation rule, (2) issue a maintenance-window token to an unverified caller, and (3) email a sanitized version of an ICS asset map. Tested with the OWASP-LLM specialists enabled via --include-m2-agents.

Why it matters

The Dec 2025–Feb 2026 Monterrey water-utility intrusion used an LLM-assisted vendor-impersonation chain to extract SCADA documentation and generate device-specific brute-force lists. Multi-turn coercion is now the dominant initial-access vector in CISA's quarterly OT incident summary.

How AgentGuardian fits

Three steps from agent fingerprint to signed NIS2 evidence.

The full scan runs locally with no LLM key, no telemetry, and no phone-home — the only configuration an air-gapped corporate segment will tolerate. Findings are tagged across three standards and rolled into a deterministic 0–100 AIVSS posture score the management body can sign against.

STEP 01

Recon and fingerprint the operations agent.

Point AgentGuardian at the operations assistant or tier-1 SOC agent (HTTP, LangGraph, CrewAI, OpenAI Agents SDK, AutoGen, Google ADK, AWS Strands, or any MCP server). The coordinator runs a passive fingerprint to extract framework, tool inventory, memory class, and tier. Stub mode runs fully offline — no LLM key, no network — so the scan executes on the air-gapped corporate segment that touches the OT DMZ.

—Agent framework and adapter detected
—Tool inventory mapped to ASI02 surface
—Memory class (vector, episodic, long-horizon) classified
—Tier assigned (T1 tools+memory+PII through T4 prompt-only)

STEP 02

Dispatch the 14-specialist swarm against the 96-probe corpus.

Ten OWASP ASI 2026 specialists plus four OWASP-LLM specialists run concurrently under one asyncio TaskGroup against the targeted operations agent. For critical infrastructure the relevant probe shards are ASI02 (tool misuse, 8 probes), ASI04 (supply-chain MCP and plugin abuse, 8 probes), ASI06 (memory poisoning including 5 HITL-bypass at T1/T2, 13 probes), ASI07 (A2A confused deputy across SOC tiers, 8 probes), and ASI09 (multi-turn coercion, output-reflection, fabricated citations, 17 probes). Five deterministic mutators — oversize, control-chars, truncate, type-confusion, encoding — vary the payload shape so the same probe family exercises libFuzzer-style coverage.

—Each finding tagged with OWASP ASI 2026 category
—Each finding tagged with MITRE ATLAS v5.4.0 technique ID
—Each finding tagged with CSA Agentic-RT category
—Deterministic AIVSS score computed locally (0–100)

STEP 03

Sign the NIS2-ready evidence bundle.

AgentGuardian writes report.html, report.pdf, report.sarif (SARIF 2.1.0), report.json, and an evidence/ directory containing attack transcripts, probe seed hashes, and corpus version 2026.05. The AIVSS formula is published and deterministic, so the signed bundle reproduces byte-for-byte off the same agent state — the property NIS2 supervisory bodies and TSA Pipeline Security inspectors require to treat a private red-team result as testimony rather than marketing.

—SARIF 2.1.0 importable into the SOC SIEM
—PDF + HTML reports for management-personal-liability sign-off
—Attack transcripts for incident-response tabletop reuse
—Hash-stable findings for external counter-signature

Regulatory landscape

The frameworks AgentGuardian maps every finding into.

Each finding in the signed evidence pack carries an OWASP ASI 2026 category, a MITRE ATLAS v5.4.0 technique ID, and a CSA Agentic-RT category. NIST AI RMF and EU AI Act are not mapped today — they are on the roadmap, and pretending otherwise in an audit conversation would be worse than honest positioning.

EU NIS2 · Art. 20

Management-body training and personal liability for cyber-risk decisions.

EU NIS2 · Art. 21(2)(d)

Supply-chain security including direct supplier and service-provider relationships.

TSA SD Pipeline-2021-02D

Pipeline owner/operator cyber-security implementation plan and tabletop exercises.

CISA / NSA / FBI · Jan 2026

Joint guidance on agentic AI in operational technology environments.

ENISA · 2026 threat landscape

AI-enabled intrusion sets targeting energy, water, and transport.

MITRE ATLAS v5.4.0

Adversarial tactics and techniques mapped to every AgentGuardian finding.

The point of mapping is not to print a compliance sticker — it is so that the same finding shows up, unchanged, in the SARIF the SOC imports, the PDF the board signs, and the evidence pack the NIS2 supervisor inspects.

Operator voice

What the test looks like from inside the operations team.

The notes below are how operators in essential-entity scope describe the shift to testing agents the same way they would test a human tier-1 SOC analyst. They are not customer quotes — they are the pattern AgentGuardian was built to fit.

Our operations assistant calls four MCP servers shipped by SCADA vendors. Two rotated tool descriptions inside a single quarterly patch window. We needed a way to test that rotation as an attacker would, not as a release-notes diff reviewer.

Tier-1 SOC analysts already get social-engineering training. When the tier-1 analyst is now an agent, the training is a 14-turn corpus run by a swarm against the agent, not a slide deck. That is the shift NIS2 Article 20 requires our board to attest to.

Stub mode mattered more than any other feature. We ran the first AgentGuardian scan on the air-gapped corporate segment that bridges to the OT DMZ. No LLM key, no telemetry, no phone-home. The signed bundle was the testimony, not the network capture.

Sample management attestation

Language for the NIS2 Article 20 record.

Article 20 places personal liability on the management body for cyber-risk measures. The text below is a template board chairs and chief executives in essential-entity scope can adapt against an AgentGuardian evidence pack. The hash, corpus version, and posture score are the inputs the supervisory authority will look for.

Corpus version 2026.05 cited
96 probes, 10 ASI categories
14 specialist attacker agents
Deterministic AIVSS posture (0–100)
SARIF 2.1.0 importable
Attack transcripts retained
SHA-256 evidence reference
Remediation timeline anchored

I, [Chief Executive], having received the AgentGuardian
evidence pack dated 2026-06-01 (corpus 2026.05, 96 probes,
14 specialists, AIVSS posture 72/100, 0 critical findings
unremediated) for the operations assistant and tier-1 SOC
agent in scope, attest under EU NIS2 Article 20 that the
management body has reviewed the cyber-risk testing
methodology, the residual risk register, and the
remediation timeline for the agentic systems supporting
[Essential Entity Name] operations.

Evidence reference: SHA-256 0x9f4b…c0d2  ·  SARIF 2.1.0
Pack location: s3://[bucket]/agentguardian/2026-Q2/

Risk reduction

Honest answers for the supervisory conversation.

Does AgentGuardian test the SCADA, DCS, or PLC layer directly?

No. AgentGuardian tests the agents that read documentation about that layer, triage alerts from that layer, and call MCP servers shipped alongside that layer. OT-network testing remains the domain of purpose-built ICS testing tools. AgentGuardian closes the IT-side agent surface that the Dec 2025 – Feb 2026 Monterrey intrusion exploited.

Can the scan run on an air-gapped corporate segment?

Yes. Stub mode requires no LLM key, no environment variables, and no network. Severity and tier weights are hash-stable, so the same agent state produces the same signed bundle on every run. Telemetry is disabled by default and there is no install-tracker phone-home.

How does this interact with TSA Pipeline Security Directives?

TSA SD Pipeline-2021-02D requires tabletop exercises and a documented cyber-security implementation plan. The AgentGuardian attack-transcript directory is reusable tabletop material because the probes, seeds, and AIVSS weights are deterministic — the same transcript reproduces under inspection, which is the property both NIS2 supervisory bodies and TSA inspectors care about.

Why is NIST AI RMF not mapped today?

The comparison matrix in the AgentGuardian documentation marks NIST AI RMF and EU AI Act as roadmap. Findings ship today with OWASP ASI 2026, MITRE ATLAS v5.4.0, and CSA Agentic-RT categories. Misrepresenting the standards mapping in a supervisory conversation is a worse outcome than narrow honest scope.

From operations assistant to
signed NIS2 evidence pack.

Book a scoping call with the AgentGuardian team to map your operations assistants, tier-1 SOC agents, and vendor MCP servers into a critical-infrastructure scan plan. Open-source download available immediately for security teams that want to start with a reproducible offline scan.

Book a Scoping Call Try Open Source

Critical infrastructure: agent security for OT-adjacent IT.

The agent is now the supply chain into OT.

Three families of attack that actually compromise OT-adjacent agents.

ReAct hijack for OT asset enumeration

MCP server tool-description rug-pull

Multi-turn social engineering of tier-1 SOC triage

Three steps from agent fingerprint to signed NIS2 evidence.

Recon and fingerprint the operations agent.

Dispatch the 14-specialist swarm against the 96-probe corpus.

Sign the NIS2-ready evidence bundle.

The frameworks AgentGuardian maps every finding into.

What the test looks like from inside the operations team.

Language for the NIS2 Article 20 record.

Honest answers for the supervisory conversation.

From operations assistant to signed NIS2 evidence pack.

From operations assistant to
signed NIS2 evidence pack.