Critical infrastructure: agent security for OT-adjacent IT.
Operations assistants and tier-1 SOC agents are now the supply chain into OT. Test them like one.
AgentGuardian runs a 14-specialist adversarial swarm against the operations assistants that bridge IT helpdesk to OT documentation, and against the tier-1 SOC triage agents standing in front of ICS operators. Probes are deterministic, the scan runs offline, and the evidence pack maps to OWASP ASI 2026, MITRE ATLAS v5.4.0, CSA Agentic-RT, and the NIS2 supervisory record.
NIS2 ART. 20 · TSA SD PIPELINE · CISA AGENTIC AI JAN 2026 · OWASP ASI01–10 · MITRE ATLAS V5.4 · CSA AGENTIC RT
The agent is now the supply chain into OT.
Electricity, water, gas, rail, and pipeline operators are not deploying agentic AI inside the OT network. They are deploying it on the corporate side — operations assistants that summarise vendor manuals, tier-1 SOC agents that triage ICS alerts, change-management copilots that read engineering tickets. Every one of those agents reaches across the IT/OT boundary by reading tool descriptions, retrieving SCADA documentation, or correlating historian extracts. When the agent is coerced, the OT network is the blast radius.
Internet-reachable MCP server instances with no authenticated transport (OX Security disclosure, May 2026).
Monterrey water-utility intrusion: LLM-assisted analysis of SCADA vendor documentation generated device-specific brute-force lists.
CISA, NSA, FBI, and international partners published guidance on agentic AI in operational technology environments.
Management bodies of essential entities carry personal liability for the adequacy of cyber-risk measures, including agent testing.
Three families of attack that actually compromise OT-adjacent agents.
The 96-probe corpus shards across all ten OWASP ASI 2026 categories, but for critical-infrastructure operations agents three families produce most of the real findings. Each is a concrete probe shipped in the OSS corpus today, not a roadmap item.
ReAct hijack for OT asset enumeration
An operations assistant with a ServiceNow CMDB tool is steered, via an embedded indirect-prompt-injection chain, to enumerate every HMI, historian, and engineering workstation reachable from corporate. The probe varies argument shape, recursion depth, and scope-token replay across tier T1 and T2 targets.
CISA, NSA, and partners flagged in January 2026 joint guidance that the ReAct loop is the most reliable foothold into OT-adjacent IT — the agent's tool call is treated as authenticated, internal, and audited as the agent identity rather than the attacker.
MCP server tool-description rug-pull
A signed MCP server the agent already trusts swaps its tool descriptions mid-session: an innocuous get_pump_pressure surface gains a hidden side-effect appendix. AgentGuardian replays the recon fingerprint, signs the rotation, and checks whether the operator's policy filter detects the description hash change before the next tool call.
OX Security disclosed approximately 200,000 MCP instances on the public internet in May 2026 with no authenticated transport. Vendor MCPs delivered with SCADA gateways are now an in-scope supply-chain channel under NIS2 Article 21(2)(d).
Multi-turn social engineering of tier-1 SOC triage
A 14-turn conversational plan masquerading as an upstream incident-response handoff coerces a tier-1 SOC agent to (1) silence an alarm correlation rule, (2) issue a maintenance-window token to an unverified caller, and (3) email a sanitized version of an ICS asset map. Tested with the OWASP-LLM specialists enabled via --include-m2-agents.
The Dec 2025–Feb 2026 Monterrey water-utility intrusion used an LLM-assisted vendor-impersonation chain to extract SCADA documentation and generate device-specific brute-force lists. Multi-turn coercion is now the dominant initial-access vector in CISA's quarterly OT incident summary.
Three steps from agent fingerprint to signed NIS2 evidence.
The full scan runs locally with no LLM key, no telemetry, and no phone-home — the only configuration an air-gapped corporate segment will tolerate. Findings are tagged across three standards and rolled into a deterministic 0–100 AIVSS posture score the management body can sign against.
Recon and fingerprint the operations agent.
Point AgentGuardian at the operations assistant or tier-1 SOC agent (HTTP, LangGraph, CrewAI, OpenAI Agents SDK, AutoGen, Google ADK, AWS Strands, or any MCP server). The coordinator runs a passive fingerprint to extract framework, tool inventory, memory class, and tier. Stub mode runs fully offline — no LLM key, no network — so the scan executes on the air-gapped corporate segment that touches the OT DMZ.
- —Agent framework and adapter detected
- —Tool inventory mapped to ASI02 surface
- —Memory class (vector, episodic, long-horizon) classified
- —Tier assigned (T1 tools+memory+PII through T4 prompt-only)
Dispatch the 14-specialist swarm against the 96-probe corpus.
Ten OWASP ASI 2026 specialists plus four OWASP-LLM specialists run concurrently under one asyncio TaskGroup against the targeted operations agent. For critical infrastructure the relevant probe shards are ASI02 (tool misuse, 8 probes), ASI04 (supply-chain MCP and plugin abuse, 8 probes), ASI06 (memory poisoning including 5 HITL-bypass at T1/T2, 13 probes), ASI07 (A2A confused deputy across SOC tiers, 8 probes), and ASI09 (multi-turn coercion, output-reflection, fabricated citations, 17 probes). Five deterministic mutators — oversize, control-chars, truncate, type-confusion, encoding — vary the payload shape so the same probe family exercises libFuzzer-style coverage.
- —Each finding tagged with OWASP ASI 2026 category
- —Each finding tagged with MITRE ATLAS v5.4.0 technique ID
- —Each finding tagged with CSA Agentic-RT category
- —Deterministic AIVSS score computed locally (0–100)
Sign the NIS2-ready evidence bundle.
AgentGuardian writes report.html, report.pdf, report.sarif (SARIF 2.1.0), report.json, and an evidence/ directory containing attack transcripts, probe seed hashes, and corpus version 2026.05. The AIVSS formula is published and deterministic, so the signed bundle reproduces byte-for-byte off the same agent state — the property NIS2 supervisory bodies and TSA Pipeline Security inspectors require to treat a private red-team result as testimony rather than marketing.
- —SARIF 2.1.0 importable into the SOC SIEM
- —PDF + HTML reports for management-personal-liability sign-off
- —Attack transcripts for incident-response tabletop reuse
- —Hash-stable findings for external counter-signature
The frameworks AgentGuardian maps every finding into.
Each finding in the signed evidence pack carries an OWASP ASI 2026 category, a MITRE ATLAS v5.4.0 technique ID, and a CSA Agentic-RT category. NIST AI RMF and EU AI Act are not mapped today — they are on the roadmap, and pretending otherwise in an audit conversation would be worse than honest positioning.
Management-body training and personal liability for cyber-risk decisions.
Supply-chain security including direct supplier and service-provider relationships.
Pipeline owner/operator cyber-security implementation plan and tabletop exercises.
Joint guidance on agentic AI in operational technology environments.
AI-enabled intrusion sets targeting energy, water, and transport.
Adversarial tactics and techniques mapped to every AgentGuardian finding.
The point of mapping is not to print a compliance sticker — it is so that the same finding shows up, unchanged, in the SARIF the SOC imports, the PDF the board signs, and the evidence pack the NIS2 supervisor inspects.
What the test looks like from inside the operations team.
The notes below are how operators in essential-entity scope describe the shift to testing agents the same way they would test a human tier-1 SOC analyst. They are not customer quotes — they are the pattern AgentGuardian was built to fit.
Our operations assistant calls four MCP servers shipped by SCADA vendors. Two rotated tool descriptions inside a single quarterly patch window. We needed a way to test that rotation as an attacker would, not as a release-notes diff reviewer.
Tier-1 SOC analysts already get social-engineering training. When the tier-1 analyst is now an agent, the training is a 14-turn corpus run by a swarm against the agent, not a slide deck. That is the shift NIS2 Article 20 requires our board to attest to.
Stub mode mattered more than any other feature. We ran the first AgentGuardian scan on the air-gapped corporate segment that bridges to the OT DMZ. No LLM key, no telemetry, no phone-home. The signed bundle was the testimony, not the network capture.
Language for the NIS2 Article 20 record.
Article 20 places personal liability on the management body for cyber-risk measures. The text below is a template board chairs and chief executives in essential-entity scope can adapt against an AgentGuardian evidence pack. The hash, corpus version, and posture score are the inputs the supervisory authority will look for.
- Corpus version 2026.05 cited
- 96 probes, 10 ASI categories
- 14 specialist attacker agents
- Deterministic AIVSS posture (0–100)
- SARIF 2.1.0 importable
- Attack transcripts retained
- SHA-256 evidence reference
- Remediation timeline anchored
I, [Chief Executive], having received the AgentGuardian evidence pack dated 2026-06-01 (corpus 2026.05, 96 probes, 14 specialists, AIVSS posture 72/100, 0 critical findings unremediated) for the operations assistant and tier-1 SOC agent in scope, attest under EU NIS2 Article 20 that the management body has reviewed the cyber-risk testing methodology, the residual risk register, and the remediation timeline for the agentic systems supporting [Essential Entity Name] operations. Evidence reference: SHA-256 0x9f4b…c0d2 · SARIF 2.1.0 Pack location: s3://[bucket]/agentguardian/2026-Q2/
Honest answers for the supervisory conversation.
Does AgentGuardian test the SCADA, DCS, or PLC layer directly?
No. AgentGuardian tests the agents that read documentation about that layer, triage alerts from that layer, and call MCP servers shipped alongside that layer. OT-network testing remains the domain of purpose-built ICS testing tools. AgentGuardian closes the IT-side agent surface that the Dec 2025 – Feb 2026 Monterrey intrusion exploited.
Can the scan run on an air-gapped corporate segment?
Yes. Stub mode requires no LLM key, no environment variables, and no network. Severity and tier weights are hash-stable, so the same agent state produces the same signed bundle on every run. Telemetry is disabled by default and there is no install-tracker phone-home.
How does this interact with TSA Pipeline Security Directives?
TSA SD Pipeline-2021-02D requires tabletop exercises and a documented cyber-security implementation plan. The AgentGuardian attack-transcript directory is reusable tabletop material because the probes, seeds, and AIVSS weights are deterministic — the same transcript reproduces under inspection, which is the property both NIS2 supervisory bodies and TSA inspectors care about.
Why is NIST AI RMF not mapped today?
The comparison matrix in the AgentGuardian documentation marks NIST AI RMF and EU AI Act as roadmap. Findings ship today with OWASP ASI 2026, MITRE ATLAS v5.4.0, and CSA Agentic-RT categories. Misrepresenting the standards mapping in a supervisory conversation is a worse outcome than narrow honest scope.
From operations assistant to
signed NIS2 evidence pack.
Book a scoping call with the AgentGuardian team to map your operations assistants, tier-1 SOC agents, and vendor MCP servers into a critical-infrastructure scan plan. Open-source download available immediately for security teams that want to start with a reproducible offline scan.