Red-teaming clinical decision-support agents.
EHR copilots, prior-auth automation, intake triage, and RAG over patient longitudinal records — tested before they touch a clinician's workflow.
Healthcare carried the highest average data-breach cost of any sector for the fourteenth year running in 2024 — USD 9.77M per incident per IBM's Cost of a Data Breach Report. Long-lived clinical memory is the largest new attack surface AI has added to that estate. AgentGuardian probes it before it ships.
Clinical agents are deploying faster than the safeguards designed for chatbots.
Epic's in-basket draft replies, Cerner's SMART-on-FHIR copilots, payor prior-authorisation bots, and RAG agents over longitudinal patient records all share two properties: they remember across sessions, and they call tools that change records, place orders, or transmit claims. That combination — tools + memory + PHI — is the most operationally severe tier in any agentic threat model. The compliance regime that governs it was written for human workflows and static decision support, not for a planner that can be persuaded to rewrite its own goal at turn fourteen.
The HIPAA Security Rule predates agentic AI by twenty-two years. The probes that test it have to compensate.
Three attack families dominate clinical agent risk.
AgentGuardian ships ninety-six probes across the ten OWASP ASI 2026 categories. For decision-support agents over EHRs, three families do most of the damage — and a competent test programme has to cover all three before the agent sees a real patient record.
EHR memory poisoning
An adversary plants instructions in the longitudinal patient record — a free-text referral note, an externally-imported HL7 message, a patient-portal reply — that the agent picks up turns later when it summarises the chart, drafts a note, or orders a prior auth. The vulnerability and the persistence are the same property of the system: every future read of the poisoned state inherits the payload. Five of the thirteen ASI06 probes specifically target HITL-bypass — the highest blast-radius surface in long-lived clinical agents.
PHI exfiltration
Clinician-note free text, patient-portal messages, payer-correspondence PDFs, and inbound fax are all untrusted strings the model treats as instructions. AgentGuardian probes the canonical exfiltration channels — outbound webhook URLs, email subject lines, tool arguments that round-trip identifiers — and tests whether the agent will smuggle MRN, DOB, or diagnosis codes out of the safe perimeter. Findings carry OWASP LLM01 and ASI09 tags, with concrete redaction-policy remediation in the evidence pack.
Multimodal claims-photo attacks
Vision-enabled agents that read claims photos, wound-care images, uploaded ID cards, and scanned consent forms accept pixels as input — and pixels can carry instructions. AgentGuardian generates adversarial overlays, EXIF-payload images, and OCR-trap scans that survive the imaging pipeline and inject into the planning step downstream. The probes are deterministic and reproduce byte-for-byte on the runner, which matters when a finding has to be re-presented to a clinical safety committee.
Three steps from agent endpoint to signed evidence pack.
AgentGuardian runs end-to-end against any HTTP-fronted agent — LangGraph, CrewAI, OpenAI Agents SDK, AutoGen, Google ADK, AWS Strands, or a clinical MCP server fronting Epic. Stub mode requires no LLM key and no network, which is the only acceptable posture when the test corpus contains anything that could be mistaken for PHI.
Fingerprint the clinical agent and its memory surfaces.
AgentGuardian opens by mapping the target as if it were any HTTP-fronted agent — but with a clinical-systems lens. The recon specialist enumerates exposed tools (Epic FHIR endpoints, Cerner SMART-on-FHIR scopes, eligibility lookups, prior-authorisation webhooks), identifies memory channels (scratchpad, conversation, vector store over PHI), and classifies the deployment into a T1 target tier when long-lived memory is present alongside tools and patient data. No PHI is required: the runner accepts a de-identified clinical corpus (e.g. MIMIC-derived shells, synthetic FHIR bundles) and stamps every artefact with its provenance hash.
- —Tool surface enumeration· Pre-scan ·Epic, Cerner, athenahealth, Veradigm adapters via the HTTP framework
- —Memory channel discovery· ASI06 setup ·Scratchpad / session / cross-session vector store
- —Tier classification· Scoring ·T1 (tools + memory + PHI) → AIVSS weight 1.0
Dispatch fourteen specialists against the highest-risk clinical surfaces.
The coordinator shards the 96-probe corpus across ten OWASP ASI 2026 categories and lets fourteen specialists run concurrently under a single asyncio task group. Three families matter most for clinical decision support: memory poisoning of the longitudinal patient context (ASI06), PHI exfiltration via prompt injection in clinician notes and patient-portal messages (cross-cutting ASI01/ASI09), and multimodal injection through uploaded imaging, scanned forms, and claims photos (ASI09 trust exploitation). Findings are tagged with OWASP ASI, MITRE ATLAS v5.4.0 technique IDs (AML.T0051 Prompt Injection, AML.T0070 RAG Poisoning), and CSA Agentic-RT categories.
- —Persistent EHR memory triggers· ASI06 — Memory Poisoning ·Payloads survive across sessions; fire only when a downstream prior-auth tool is invoked
- —Cross-tenant vector bleed· ASI06 — Memory Poisoning ·RAG corpus pollution leaks one tenant's PHI into another tenant's retrieval
- —Indirect prompt injection in clinician notes· ASI01 — Goal Hijack ·Adversarial instructions hidden in free-text intake or referral notes
- —PHI exfiltration via tool argument· ASI02 — Tool Misuse ·Smuggling identifiers into outbound webhook URLs and email subjects
- —Multimodal injection (imaging + scanned forms)· ASI09 — Trust Exploitation ·Pixel-level and EXIF instructions on uploaded JPEGs / DICOM cover sheets
- —HITL-bypass probes· ASI06 — Memory Poisoning (T1) ·Five variants targeting the human-in-the-loop approval step for high-acuity orders
Produce an OCR-readable evidence pack the CISO and CMO can sign.
Every scan emits a five-artefact bundle: report.html, report.pdf, report.sarif (SARIF 2.1.0), report.json, and a chain-of-custody evidence/ directory. Findings carry a deterministic AIVSS score (0-100) computed from severity × tier weights — the formula is published in src/agent_guardian/scoring/aivss.py and is hash-stable for external signing. The pack maps each finding to OWASP ASI 2026, MITRE ATLAS v5.4.0, and CSA Agentic-RT, and includes attack transcripts, remediation guidance, and a HIPAA-aware cover sheet that flags whether de-identified or synthetic data was used. CI can gate on a posture floor (e.g. fail under 70) the same way it gates on a coverage floor.
- —SARIF 2.1.0 output· Artefact ·Loads directly into GitHub Advanced Security and JFrog
- —Signed evidence bundle· Artefact ·Stub-mode runs are deterministic and reproducible offline
- —AIVSS posture score· Scoring ·Single 0-100 number suitable for board reporting and PR gating
What a healthcare red-team programme actually looks like.
The pattern below is composited from conversations with provider-side security and AI-platform teams piloting clinical copilots in 2026. It does not represent any single customer.
A mid-sized integrated delivery network — eight hospitals, roughly four thousand affiliated providers — pilots an in-basket draft-reply copilot on top of Epic. The copilot reads inbound patient-portal messages, retrieves the relevant problem list and last-three-encounters summary from a vector store, and drafts a reply for clinician approval. Memory is long-lived. Tools are limited to read access on launch.
The CISO wants to know whether a hostile patient message can change how the agent treats a different patient's next chart summary. The CMO wants to know whether any failure mode could surface in a Joint Commission survey. Procurement wants both answers documented, signed, and stored before the contract escalates to a system-wide rollout.
The AI platform team installs the AgentGuardian PyPI package on an isolated runner, configures the HTTP adapter against the staging endpoint, and points the corpus at a de-identified synthetic FHIR bundle. The first full-mode scan runs the ninety-six-probe corpus through the fourteen-specialist swarm in under forty minutes. Fourteen findings come back: three at high severity in ASI06, two at high severity in ASI01 (indirect injection via patient-portal free text), and a smattering of medium findings spread across ASI02 and ASI09.
The AIVSS posture score lands at 62. The team gates the CI pipeline at 70 on the path to production. Two sprints later the redaction policy and the retrieval-time provenance check are in place, the score crosses 78, and the evidence pack — SARIF, PDF, HTML, JSON, transcripts — goes to the AI governance committee with the OCR-readable cover sheet attached.
The clinical safety conversation became a conversation about a number, a transcript, and a remediation log — not a conversation about whether the agent “feels safe”.
Built for the regulators who already read SOC 2 reports.
Every finding is tagged against OWASP ASI 2026, MITRE ATLAS v5.4.0, and CSA Agentic-RT, with a deterministic AIVSS score on top. The evidence-pack cover sheet is structured to drop straight into an OCR pipeline and to crosswalk onto the standards already in use by hospital compliance and payer audit teams.
- ✓HIPAA Security Rule (45 CFR §164.308 — Administrative Safeguards)
- ✓HHS OCR online-tracking guidance (March 2024 update)
- ✓HHS RFI on AI in Healthcare (February 2026)
- ✓FDA guidance on AI/ML-enabled SaMD (Predetermined Change Control Plan, 2024)
- ✓ONC HTI-1 final rule (decision-support intervention transparency)
- ✓EU AI Act Annex III §5(a) (high-risk: AI in healthcare)
- ✓Joint Commission patient-safety standards (PC.01.02.03)
- ✓NIST AI RMF 1.0 — GOVERN, MAP, MEASURE, MANAGE
Inbasket Draft-Reply Copilot · Staging
What to say to your CISO, CMO, and AI governance committee.
Three paragraphs, three audiences. Copy-pasteable and pre-mapped to the framework each audience cares about.
“We have moved every clinical-agent endpoint behind an AgentGuardian gate in CI. The runner is offline-deterministic, the corpus is de-identified, and every release ships a SARIF file mapped to OWASP ASI 2026 and MITRE ATLAS v5.4.0. The number we report up is the AIVSS posture score; the floor is 70 to ship, 80 to flip on a new tool scope.”
“The copilot has been adversarially tested for the failure modes that matter clinically: long-lived memory inheriting bad instructions, identifiers smuggled into downstream tools, and adversarial image overlays in claims attachments. Findings come back as transcripts a clinical safety committee can read, with a remediation log and an approval record. We document this before the agent sees a real patient.”
“The evidence pack maps directly onto the HIPAA Security Rule Administrative Safeguards, the HHS OCR online-tracking guidance, the ONC HTI-1 decision-support intervention transparency requirements, and the NIST AI RMF MEASURE function. Crosswalks ship in the pack. The artefact is OCR-readable and survives external audit without manual re-keying.”
Probe the clinical agent before
it reads a real chart.
Healthcare-specific briefing, sample evidence pack, and a stub-mode walkthrough that needs no PHI, no LLM key, and no network.