In its FutureScape 2026 Worldwide AI and Generative AI Predictions, IDC forecasts that by 2030, up to twenty percent of Global 1000 organisations will face lawsuits, fines, and CIO dismissals tied to high-profile disruptions caused by poor AI agent governance. The same report projects that by 2027, agent usage inside the Global 2000 will rise tenfold, while the token and API-call load on those agents climbs by a factor of one thousand (IDC, 2025). The build-vs-buy question for adversarial agent testing — once a curiosity for AI labs — is no longer academic. The CISO who has not picked a path by the end of this fiscal year is choosing the default, which is no path at all.
Three options exist. Stand up an in-house red team. Adopt an open-source toolchain — PyRIT, Garak, Promptfoo, AgentDojo, and the small constellation around them. License a commercial agent-security platform. Each option has a defensible use case. None of them, in isolation, holds up against the volume and velocity of attack research that has emerged in the last eighteen months. Most mature programmes end up running elements of all three. This post lays out where each path is the right call, where each one quietly fails, and how to sequence them across a ninety-day window if your organisation is starting from zero.
Why the question is suddenly urgent
Gartner placed AI Trust, Risk and Security Management (AI TRiSM) alongside AI agents and AI-ready data at the top of the August 2025 Hype Cycle for Artificial Intelligence, identifying it as a peak-of-inflated-expectations technology that addresses governance, trustworthiness, safety, reliability, security and privacy across enterprise AI. Gartner's argument is direct: “AI brings new trust, risk and security management challenges that conventional controls don't address.” (Gartner, August 2025). Generative AI itself has slid into the Trough of Disillusionment in the same cycle — not because the technology failed, but because enterprises have realised that value depends on readiness, trust and integration rather than the model itself.
That governance pressure now has case law behind it. In Moffatt v. Air Canada, the British Columbia Civil Resolution Tribunal held the airline liable for negligent misrepresentation after its chatbot promised a retroactive bereavement-fare refund the policy did not allow. The tribunal explicitly rejected the argument that the chatbot was a “separate legal entity,” ruling that Air Canada was responsible for any information appearing on its commercial website (ABA, February 2024). The award was small. The principle was not. Eighteen months later, Aim Security disclosed EchoLeak (CVE-2025-32711), the first publicly documented zero-click prompt-injection exploit against a production LLM system — Microsoft 365 Copilot — in which a single inbound email silently exfiltrated confidential data without user interaction. Microsoft shipped emergency patches. Indirect prompt injection had moved from academic concern to weaponised supply-chain attack.
Boards no longer ask whether the organisation has an AI policy. They ask whether someone has actually tried to break the agents, and where the evidence is.
What Microsoft's “100 generative AI products” paper actually says
The most influential single document in the agent red-team field over the last year is the Microsoft AI Red Team's January 2025 paper, Lessons From Red Teaming 100 Generative AI Products (arXiv:2501.07238). It distils what one of the most active AIRTs in the industry learned across more than a hundred adversarial assessments. Eight lessons are stated explicitly. Three of them reshape the build-vs-buy decision (Microsoft, 2025).
First: “AI red teaming is not safety benchmarking.” A leaderboard score on a public eval is not a risk assessment. The benchmarks measure the model's behaviour on a static distribution. Red teaming measures whether a determined adversary can make the deployed system act outside its policy envelope. The two activities use the same word and produce different artefacts.
Second: “You don't have to compute gradients to break an AI system.” The popular image of an AI attack involves white-box gradient methods like Zou et al.'s GCG attack, which generates universal adversarial suffixes that transfer to closed models (arXiv:2307.15043). In practice, the breaks that hurt production agents are plain English — a malicious document, a poisoned tool description, an instruction hidden in a calendar invite. The skill set required to run an effective red team looks much more like a senior threat-modeller's than a machine-learning researcher's.
Third — and this is the one procurement committees misread — “the human element is crucial.” Microsoft is explicit that automation extends the reach of a red team but does not replace it. PyRIT, the open framework Microsoft itself released, is positioned as augmentation, not substitution. The paper's final lesson is sober: “the work of securing AI systems will never be complete.” Any vendor that promises a one-time scan and a clean bill of health is selling something other than security.
The open-source option: PyRIT, Garak, Promptfoo, AgentDojo
The open-source tier of the agent red-team market matured fast. Four projects matter; each occupies a different niche, and each is rigorous about what it does and does not cover.
PyRIT (Python Risk Identification Toolkit), released by Microsoft in February 2024, is the framework Microsoft itself used across the hundred assessments referenced above. It is adaptive — the attack strategy shifts in response to the target's replies — supports thousands of malicious prompts per harm category, and ships a scoring engine. Microsoft explicitly frames PyRIT as “an automation framework to red team generative AI systems,” not as a managed service (Microsoft Security Blog).
Garak, originally written by Prof. Leon Derczynski at ITU Copenhagen in spring 2023 and transferred to NVIDIA in November 2024, is the closest thing the field has to nmap or Metasploit for LLMs. It runs a battery of probes, detectors flag hits, and the result is a JSONL report with per-probe pass and fail rates. A 2024 Fujitsu Research review independently ranked it the leading LLM vulnerability scanner of its cohort (garak.ai).
Promptfoo is the operational layer most engineering teams reach for. MIT licensed, used by more than three hundred thousand developers and a hundred and twenty-seven of the Fortune 500, it tests for prompt injection, jailbreaks and over fifty vulnerability categories aligned to the OWASP Top 10 for LLM Applications and the NIST AI RMF. Its declarative configs slot into CI pipelines without ceremony (promptfoo.dev).
AgentDojo (Debenedetti, Zhang et al., ETH Zurich SPY Lab, NeurIPS 2024) is the closest the research community has to a reproducible methodology for measuring LLM-agent security. Its dynamic environment ships seventy tools, ninety-seven realistic user tasks, and twenty-seven injection targets simulating email, banking and communication platforms. Its findings — current LLMs solve under sixty-six percent of benign tasks, more capable models are often easier to attack, simple tool isolation is the most effective single defence — have been used by the US and UK AI Safety Institutes to evaluate frontier models (arXiv:2406.13352).
What none of these tools does, on its own, is produce a regulator-grade evidence pack, run continuously against production endpoints, normalise findings into a single posture score, or map results consistently across OWASP LLM Top 10, MITRE ATLAS v5.1.0 (now sixteen tactics, eighty-four techniques, fifty-six sub-techniques, thirty-two mitigations) and the CSA Agentic AI Red Teaming Guide's twelve threat categories. That is not a criticism — it is a category boundary. An OSS scanner is a measurement instrument. An assurance programme is a process.
The commercial option: when continuous coverage and evidence beat flexibility
A commercial platform is the right choice when three conditions are true at the same time: continuous coverage matters more than the ability to fork the corpus; an audit-grade evidence pack must exist on demand; and the operational headcount required to run open-source tooling continuously is either unavailable or not the best use of senior security engineers. The Forrester Wave for Privacy Management Software, Q4 2025, captures the structural shift — privacy platforms must now bring together privacy, data governance and AI governance in a unified way (Forrester, Q4 2025). The same logic applies on the offensive side. The buyer is no longer purchasing a scanner; they are purchasing a continuous control with a contract behind it.
The trade-off is the one every closed product carries. The probe corpus is opaque. The scoring formula is opaque. The roadmap is the vendor's, not yours. When the corpus needs to be extended to cover a non-public internal tool — the dispatch grid, the benefits engine, the trading risk system — the extension is a feature request, not a pull request. The mitigation is structural: insist that the vendor publishes its framework mapping (OWASP, ATLAS, NIST AI RMF, ISO/IEC 42001), publishes a per-finding score formula that can be recomputed offline, and supports a documented offline export of the evidence pack.
The hybrid pattern most mature programmes converge on
The pattern that holds up across the largest deployed agent programmes — the ones running tens of thousands of agentic calls per day under regulated workloads — is layered. Open-source tools sit in the inner loop, on engineer laptops and in pull-request CI. A commercial or managed platform sits in the outer loop, scanning production agents on a continuous schedule and producing the artefacts auditors and regulators consume. An in-house team — often small, sometimes a single principal-level security engineer with rotating contributors — owns the custom probes for internal tools and the judgement calls the automation cannot make. That last point matters; the CSA Agentic AI Red Teaming Guide is explicit that automated probes alone do not cover threat categories like agent untraceability or multi-agent exploitation (CSA, May 2025).
This is the same shape the SAST/DAST market settled into over a decade ago. Developers run open scanners locally; a managed platform runs continuous DAST against the production estate; a small AppSec team writes custom rules for the application-specific cases. Agent security is following the curve a generation later and at compressed speed.
A decision matrix that fits on one page
Four variables determine which mix is right for a given organisation. None of them is the size of the security budget.
The most common error in the matrix is assuming the rows are independent. They are not. Regulatory pressure on a single low-tier agent rarely justifies a build path; a sprawling production estate without continuous coverage will not satisfy a supervisor regardless of how many open-source tools the engineering team runs in CI.
A ninety-day path for a CISO without an AI red team today
For a security leader inheriting agent risk in mid-2026 with no existing red-team capability, the following sequence has held up across multiple organisations. It assumes a small, motivated security team and at least one agent in production.
- —Days 1-15. Inventory agents and classify by tier. Distinguish prompt-only assistants from tool-using agents from memory-bearing multi-step agents. Most organisations discover two to three times as many agents as the CMDB suggests.
- —Days 15-30. Stand up an open-source baseline. Promptfoo in CI, Garak against the top three by tier, AgentDojo against any agent with tool access. Output: a first AIVSS-style posture score and an OWASP LLM Top 10 coverage map.
- —Days 30-60. Run a targeted external assessment against the highest-tier agent. Procure a short, scoped engagement from a credible AI red-team specialist or commercial platform. The goal is calibration, not coverage.
- —Days 60-90. Commit to a continuous-coverage decision. Either license a managed platform for the outer loop and keep OSS in the inner loop, or formally fund an in-house red-team function and budget for corpus maintenance, judge re-grounding and harness SRE.
- —Day 90 onward. Standing reporting line to the board. A single page: agent count by tier, AIVSS trend, top five open findings, framework mapping coverage, last evidence-pack signature date.
Three traps to avoid along the way, each one drawn from a real incident. First, do not deploy a learning agent into a public channel without adversarial controls; the Microsoft Tay incident of March 2016 and the Chevrolet of Watsonville chatbot of December 2023 are the canonical reminders of how fast that fails (AI Incident Database). Second, do not assume that retrieval-augmented generation solves prompt injection; OWASP's 2025 Top 10 for LLM Applications is explicit that “techniques marketed as safety features such as Retrieval Augmented Generation (RAG) and fine-tuning do not actually solve the core vulnerability of prompt injection.” (OWASP, 2025). Third, do not allow employees to paste source code or strategy material into consumer chatbots without an enterprise control plane; the Samsung Semiconductor disclosures of March 2023, in which three separate leaks occurred within twenty days of permitting access, set the cost of inaction for everyone who came after.
Practical takeaway
- —Build-vs-buy is the wrong binary; the question is which layer you own and which layer you rent.
- —Open source — PyRIT, Garak, Promptfoo, AgentDojo — is the right inner-loop default for any organisation shipping agents.
- —A commercial or managed platform is justified when continuous coverage, named attestation, or audit-grade evidence is required, not when the only goal is scanning.
- —In-house build is justified for frontier labs, sovereign workloads, and platforms with non-public tools, provided three full-time roles — corpus, judge, harness — are funded continuously.
- —Whatever the mix, the artefact a regulator will ask for in 2027 is a signed, reproducible evidence pack mapped to OWASP, ATLAS and NIST AI RMF. Plan backward from that.
To operationalise the inner loop, the open-source AgentGuardian package on PyPI runs a multi-specialist adversarial swarm against any LangGraph, CrewAI, OpenAI Agents SDK, AutoGen, ADK or HTTP/MCP endpoint, dispatching a probe corpus sharded across the OWASP and ATLAS categories and emitting a deterministic AIVSS posture score — useful as the local equivalent of running Garak plus Promptfoo plus AgentDojo under one harness. To operationalise the outer loop — continuous coverage against production agents, signed evidence packs, and the regulator-facing reports the matrix above ends on — see the open-source repository for the engineering inner loop and the managed platform for the audit-grade outer loop. Either is a more honest starting point than another quarter spent debating the build-vs-buy framing itself.