A SaaS vendor has returned the usual 200-question security questionnaire, all the right boxes checked: yes to encryption, yes to MFA, yes to an annual penetration test, SOC 2 report attached. On paper, the vendor looks fine. But the risk analyst running the review has learned the hard way! From SolarWinds & MOVEit as well as a dozen breaches in between, that "yes to an annual pentest" is a thin assurance. A test conducted once a year, scoped narrowly, and filed away tells you almost nothing about whether the vendor is actually defensible the other 364 days.
So a sharper question is starting to appear in vendor assessments by TPRM teams and Cyber Insurance companies: "Do you continuously run autonomous penetration testing across your production environment, and can you show us recent results?" It is a deceptively simple question, and the answer reveals far more about a supplier's real security posture than any checkbox. This article looks at why that question is gaining traction, especially now that attackers have cheap, capable AI on their side, the technology category that makes it answerable, and how to fold it into your own due diligence.
Why the Annual Pentest Is No Longer Enough
The traditional penetration test is a point-in-time exercise: a team of consultants spends a week or two probing an environment, writes a report, and leaves. It is valuable, but it shares the fundamental weakness of the annual questionnaire — it captures a single snapshot of a system that changes every single day. New code ships, cloud resources spin up, credentials proliferate, a new known-exploited vulnerability is disclosed. By the time the PDF is delivered, the environment it describes no longer exists.
And the gap just got a lot more dangerous. The same AI tools showing up in everyone else's workflow are now in the hands of the people trying to break in. For a few dollars in API tokens, an attacker can turn an AI agent loose on a target by probing it, finding a weak spot, and chaining small weaknesses into a working attack path, tirelessly and at machine speed. This is not hypothetical: in late 2025 Anthropic disrupted what it called the first AI-orchestrated cyber-espionage campaign, in which the AI carried out an estimated 80–90% of the operation on its own. Researchers have shown the same thing on a shoestring and an AI agent exploited 87% of tested real-world vulnerabilities at roughly $25 each. The math is lopsided, and it favors the attacker: a few dollars of compute on their side can turn into millions in losses on yours. When the offense is that cheap and never sleeps, testing your own systems once a year stops being caution and becomes a blind spot.
This is precisely the gap that the industry has been trying to close with continuous monitoring. Gartner has formalized the broader idea as Continuous Threat Exposure Management (CTEM), which is a program of continuously discovering, validating, and prioritizing exposures rather than assessing them once a year. Autonomous penetration testing is one of the technologies that makes continuous validation practical, because it removes the human bottleneck that made frequent testing impossible.
Compliance Testing Tells You What's Wrong. Risk Testing Tells You What Matters.
Most security programs already run the traditional trio: SAST reads through source code for risky patterns, DAST pokes at a running web app from the outside, and vulnerability management keeps a running tally of known CVEs across the stack. These are useful, and plenty of frameworks require them, but lets be honest about what they produce: lists. A scanner hands you a spreadsheet of hundreds of "potential" issues, each stamped with a generic severity score that knows nothing about your environment. It's largely a theory-and-compliance exercise and here is everything that could conceivably be a problem, good luck sorting it out.
What that pile of findings almost never tells you is the part that actually matters: can someone string these together to reach something valuable? A "medium" flaw that lets an attacker hop from a forgotten web form to your customer database is a five-alarm fire. A "critical" CVE on a server that's walled off and connects to nothing may be a non-event. This is the difference between testing for vulnerabilities and testing for risk that leads to real impact and it's where continuous autonomous pentesting earns its keep. It doesn't replace SAST, DAST, and vulnerability management; it puts them to work. The scanners surface candidates; the testing safely proves which ones an attacker could actually exploit, what they'd reach, and what it would cost you while quietly setting aside the ones that lead nowhere.
What "Autonomous Penetration Testing" Actually Means
Autonomous penetration testing uses software to safely execute real attack techniques against an environment finding weaknesses, chaining them into attack paths, and proving what an actual adversary could reach without a human operator driving each step. Because it is software, it can run continuously and at scale rather than once a year.
This is a real and fast-growing market, not a single product. The most visible name associated with autonomous penetration testing is Horizon3.ai and its NodeZero platform, but adjacent approaches span automated security validation (offered by vendors such as Pentera), breach and attack simulation, or BAS (Cymulate and others), and the wider CTEM discipline described above. The common thread is replacing periodic, manual testing with continuous, automated validation.
NodeZero is a useful illustration of how the category works. According to the company, it is agentless and safe to run against live production systems, so an organization can launch a test in minutes and re-run it on demand to confirm a fix actually closed the hole across internal and external networks, cloud, and identity infrastructure. In June 2025 the company raised a $100 million Series D, a sign of how much investment is flowing into continuous validation as a category.
From "Trust Me" to "Let Me Show You"
Here is where continuous, automated testing gets genuinely interesting for vendor risk. Historically, TPRM has been an exercise in trust: the vendor asserts that it is secure, and you decide whether to believe the assertion. Continuous testing flips that into something verifiable. A supplier can run a test and share concrete evidence of what was found, what was exploitable, what was remediated, and proof via re-test that the fix worked.
Some vendors market this use case directly. Horizon3.ai, for instance, positions NodeZero for third-party risk management around moving suppliers "from trust to verification," and notes that results can double as audit evidence for frameworks like SOC 2 and ISO 27001, reducing, rather than adding to, the questionnaire burden suppliers already carry.
Perhaps the strongest independent validation of this model is governmental. Through the U.S. National Security Agency's Continuous Autonomous Penetration Testing (CAPT) program, defense industrial base suppliers have been encouraged to continuously and autonomously test their own environmentswhich is an initiative aimed squarely at hardening the supply chain that serves the defense sector. When a national security agency is pushing suppliers toward continuous self-testing, it is a strong signal of where third-party assurance is heading.
How to Add This to Your Vendor Assessments
You do not need to overhaul your program, or endorse any particular product, to start capturing this signal. A few well-placed, vendor-neutral questions, added to your existing due diligence, will tell you a great deal about a supplier's security maturity:
- Cadence: "How frequently do you conduct penetration testing & is it annually, quarterly, or continuously?" (Continuous or autonomous testing is a strong positive indicator.)
- Coverage: "Does your testing cover your full production environment, including cloud and identity infrastructure, or only a narrow scope?"
- Validation of fixes: "When you remediate a finding, do you re-test to verify the fix actually closed the attack path?"
- Evidence: "Can you share a recent summary of findings and remediation, or attest to results from a continuous testing program?"
- KEV responsiveness: "When a vulnerability is added to the CISA KEV catalog, how quickly can you confirm whether it is exploitable in your environment?"
A vendor that can answer these crisply is ideally pointing to a continuous, automated testing program and is demonstrating a category of security maturity that no static questionnaire can capture. A vendor that cannot is telling you where to focus your attention. Notice that none of these questions name a product; they probe a capability, and let the supplier describe how they meet it.
| Traditional Vendor Assurance | Continuous Autonomous Testing |
|---|---|
| Annual, point-in-time snapshot | Continuous, reflects the environment as it changes |
| Self-attested ("trust me") | Evidence-based ("here is proof") |
| Narrow, pre-agreed scope | Broad coverage across network, cloud, and identity |
| Lists vulnerabilities for compliance | Proves real attack paths and business impact |
| Fixes assumed effective | Fixes verified by re-test |
| Slow to react to new threats | Rapid validation against newly exploited vulnerabilities |
The Bottom Line
Third-party risk management has spent two decades trying to bridge the gap between what a vendor says about its security and what is actually true. Now that attackers can rent AI to hunt for weaknesses around the clock for pocket change, that gap is more expensive than ever to leave open. Continuous autonomous penetration testing is one of the most promising bridges to appear yet, because it replaces assertion with evidence, snapshots with continuous proof, and compliance checklists with real, risk-based testing. Asking your suppliers whether they have adopted it — and rewarding the ones that have — is a low-effort, high-signal upgrade to any vendor due diligence process, no matter which tools they or you happen to use.
Put This Question in Your Vendor Assessments
Fair TPRM is a free, open-source platform for vendor risk management, GRC compliance, and FAIR risk quantification — a practical place to add continuous-validation questions to your vendor due diligence.
Free Demo Learn More About Autonomous PentestingSources & Further Reading
Fair TPRM is independent of and uncompensated by every company named below.
- Disrupting the First Reported AI-Orchestrated Cyber Espionage Campaign - Anthropic, November 2025
- LLM Agents can Autonomously Exploit One-day Vulnerabilities - Fang et al., arXiv, 2024
- How to Manage Cybersecurity Threats, Not Episodes (Continuous Threat Exposure Management) - Gartner
- Horizon3.ai Raises $100M to Cement Leadership in Autonomous Security - Business Wire, June 2025
- The NodeZero Platform - Horizon3.ai
- NodeZero for Third-Party Risk Management - Horizon3.ai
- NSA Continuous Autonomous Penetration Testing (CAPT) Program for DIB Suppliers - Horizon3.ai