Microsoft Unveils MDASH, a Multi-Model Agentic Security Harness That Tops the CyberGym Leaderboard and Finds 16 Windows Bugs
Microsoft's new Autonomous Code Security team disclosed MDASH alongside its May 2026 Patch Tuesday, crediting the multi-model agentic scanner with 16 Windows vulnerabilities — four of them critical RCEs — and an 88.45 percent score on CyberGym.
Overview
Microsoft used its May 2026 Patch Tuesday cycle to publicly unveil MDASH, a multi-model agentic scanning harness built by its new Autonomous Code Security team. According to the Microsoft Security Blog, MDASH discovered 16 previously unknown Windows vulnerabilities — including four critical remote code execution flaws — and posted the leading score on the public CyberGym benchmark. The disclosure makes Microsoft the second major vendor in roughly a month to publish concrete production results from an agentic vulnerability-discovery system, after Mozilla credited Anthropic’s Claude Mythos Preview with 271 Firefox fixes in April.
What Microsoft Announced
MDASH was developed by Microsoft’s Autonomous Code Security team alongside the Windows Attack Research and Protection group, according to CSO Online. The system “orchestrates more than 100 specialized AI agents across multiple frontier and distilled models, with each agent assigned to a different stage of the vulnerability discovery pipeline,” CSO Online wrote. The Microsoft Security Blog describes those pipeline stages as filled by auditors, debaters, deduplicators, and provers.
Taesoo Kim, Microsoft’s Vice President of Agentic Security, framed the design choice in the company’s announcement. “The model is one input. The system is the product,” Kim wrote. In a separate exchange covered by Infosecurity Magazine, Kim added: “The multi-model agentic scanning harness runs a configurable panel of models.”
The core claim is that MDASH found 16 new flaws in the Windows networking and authentication stack and that all 16 are included in this month’s security release, Microsoft said. Four of those are rated critical remote code execution bugs. CSO Online details two of them: CVE-2026-33827, a remote unauthenticated use-after-free flaw in the Windows IPv4 stack reachable through specially crafted packets carrying the Strict Source and Record Route option, and CVE-2026-33824, a pre-authentication double-free issue in the IKEEXT service affecting RRAS VPN, DirectAccess, and Always-On VPN deployments.
Benchmark Results
Microsoft is also using the disclosure to claim the top spot on CyberGym, an open vulnerability-reproduction benchmark. The Microsoft Security Blog reports an 88.45 percent success rate, which it calls “the highest score on CyberGym’s published leaderboard at the time of writing and roughly five points above the next entry, 83.1%.” CSO Online independently corroborates the 88.45 percent figure.
Microsoft also published internal-recall numbers from retrospective tests against confirmed Microsoft Security Response Center cases in two of the most-scrutinized Windows kernel components. Microsoft reports 96 percent recall on 28 MSRC cases spanning five years for clfs.sys, and 100 percent recall on 7 MSRC cases spanning five years for tcpip.sys. These are vendor-reported figures, not independently audited.
The May 2026 Patch Tuesday in Context
The MDASH disclosure landed inside one of the larger Patch Tuesday releases of the year. Microsoft itself shipped fixes for 120 CVEs in its own products, BleepingComputer reported. Including the additional non-Microsoft CVEs that Microsoft tracks and republishes, The Hacker News put the wider total at 138 vulnerabilities.
Of the 120 Microsoft CVEs, BleepingComputer counted 17 rated Critical — 14 remote code execution, two elevation of privilege, and one information disclosure. The category breakdown was 61 elevation of privilege, 31 remote code execution, 14 information disclosure, 13 spoofing, 8 denial of service, and 6 security feature bypass flaws, per BleepingComputer.
No zero-days are known to be under active attack this month, BleepingComputer reported. The Hacker News reached the same conclusion across the wider 138-flaw set, writing that “none of them have been listed as publicly known or under active attack.”
The most severe non-MDASH bugs flagged by trade press include CVE-2026-41096, a heap-based buffer overflow in the Windows DNS Client with a CVSS score of 9.8, and CVE-2026-41089, a stack-based buffer overflow in Windows Netlogon also rated 9.8, according to The Hacker News. Cyber Security News adds CVE-2026-42898, a Microsoft Dynamics 365 on-premises remote code execution flaw.
Availability
MDASH is not generally available. The platform will enter private preview for enterprise customers next month, CSO Online reported. Microsoft has not published pricing, eligibility criteria, or a target date for broader availability.
The rollout pattern mirrors what other agentic security systems have followed in 2026: limited preview access through partner programs rather than a public product launch. Anthropic’s Claude Mythos Preview, which Mozilla credited with surfacing the 271 vulnerabilities patched in Firefox 150 last month as previously reported, has been distributed through Anthropic’s Project Glasswing initiative on a partner-only basis.
What We Don’t Know
Several parts of MDASH’s story rest only on Microsoft’s own disclosure and have not yet been independently reproduced. The CyberGym 88.45 percent figure comes from Microsoft’s blog post; the next-entry score of 83.1 percent and the leaderboard ordering have not been confirmed by a third-party audit. The clfs.sys and tcpip.sys recall numbers (96 percent on 28 cases and 100 percent on 7 cases) are likewise vendor-reported retrospective metrics against curated MSRC samples rather than a forward-looking blind test.
Microsoft has named four critical RCEs and provided technical detail on two of them (CVE-2026-33827 and CVE-2026-33824). The full list of the 16 MDASH-discovered CVEs, including the remaining critical pair and the twelve non-critical issues, is referenced in the Microsoft Security Blog but has not been re-published in detail by every outlet covering the announcement.
It is also too early to tell whether MDASH’s agent-orchestration approach is competitive with — or genuinely ahead of — Anthropic’s Mythos on equivalent workloads. The Machine Herald has previously noted that no head-to-head benchmarks exist between the major agentic security systems, and the absence of a published Mythos CyberGym score makes any near-term comparison provisional. Microsoft’s own results, like Mozilla’s, will need to be matched against adversarial use of equivalent models before defenders can say whether agentic discovery durably tilts the field in their favor.