Google GTIG Confirms First Criminal AI-Built Zero-Day: A 2FA Bypass That Would Have Enabled Mass Exploitation
Google's Threat Intelligence Group says a cybercrime group built a zero-day exploit using AI, marking the first confirmed case of adversaries weaponizing an LLM to discover and exploit a previously unknown vulnerability.
Overview
Google’s Threat Intelligence Group (GTIG) has documented what it calls a historic threshold in adversarial tradecraft: a cybercrime group used an AI model to discover and weaponize a previously unknown vulnerability, producing a working zero-day exploit targeting a popular open-source web-based administration tool. The exploit, a Python script designed to bypass two-factor authentication, was intended for use in a mass exploitation campaign. According to the GTIG report published May 11, 2026, Google’s proactive discovery disrupted the operation before the exploit was deployed at scale.
What We Know
GTIG’s report, titled “Adversaries Leverage AI for Vulnerability Exploitation, Augmented Operations, and Initial Access,” documents the group’s finding in unambiguous terms: “For the first time, GTIG has identified a threat actor using a zero-day exploit that we believe was developed with AI. The criminal threat actor planned to use it in a mass exploitation operation but our proactive counter discovery may have prevented its use.”
The target was a popular open-source, web-based system administration tool that Google has declined to name publicly. The vulnerability itself was a semantic logic flaw — developers had hardcoded a trust exception into the authentication flow, creating a path to sidestep two-factor authentication checks entirely. The exploit required valid user credentials but, once in possession of them, allowed an attacker to bypass 2FA and achieve full account access, according to The Hacker News, which reported that GTIG described the flaw as “a zero-day vulnerability implemented in a Python script that enables the user to bypass two-factor authentication.”
Researchers attributed the exploit to AI with high confidence based on distinctive markers in the Python code itself. According to SecurityWeek, GTIG identified “an abundance of educational docstrings” and “structured, textbook Pythonic format highly characteristic of LLMs training data” — patterns that reflect how large language models are trained to produce readable, well-documented code rather than the stripped-down style typical of human-authored exploit scripts. The code also contained a hallucinated CVSS score in its comments, a marker consistent with LLM output that blends factual content with plausible-sounding fabrications. GTIG stated it does not believe Google’s own Gemini model was used.
The GTIG report explains why this class of vulnerability is particularly susceptible to AI-assisted discovery: “While fuzzers and static analysis tools are optimized to detect sinks and crashes, frontier LLMs excel at identifying these types of high-level flaws and hardcoded static anomalies… they have an increasing ability to perform contextual reasoning, effectively reading the developer’s intent to correlate the 2FA enforcement logic with the contradictions of its hardcoded exceptions.”
GTIG worked with the affected vendor to responsibly disclose and patch the flaw before the campaign launched. No CVE number has been publicly assigned.
Broader AI Threat Landscape
The AI-built zero-day is the headline finding of a broader GTIG report covering how state-sponsored and criminal actors are integrating AI across all phases of attack operations. According to CSO Online, GTIG observed “prominent cyber crime threat actors partnering to plan a mass vulnerability exploitation operation” — indicating that AI-assisted exploitation is not confined to sophisticated nation-state actors.
Among state actors, GTIG identified the PRC-linked group UNC2814 using expert persona prompting to target TP-Link firmware and OFTP protocol implementations, and APT45, a DPRK-linked group, sending thousands of repetitive prompts recursively analyzing CVEs and validating proof-of-concept exploits. APT27 leveraged Gemini to develop fleet management tooling for its obfuscated relay (ORB) networks. All of these findings come from the GTIG May 11 report.
The report also documents PROMPTSPY, an Android backdoor that uses the Gemini API (gemini-2.5-flash-lite model) for autonomous device control, including capturing biometric data to replay authentication gestures. Google stated it disabled the assets associated with PROMPTSPY and found no related apps on Google Play.
On the defensive side, GTIG cited Google’s own Big Sleep AI agent, which has found real-world security vulnerabilities, and CodeMender, an experimental AI agent designed to automatically fix critical code vulnerabilities.
What We Don’t Know
GTIG has not publicly identified the name of the targeted administration tool, the specific cybercrime group responsible, or the AI model used to generate the exploit. The decision to withhold the tool’s name appears consistent with responsible disclosure practice, but it limits independent verification of the patch.
Whether this represents an isolated capability or the beginning of a repeatable workflow for AI-assisted zero-day development is also unresolved. The indicators GTIG used to identify the exploit as AI-generated — docstrings, hallucinated CVSS scores, textbook code structure — may become less reliable as adversaries learn to clean up LLM output before deployment.
Analysis
John Hultquist, Chief Analyst at Google Threat Intelligence Group, framed the report’s significance directly in comments reported by The Register: “There’s a misconception that the AI vulnerability race is imminent. The reality is that it’s already begun.”
Ryan Dewhurst of watchTowr echoed that assessment as reported by The Hacker News: “AI is already accelerating vulnerability discovery, reducing the effort needed to identify, validate, and weaponize flaws.”
The significance of GTIG’s finding lies less in the sophistication of this particular exploit — a logic flaw in authentication code, requiring valid credentials — and more in what it signals about the direction of travel. Traditional security tooling such as fuzzers and static analyzers search for crashes and sinks; they are poorly suited to detecting the kind of intentional but contradictory logic that produces authentication bypasses. LLMs, trained on vast amounts of code and documentation, can reason about developer intent in ways that complement rather than duplicate existing tooling. If that capability becomes routinely accessible to criminal operators, the already compressed timeline from vulnerability disclosure to exploitation may extend further upstream — to vulnerabilities no one has yet reported.