CAISI Signs Frontier AI Testing Agreements With Google DeepMind, Microsoft, and xAI, Extending Pre-Deployment Reviews Beyond Anthropic and OpenAI
The Commerce Department's AI standards center will evaluate the three labs' frontier models in classified environments before public release, expanding a program that began with Anthropic and OpenAI in 2024.
Overview
The Center for AI Standards and Innovation (CAISI), the AI evaluation arm housed inside the National Institute of Standards and Technology, announced on May 5, 2026 that it has signed new agreements with Google DeepMind, Microsoft, and xAI to evaluate their frontier AI models for national security risks before those models reach the public. The deal extends a pre-deployment testing framework that previously covered only Anthropic and OpenAI to three of the largest remaining frontier AI developers in the United States.
What We Know
Under the new arrangement, CAISI will, in its own words, “conduct pre-deployment evaluations and targeted research to better assess frontier AI capabilities and advance the state of AI security,” according to the NIST announcement. The agreements allow CAISI to examine models before they are publicly available and to continue post-deployment assessment once they ship.
NIST said the agreements “support testing in classified environments and were drafted with the flexibility required to rapidly respond to continued AI advancements.” According to Nextgov/FCW, CAISI will study models with reduced or removed safeguards to understand their unmitigated capabilities. SiliconANGLE reports that the risk categories under examination include cybersecurity, biosecurity, and chemical weapons.
CAISI Director Chris Fall said in the NIST statement: “Independent, rigorous measurement science is essential to understanding frontier AI and its national security implications. These expanded industry collaborations help us scale our work in the public interest at a critical moment.”
NIST said that, to date, CAISI “has completed more than 40 such evaluations, including on state-of-the-art models that remain unreleased.” Al Jazeera corroborates that figure and adds that Microsoft will work with U.S. government scientists to test AI systems “in ways that probe unexpected behaviors.”
The new agreements build on partnerships first signed in August 2024 with Anthropic and OpenAI, according to NIST, which have since been renegotiated. Nextgov/FCW notes that the renegotiated terms align the program with the Trump administration’s AI Action Plan, and that the administration is reportedly considering an executive order that would create a formal pre-market review process for new AI models.
Industry response was welcoming. Aaron Cooper, Business Software Alliance Senior Vice President of Global Policy, told Nextgov/FCW that “today’s announcement reinforces CAISI’s role as the right institutional home within government for advancing evaluation and measurement science.”
Context
CAISI sits inside the Department of Commerce as part of NIST. As Al Jazeera notes, the office was originally established under President Biden in 2024 as the U.S. Artificial Intelligence Safety Institute and made its first model-evaluation agreements with OpenAI and Anthropic at that time.
The expansion announcement landed days after a separate Pentagon procurement story unfolded. Al Jazeera reports that the Department of Defense recently struck AI deals with seven companies — Google, Microsoft, Amazon Web Services, Nvidia, OpenAI, Reflection, and SpaceX — to use AI across classified computer networks, with Anthropic notably absent. The CAISI agreements operate on a different track, focused on safety evaluation rather than military procurement, and Anthropic’s prior CAISI deal remains in place under its renegotiated terms.
What We Don’t Know
Neither NIST nor the participating companies disclosed which specific Google DeepMind, Microsoft, or xAI models will be submitted to CAISI first, nor what cadence of evaluation the agreements establish. The detailed scope of any reported executive order on pre-market AI review has not been finalized; Nextgov/FCW characterizes such an order as under consideration rather than issued.
CAISI has also not published the methodologies it uses for the evaluations it has already completed. SiliconANGLE describes the program as a shift from the Trump administration’s earlier hands-off posture toward AI, but the practical enforcement weight of voluntary CAISI evaluations — whether findings can block a launch, trigger remediation, or merely inform classified federal users — has not been spelled out publicly.
Why It Matters
With the addition of Google DeepMind, Microsoft, and xAI, CAISI now has formal pre-deployment access to models from five of the most prominent U.S. frontier AI developers. The arrangement consolidates a measurement-science role for NIST at a moment when the rest of the federal government is still debating whether and how to regulate frontier AI directly. For the labs, the agreements normalize a workflow in which a sanitized, safeguard-stripped version of a frontier model is made available to government evaluators in classified facilities ahead of public release — a posture that would have been unusual two years ago and is now becoming an industry baseline.