Google Splits Its Eighth-Generation TPU Into Training and Inference Chips as Anthropic Locks in Up to a Million Units

Overview

Google used its Cloud Next 2026 keynote on April 22 to unveil its eighth-generation Tensor Processing Unit line, breaking with a decade of unified design by splitting the TPU into two purpose-built chips: the TPU 8t for training and the TPU 8i for inference. Google’s official announcement, authored by SVP and Chief Technologist Amin Vahdat, frames the move as a response to the “agentic era,” where models are both trained at ever-larger scales and served concurrently to millions of users and autonomous agents.

The bifurcation follows previously reported momentum for Google’s custom silicon, which has drawn Meta and Anthropic into multibillion-dollar commitments as hyperscalers look to diversify away from Nvidia.

What We Know

The TPU 8t, positioned for training, scales to 9,600 chips in a single superpod and delivers 121 exaflops of compute with two petabytes of shared high-bandwidth memory, according to Google’s technical blog. The same post details near-linear scaling to more than one million chips in a single logical cluster, double the interchip interconnect (ICI) bandwidth of the prior Ironwood generation, and up to 2x performance per watt versus Ironwood. Google says the chip sustains over 97 percent “goodput” — productive compute time — during training runs.

The TPU 8i, aimed at inference and reinforcement learning, triples on-chip SRAM to 384 MB and increases HBM to 288 GB, enabling the chip to hold massive key-value caches entirely on silicon, per the Google Cloud infrastructure post. Interconnect bandwidth reaches 19.2 Tb/s, and a new on-chip Collectives Acceleration Engine reduces latency by up to 5x for Mixture-of-Experts models. Google claims the 8i delivers 80 percent better performance per dollar for inference than its predecessor.

The TPU 8t delivers 2.8 times the price-performance of the seventh-generation Ironwood TPU, as reported by CNBC, which also noted that Google is positioning the new chips as complementary to — rather than a replacement for — its Nvidia-based infrastructure. Google simultaneously confirmed it will offer Nvidia’s upcoming Vera Rubin GPUs through its A5X instances later in 2026.

According to TechCrunch’s coverage, the new TPUs can be orchestrated across more than one million units in a single cluster, and Google said its Gemini API now processes more than 16 billion tokens per minute.

The industry context was sharpened by The Next Web’s reporting, which identified the 8t as Broadcom-designed (codenamed Sunfish) and the 8i as MediaTek-designed (codenamed Zebrafish), both targeting TSMC’s 2nm process for late 2027 volume production. The outlet also reported that custom AI silicon is expanding at roughly 44.6 percent annually against 16.1 percent for GPUs, and cited analyst projections that custom chips could reach 45 percent of the AI accelerator market by 2028.

Anthropic has emerged as the anchor customer for both TPU generations. The Next Web reported that Anthropic will have access to up to one million TPU chips and more than a gigawatt of capacity in 2026, with commitments expanding to 3.5 gigawatts in 2027 spanning both Ironwood and the new 8-series chips.

Broader Cloud Next 2026 Announcements

The chip launch arrived alongside a wider Cloud Next push. In a blog post summarizing the keynote, CEO Sundar Pichai said Gemini Enterprise had grown 40 percent quarter over quarter in paid monthly active users in the first quarter of 2026 and that Gemini models were processing “over 16 billion tokens per minute” through the direct API. Google also introduced a Gemini Enterprise Agent Platform for building and governing agents across organizations, and announced an expanded cybersecurity partnership with Wiz on an AI Application Protection Platform.

On the productivity side, Google unveiled Workspace Intelligence, an AI layer spanning Gmail, Docs and Chat that 9to5Google describes as a system that “understands complex semantic relationships” across a user’s data, surfaces priority actions, and powers new features like AI Inbox in Gmail and “Ask Gemini” in Google Chat.

What We Don’t Know

Google has not disclosed a precise general-availability date for either TPU 8t or TPU 8i, with the company stating only that the chips will reach Cloud customers “later in 2026.” The Next Web’s 2nm and late-2027 volume timeline suggests early availability will rely on a different process node, but Google has not publicly confirmed the specifics. Pricing for cloud instances has not been published, and Google has not broken out how the Anthropic commitment maps across 8t and 8i units versus remaining Ironwood capacity.

It also remains unclear how aggressively other hyperscalers will respond. AWS continues to scale its Trainium line, and Microsoft has yet to detail a comparable split architecture for its Maia silicon. Whether custom ASICs can meaningfully erode Nvidia’s dominance in inference — where, per The Next Web, Nvidia still holds more than 90 percent share — will depend on software ecosystem maturity, availability outside a handful of anchor customers, and whether the performance-per-dollar claims hold up in independent benchmarks.