- Google is poised to announce new inference-focused TPU chips at Google Cloud Next in Las Vegas, directly challenging Nvidia in the exploding market for chips that run AI models in real time.
- Anthropic, Meta, and Citadel Securities have all signed major TPU deals — with Meta receiving its first significant supply and testing inference advantages, and Anthropic locking in access to up to 1 million TPUs.
- Google Chief Scientist Jeff Dean said it now “makes sense to specialize chips more for training or inference workloads” — a significant strategic shift from the company’s previous one-chip-fits-all approach.
- Supply is already constrained: one startup exec complained Google has effectively allocated all available TPUs to elite AI labs like Anthropic, leaving smaller players waiting.
What Happened?
Google is preparing to unveil its next generation of tensor processing units (TPUs) at the Google Cloud Next conference in Las Vegas this week, with a likely focus on inference — the process of running AI models after they’ve been trained. The move signals a meaningful strategic evolution for Google’s chip program, which has historically produced generalist TPUs. Chief Scientist Jeff Dean confirmed the company is exploring specialization between training and inference chips, a shift driven by surging demand for fast AI query processing. The momentum behind Google’s silicon has been remarkable: Anthropic signed a deal for up to 1 million TPUs, Meta committed to a multibillion-dollar cloud agreement and is now testing its first TPU supply, and Citadel Securities plans to present at the conference on how TPUs enabled faster model training than its previous GPU setup.
Why It Matters?
Nvidia’s GPUs remain the gold standard for AI training, but the competitive landscape for inference — the workload that’s actually growing fastest as AI moves from research to deployment — is wide open. Google brings a rare combination of assets to this fight: a decade of custom chip design experience, direct feedback loops between its AI model teams and hardware engineers, and firsthand scale from running some of the world’s largest AI systems. Gartner analyst Chirag Dekate noted that Google’s Gemini is already the fastest model at complex reasoning tasks — a direct reflection of TPU inference performance. As enterprises and AI labs rush to deploy AI agents that require rapid, continuous query processing, the chip that wins inference wins the next wave of the AI buildout.
What’s Next?
Google Cloud Next this week will be the venue for the official chip announcement, and all eyes will be on whether Google formally launches a dedicated inference TPU or previews one for the near future. Meanwhile, supply constraints are already biting — with Google prioritizing top-tier AI labs, smaller startups are getting squeezed out. Longer term, Google faces the same three-year design cycle challenge that plagues all chipmakers: AI models evolve far faster than silicon can be planned, built, and shipped. The company’s answer, according to its VP Amin Vahdat, is to sometimes design two parallel chip variants and ship whichever fits the market need — a hedge against the fundamental unpredictability of where AI is headed.
Source: Bloomberg













