- The AI compute crunch is hitting critical mass: GPU rental prices (Nvidia Blackwell) are up 48% in two months to $4.08/hour, Anthropic’s Claude API uptime fell to 98.32% in March (vs. the 99.99% “four nines” standard for enterprise software), and OpenAI killed its Sora video app partly to free up compute for higher-priority products
- Demand for agentic AI — autonomous tools that independently perform multi-step tasks — has caused token consumption to explode; OpenAI’s API usage rose from 6 billion tokens per minute in October to 15 billion in late March, and the company’s CFO said she spends significant time “trying to find any last-minute compute available”
- Anthropic has been forced to rate-limit users during peak hours (5 a.m.–11 a.m. PT on weekdays), sparking complaints from power users hitting limits in 45 minutes; enterprise clients are switching to OpenAI models due to Anthropic’s reliability issues, even though many prefer Claude’s performance
- The bottleneck is structural: data center build times are long, power through 2026 is already spoken for, and GPU lead times are measured in months — CoreWeave has raised prices 20%+ and is locking customers into 3-year contracts, with Bank of America analysts projecting demand will outstrip supply through at least 2029
What Happened?
The AI industry is hitting a classic infrastructure bottleneck: demand is growing far faster than companies can build the compute capacity to serve it. The surge is being driven by the rise of “agentic” AI — autonomous tools that perform multi-step tasks independently, consuming vastly more computing resources per interaction than a simple chatbot. Anthropic, whose Claude API revenue has exploded from $9 billion annualized at end-2025 to $30 billion by April 2026, has seen its API uptime fall to 98.32% in March — well below the 99.99% standard enterprise software buyers expect. OpenAI’s API token consumption rose 2.5x between October and late March. GPU spot prices have surged 48% in two months. CoreWeave has raised prices 20%+ and is asking customers to commit to three-year contracts. And OpenAI killed Sora, its viral video-generation product, in part to redirect compute to higher-priority services.
Why It Matters?
Reliability is the foundation on which enterprise AI adoption is built — and the current compute shortage is undermining it at precisely the moment companies are most dependent on AI for productivity. The shift from novelty to infrastructure means that outages are no longer an inconvenience; they are a business continuity problem. Anthropic’s enterprise clients, including some who prefer Claude to OpenAI’s models on performance grounds, are already switching providers due to reliability concerns. More broadly, the compute crunch creates a structural tension: AI companies cannot raise prices aggressively without risking user defection to cheaper rivals, but they also cannot afford to serve rapidly growing demand at current margins. Price increases are historically the clearest market signal in a supply crunch — and the AI industry may be approaching the point where they become unavoidable.
What’s Next?
The supply-demand gap is not expected to close quickly. Data center construction timelines are measured in years, and available power through 2026 is already fully committed. Bank of America analysts project that demand for CoreWeave’s GPU cloud services will outstrip supply through at least 2029. Meanwhile, AI model capabilities are advancing rapidly — each new generation of agentic tools consumes more compute per task than the last. The companies best positioned to weather the crunch are those with long-term infrastructure contracts and dedicated compute reserves; those relying on spot markets face the most volatility. For users, the message from the current wave of outages and rate limits is clear: the era of unlimited, always-available AI is already over.
Source: The Wall Street Journal










