- Nearly 100% of OpenAI employees use Codex weekly — originally a coding tool, it’s now used by non-technical teams in marketing and recruiting to handle billing disputes, create dashboards, draft legal disclosures, and build customer demos.
- Google’s AI invoice-validation agent reviews 5x more invoices than the human team it replaced, is on track to save $200M annually in overpayment issues, and has spawned a second agent to handle the backlog of flagged discrepancies it creates — illustrating the “10X problem” where AI acceleration in one area creates bottlenecks elsewhere.
- Anthropic uses Claude in a two-agent architecture — one “builder” agent performs tasks while a second “auditor” agent summarizes what the first one did — with humans serving as reviewers rather than executors.
- Gartner estimates the average Fortune 500 company will run more than 150,000 AI agents within two years, but only 13% of companies currently believe they have adequate AI-agent governance in place.
What Happened?
A WSJ investigation reveals how the three leading AI labs — OpenAI, Google, and Anthropic — are deploying AI agents internally, offering a preview of how agentic AI will reshape white-collar work broadly. At OpenAI, Codex has evolved from a developer tool into a general knowledge-work platform used across legal, sales, marketing, and recruiting. Google’s finance team deployed an invoice-validation agent that can review five times the volume of its predecessor human team, saving an estimated $200 million annually in overpayment errors. Anthropic uses Claude in a two-agent loop: one agent builds, a second audits, and humans review. In all three cases, the pattern is the same: humans become reviewers and validators rather than primary executors of multistep tasks.
Why It Matters?
These aren’t hypothetical use cases — they’re live production workflows at the companies that are simultaneously selling these tools to the rest of the economy. The implications for white-collar employment are significant: junior roles in legal, finance, and operations are being compressed, not eliminated — with the humans who remain spending time training AI models and reviewing AI output rather than doing the underlying work. Google explicitly noted its finance team “remained around the same size despite producing more,” while OpenAI’s general counsel said she’s still hiring junior associates — but primarily to review Codex’s output. The productivity gains are real; so are the organizational challenges: Google’s invoice-validation agent created a new backlog problem that required building yet another agent to solve.
What’s Next?
The Gartner forecast — 150,000 agents per Fortune 500 company within two years — suggests the agentic shift is still in its very early stages. The governance gap (only 13% of companies feel adequately prepared) is likely to drive a wave of enterprise spending on AI oversight tools, audit frameworks, and agent-management platforms. The “10X problem” described by Google — where accelerating one workflow creates downstream bottlenecks — will be a recurring theme as agents cascade through organizations. The companies that figure out how to orchestrate multi-agent workflows, manage the review burden on human workers, and govern agent behavior at scale will have a significant structural advantage.
Source: The Wall Street Journal













