The token bill comes due: Inside the industry scramble to manage AI’s runaway costs

5 hours ago 1

Across the industry, companies are starting to balk astatine the terms of AI. Uber blew through its full 2026 AI coding fund by April. Microsoft revoked its developers’ Claude Code licenses months aft enabling them. A Priceline worker told TechCrunch that a regular Cursor declaration renewal came backmost 4-5x much expensive.

Even though per-token prices person fallen, the propulsion for much AI adoption and progressively autonomous agents person driven token depletion higher and higher. Companies that gorged themselves successful aboriginal 2025 connected all-you-can-eat subscriptions are present scrambling to recognize wherever their wealth is going, propulsion backmost spending, and fig retired whether they tin salvage immoderate ROI from the wreckage of their budgets.

Meanwhile, a marketplace is forming to conscionable them there. Startups, established vendors, and a caller standards assemblage are each racing to springiness companies the tools and connection to way what they spend.

“Six months ago, I would person a speech with a lawsuit and it would beryllium each astir ‘What tin it do? Is it bully enough?’” Alexander Embricos, OpenAI’s caput of enterprise, told TechCrunch astatine an lawsuit successful New York City this week. “Our conversations are ne'er astir that now. Now the conversations are about, ‘hey, we’re spending truthful much. What visibility bash you have? What auditability bash you have? What token controls bash you have? What is the ratio of your models?’”

It’s against this backdrop that the Linux Foundation this week unveiled plans for the Tokenomics Foundation, a caller standards assemblage that aims to instill the aforesaid outgo subject astir AI tokens that FinOps did for unreality spend.

“In April and May, I started proceeding from companies: ‘Oh my god, we are 3x implicit our full 2026 token fund and it’s lone April,’” J.R. Storment, enforcement manager of the FinOps Foundation, a task nether the Linux Foundation, told TechCrunch. “We started proceeding existential crises, and the full speech shifted from tokenmaxxing and ‘go fast’ to ‘we request guardrails, however bash we power this?’”

The cries heard circular the tech satellite followed fervent demands from CEOs pushing their teams to usage the champion models and determination fast, costs beryllium damned. New models released successful November similar Anthropic’s Claude Opus 4.5, OpenAI’s GPT-5.1, and Google’s Gemini 3 Pro brought important improvements to agentic tools, which person multiplied consumption. It’s however 1 institution reportedly recovered itself with a $500 cardinal Claude measure aft forgetting to acceptable usage limits for employees. 

“It’s similar the crack-cocaine epidemic,” says Chris Reed, elder manager of IT concern astatine Priceline, erstwhile asked astir the pricing contented successful utilizing AI. “They fto you effort it to get you hooked connected it, and present you’re benignant of beholden to it.”

Vitaly Gordon, CEO of engineering operations level Faros AI, said helium precocious spoke to a CTO who told him: “One of my engineers spent $40,000 connected tokens past month, and I genuinely don’t cognize whether I should halt him oregon should I spell and archer everyone other to beryllium similar him.“

A March survey by Faros recovered that among 20,000 developers, output was rising, but truthful were bugs and rewrites. Jellyfish, an engineering absorption platform, likewise recovered engineers who utilized the astir tokens were astir doubly arsenic productive than those who utilized AI less, but they spent 10x the fig of tokens to get there.

Nicholas Arcolano, caput of probe astatine Jellyfish, told TechCrunch via email that expenditure connected AI is exploding successful ample portion owed to agentic features, with per-developer depletion rising astir 18.6x successful 9 months. All successful all, these stats marque the productivity lawsuit murkier than the spending suggests.

“Whether utmost walk pays disconnected comes down to the eventual concern worth of shipped codification (e.g. revenue), which astir companies inactive can’t measure,” Arcolano said.

At slightest immoderate of that measurement contented is the sheer standard astatine which AI is being utilized today.

“Tracking unreality costs is simply a hundreds-of-millions-of-rows-a-month information problem,” Storment said. “Tracking token costs is simply a trillions-of-rows-a-month information problem. You can’t conscionable instrumentality that into immoderate spreadsheet oregon adjacent basal tool. You’ve got to fundamentally rethink your tooling, your specs and your accounting systems to bash that.”

At Priceline, Reed is already seeing discrepancies. He noted issues betwixt a vendor’s reported usage and Priceline’s interior data.

“I started my vocation successful telecom disbursal management, and I’m seeing each the aforesaid parallels, from telecom to unreality to AI,” helium said. “Anytime you present thing new, it’s ripe for billing errors and audit and optimization opportunities.”

A marketplace is opening to signifier astir this problem. There are the pure-play companies, similar Pay-i, which tracks, measures and optimizes the costs and show of GenAI investments. Paid, meanwhile, lets developers way costs, measurement usage and measure users based connected existent worth alternatively than subscription fees.

Then determination are companies similar Jellyfish, Waydev and Faros AI, which each supply AI cause monitoring to beryllium the ROI of developer tools. Storment says astir of the 180 vendors wrong the FinOps Foundation are leaning towards this space. 

Companies with existing organisation are besides adding caller features to capitalize connected this caller market. Ramp has precocious moved into AI walk management; Datadog and New Relic person tacked connected services similar unreality outgo management, token-level observability, and GPU monitoring. At the FinOps X league adjacent week, AWS is expected to present caller fiscal absorption features geared toward endeavor AI spending.

Tiffany Luck, a spouse astatine NEA, thinks token ratio and observability volition apt beryllium added successful astatine the “harness oregon app layer.” She pointed to Factory, a startup that makes AI agents for enterprises, which this week launched a exemplary router that automatically picks the close exemplary for each task. 

Gordon expects frontier labs and different exemplary providers to follow OpenRouter-style optimization to thrust queries to the cheapest models — a inclination already showing up connected endeavor Claude bills. 

“The fiscal study for however overmuch you walk connected Anthropic, adjacent if you telephone the Opus model, immoderate of the walk volition beryllium connected Sonnet oregon Haiku, due to the fact that they are astute capable to bash it,” Gordan said. “I deliberation this volition go much and much of a thing.”

But each these tools are being built without a communal connection oregon shared definitions for however overmuch a token costs, what it produces, and however to comparison walk crossed vendors. That’s wherever the Tokenomics Foundation hopes to beryllium useful.

The Foundation is gathering a canonical explanation and model for “tokenomics;” unfastened standards, specifications and metrics for AI token usage and billing; arsenic good arsenic caller metrics for AI economics, similar cost-per-intelligence oregon tokens-per-watt. It besides plans to specify metrics crossed token mill effectiveness and depletion efficiency. The radical is readying a ceremonial motorboat successful July, and is astir to denote much members astatine the FinOps X league adjacent week. 

“Token economics is fundamentally much abstract and opaque than thing we’ve managed astatine this standard before,” Nishant Gupta, main availability serviceman astatine Salesforce, said successful a statement. “It requires a antithetic operational musculus than the 1 the manufacture built for cloud.”

That said, Goldman Sachs projects planetary token usage to multiply by 24 times by 2030. The companies already implicit fund request solutions now, and the foundation’s archetypal deliverable is inactive months away.

“Maybe we created a steam engine, but we inactive haven’t figured retired the assembly line,” said Gordon.

According to Arcolano, the astute determination is broad, mean adoption. 

“The champion ROI comes from moving the wide mediate from debased to mean usage, not pushing dense users higher,” helium said.

Russell Brandom and Tim Fernholz contributed to this reporting.

When you acquisition done links successful our articles, we whitethorn gain a tiny commission. This doesn’t impact our editorial independence.

Read Entire Article