Skip to main content
Innovation|Innovation

AI Keeps Getting Cheaper. So Why Is Everyone's AI Bill Exploding?

The price of AI fell roughly 4x in a year — GPT-4-class work that cost $20 now costs cents. Yet companies are blowing through annual AI budgets in months. Both things are true, and the reason matters.

June 12, 20266 min read
Share:

Here is a riddle from the strange economy we now live in. The price of running AI models has been falling faster than almost any technology input in history — the average cost per million tokens dropped from roughly $10 to $2.50 in a single year, and capability that cost $20 per million tokens in late 2022 now goes for around 40 cents. At the same time, one of the world's largest tech companies reportedly burned through its entire annual AI budget in four months.

Both facts are true. Understanding why they coexist is, I would argue, the single most useful piece of economic literacy for anyone working anywhere near technology right now.

The Collapse, by the Numbers

A token is the basic unit AI models read and write — roughly three-quarters of a word. When you hear "cost per million tokens," think of it as the price of having a machine read or write a few novels' worth of text.

That price has been in free fall. Analyses of inference pricing find per-token costs for a fixed level of capability falling anywhere from 9x to several hundred-fold per year, depending on the capability tier you track. The blended industry average fell about 4x in the past year alone. The drivers are unglamorous and compounding: each new GPU generation delivers two to three times more throughput per dollar, software optimization pushed typical GPU utilization from around 35% to closer to 75%, models got architecturally leaner, and competition — particularly from price-aggressive entrants — squeezed margins across the board.

There is a useful historical rhyme here. Electricity, computing cycles, bandwidth, storage: each followed the same arc, where the input became so cheap that the interesting question stopped being "what does it cost?" and became "what do we do now that it's nearly free?"

The Paradox: Cheaper Units, Bigger Bills

So why did Uber — per industry reporting — exhaust its 2026 AI budget by spring? Because adoption of AI coding tools among its roughly 5,000 engineers jumped from 32% to 84%, at a cost of $500 to $2,000 per engineer per month. Enterprise surveys tell the same story economy-wide: per-token prices down dramatically, total AI bills up roughly 320%, with inference now consuming about 85% of typical enterprise AI budgets.

Economists have a name for this: the Jevons paradox, observed in 1865 when more-efficient steam engines led Britain to burn more coal, not less. When something useful gets cheaper, we do not pocket the savings — we find vastly more uses for it. Cheaper tokens did not shrink AI spending. They made it rational to point AI at problems that were uneconomical a year ago, and then at a thousand more behind those.

The arithmetic is straightforward. A 4x price drop paired with a 20x usage increase is a 5x bigger bill. And usage is growing that fast because the new applications genuinely work: agents that read entire codebases, assistants that draft and re-draft until something is right, pipelines that process every document instead of a sample. None of that was affordable at 2023 prices.

What Nearly-Free Intelligence Actually Enables

It is worth pausing on what changes when a unit of machine reasoning costs cents instead of dollars, because the second-order effects are bigger than the first.

  • Iteration becomes the default. When a draft costs half a cent, you generate twenty and keep the best. Quality stops being limited by the cost of attempts.
  • Whole categories of "too small to automate" work flip. Summarizing every customer call, triaging every support ticket, checking every contract clause — tasks that never justified human time or expensive compute suddenly clear the bar.
  • Small, local models inherit yesterday's frontier. Capability that required a data center two years ago now runs on a phone. More than two billion phones already run small language models locally, which moves AI into places with no connectivity, no cloud budget, and strict privacy needs.
  • The bottleneck moves. When intelligence is cheap, the scarce resources become everything around it: clean data, good judgment about what to build, energy to power the data centers, and people who can verify the output. Notably, the industry's own constraint has shifted from chips to electricity — power, not silicon, is the gating factor on the build-out now.

Token Economics Is Now a Real Skill

For builders, the practical consequence is that cost-per-quality has become a core product discipline, the way page-load time became one for the web. A few habits separate teams that ride the cost curve from teams that get flattened by it:

  • Measure before optimizing. Most teams discover their spend is dominated by a handful of call sites — often retries, oversized prompts, or context windows stuffed with material the model never needed.
  • Route by difficulty. The largest savings come from sending easy tasks to small cheap models and reserving frontier models for the hard ones. Enterprises report cost reductions up to around 75% from this kind of routing alone.
  • Cache aggressively. Re-answering the same question with fresh compute is the AI era's version of leaving the lights on.
  • Design model-agnostic. The price-performance leader changes every few months. Teams hard-wired to one provider pay yesterday's prices indefinitely.

For households and careers, the implication is gentler but real: the marginal cost of trying AI on your own work is approaching zero. The $20-per-month assistant now delivers what cost enterprises millions to assemble three years ago. The barrier to finding out whether AI helps your particular work is no longer money. It is the hour of attention required to try it seriously.

The Uncomfortable Reality for the AI Industry

One more wrinkle worth knowing, because it shapes everything downstream. Token-price deflation means AI companies must grow usage relentlessly just to keep revenue flat — a treadmill that keeps accelerating. That is why providers are racing up the value chain, selling outcomes and agents rather than raw tokens, and why the capital requirements keep climbing even as unit prices fall. The deflation that is wonderful for users is brutal for vendors, and the shakeout it forces will determine which of today's AI giants are still standing in five years.

Cheap intelligence, expensive electricity, exploding usage, compressing margins — that is the actual shape of the AI economy in 2026, and it looks less like a bubble popping or a rocket launching than like every previous foundational technology finding its price. The steam engine did not matter when it was novel. It mattered when it was cheap.

FAQ

If AI is getting so cheap, why do my ChatGPT and Claude subscriptions still cost the same?

Subscription prices are anchored to what the service is worth to you, not what it costs to run. Falling inference costs mostly show up as better models at the same price — today's $20 plan is dramatically more capable than 2024's — plus generous free tiers. The raw cost decline is most visible in API pricing, which is what businesses pay.

What is the Jevons paradox in one sentence?

When something useful becomes more efficient or cheaper, total consumption of it usually rises rather than falls, because new uses become economical — true of coal in 1865 and of AI tokens now.

Does the cost collapse mean AI companies are in trouble?

It pressures them. Falling unit prices force volume growth just to stand still, which is why providers are pushing toward higher-value products like agents and enterprise outcomes. Strong players with scale, efficient infrastructure, and sticky products can thrive; undifferentiated resellers of raw model access are the most exposed.

Will prices keep falling at this rate?

The forces driving the decline — hardware generations, software optimization, competition, smaller specialized models — all remain in motion, so further declines are likely. The countervailing pressure is physical: electricity and data-center capacity are the new constraints, and power costs in some regions are rising 30-50%. Cheap tokens still need affordable energy behind them.

What should a non-technical person actually do with this information?

Two things. First, re-test AI on your work every six months or so — capability per dollar improves fast enough that last year's "not good enough" verdicts expire. Second, treat AI fluency as a compounding skill: the tools will keep getting cheaper and better, but the judgment to direct and verify them is what employers consistently pay a premium for.


More from Innovation