Back to news
AnalysisJune 29, 2026· 2 min read

AI's true cost emerges as vendors end subsidies, banks hunt cheaper options

Tech companies are switching from flat-rate AI pricing to per-token charges, forcing banks to confront real costs that far exceed what they budgeted. Some have burned through a year's spend in months.

Our Take

The subsidy is over. Vendors can no longer hide compute costs behind bulk licensing, and the customers paying for it are already building their own models rather than accept the real bill.

Why it matters

Banks and enterprises are discovering that AI isn't cheap, and neither will be the infrastructure buildout that now sits on $2 trillion in financing (per analyst estimate). If customers defect to in-house solutions or open source, the ROI math that justified vendor spending collapses.

Do this week

Finance: audit your token consumption month-to-month against your annual AI budget baseline before month-end so you can flag overage risk to procurement.

The subsidy era ends, real prices arrive

Tech vendors including Anthropic, OpenAI, and Microsoft are abandoning flat per-seat licensing and switching to per-token pricing. This shift exposes what was always true but hidden: running AI at scale is expensive, and vendors were absorbing losses to drive adoption.

Under the old model, companies paid a fixed rate regardless of usage. This encouraged what insiders call "tokenmaxxing." Under the new model, cost scales with consumption. The result is immediate: some enterprise customers have exhausted a full year's AI budget in months. PNC, for instance, is now building its own AI infrastructure in-house to reduce per-token exposure.

The pricing opacity is beginning to clear. A startup called Ornn is publishing token price indices derived from executed GPU transactions. A second firm, IFX, is building derivatives products on top of that data, allowing traders to hedge compute costs like any commodity.

The math is tightening on vendors and creditors alike

The AI buildout is on track to consume roughly $2 trillion in capital (per analyst Azeem Azhar). Early spending came from hyperscalers themselves, but increasingly, the money is borrowed from banks and private credit. That debt carries a demand: profitability.

Azhar estimates that to achieve a 25% return on $8 billion in annual operating costs for 1 gigawatt of AI capacity, vendors need to charge between $1.05 and $2.10 per token. As of recent data (per Ornn), H100 GPU pricing sits at $2.45 per token, which means vendors are operating at thin or negative margins once overhead is included.

This creates a two-way squeeze. Vendors cannot subsidize indefinitely without disappointing lenders. Customers paying real prices now have a rational incentive to defect to cheaper alternatives: open-source models, older but functional models, or internal infrastructure. Each defection shrinks the customer base vendors need to justify their capital raise.

Audit usage, plan alternatives, lock costs now

Banks and enterprises should treat token pricing as a variable cost that demands monthly governance, not a fixed IT line item. Nathan Place's reporting (per American Banker) documents that companies are already pursuing three cost-reduction strategies: migrating to open-source models, reverting to older models that still deliver value, and building proprietary inference layers.

If your organization has not yet migrated to per-token billing, the transition is coming. If you have, you need a monthly consumption audit against budget and a written decision on whether to invest in in-house inference, negotiate multi-year fixed rates, or shift workloads to cheaper model architectures. The subsidy window is closed. The only variable left is how quickly you adapt.

#Enterprise AI#Finance AI#LLM
Share:
Keep reading

Related stories