Banks face 500% token bill spike as AI pricing shifts

Token-based pricing has caught banks off guard

For the past several months, major AI providers including Anthropic, OpenAI, and Microsoft have shifted their billing model from flat subscriptions to per-token charges. Tokens are the units of data on which language models run. This shift has created immediate sticker shock in banking.

Royal Bank of Canada reported a 500% jump in token usage from 2025 to 2026 (company-reported). Bill Demchak, CEO of PNC Financial Services Group, stated bluntly that "these tokens are really expensive." Zachery Anderson, chief data and analytics officer at JPMorgan Chase, told Semafor that some employees are "spending more on tokens than their salary."

The financial impact matters because banks adopted AI specifically to reduce costs. If token consumption rises faster than productivity gains, the premise collapses. "Any impact that AI can have on the productivity of a bank, that productivity can be taken away by the cost of tokens," Demchak said at the Morgan Stanley U.S. Financials Conference.

The problem is model selection, not technology itself

Banks have been deploying agentic models—AI systems that can make decisions and complete tasks with minimal human oversight—for tasks that do not require that level of sophistication. Rob May, CEO of Neurometric, a consulting firm specializing in AI spend, explained: "If you're going to do simple tasks, those models are overkill. Why spend all the money on a giant model that can do that, but can also write you a great lasagna recipe and give you workout instructions?"

The solution is deliberate step-down: use smaller, older, or custom-built models for routine work. PNC has already adopted this approach. Ned Carroll, PNC's head of data and automation, noted: "I don't need a model to answer advanced calculus when I want to understand a policy or procedure around a check return."

Banks are also exploring open-source models, which carry no token cost, and response caching—storing answers to frequently asked queries so repeat requests hit a database instead of hitting the model and incurring charges.

Three tactics to control token spend immediately

Right-size the model for the task. Older, smaller, and custom-built language models handle classification, summarization, and routine lookup work without the cost overhead of frontier models.
Cache answers to repeated queries. If the same question is asked frequently and the answer is stable, store it and query the database first. No model call means no token charge.
Build or buy GPU capacity in-house. PNC has committed to owning its own graphics processing units and reducing reliance on third-party token purchases. This trades upfront capital for long-term control.

One more option exists: for high-risk tasks where AI failure is costly, revert to human labor. May noted: "There are still use cases for that." Sometimes the cheapest AI is no AI.

Banks face 500% token bill spike as AI pricing shifts

Our Take

Why it matters

Do this week

Token-based pricing has caught banks off guard

The problem is model selection, not technology itself

Three tactics to control token spend immediately

Related stories

Non-observable states cut Markovian bandit regret near-logarithmic

New method lets you interpret protein AI models without exploding feature counts

Darts Adds Four Foundation Models in One Interface