Back to news
AnalysisJune 29, 2026· 3 min read

Banks face 500% token bill spike as AI pricing shifts

Royal Bank of Canada's token costs jumped 500% in one year as AI providers switched from subscriptions to per-unit billing. Here's how banks are cutting costs without ditching AI.

Our Take

Banks expected AI to cut costs; instead, per-token pricing has made it a line-item threat—and the fix is unsexy: use older, smaller models for routine work.

Why it matters

Finance leaders are discovering that agentic AI models, the latest frontier, consume tokens at rates that can exceed employee salaries. This reshuffles which AI tools banks actually deploy and forces a hard look at build-versus-buy decisions.

Do this week

Finance chief: Audit your last 90 days of token spend by use case and model size—cache repeated queries and flag any agentic model handling routine tasks that an older or smaller model could handle instead.

Token-based pricing has caught banks off guard

For the past several months, major AI providers including Anthropic, OpenAI, and Microsoft have shifted their billing model from flat subscriptions to per-token charges. Tokens are the units of data on which language models run. This shift has created immediate sticker shock in banking.

Royal Bank of Canada reported a 500% jump in token usage from 2025 to 2026 (company-reported). Bill Demchak, CEO of PNC Financial Services Group, stated bluntly that "these tokens are really expensive." Zachery Anderson, chief data and analytics officer at JPMorgan Chase, told Semafor that some employees are "spending more on tokens than their salary."

The financial impact matters because banks adopted AI specifically to reduce costs. If token consumption rises faster than productivity gains, the premise collapses. "Any impact that AI can have on the productivity of a bank, that productivity can be taken away by the cost of tokens," Demchak said at the Morgan Stanley U.S. Financials Conference.

The problem is model selection, not technology itself

Banks have been deploying agentic models—AI systems that can make decisions and complete tasks with minimal human oversight—for tasks that do not require that level of sophistication. Rob May, CEO of Neurometric, a consulting firm specializing in AI spend, explained: "If you're going to do simple tasks, those models are overkill. Why spend all the money on a giant model that can do that, but can also write you a great lasagna recipe and give you workout instructions?"

The solution is deliberate step-down: use smaller, older, or custom-built models for routine work. PNC has already adopted this approach. Ned Carroll, PNC's head of data and automation, noted: "I don't need a model to answer advanced calculus when I want to understand a policy or procedure around a check return."

Banks are also exploring open-source models, which carry no token cost, and response caching—storing answers to frequently asked queries so repeat requests hit a database instead of hitting the model and incurring charges.

Three tactics to control token spend immediately

  • Right-size the model for the task. Older, smaller, and custom-built language models handle classification, summarization, and routine lookup work without the cost overhead of frontier models.
  • Cache answers to repeated queries. If the same question is asked frequently and the answer is stable, store it and query the database first. No model call means no token charge.
  • Build or buy GPU capacity in-house. PNC has committed to owning its own graphics processing units and reducing reliance on third-party token purchases. This trades upfront capital for long-term control.

One more option exists: for high-risk tasks where AI failure is costly, revert to human labor. May noted: "There are still use cases for that." Sometimes the cheapest AI is no AI.

#Finance AI#Enterprise AI#LLM
Share:
Keep reading

Related stories