Back to news
NewsMay 20, 2026· 3 min read

Google processes 3.2 quadrillion tokens monthly as Gemini 3.5 Flash ships

Google revealed token consumption jumped 7x in a year to 3.2 quadrillion monthly. New Gemini 3.5 Flash model runs 4x faster than competitors while costing half as much, available now across products and APIs.

Our Take

Google is shipping real agentic products (Ask Maps, Ask YouTube, voice Docs) with measurable user adoption, but the headline metric—token volume—obscures whether these features solve customer problems or just process more requests.

Why it matters

Practitioners budgeting AI spend need to know if Flash's price-to-speed ratio actually reduces their token costs or just enables more consumption. Google's infrastructure claims matter only if they translate to lower end-user latency or cheaper API pricing at scale.

Do this week

Finance: audit your token consumption across all Gemini models and estimate savings if you shift 80% of workloads to 3.5 Flash before Q3 budget planning.

Google ships Gemini 3.5 Flash with 4x faster output than rivals

Google released Gemini 3.5 Flash today, positioning it as a frontier-capable model that runs four times faster than competing models while costing less than half the price of other frontier options (per Google). The company claims the model outperforms Gemini 3.1 Pro across most benchmarks, with substantial gains in coding tasks and on GDPVal, a metric capturing real-world economically valuable workflows.

Token processing inside Google has accelerated sharply. In March, Google processed 500 billion tokens daily across internal AI developer tools. That figure has since doubled multiple times: the company now processes over 3 trillion tokens per day internally. Across all surfaces (products, APIs, enterprise customers), Google reports processing 3.2 quadrillion tokens monthly—a sevenfold increase from the prior year's 480 trillion.

The Gemini app has grown to 900 million monthly active users, more than double the 400 million reported a year ago. Daily requests within the app have grown over seven times in that period. Search-based features including AI Overviews (2.5 billion monthly active users) and AI Mode (1 billion monthly active users) continue to expand user engagement.

Google also announced new conversational features rolling out across products: Ask YouTube (available in U.S. summer testing), voice-powered Docs Live for Google Docs subscribers (summer), and expanded voice capabilities coming to Gmail and Keep. The company detailed infrastructure investments including its eighth-generation TPU chips (TPU 8t for training, 8i for inference) and distributed training across over 1 million TPUs globally. Google's annual capex is expected to reach $180 to $190 billion this year, up from $31 billion in 2022.

On AI safety, Google announced that OpenAI, Kakao, and Eleven Labs have adopted SynthID, Google's invisible watermark for AI-generated media. SynthID has watermarked over 100 billion images and videos to date. Content Credentials verification, showing whether content originated from a camera or AI, will roll out to Search and Chrome.

Token volume growth does not prove product value

The 3.2-quadrillion-token headline is a supply-side metric: it measures what Google processes, not what customers extract from it. High token volume can signal strong adoption or inefficient consumption. Without customer-reported cost savings or latency improvements, the figure remains a throughput vanity metric.

Gemini 3.5 Flash's speed advantage is real: four times faster output than frontier competitors is measurable and matters for latency-sensitive applications. The price-to-capability ratio, if confirmed in independent benchmarks, could reduce total cost of ownership for high-volume users. Yet Google's own internal token consumption spike (500 billion to 3+ trillion daily in months) suggests that cheaper, faster inference may simply increase token burn rather than shrink it. Enterprises already blowing through annual budgets by May need to know whether Flash solves budget constraints or enables overspending.

The new conversational products (Ask YouTube, voice Docs) address real friction: video navigation and hands-free document creation are defensible use cases. But no adoption metrics for these early-stage features are yet public. User growth in the Gemini app and Search features shows engagement, not necessarily the economic value of agentic workflows.

Verify Flash costs before committing infrastructure

Test Gemini 3.5 Flash on your highest-volume inference workloads in a staging environment and measure tokens-per-task and end-to-end latency before shifting production traffic. Compare the reported 50% price reduction against your actual token consumption under realistic load; token efficiency gains often disappear under sustained real-world demand. If you are currently processing 1 trillion tokens daily, model the $1 billion annual savings claim with your own pricing contract (list rates vary by tier) and confirm with a cost-per-output-token calculation before budget approval.

#Gemini#Agents#Developer Tools#Enterprise AI
Share:
Keep reading

Related stories