Gemini 3.5 Flash outperforms Pro models on coding at 4x speed

Google ships Gemini 3.5 Flash with coding and agentic claims

Google announced Gemini 3.5 Flash on May 19, 2026, positioning it as the company's strongest agentic and coding model yet. The model is available immediately via the Gemini API, Google Antigravity (the company's agent orchestration platform), and consumer products including the Gemini app and AI Mode in Search.

Google reports that 3.5 Flash outperforms its predecessor Gemini 3.1 Pro on three published benchmarks: Terminal-Bench 2.1 (76.2%), GDPval-AA (1656 Elo), and MCP Atlas (83.6%). The company also claims 84.2% performance on CharXiv Reasoning, a multimodal task. On output token throughput, Google states 3.5 Flash runs 4 times faster than other frontier models (company-reported).

Three named customers are running 3.5 Flash in production pilots. Shopify uses subagents to forecast merchant growth. Macquarie Bank is testing it on document-heavy customer onboarding (100+ page PDFs). Salesforce integrated it into Agentforce for multi-turn task automation. Ramp, Xero, and Databricks are also listed as early users automating document processing, tax workflows, and data diagnostics respectively.

Google separately introduced Gemini Spark, a personal agent running 3.5 Flash that operates continuously in the background. The company is rolling it to trusted testers immediately and planning a Beta for Google AI Ultra subscribers in the US next week.

Benchmark claims require independent testing

The core claim is speed-without-sacrifice: frontier intelligence at exceptional latency. If true, this eliminates the primary friction point in deploying agents at scale. Shorter inference time means lower per-query cost, faster iteration loops, and better user experience in real-time workflows.

The catch is cardinality. Every benchmark cited is company-reported. Terminal-Bench 2.1, GDPval-AA, MCP Atlas, and CharXiv Reasoning do not yet appear in independent benchmarking efforts like HELM, BigCodeBench, or academic reproducers. The 4x throughput claim is also unvalidated outside Google's infrastructure.

Customer testimonials (Shopify, Macquarie, Salesforce) are real pilots, not full deployments. Descriptions like "developing forecasts" and "piloting" signal early-stage use, not scaled revenue impact. Without third-party SLA data or external latency measurement, the practical speed advantage remains a marketing assertion.

Validate latency and cost before migrating

If your team is running production agents on 3.1 Pro or other frontier models, test 3.5 Flash on a representative subset of your actual queries and workflows. Measure end-to-end latency in your own infrastructure, not on Google's demo setup. Compare true cost-per-successful-task against your current model, accounting for your failure rate and retry loops.

The speed claim is worth investigating because latency is real friction in agent deployment. But don't assume Google's published numbers transfer to your codebase. Run the benchmark.

Gemini 3.5 Flash outperforms Pro models on coding at 4x speed

Our Take

Why it matters

Do this week

Google ships Gemini 3.5 Flash with coding and agentic claims

Benchmark claims require independent testing

Validate latency and cost before migrating

One daily brief. Every story gets a hype verdict.

Related stories

The 30-Day AI-Native Challenge: a free/freemium roadmap to real AI skills

Your AI compliance gap is wider than your governance framework

Compliance teams ditch spreadsheets for unified EDD software