Back to news
AnalysisMay 20, 2026· 2 min read

Gemini 3.5 Flash outperforms Pro models on coding at 4x speed

Google released Gemini 3.5 Flash today, claiming frontier-level performance on agentic and coding tasks while running 4 times faster than rival models. Shopify, Macquarie, and Salesforce are already using it to automate multi-week workflows.

Our Take

Google is shipping real agentic performance, but the benchmarks are company-reported and the speed claim needs independent reproduction to matter.

Why it matters

If 3.5 Flash truly delivers flagship-model coding at a fraction of the latency, it reshapes the unit economics of agent-based automation. Practitioners need to know whether the speedup is real or marketing artifact before rebuilding workflows around it.

Do this week

Engineering teams: run your internal coding and multi-step reasoning benchmarks against 3.5 Flash on your own hardware before committing infrastructure spend.

Google ships Gemini 3.5 Flash with coding and agentic claims

Google announced Gemini 3.5 Flash on May 19, 2026, positioning it as the company's strongest agentic and coding model yet. The model is available immediately via the Gemini API, Google Antigravity (the company's agent orchestration platform), and consumer products including the Gemini app and AI Mode in Search.

Google reports that 3.5 Flash outperforms its predecessor Gemini 3.1 Pro on three published benchmarks: Terminal-Bench 2.1 (76.2%), GDPval-AA (1656 Elo), and MCP Atlas (83.6%). The company also claims 84.2% performance on CharXiv Reasoning, a multimodal task. On output token throughput, Google states 3.5 Flash runs 4 times faster than other frontier models (company-reported).

Three named customers are running 3.5 Flash in production pilots. Shopify uses subagents to forecast merchant growth. Macquarie Bank is testing it on document-heavy customer onboarding (100+ page PDFs). Salesforce integrated it into Agentforce for multi-turn task automation. Ramp, Xero, and Databricks are also listed as early users automating document processing, tax workflows, and data diagnostics respectively.

Google separately introduced Gemini Spark, a personal agent running 3.5 Flash that operates continuously in the background. The company is rolling it to trusted testers immediately and planning a Beta for Google AI Ultra subscribers in the US next week.

Benchmark claims require independent testing

The core claim is speed-without-sacrifice: frontier intelligence at exceptional latency. If true, this eliminates the primary friction point in deploying agents at scale. Shorter inference time means lower per-query cost, faster iteration loops, and better user experience in real-time workflows.

The catch is cardinality. Every benchmark cited is company-reported. Terminal-Bench 2.1, GDPval-AA, MCP Atlas, and CharXiv Reasoning do not yet appear in independent benchmarking efforts like HELM, BigCodeBench, or academic reproducers. The 4x throughput claim is also unvalidated outside Google's infrastructure.

Customer testimonials (Shopify, Macquarie, Salesforce) are real pilots, not full deployments. Descriptions like "developing forecasts" and "piloting" signal early-stage use, not scaled revenue impact. Without third-party SLA data or external latency measurement, the practical speed advantage remains a marketing assertion.

Validate latency and cost before migrating

If your team is running production agents on 3.1 Pro or other frontier models, test 3.5 Flash on a representative subset of your actual queries and workflows. Measure end-to-end latency in your own infrastructure, not on Google's demo setup. Compare true cost-per-successful-task against your current model, accounting for your failure rate and retry loops.

The speed claim is worth investigating because latency is real friction in agent deployment. But don't assume Google's published numbers transfer to your codebase. Run the benchmark.

#Gemini#Agents#Developer Tools#Enterprise AI
Share:
Keep reading

Related stories