Tool brief · June 30, 2026

Gemini 3.5 Flash hits GA: a Flash-tier model you can actually point an agent loop at

DeveloperFor Developer

The tool

gemini-3.5-flash

What it is

A stable, GA model ID — gemini-3.5-flash — exposed through the Gemini API in AI Studio and Vertex. Gemini 3.5 Flash provides sustained frontier-level intelligence optimized for real-world tasks at a higher speed and lower cost. Designed for the agentic era, it excels at sub-agent deployment, multi-step workflows, and long-horizon tasks at scale. Google's pitch is "near-Pro on coding and tool use, at Flash latency." Treat that as the vendor's claim until your evals say otherwise.

The next-work-session test

You have an agent loop in production today calling gemini-3-flash-preview (or gpt-5.x-mini, or Claude Haiku). The endpoint is the question: does swapping in gemini-3.5-flash change the loop's win rate enough to justify the cost delta?

Concrete scenario: a ReAct-style coding agent with ~12 tool calls per task, MCP for filesystem + git, a 30-task internal eval set. Next session — swap the model ID, re-run the evals, diff the trace logs. The thing that's actually new for you is the `thinking_level` knob. Improved low thinking: low is now significantly improved for code and agentic tasks that require fewer steps, offering strong quality at lower latency and cost. That means your eval matrix is now 2D: model × thinking level. Budget for it.

Pricing

$1.50 per million input tokens, $9 per million output tokens. 1,048,576 token context window, maximum output of 65,536 tokens. Cached input is roughly $0.15/M according to third-party trackers — verify in your own console before you trust it.

Important context, because the Flash brand used to mean "cheap": Gemini 3 Flash bumped that to $0.50 and $3. Now 3.5 Flash sits at $1.50 and $9, which works out to five times the input price of the 2.5 model from less than a year earlier: the line has only been going up. Simon Willison's read is the same — at $1.50/million input and $9/million output it's getting close in price to Google's Gemini 3.1 Pro, which is $2 and $12. So "Flash" is no longer a price tier you reach for reflexively. Do the math against your token mix.

What we'd actually use it for

Honest, narrower use case: the sub-agent and tool-call orchestration layer of an existing agent, not the planner. It hits 76.2% on Terminal-Bench 2.1 and 83.6% on MCP Atlas, beating Gemini 3.1 Pro on both — vendor-reported, but MCP Atlas is the relevant benchmark if you're shipping MCP servers. That's where it earns its keep: tight, multi-step tool loops where you were already paying Pro prices for reliability and want to step down without the win-rate falling off a cliff.

Also: a strong default for eval harness graders and synthetic data generation, where you want fast, structured outputs and the input side is the cost driver.

Limits

Not free, and not cheap by 2025 standards. Your bill goes up if you're migrating from 2.5 Flash or 3.1 Flash-Lite. Run the cost diff on a real week of traffic before committing.
Reasoning ceiling. It trails on academic reasoning (Humanity's Last Exam, ARC-AGI-2). Don't swap it in for the planner if your planner is doing hard, novel reasoning.
Thinking-level cost surprises. Medium thinking is the default per OpenRouter's docs; if you don't pin thinking_level explicitly, output token counts (and bills) will jump versus what low would have produced. This is a real eval-rigor trap — pin the level in every test run.
Vendor benchmarks are vendor benchmarks. MCP Atlas and Terminal-Bench numbers come from Google's own table. Your repo, your tools, your win rate.
Still manual: writing the eval set, picking the thinking level per task type, tuning retries on tool-call failures. The model ID change is one line; the work around it isn't.

Try it if

You run an agent loop with heavy tool calls and want to A/B against a current Flash- or Haiku-tier model.
You're already on Gemini and want a drop-in upgrade with the same API surface — see the What's new in Gemini 3.5 Flash doc for the migration notes.
You can exploit the thinking_level=low setting on short-horizon tasks to claw back the price increase.
Your bottleneck is MCP tool-call reliability, not abstract reasoning.

Skip it if

You're price-sensitive and your current model is 3.1 Flash-Lite or 2.5 Flash — the per-token jump is real.
Your workload is single-shot Q&A or summarization with no tool use. You're paying for agentic capability you won't use.
You need the strongest possible reasoning on novel problems — go Pro, or wait for the next Pro tick.
You haven't built an eval harness yet. Without one, you can't tell if this swap helped, and the marketing won't tell you either. Start there, then come back.

Source: ai.google.dev