← All tool briefs

Tool brief · June 30, 2026

Gemini 3.5 Flash hits GA: a Flash-tier model you can actually point an agent loop at

DeveloperFor Developer

The tool

gemini-3.5-flash

Visit gemini-3.5-flash

What it is

A stable, GA model ID — gemini-3.5-flash — exposed through the Gemini API in AI Studio and Vertex. Gemini 3.5 Flash provides sustained frontier-level intelligence optimized for real-world tasks at a higher speed and lower cost. Designed for the agentic era, it excels at sub-agent deployment, multi-step workflows, and long-horizon tasks at scale. Google's pitch is "near-Pro on coding and tool use, at Flash latency." Treat that as the vendor's claim until your evals say otherwise.

The next-work-session test

You have an agent loop in production today calling gemini-3-flash-preview (or gpt-5.x-mini, or Claude Haiku). The endpoint is the question: does swapping in gemini-3.5-flash change the loop's win rate enough to justify the cost delta?

Concrete scenario: a ReAct-style coding agent with ~12 tool calls per task, MCP for filesystem + git, a 30-task internal eval set. Next session — swap the model ID, re-run the evals, diff the trace logs. The thing that's actually new for you is the `thinking_level` knob. Improved low thinking: low is now significantly improved for code and agentic tasks that require fewer steps, offering strong quality at lower latency and cost. That means your eval matrix is now 2D: model × thinking level. Budget for it.

Pricing

$1.50 per million input tokens, $9 per million output tokens. 1,048,576 token context window, maximum output of 65,536 tokens. Cached input is roughly $0.15/M according to third-party trackers — verify in your own console before you trust it.

Important context, because the Flash brand used to mean "cheap": Gemini 3 Flash bumped that to $0.50 and $3. Now 3.5 Flash sits at $1.50 and $9, which works out to five times the input price of the 2.5 model from less than a year earlier: the line has only been going up. Simon Willison's read is the same — at $1.50/million input and $9/million output it's getting close in price to Google's Gemini 3.1 Pro, which is $2 and $12. So "Flash" is no longer a price tier you reach for reflexively. Do the math against your token mix.

What we'd actually use it for

Honest, narrower use case: the sub-agent and tool-call orchestration layer of an existing agent, not the planner. It hits 76.2% on Terminal-Bench 2.1 and 83.6% on MCP Atlas, beating Gemini 3.1 Pro on both — vendor-reported, but MCP Atlas is the relevant benchmark if you're shipping MCP servers. That's where it earns its keep: tight, multi-step tool loops where you were already paying Pro prices for reliability and want to step down without the win-rate falling off a cliff.

Also: a strong default for eval harness graders and synthetic data generation, where you want fast, structured outputs and the input side is the cost driver.

Limits

  • Not free, and not cheap by 2025 standards. Your bill goes up if you're migrating from 2.5 Flash or 3.1 Flash-Lite. Run the cost diff on a real week of traffic before committing.
  • Reasoning ceiling. It trails on academic reasoning (Humanity's Last Exam, ARC-AGI-2). Don't swap it in for the planner if your planner is doing hard, novel reasoning.
  • Thinking-level cost surprises. Medium thinking is the default per OpenRouter's docs; if you don't pin thinking_level explicitly, output token counts (and bills) will jump versus what low would have produced. This is a real eval-rigor trap — pin the level in every test run.
  • Vendor benchmarks are vendor benchmarks. MCP Atlas and Terminal-Bench numbers come from Google's own table. Your repo, your tools, your win rate.
  • Still manual: writing the eval set, picking the thinking level per task type, tuning retries on tool-call failures. The model ID change is one line; the work around it isn't.

Try it if

  • You run an agent loop with heavy tool calls and want to A/B against a current Flash- or Haiku-tier model.
  • You're already on Gemini and want a drop-in upgrade with the same API surface — see the What's new in Gemini 3.5 Flash doc for the migration notes.
  • You can exploit the thinking_level=low setting on short-horizon tasks to claw back the price increase.
  • Your bottleneck is MCP tool-call reliability, not abstract reasoning.

Skip it if

  • You're price-sensitive and your current model is 3.1 Flash-Lite or 2.5 Flash — the per-token jump is real.
  • Your workload is single-shot Q&A or summarization with no tool use. You're paying for agentic capability you won't use.
  • You need the strongest possible reasoning on novel problems — go Pro, or wait for the next Pro tick.
  • You haven't built an eval harness yet. Without one, you can't tell if this swap helped, and the marketing won't tell you either. Start there, then come back.

Source: ai.google.dev

No sponsored verdicts · We have no paid relationship with featured vendors