← All tool briefs

Tool brief · June 29, 2026

Gemini 3.5 Flash's computer use, judged by a founder's calendar

FounderFor Founder & Operator

The tool

Computer Use in Gemini 3.5 Flash

Visit Computer Use in Gemini 3.5 Flash

What it is

A built-in tool inside Gemini 3.5 Flash that lets the model see a screen, click, type and scroll its way through software that wasn't designed for an API. Computer use is now a built-in tool supported in Gemini 3.5 Flash. Previously only available as a standalone Gemini 2.5 computer use model, computer use is now integrated natively in the main Gemini Flash model. It is a public preview, not general availability.

The next-work-session test

You're a founder. Tomorrow morning you have to pull last week's pipeline from a CRM that doesn't expose the views you need by API, paste it into a sheet, and tag every deal by source. Today that's 40 minutes of clicking. With Computer Use in 3.5 Flash, you script the agent once against the live UI, run it Monday mornings, and get back the time. The change isn't "AI does my job" — it's that one recurring 40-minute task moves off your calendar without you hiring an integration engineer or paying for Zapier seats for a tool nobody else uses.

That's the bet worth running this week.

Pricing

The Computer Use tool itself doesn't have a separate price — you pay standard Gemini 3.5 Flash token rates. According to third-party listings, Gemini 3.5 Flash costs $1.50 per million input tokens and $9 per million output tokens, with a 1,048,576 token context window and maximum output of 65,536 tokens. Cache hits cost $0.15 per 1M tokens vs $1.50 for standard input — a 90% reduction.

Worth flagging: computer-use agents send screenshots as input. Vision tokens add up fast. The headline $1.50/M number is the floor, not the realistic per-task cost. Treat it as Google's pricing claim and instrument your own runs before assuming the math.

What we'd actually use it for

Narrower than the demos suggest. The honest list:

  • Pulling reports out of vendor dashboards that don't have CSV export.
  • Reconciling a list across two SaaS tools you log into but don't own (your accountant's portal, a partner's admin panel).
  • Filling out the same five-field form 200 times for a launch — applying to directories, claiming listings, submitting to review sites.
  • QA-ing your own product's signup flow nightly across browsers.

Not: replacing your ops hire. Not: running unsupervised on anything that costs money or sends email.

Limits

It's preview, not GA — expect breakage. It is a public preview, not general availability. The headline benchmark (78.4 on OSWorld-Verified) is self-reported. Take that benchmark as Google's claim about its own model on a benchmark Google chose.

Security is the real ceiling. To mitigate some of the prompt injection risks for agents operating in live environments, Google uses targeted adversarial training for computer use in Gemini 3.5 Flash. They also ship two optional enterprise safeguard systems that let enterprises require explicit user confirmation for sensitive or irreversible actions, and automatically stop tasks if an indirect prompt injection is identified. Translation: if your agent reads a webpage with hostile instructions buried in it, bad things can happen. Don't point this at your bank, your Stripe dashboard, or anything one click away from sending money or wiring data out.

Other gaps:

  • Latency. Screenshot → reason → click loops are slow. Not for anything time-sensitive.
  • Brittle to UI changes. A vendor rev'ing their dashboard breaks your agent.
  • Audit trail. You need to log every action yourself.

Try it if

  • You have 3+ recurring weekly tasks that live in SaaS tools without good APIs.
  • You're already comfortable in the Gemini API or Google AI Studio.
  • You can run it in a sandboxed browser profile, not your main session.
  • You want to test before a competitor's ops team does.

Skip it if

  • The task touches money movement, customer comms, or anything regulated.
  • You don't have someone who can babysit a flaky agent for the first month.
  • Your workflows are already covered by Zapier, Make, or a native API — boring tools that don't hallucinate are still better for boring jobs.
  • You need GA-grade reliability now, not in two quarters.

The bet here isn't that Computer Use replaces your stack. It's that the cost of trying agentic automation on real GTM workflows just dropped to a Flash-tier API bill. For a founder, that's worth one afternoon.

Source: deepmind.google

No sponsored verdicts · We have no paid relationship with featured vendors