A month after Google dropped Gemini 3 Pro and swept every benchmark, they released Flash. And it broke the rules.

Top 3 intelligence. Top 5 price. Top speed. You don’t usually get all three.

Flash Beats Pro on SWE-bench

The headline that shouldn’t be real: Gemini 3 Flash scores 78% on SWE-bench Verified. Gemini 3 Pro scores 76.2%.

A distilled model outperforming its teacher on the benchmark that matters most for agentic coding.

Why this inversion matters

SWE-bench Verified tests real software engineering tasks: reading codebases, understanding issues, writing fixes. It’s the closest proxy we have for “can this model actually code autonomously?” Flash beating Pro here suggests the distillation process didn’t just compress - it sharpened specific reasoning paths for coding.

This isn’t a rounding error. Flash also matches or beats Pro on other benchmarks:

  • GPQA Diamond: 90.4% (vs Pro’s 93.8%)
  • MMMU-Pro: 81.2% (matches Pro)
  • Video-MMMU: 87.6% (matches Pro’s multimodal dominance)

The Numbers That Matter for Agentic Work

Three metrics define whether a model works for autonomous agents:

Speed: 218 tokens/second. 3x faster than Gemini 2.5 Pro. Fast enough to keep you in flow state during iteration loops.

Price: $0.50/1M input, $3/1M output. 4x cheaper than Pro. Cheap enough to run 10 agents in parallel without watching your bill.

Quality: 78% SWE-bench. Competitive with Claude Sonnet 4.5 (77.2%), ahead of GPT-5.2. Good enough to trust with multi-step tasks.

This model should not exist. You don’t get intelligence and price. On top of that, top speed. It is a lightning fast model.

— IndyDevDan

The math changes when you combine these. Running five Flash instances costs less than one Pro instance, completes faster, and produces comparable quality. For agentic workflows where you validate outputs anyway, this is the right tradeoff.

What Agentic Tools Are Saying

The teams building on these models see it too.

Cognition (Devin): “Gemini 3 Flash is a major step above other models in its speed class when it comes to instruction following and intelligence. It’s immediately become our go-to for latency-sensitive experiences.”

JetBrains: “In our Junie agentic-coding evaluation, Gemini 3 Flash delivered quality close to Gemini 3 Pro, while offering significantly lower inference latency and cost.”

Cline: “Gemini 3 has been a game-changer. We’re using it to handle complex, long-horizon coding tasks that require deep context understanding across entire codebases.”

Cursor: “Our engineers have found Gemini 3 Flash to work well together with Debug Mode. Flash is fast and accurate at investigating issues and finding the root cause of bugs.”

The pattern: fast iteration, multi-step tasks, cost-effective scaling. Flash fits the agentic use case better than Pro for most workflows.

Multimodal Without the Wait

Gemini’s multimodal lead extends to Flash. Native processing of images, video, audio, and PDFs at inference time. No mode switching, no separate pipelines.

Real feedback from production:

Resemble AI: 4x faster multimodal analysis compared to 2.5 Pro for deepfake detection.

Box: 15% accuracy improvement on their hardest extraction tasks: handwriting, long-form contracts, complex financial data.

Rakuten: 50%+ outperformance on multilingual meeting transcription with speaker identification.

For agentic workflows that need to see (screenshots, diagrams, documentation), Flash processes visual inputs at the same speed it handles text. That matters when your agent needs to iterate on UI fixes or analyze architecture diagrams.

Media resolution control

Gemini 3 introduces media_resolution parameter: low, medium, high, ultra_high. Higher resolution improves fine text reading and small detail detection, but increases token usage. For most agentic tasks, medium is the sweet spot.

What Flash Doesn’t Solve

The caveats are real:

  • Hallucination rate: 91% per Artificial Analysis - higher than Pro (88%). When Flash is wrong, it’s confidently wrong. Agentic workflows need output validation regardless, but this makes it non-negotiable.

  • Token consumption: Uses 2x the tokens of 2.5 Flash on reasoning tasks. The price per token is lower, but total cost per task may be similar for complex problems.

  • Speed tradeoff: 22% slower than 2.5 Flash (non-reasoning). You’re trading raw speed for intelligence.

  • Long context reliability: 22.1% on Artificial Analysis tests vs Pro’s 26.3% at 1M tokens. For full context window usage, Pro is more reliable.

If you need absolute correctness without validation, use Pro. If you need maximum context reliability, use Pro. For everything else - iterative development, parallel agents, cost-conscious scaling - Flash is the better fit.

The Trust Equation

Andy from IndyDevDan frames 2026 as “the year of trust” in agentic systems. His argument: the model isn’t the limitation anymore. Trust is.

Flash changes the trust calculation:

  • Speed enables iteration: Fast enough to try multiple approaches in the time one slow model takes
  • Price enables scale: Cheap enough to run best-of-N patterns without budget anxiety
  • Quality enables autonomy: Good enough to let agents run longer before needing human review

The question isn’t “is Flash as good as Pro?” It’s “does Flash let me ship faster?” For agentic development, the answer is yes.

Try It

Flash is available now in Google AI Studio, Gemini CLI, and via API. If you’re already using the hybrid workflow with Gemini 3 Pro, switch gemini-3-pro-preview to gemini-3-flash-preview for faster iteration.

For agentic coding tools: Flash is already in Cursor, Cline, JetBrains AI, and Gemini Code Assist. Check your model settings.