HomeFeaturesPricingBlogFAQContact
← All articles

Gemini 3.5 Flash Is Now Generally Available at $1.50 per Million Tokens

Key takeaways
  • Google's Gemini 3.5 Flash became generally available in June 2026 at $1.50 per million input tokens and $9 per million output tokens.
  • Gemini 3.5 Flash outperforms Gemini 3.1 Pro on coding and agentic benchmarks (76.2% Terminal-Bench, 83.6% MCP Atlas) at roughly 25% lower cost than the older Pro model.
  • A lighter option, Gemini 3.1 Flash-Lite, is also available at $0.25 per million input tokens for cost-sensitive or latency-sensitive workloads.
  • The release continues the mid-2026 trend of frontier-grade AI dropping in price even as benchmark scores climb.

Google's Gemini 3.5 Flash pricing is now public and generally available: $1.50 per million input tokens, $9 per million output tokens. For developers watching AI API costs closely, those numbers carry a story about where the market is heading in mid-2026.

What Gemini 3.5 Flash Delivers

Gemini 3.5 Flash launched at Google I/O on May 20, 2026 and has been rolling out to full general availability since then. The model carries a 1,048,576-token context window - effectively 1 million tokens - and Google describes its speed as "4x faster than comparable models" at the same capability tier.

On benchmarks, Gemini 3.5 Flash scores 76.2% on Terminal-Bench and 83.6% on MCP Atlas, according to published data from pricepertoken.com and OpenRouter. Notably, it outperforms Gemini 3.1 Pro on both coding and agentic tasks while costing roughly 25% less per token than that older Pro model.

The headline is clear: more capable than the previous generation flagship, at a lower price per token.

Why the Pricing Looks Higher Than Expected

There is a catch worth understanding. Gemini 3.5 Flash costs more per token than its predecessor, Gemini 3.0 Flash. Simon Willison, who tracks these releases at simonwillison.net, noted the positioning plainly: "more expensive, but Google plan to use it for everything."

This is a pattern repeating across the AI industry. Each new generation of "Flash" or budget-tier models is effectively the prior generation's frontier model in a faster, leaner wrapper. The per-token price rises slightly compared to the previous Flash, but capability jumps further - so you get more done per dollar even as the nominal rate increases.

Google has also released Gemini 3.1 Flash-Lite alongside the 3.5 series. Priced at just $0.25 per million input tokens with 2.5x faster output generation than earlier Gemini versions, Flash-Lite is aimed at latency-sensitive or high-volume workloads where top-tier accuracy is not the priority.

How Gemini 3.5 Flash Compares to the Market

Putting the numbers in context helps clarify who should consider switching.

Claude Opus 4.8, Anthropic's current flagship released May 28, costs $5 per million input tokens and $25 per million output tokens. Gemini 3.5 Flash at $1.50 input and $9 output slots meaningfully below that - roughly 3x cheaper on input. OpenAI's top models sit in similar territory to Opus.

For workloads that do not require the deepest reasoning - document parsing, summarization, structured output extraction, fast multi-turn chat - the savings compound quickly. At 100 million input tokens per month, shifting from a $5/M model to Gemini 3.5 Flash saves roughly $350 per month on input alone.

The important qualifier: benchmark scores on coding and agentic tasks are strong, but real-world quality for complex reasoning or creative work may still favor pricier models. The right model depends on the task, and the math only works if the output quality is sufficient for your specific use case.

The Broader Shift: More Tiers, More Decisions

June 2026 looks different from a developer's perspective than six months ago. There is no longer a simple cheap-vs-expensive binary. Google alone now offers Flash-Lite at $0.25/M input, Gemini 3.5 Flash at $1.50/M input, and Gemini 3.5 Pro at a higher tier. Anthropic spans from Claude Haiku to Opus 4.8. Microsoft's MAI-Code-1-Flash, released into GitHub Copilot earlier this month, prices out at $0.75/M input for coding-specific tasks.

The practical consequence: picking the right model for each task is now the most important cost lever a developer has. Overkill wastes money; underpowering hurts quality.

This is the same premise behind bring-your-own-key tools like ByteChat - the idea that switching models on a per-session basis beats being locked into a single provider's subscription at a flat monthly rate.

Frequently asked questions

Is Gemini 3.5 Flash more expensive than Gemini 3.0 Flash?

Yes, the per-token price for Gemini 3.5 Flash is higher than its predecessor. However, it outperforms Gemini 3.1 Pro on coding and agentic benchmarks at roughly 25% lower cost, meaning you get more capability per dollar compared to the prior generation's top model.

How does Gemini 3.5 Flash pricing compare to Claude Opus 4.8?

Gemini 3.5 Flash costs $1.50 per million input tokens versus $5 for Claude Opus 4.8 - about 3x cheaper on input. Output is $9/M for Gemini 3.5 Flash versus $25/M for Opus 4.8. For tasks where Gemini's benchmark performance is sufficient, the cost difference is substantial.

What is Gemini 3.1 Flash-Lite and when should I use it instead?

Gemini 3.1 Flash-Lite is priced at $0.25 per million input tokens - well below Gemini 3.5 Flash - and delivers 2.5x faster output generation than earlier Gemini versions. It is suited to high-volume, latency-sensitive tasks where maximum accuracy is not required, such as classification, routing, or short-form summarization.

The midrange tier in AI is getting crowded - which is genuinely good news for anyone paying per token.

Use Gemini 3.5 Flash on your own Google API key at zero markup

ByteChat lets you bring your own Google API key and chat with Gemini 3.5 Flash directly. No subscription, no token markup - just the model cost shown on Google's pricing page.

Try ByteChat free →