June 06, 2026 · ByteChatTrend

DeepSeek V4-Pro Price Cut Goes Permanent: Inside the AI Pricing War

Key takeaways

DeepSeek permanently cut V4-Pro output token prices by 75% in late May 2026, to $0.87 per million tokens — down from $3.48.
OpenAI GPT-5 costs $10 per million output tokens and Anthropic Claude Opus 4.7 costs $25 — making V4-Pro 10 to 28 times cheaper for output-heavy tasks.
DeepSeek attributes the sustainable pricing to running V4 on Huawei Ascend AI accelerators, reducing reliance on Nvidia hardware.
According to llm-stats.com, GPT-4-level performance has fallen from $30 per million tokens in 2023 to under $1 today, as competition and infrastructure efficiency compound.

In late May 2026, DeepSeek confirmed that the 75% price cut on its flagship V4-Pro model — originally pitched as a promotion — was now permanent. DeepSeek V4-Pro pricing now stands at $0.87 per million output tokens, putting it among the cheapest frontier-grade models available via API today, and reigniting the ongoing AI pricing war.

What DeepSeek V4-Pro Now Costs

The new permanent pricing structure breaks down as:

Cache-hit input: $0.003625 per million tokens
Cache-miss input: $0.435 per million tokens
Output: $0.87 per million tokens

Those numbers carry weight in context. According to InfoWorld, DeepSeek's promotional pricing was already causing ripples when it launched; making it permanent signals the company is confident this is sustainable, not a temporary loss-leader play. The previous output rate was $3.48 per million tokens.

How V4-Pro Compares to GPT-5, Claude, and Gemini

Put V4-Pro next to the other flagship models in active use today, and the gap is hard to ignore:

| Model | Input (per 1M tokens) | Output (per 1M tokens) | |---|---|---| | DeepSeek V4-Pro | $0.435 | $0.87 | | Google Gemini 3.5 Flash | $0.15 | $0.60 | | OpenAI GPT-5 | $2.50 | $10.00 | | Anthropic Claude Opus 4.7 | $5.00 | $25.00 |

Gemini 3.5 Flash still edges out V4-Pro on raw per-token rate, but it sits in a different tier — optimized for cost, not flagship capability. V4-Pro competes with GPT-5 and Claude Opus 4.7 on quality, where it is 11 to 28 times cheaper for output-heavy tasks like coding, summarization, and agentic workflows.

For a developer running real workloads, the math is stark. A job that costs $10 in GPT-5 output tokens costs under $1 on V4-Pro — if quality holds for the use case.

Why DeepSeek Can Sustain These Prices

The obvious question: how is this sustainable at scale? DeepSeek's V4 series was engineered to run on Huawei's Ascend AI accelerators rather than Nvidia hardware, a shift driven partly by US export controls limiting access to top-tier Nvidia chips. As The Next Web and Technology.org both reported, the rising availability of Huawei's Ascend 950 and 950PR AI supernode systems gave DeepSeek enough infrastructure cost reduction to lock in lower rates permanently.

There is a broader irony here. The export restrictions intended to slow China's AI progress reportedly pushed DeepSeek to build more efficient inference on alternative silicon — and those efficiency gains are now flowing straight through to API pricing.

The Bigger Picture: AI Prices Keep Falling

V4-Pro's cut is the latest chapter in a multi-year trend. According to llm-stats.com, GPT-4-level performance cost roughly $30 per million tokens in 2023. Today the same capability is available for under $1 — a roughly 30x reduction in three years, driven by competition, model efficiency improvements, and infrastructure scale.

The pressure is spreading. Anthropic recently shifted enterprise customers from flat-seat pricing to usage-based billing. Google has been aggressive with Gemini Flash pricing. OpenAI is expanding GPT-5 across more consumer tiers. Every major lab is now competing on cost as actively as on capability, and V4-Pro's permanent cut accelerates that dynamic further.

The catch worth noting: "cheaper models" only translate to savings if you are actually paying API rates. If you are on a flat monthly subscription, price cuts to the underlying models do not reach your wallet at all — the platform absorbs or pockets the margin.

This is the same premise behind bring-your-own-key tools like ByteChat: when a model's price drops, the savings land immediately for users paying per token directly.

What to Watch Next

The interesting variable going forward is whether V4-Pro's value-per-dollar pushes quality-per-cost expectations high enough that mid-tier models lose their market position. If the capability gap between a $25/M output model and a $0.87/M output model keeps narrowing, the justification for paying a premium becomes increasingly task-specific rather than general.

It also reframes what "premium AI" means in 2026. Paying more may still be worth it for specific reasoning-heavy or safety-critical tasks — but the default assumption that more expensive equals better is getting harder to sustain.

The AI pricing curve hasn't flattened yet — and that's generally good news for anyone paying by the token.

Frequently asked questions

Is DeepSeek V4-Pro really 75% cheaper now?

Yes. DeepSeek originally launched V4-Pro with a promotional 75% discount, then confirmed in late May 2026 that the cut is permanent. Output tokens now cost $0.87 per million, down from the original $3.48 per million.

How does DeepSeek V4-Pro compare to GPT-5 and Claude Opus 4.7 in price?

GPT-5 charges $10 per million output tokens and Claude Opus 4.7 charges $25 per million output tokens. DeepSeek V4-Pro at $0.87 per million output tokens is roughly 11 times cheaper than GPT-5 and 28 times cheaper than Claude Opus 4.7 for output-heavy workloads.

Does a cheaper DeepSeek save me money if I use a flat-rate AI subscription?

No — flat monthly subscriptions do not pass API price changes through to users. You benefit directly from model price cuts only when paying actual API rates, either as a developer or through a bring-your-own-key chat tool.