DeepSeek V4 Is Here: 1M Context Becomes Standard, Challenging Top Closed-Source AI

DeepSeek V4, which made the whole world wait bitterly until April, has finally arrived!

Just now, DeepSeek V4 really came!

Today, DeepSeek, the one that once broke the dominance of closed-source models almost by itself and proved that DeepSeek starts updating frequently to shift industry dynamics, has officially announced to global developers with the preview version of the DeepSeek-V4 series—

The civilian era of million-level context (1M Context), and a new peak in open-source Agent capabilities, world knowledge, and reasoning performance, has arrived.

DeepSeek V4 has once again achieved leadership in China and in the open-source field, mirroring the rapid rise we’ve seen as Qwen 3.6 pushed boundaries.

The technical report of V4 has also been released at the same time.

Paper address: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/DeepSeek_V4.pdf

DeepSeek V4 Models Overview: V4-Pro vs V4-Flash

DeepSeek V4-Pro: Flagship Model Specifications

The DeepSeek-V4 series includes two versions: the performance monster DeepSeek-V4-Pro, with 1.6T total parameters and 49B activated parameters.

DeepSeek V4-Flash: Efficient and Cost-Optimized Version

DeepSeek-V4-Flash is designed specially for high efficiency and economy, with 284B total parameters and 13B activated parameters.

deepseek v4 vs claude opus 4.6 vs gpt 5.4

It can be said that DeepSeek-V4-Pro has reached a new peak for open-source models, benchmarking against the world’s top closed-source level.

DeepSeek V4-Pro Performance: Agent, Knowledge, and Reasoning

DeepSeek V4 Agent Capability Breakthrough

First, V4-Pro has achieved a leap-forward breakthrough in Agent capabilities, and its Agentic Coding level firmly ranks first in the open-source world.

Actual test feedback shows that its coding experience has already surpassed Sonnet 4.5, and its delivery quality is catching up with Opus 4.6 (non-thinking mode). It has now become the preferred model for internal Agent programming in the company.

DeepSeek V4 World Knowledge Benchmark Performance

Second, it has deep world knowledge reserves.

In the knowledge evaluation dimension, V4-Pro is significantly ahead of similar open-source products, and the gap with the closed-source benchmark Gemini-Pro-3.1 has been reduced to a very small range.

DeepSeek V4 Reasoning and STEM Capabilities

Also, it has top-tier logical reasoning performance.

En hard-core areas such as mathematics, STEM, and high-difficulty competitive programming, V4-Pro not only tops the open-source community, but also already has real combat competitiveness to challenge the strongest closed-source models in the world.

DeepSeek V4 Architecture Innovations: CSA, HCA, and mHC

Supporting these two models in looking down on the field are the “three great skills” of the underlying technology:

Hybrid Attention in DeepSeek V4 (CSA + HCA)

DeepSeek-V4 did not blindly increase hardware investment, but creatively designed a hybrid attention architecture.

Compressed Sparse Attention (CSA) compresses the KV cache along the token dimension and combines it with DSA sparse attention; Heavy Compressed Attention (HCA) carries out even more extreme compression to maintain dense computation.

This “long and short combination” strategy greatly reduces computation and memory requirements when the model handles million-word context.

Manifold-Constrained Hyper-Connection (mHC) in DeepSeek V4

To improve the stability of signal propagation and strengthen model expressiveness, V4 introduces the mHC structure, upgrading the traditional residual connection.

Muon Optimizer in DeepSeek V4 Training

The new Muon optimizer is introduced, making the training process not only converge faster, but also more stable.

It is exactly these structural innovations that allow DeepSeek-V4 to achieve a qualitative leap in inference efficiency.

In the extreme scenario of 1 million token context, DeepSeek-V4-Pro’s single-token inference computation is only 27% of the previous generation, and KV cache usage is reduced to an astonishing 10%.

DeepSeek V4-Flash Performance: Speed, Cost, and Use Cases

Compared with the Pro version, the Flash version is a faster and more efficient economical choice.

DeepSeek V4-Flash Reasoning vs Pro Comparison

Although it is slightly weaker than the Pro version in the depth of world knowledge, DeepSeek-V4-Flash keeps a logical reasoning level close to it.

DeepSeek V4 API Cost and Efficiency Advantages

Benefiting from a more streamlined parameter scale and activation mechanism, it can provide users with API access that responds faster and costs less.

DeepSeek V4 Agent Task Performance Differences

When handling basic Agent tasks, V4-Flash performs almost the same as the Pro version, but when facing extremely complex tasks, there is still room for further progress.

DeepSeek V4 Long Context Breakthrough: 1M Tokens Becomes Standard

DeepSeek-V4 introduces a revolutionary attention mechanism.

Through efficient compression in the Token dimension and combining DSA sparse attention (DeepSeek Sparse Attention) technology, it achieves world-leading long-text processing capability.

DeepSeek V4 1M Context as Default Configuration

Starting today, 1M (1 million tokens) ultra-long context will become the standard configuration of DeepSeek official services.

DeepSeek V4 vs Previous Models in Memory and Compute

This innovation greatly cuts dependence on computing resources and memory.
Changes in computation and memory capacity of DeepSeek-V4 and DeepSeek-V3.2 with context length

DeepSeek V4 Agent Optimization and Ecosystem Integration

DeepSeek V4 Integration with Agent Frameworks

DeepSeek-V4 has carried out deep adaptation for mainstream Agent ecosystems such as Claude Code, OpenClaw, OpenCode, and CodeBuddy.

DeepSeek V4 Use Cases: Code and Document Generation

In scenarios such as code writing and automated document generation, its output efficiency has been significantly improved.
Example of PPT pages automatically generated by V4-Pro under a specific Agent framework

DeepSeek V4 API Update and Model Migration Guide

How to Use DeepSeek V4 API (model_name Setup)

For developers, the good news is: the API has already gone online at the same time!

You only need to simply modify model_name to access these two new flagships:

Performance: deepseek-v4-pro
Efficiency: deepseek-v4-flash

DeepSeek V4 Model Deprecation Timeline

Special reminder: the original deepseek-chat and deepseek-reasoner model names will serve as transitional aliases of V4, but these two old names will be officially discontinued on July 24, 2026.

DeepSeek V4 Technical Paper Insights

DeepSeek V4 CSA Compression Mechanism Explained

In V4-Pro, the compression ratio of CSA is 4. The KV cache of every 4 tokens is merged into one entry.

DeepSeek V4 HCA Global Compression Strategy

HCA takes another path. The compression ratio is pulled to 128, much more aggressive than CSA.

DeepSeek V4 Hybrid Attention Collaboration Design

The two mechanisms are alternately stacked. CSA does fine-grained retrieval, HCA does global perception.

DeepSeek V4 Training Innovations: Muon, MoE, and Distillation

DeepSeek V4 mHC Stability Optimization

mHC constrains the residual mapping matrix to avoid divergence in deep networks.

DeepSeek V4 Muon Optimizer and Training Tricks

The core of Muon is orthogonalization of gradient momentum.

DeepSeek V4 MegaMoE and System Acceleration

V4 open-sources MegaMoE, merging communication and computation into a single pipeline kernel.

DeepSeek V4 OPD Distillation and GRM Reward Model

V4 uses On-Policy Distillation and introduces Generative Reward Model for joint optimization.

Why DeepSeek V4 Matters for Open-Source AI

From the sudden rise of V3 to the efficiency revolution of V4, DeepSeek has always insisted on sharing the most top-level technologies with the community through open source.

The launch of DeepSeek-V4 is not only a jump in technical parameters, but also a strong response to the evolving AI ecosystem, standing strong amidst industry rumblings like the Anthropic Mythos leak.

Million-level long context
High-performance Agent systems

It proves that through architecture innovation, we can greatly lower the threshold of large models without sacrificing performance.

Now, you can immediately start the new 1M context experience in the official App or at chat.deepseek.com.

This is not just a chat box. This is a “second brain” that can hold an entire encyclopedia and understand the logic of tens of thousands of lines of code.

Reference

https://huggingface.co/collections/deepseek-ai/deepseek-v4

https://modelscope.cn/collections/deepseek-ai/DeepSeek-V4

https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/DeepSeek_V4.pdf

https://api-docs.deepseek.com/zh-cn/guides/thinking_mode