
DeepSeek releases V4 with 1.6T parameters and 1M-token context, sustains MIT-licensed open weights
On April 24, the Chinese lab DeepSeek released its V4 model series, returning the flagship V4-Pro to an MIT-licensed open-weights distribution at 1.6 trillion total parameters with 49 billion active per token, and with a 1-million-token context window across the API and the released checkpoints, according to DeepSeek's API release notes, the Hugging Face model card and Reuters reporting. A second variant, V4-Flash, sits at 284 billion total parameters with 13 billion active. The release lands as the most capable Chinese open-weights model to date and reframes the open-source frontier in front of Qwen 3.6 and Llama 4.
DeepSeek, founded by hedge-fund manager Liang Wenfeng under the same group as High-Flyer Quant, has been on a roughly six-month cadence since the V3 line and the R1 reasoning model in early 2025. The lab has consistently shipped open weights — including R1 — and the V4 release confirms that strategy survives the broader U.S. push to constrain Chinese frontier capabilities. The architecture combines a Mixture-of-Experts core with a hybrid attention scheme — Compressed Sparse Attention plus what DeepSeek calls Manifold-Constrained Hyper-Connections — that the lab credits with V4-Pro using only 27 percent of the per-token inference FLOPs of V3.2 at the same context length, according to the Hugging Face card.
DeepSeek did not run a public press conference for the V4 release. The most direct on-record framing came through the lab's API release notes, which describe V4 as a step toward agentic systems that can use a 1-million-token context as working memory rather than as retrieval. A senior DeepSeek researcher posted a single-line message on X — "Go wild and have fun" — alongside the open-weights link, according to Forbes' coverage. Past statements from Liang Wenfeng have positioned the lab around longtermism and AGI rather than near-term commercialisation; he has not, as of this writing, made an on-the-record statement specifically tied to V4.
On the release benchmarks DeepSeek published alongside the model card and corroborated by Hugging Face, V4-Pro scores 87.5 on MMLU-Pro, 80.6% on SWE-bench Verified, 90.1 on GPQA Diamond and 93.5 on LiveCodeBench, with 37.7 on Humanity's Last Exam (HLE). Hacker News and Asia Times' synthesis put V4-Pro ahead of Qwen 3.6 and Llama 4 on most reasoning and coding benchmarks while still trailing the closed frontier — Gemini 3.1 Pro and Claude Opus-class systems — on HLE and the harder slices of SWE-bench. The release also caused a documented run on Huawei Ascend chips among Chinese hyperscalers, according to Reuters.

Reaction split along the now-familiar geopolitical fault line. The Council on Foreign Relations characterised the release as "a new phase in the U.S.-China AI rivalry," noted that U.S. officials have publicly accused DeepSeek of training V4 on smuggled Blackwell silicon and of running industrial-scale distillation against U.S. frontier models, and confirmed that the State Department issued a global diplomatic directive to allies in April. Markman at Forbes argued that V4 plus the Qwen line "reshapes the open-source AI race," while Reuters' coverage of the muted equity-market reaction described V4 as a model that "does not wow markets" despite being the strongest open release of the year.
For us at Enpo Sekai, V4 matters more for what it does to the open-weights price floor than for any single benchmark. Our character engine and persona stack target inference cost, latency and on-device feasibility before raw IQ; an open 1.6T MoE with 49B active parameters is, in practice, still too large for the local-first desktop builds we ship to consumers, but it sets a new ceiling for what we can deploy on character-game backends and on the inference clusters we license to B2B customers. The release also reinforces our policy of decoupling our character layer from any single foundation model — open or closed — so that whichever model wins per use case, we ride on top of it rather than under it.
We will be watching three things over the next twelve months: (1) whether DeepSeek can sustain its release cadence under tightening U.S. export-control enforcement, since 1.6T MoE training requires the kind of contiguous compute that smuggled Blackwell pipelines do not reliably provide; (2) whether the V4-Flash 284B variant becomes the practical work-horse for character-engine-style mid-latency products, given the gap between flagship benchmarks and what teams our size will actually deploy; (3) how the open-source license travels — whether downstream Chinese cloud forks honour MIT, and whether Western enterprise procurement teams treat MIT-licensed Chinese weights as deployable.

