{"id":2564,"date":"2026-04-23T18:22:10","date_gmt":"2026-04-23T18:22:10","guid":{"rendered":"https:\/\/deepinsightai.io\/?p=2564"},"modified":"2026-04-23T18:22:12","modified_gmt":"2026-04-23T18:22:12","slug":"deepseek-starts-updating-frequently","status":"publish","type":"post","link":"https:\/\/deepinsightai.io\/es\/deepseek-starts-updating-frequently\/","title":{"rendered":"DeepSeek Starts Updating Frequently: Tile Kernels and DeepEP V2"},"content":{"rendered":"<p>Just now, DeepSeek\u2019s <a href=\"https:\/\/deepinsightai.io\/es\/the-fake-star-economy-on-github\/\">GitHub<\/a> started updating frequently. It launched and open-sourced a new repository, <strong>Tile Kernels<\/strong>, and at the same time updated the <strong>DeepEP<\/strong> repository, bringing <strong>DeepEP V2<\/strong> online. It has been less than a week since DeepSeek quietly updated <strong>Mega MoE<\/strong> and <strong>FP4 Indexer<\/strong> last time.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">DeepSeek Tile Kernels<\/h2>\n\n\n\n<figure data-spectra-id=\"spectra-mobt4mso-77si3j\" class=\"wp-block-image aligncenter size-full\"><img fetchpriority=\"high\" decoding=\"async\" width=\"889\" height=\"471\" src=\"https:\/\/deepinsightai.io\/wp-content\/uploads\/2026\/04\/image-59.png\" alt=\"DeepSeek Tile Kernels\" class=\"wp-image-2568\" title=\"DeepSeek Starts Updating Frequently: Tile Kernels and DeepEP V2\" srcset=\"https:\/\/deepinsightai.io\/wp-content\/uploads\/2026\/04\/image-59.png 889w, https:\/\/deepinsightai.io\/wp-content\/uploads\/2026\/04\/image-59-300x159.png 300w, https:\/\/deepinsightai.io\/wp-content\/uploads\/2026\/04\/image-59-768x407.png 768w, https:\/\/deepinsightai.io\/wp-content\/uploads\/2026\/04\/image-59-18x10.png 18w\" sizes=\"(max-width: 889px) 100vw, 889px\" \/><\/figure>\n\n\n\n<p>Link: <code>https:\/\/github.com\/deepseek-ai\/TileKernels<\/code><\/p>\n\n\n\n<p>According to the introduction, <strong>Tile Kernels<\/strong> are GPU kernels optimized for LLM operations, built with <strong>TileLang<\/strong>. TileLang is a domain-specific language for expressing high-performance GPU kernels in Python, with characteristics such as easy portability, agile development, and automatic optimization.<\/p>\n\n\n\n<p>The performance of Tile Kernels is extremely strong. As DeepSeek itself wrote: \u201cMost kernels in this project are already close to the hardware performance limit in terms of compute intensity and memory bandwidth. Some of them have already been used internally in training and inference scenarios. However, they do not yet represent best practices, and we are continuing to improve the code quality and documentation.\u201d<\/p>\n\n\n\n<p>There is not much introductory information in the repository, yet between the lines it already \u201cspoils\u201d the underlying architectural innovation path of DeepSeek\u2019s next-generation models, signaling a leap comparable to the recent <a target=\"_blank\" rel=\"noreferrer noopener\" href=\"https:\/\/deepinsightai.io\/es\/hy3-preview-launch\/\">Hy3 preview launch<\/a>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">DeepSeek Tile Kernels Features<\/h3>\n\n\n\n<p>Here are some specific features of Tile Kernels:<\/p>\n\n\n\n<p><strong>Gating mechanism:<\/strong> Top-k expert selection and scoring for MoE routing<\/p>\n\n\n\n<p><strong>MoE routing:<\/strong> Mapping tokens to experts, fused expand\/reduce, and weight normalization<\/p>\n\n\n\n<p><strong>Quantization:<\/strong> Supports FP8\/FP4\/E5M6 conversion in per-token, per-block, and per-channel modes, and fuses SwiGLU + quantization operations<\/p>\n\n\n\n<p><strong>Transpose:<\/strong> Batched transpose operations<\/p>\n\n\n\n<p><strong>Engram:<\/strong> Engram gating kernels, fusing RMSNorm, forward\/backward propagation, and weight-gradient reduction<\/p>\n\n\n\n<p><strong>Manifold HyperConnection:<\/strong> Hyper-connection kernels, including Sinkhorn normalization and split\/apply for mix<\/p>\n\n\n\n<p><strong>Modeling:<\/strong> High-level <code>torch.autograd.Function<\/code> wrappers that combine the underlying kernels into trainable layers (engram gate, mHC pipeline)<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">DeepSeek EPv2: Faster EP With Engram, PP, and CP Support<\/h2>\n\n\n\n<figure data-spectra-id=\"spectra-mobt54cl-th2xp4\" class=\"wp-block-image aligncenter size-large\"><img decoding=\"async\" width=\"1024\" height=\"656\" src=\"https:\/\/deepinsightai.io\/wp-content\/uploads\/2026\/04\/image-60-1024x656.png\" alt=\"DeepSeek EPv2: Faster EP With Engram, PP, and CP Support\" class=\"wp-image-2569\" title=\"DeepSeek Starts Updating Frequently: Tile Kernels and DeepEP V2\" srcset=\"https:\/\/deepinsightai.io\/wp-content\/uploads\/2026\/04\/image-60-1024x656.png 1024w, https:\/\/deepinsightai.io\/wp-content\/uploads\/2026\/04\/image-60-300x192.png 300w, https:\/\/deepinsightai.io\/wp-content\/uploads\/2026\/04\/image-60-768x492.png 768w, https:\/\/deepinsightai.io\/wp-content\/uploads\/2026\/04\/image-60-18x12.png 18w, https:\/\/deepinsightai.io\/wp-content\/uploads\/2026\/04\/image-60.png 1069w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>EPv2 link: <code>https:\/\/github.com\/deepseek-ai\/DeepEP\/pull\/605<\/code><\/p>\n\n\n\n<p>Earlier today, DeepSeek also released the latest version of <strong>EPv2<\/strong>, delivering faster <strong>expert parallelism (EP)<\/strong> and support for <strong>Engram \/ pipeline parallelism (PP) \/ context parallelism (CP)<\/strong>.<\/p>\n\n\n\n<p>As hardware, networks, and model architectures have evolved alongside rapid industry releases like <a href=\"https:\/\/deepinsightai.io\/es\/qwen-3-6\/\" target=\"_blank\" rel=\"noreferrer noopener\">Qwen 3.6<\/a>, DeepSeek\u2019s earlier DeepEP V1 had already accumulated too much historical baggage and too many performance issues.<\/p>\n\n\n\n<p>This update completely restructures <strong>Expert Parallelism<\/strong>. Compared with V1, it only needs a fraction of the SM resources to reach extreme performance, while also supporting larger-scale <strong>scale-up<\/strong> (within a single machine) and <strong>scale-out<\/strong> (across machines).<\/p>\n\n\n\n<p>In addition, DeepSeek introduced an experimental <strong>0 SM<\/strong> series in this update, including <strong>0 SM Engram<\/strong>, <strong>0 SM pipeline parallelism (PP)<\/strong>, and <strong>0 SM context parallelism (CP)<\/strong> All-gather operators. At the same time, the backend has been switched from <strong>NVSHMEM<\/strong> to the lighter <strong>NCCL Gin<\/strong> backend.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">New Features in DeepSeek DeepEP V2<\/h3>\n\n\n\n<p>Here are some of the new features in DeepEP V2:<\/p>\n\n\n\n<p><strong>Fully JIT<\/strong><\/p>\n\n\n\n<p><strong>NCCL Gin backend:<\/strong><\/p>\n\n\n\n<p>Header-only, extremely lightweight<\/p>\n\n\n\n<p>Able to reuse existing NCCL communicators<\/p>\n\n\n\n<p><strong>EPv2:<\/strong><\/p>\n\n\n\n<p>Unifies high-throughput and low-latency APIs into a single interface, and adopts a brand-new GEMM layout<\/p>\n\n\n\n<p>Supports larger scaling domains, up to <strong>EP2048<\/strong><\/p>\n\n\n\n<p>Introduces analytical SM and QP count calculation, so auto-tuning is no longer needed<\/p>\n\n\n\n<p>Continues to support both <strong>Hybrid<\/strong> mode and <strong>Direct<\/strong> mode<\/p>\n\n\n\n<p>For older V3-like training tasks, SM usage drops from <strong>24<\/strong> to <strong>4\u20136<\/strong>, while maintaining the same or even better performance<\/p>\n\n\n\n<p><strong>0 SM Engram<\/strong> (with RDMA)<\/p>\n\n\n\n<p><strong>0 SM PP<\/strong> (with RDMA)<\/p>\n\n\n\n<p><strong>0 SM CP<\/strong> (with Copy Engine)<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">DeepSeek DeepEP V2 Performance<\/h2>\n\n\n\n<p>Following the configuration of <strong>DeepSeek-V3<\/strong>, tests were run under the new version with settings of <strong>8K tokens per batch<\/strong>, <strong>7168 hidden dimension<\/strong>, <strong>Top-8 experts<\/strong>, <strong>FP8 dispatch<\/strong>, and <strong>BF16 combine<\/strong>. The results are as follows:<\/p>\n\n\n\n<figure data-spectra-id=\"spectra-mobt5q25-blwgn9\" class=\"wp-block-image aligncenter size-full\"><img decoding=\"async\" width=\"650\" height=\"289\" src=\"https:\/\/deepinsightai.io\/wp-content\/uploads\/2026\/04\/image-61.png\" alt=\"DeepSeek DeepEP V2 Performance\" class=\"wp-image-2570\" title=\"DeepSeek Starts Updating Frequently: Tile Kernels and DeepEP V2\" srcset=\"https:\/\/deepinsightai.io\/wp-content\/uploads\/2026\/04\/image-61.png 650w, https:\/\/deepinsightai.io\/wp-content\/uploads\/2026\/04\/image-61-300x133.png 300w, https:\/\/deepinsightai.io\/wp-content\/uploads\/2026\/04\/image-61-18x8.png 18w\" sizes=\"(max-width: 650px) 100vw, 650px\" \/><\/figure>\n\n\n\n<p>Note: The results shown are logical bandwidth. For example, in the case of <strong>EP 8 x 2<\/strong>, the <strong>90 GB\/s<\/strong> bandwidth actually includes traffic between local GPUs (local ranks).<\/p>\n\n\n\n<p>Compared with V1, V2 achieves up to <strong>1.3\u00d7 peak performance<\/strong>, while saving as much as <strong>4\u00d7 SM resource usage <\/strong>\u2014a crucial optimization for staying competitive in a landscape dominated by heavyweights like <a href=\"https:\/\/deepinsightai.io\/es\/claude-opus-4-7\/\" target=\"_blank\" rel=\"noreferrer noopener\">Claude Opus 4.7<\/a>.<\/p>\n\n\n\n<p>Finally, just a bit of advice for DeepSeek: hurry up and release <strong>V4<\/strong> already. Everyone is getting impatient.<\/p>\n\n\n\n<p><\/p>","protected":false},"excerpt":{"rendered":"<p>Just now, DeepSeek\u2019s GitHub started updating frequently. It launched and open-sourced a new repository, Tile Kernels, and at the same time updated the DeepEP repository, bringing DeepEP V2 online. It has been less than a week since DeepSeek quietly updated Mega MoE and FP4 Indexer last time. DeepSeek Tile Kernels Link: https:\/\/github.com\/deepseek-ai\/TileKernels According to the [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":2567,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_seopress_robots_primary_cat":"none","_seopress_titles_title":"%%post_title%%","_seopress_titles_desc":"DeepSeek releases Tile Kernels and DeepEP V2 with faster expert parallelism, 1.3\u00d7 performance boost, and major GPU efficiency gains.","_seopress_robots_index":"","_uag_custom_page_level_css":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"set","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[2,10],"tags":[],"class_list":["post-2564","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-news","category-llm"],"uagb_featured_image_src":{"full":["https:\/\/deepinsightai.io\/wp-content\/uploads\/2026\/04\/deepseek.png",786,520,false],"thumbnail":["https:\/\/deepinsightai.io\/wp-content\/uploads\/2026\/04\/deepseek-150x150.png",150,150,true],"medium":["https:\/\/deepinsightai.io\/wp-content\/uploads\/2026\/04\/deepseek-300x198.png",300,198,true],"medium_large":["https:\/\/deepinsightai.io\/wp-content\/uploads\/2026\/04\/deepseek-768x508.png",768,508,true],"large":["https:\/\/deepinsightai.io\/wp-content\/uploads\/2026\/04\/deepseek.png",786,520,false],"1536x1536":["https:\/\/deepinsightai.io\/wp-content\/uploads\/2026\/04\/deepseek.png",786,520,false],"2048x2048":["https:\/\/deepinsightai.io\/wp-content\/uploads\/2026\/04\/deepseek.png",786,520,false],"trp-custom-language-flag":["https:\/\/deepinsightai.io\/wp-content\/uploads\/2026\/04\/deepseek-18x12.png",18,12,true]},"uagb_author_info":{"display_name":"Claude Carter","author_link":"https:\/\/deepinsightai.io\/es\/author\/cloud-han03gmail-com\/"},"uagb_comment_info":0,"uagb_excerpt":"Just now, DeepSeek\u2019s GitHub started updating frequently. It launched and open-sourced a new repository, Tile Kernels, and at the same time updated the DeepEP repository, bringing DeepEP V2 online. It has been less than a week since DeepSeek quietly updated Mega MoE and FP4 Indexer last time. DeepSeek Tile Kernels Link: https:\/\/github.com\/deepseek-ai\/TileKernels According to the&hellip;","_links":{"self":[{"href":"https:\/\/deepinsightai.io\/es\/wp-json\/wp\/v2\/posts\/2564","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/deepinsightai.io\/es\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/deepinsightai.io\/es\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/deepinsightai.io\/es\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/deepinsightai.io\/es\/wp-json\/wp\/v2\/comments?post=2564"}],"version-history":[{"count":1,"href":"https:\/\/deepinsightai.io\/es\/wp-json\/wp\/v2\/posts\/2564\/revisions"}],"predecessor-version":[{"id":2571,"href":"https:\/\/deepinsightai.io\/es\/wp-json\/wp\/v2\/posts\/2564\/revisions\/2571"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/deepinsightai.io\/es\/wp-json\/wp\/v2\/media\/2567"}],"wp:attachment":[{"href":"https:\/\/deepinsightai.io\/es\/wp-json\/wp\/v2\/media?parent=2564"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/deepinsightai.io\/es\/wp-json\/wp\/v2\/categories?post=2564"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/deepinsightai.io\/es\/wp-json\/wp\/v2\/tags?post=2564"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}