{"id":2705,"date":"2026-05-01T02:56:44","date_gmt":"2026-05-01T02:56:44","guid":{"rendered":"https:\/\/deepinsightai.io\/?p=2705"},"modified":"2026-05-01T02:56:45","modified_gmt":"2026-05-01T02:56:45","slug":"lda-1b","status":"publish","type":"post","link":"https:\/\/deepinsightai.io\/de\/lda-1b\/","title":{"rendered":"LDA-1B Erkl\u00e4rt: Wie \u201cDatenm\u00fcll\u201d den n\u00e4chsten Durchbruch der Roboter-KI erm\u00f6glicht"},"content":{"rendered":"<p>The arms race around robot foundation models has just welcomed a new player. A joint team from Peking University, Tsinghua University, Galaxy General, and Zhiyuan Institute has introduced <strong>LDA-1B<\/strong>, pushing parameter size directly to the billion scale.<\/p>\n\n\n\n<p>Behind this number sits a more aggressive idea: stop focusing only on expert demonstration data. Those \u201cgarbage data\u201d pieces that used to be thrown into the recycle bin might actually be the nutrients robots really need.<\/p>\n\n\n\n<p>The traditional training path for robots is straightforward\u2014find a skilled operator, record their actions, and let the robot learn by imitation. This behavior cloning approach has been widely used by OpenAI and Google DeepMind. But the problem is obvious: data utilization is painfully low.<\/p>\n\n\n\n<p>A failed robot attempt? Discarded.<br>A casually recorded human operation video? Not high quality, discarded.<br>Data from a different robot platform? Incompatible format, still discarded.<\/p>\n\n\n\n<p>The <strong>LDA-1B<\/strong> team asked a simple question: what happens if all that wasted data is actually used?<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">How LDA-1B Uses All Available Data<\/h2>\n\n\n\n<p>They assembled a dataset called EI-30k\u201430,000 hours of embodied interaction data, covering both human operations and robot trajectories.<\/p>\n\n\n\n<p>This scale is already massive in robotics. For comparison, the previous largest dataset, Open X-Embodiment, had just over 1,000 hours.<\/p>\n\n\n\n<p>But scale isn\u2019t the key. The key is diversity.<\/p>\n\n\n\n<p>The dataset includes:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Successful demonstrations and failed attempts<\/li>\n\n\n\n<li>High-precision robot data and casually recorded human videos<\/li>\n\n\n\n<li>Dual-arm manipulations and dexterous hand operations<\/li>\n<\/ul>\n\n\n\n<p>By traditional standards, much of this data is inconsistent and would never make it into a training set.<\/p>\n\n\n\n<p><strong>LDA-1B<\/strong> takes a different approach: assign different roles to data of different quality.<\/p>\n\n\n\n<p>High-quality expert demonstrations are used to learn policy.<br>Lower-quality or \u201cunqualified\u201d data is used to learn dynamics of the physical world.<\/p>\n\n\n\n<p>A failed grasping video cannot be directly imitated\u2014but it tells the model, \u201cthis way of grasping will fail.\u201d That is dynamics knowledge.<\/p>\n\n\n\n<p>The idea sounds simple, but it raises a technical challenge:<br>how can <strong>LDA-1B<\/strong> learn both \u201cwhat the next frame looks like\u201d and \u201cwhat action to take next\u201d at the same time?<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">LDA-1B and Prediction in DINO Latent Space<\/h2>\n\n\n\n<p>The team\u2019s solution is to move prediction tasks into the latent space of DINO.<\/p>\n\n\n\n<p>DINO, a visual self-supervised model developed by Meta, compresses images into highly abstract feature representations. In this space, <strong>LDA-1B<\/strong> doesn\u2019t need to care about surface details like \u201cis the table wooden or white,\u201d but instead focuses on core physical information like \u201cwhere objects are\u201d and \u201chow they move.\u201d<\/p>\n\n\n\n<p>This design brings two advantages:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Much higher computational efficiency, avoiding pixel-level redundancy<\/li>\n\n\n\n<li>Stronger generalization across environments, since <strong>LDA-1B<\/strong> learns abstract physical rules instead of scene-specific visual features<\/li>\n<\/ul>\n\n\n\n<p>Within a unified multi-modal diffusion Transformer framework, the model jointly denoises action chunks and future DINO sequences.<\/p>\n\n\n\n<p>Heterogeneous data plays unique and complementary roles across:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Visual prediction<\/li>\n\n\n\n<li>Dynamics learning<\/li>\n\n\n\n<li>Policy learning<\/li>\n<\/ul>\n\n\n\n<p>The team collected EI-30k with over 30,000 hours of diverse human-robot interaction data, covering different event durations and manipulation tasks.<\/p>\n\n\n\n<p>Architecturally, <strong>LDA-1B<\/strong> uses a Multi-modal Diffusion Transformer. This allows it to handle asynchronous visual and action streams. In the real world, camera frame rates and robot control frequencies are often misaligned, which traditional models struggle to process.<\/p>\n\n\n\n<p>The introduction of diffusion modeling also enables <strong>LDA-1B<\/strong> to train stably at the billion-parameter scale\u2014something previously difficult for robot models.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">LDA-1B Performance Across Three Task Categories<\/h2>\n\n\n\n<p>For evaluation, the team selected three representative scenarios:<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Contact-Rich Tasks in LDA-1B<\/h3>\n\n\n\n<p>These test a robot\u2019s perception and control of force\u2014tasks like inserting a USB cable or tightening screws, where precise force feedback is essential.<\/p>\n\n\n\n<p><strong>LDA-1B<\/strong> outperforms the previous \u03c00.5 model by 21%.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Dexterous Manipulation with LDA-1B<\/h3>\n\n\n\n<p>Even more challenging, these tasks require coordinated multi-finger control, such as rotating a Rubik\u2019s Cube or using tools.<\/p>\n\n\n\n<p>Here, <strong>LDA-1B<\/strong> shows an even larger advantage, improving performance by 48%.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Long-Horizon Planning in LDA-1B<\/h3>\n\n\n\n<p>These evaluate planning ability. The robot must complete a sequence of sub-tasks to achieve a final goal.<\/p>\n\n\n\n<p><strong>LDA-1B<\/strong> achieves a 23% improvement in this category.<\/p>\n\n\n\n<p>More interesting are the fine-tuning experiments.<\/p>\n\n\n\n<p>The team deliberately used \u201clow-quality\u201d data\u2014failed cases and incomplete trajectories that would normally be discarded.<\/p>\n\n\n\n<p>The result: using just 30% of this data improved performance by 10%.<\/p>\n\n\n\n<p>This finding overturns a common industry belief:<br>so-called \u201cgarbage data\u201d is not a burden\u2014it can be a hidden asset for <strong>LDA-1B<\/strong>.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">LDA-1B and a New Path to World Models<\/h2>\n\n\n\n<p>The technical approach behind <strong>LDA-1B<\/strong> responds to a bigger question: how should robot foundation models learn?<\/p>\n\n\n\n<p>There are currently two dominant paradigms:<\/p>\n\n\n\n<p><strong>Behavior Cloning<\/strong><br>Represented by OpenAI\u2019s robotics work and Physical Intelligence\u2019s \u03c00 series. The idea is simple: watch experts and imitate them.<\/p>\n\n\n\n<p><strong>World Models<\/strong><br>Represented by works like Genie and DIAMOND. The idea is to first understand how the physical world works, then decide how to act.<\/p>\n\n\n\n<p>Behavior cloning suffers from low data efficiency\u2014it only learns from successful cases.<br>World models have struggled with crude implementations\u2014either focusing only on video prediction without actions, or relying on datasets too small for large-scale training.<\/p>\n\n\n\n<p><strong>LDA-1B<\/strong> takes a third path.<\/p>\n\n\n\n<p>It unifies dynamics learning, policy learning, and visual prediction into a single framework, allowing data of different qualities to play different roles.<\/p>\n\n\n\n<p>This idea\u2014Unified World Model\u2014has been proposed before in theory. But <strong>LDA-1B<\/strong> is the first to implement it at the billion-parameter scale with stable training.<\/p>\n\n\n\n<p>From an engineering perspective, the biggest contribution of <strong>LDA-1B<\/strong> isn\u2019t a single breakthrough. It\u2019s proving one thing:<\/p>\n\n\n\n<p>Robot foundation models can scale like language models\u2014by \u201cconsuming\u201d massive amounts of heterogeneous data.<\/p>\n\n\n\n<p>Data that used to be wasted can be turned into knowledge inside <strong>LDA-1B<\/strong>.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">Is LDA-1B the Cure for Data Hunger?<\/h2>\n\n\n\n<p>Robotics has long faced an awkward reality: data is expensive.<\/p>\n\n\n\n<p>Recording one hour of high-quality robot demonstration data requires skilled operators, standardized environments, and precise sensors. The cost can reach thousands of dollars.<\/p>\n\n\n\n<p>Even well-funded labs struggle to train models with data at the scale of language models.<\/p>\n\n\n\n<p><strong>LDA-1B<\/strong> offers a new direction.<\/p>\n\n\n\n<p>Instead of spending heavily to collect perfect data, make use of imperfect data.<\/p>\n\n\n\n<p>Human operation videos uploaded to YouTube, failed robot attempts during debugging, datasets collected across different labs and platforms\u2014these previously ignored resources can now become training material for <strong>LDA-1B<\/strong>.<\/p>\n\n\n\n<p>That said, some uncertainties remain.<\/p>\n\n\n\n<p>The paper does not fully disclose the composition and sourcing of the EI-30k dataset, which creates a barrier for other teams attempting to reproduce <strong>LDA-1B<\/strong> at this scale.<\/p>\n\n\n\n<p>There\u2019s also the issue of deployment: a billion-parameter model comes with significant computational cost. Robots, unlike servers, cannot simply scale up compute.<\/p>\n\n\n\n<p>Still, at this moment, <strong>LDA-1B<\/strong> sets a new reference point for robot foundation models:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Larger scale<\/li>\n\n\n\n<li>More diverse data<\/li>\n\n\n\n<li>More unified methods<\/li>\n<\/ul>\n\n\n\n<p>Now the question is how the rest of the field will respond to <strong>LDA-1B<\/strong>.<\/p>","protected":false},"excerpt":{"rendered":"<p>The arms race around robot foundation models has just welcomed a new player. A joint team from Peking University, Tsinghua University, Galaxy General, and Zhiyuan Institute has introduced LDA-1B, pushing parameter size directly to the billion scale. Behind this number sits a more aggressive idea: stop focusing only on expert demonstration data. Those \u201cgarbage data\u201d [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":2708,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_seopress_robots_primary_cat":"none","_seopress_titles_title":"%%post_title%%","_seopress_titles_desc":"LDA-1B redefines robot training by turning failed attempts and low-quality data into valuable learning signals. Discover how this billion-parameter model boosts performance across dexterous, contact-rich, and long-horizon tasks.","_seopress_robots_index":"","_uag_custom_page_level_css":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"set","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[2,6],"tags":[],"class_list":["post-2705","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-news","category-robots"],"uagb_featured_image_src":{"full":["https:\/\/deepinsightai.io\/wp-content\/uploads\/2026\/05\/LDA-1B-Explained-How-Garbage-Data-Is-Powering-the-Next-Robot-AI-Breakthrough.webp",1536,1024,false],"thumbnail":["https:\/\/deepinsightai.io\/wp-content\/uploads\/2026\/05\/LDA-1B-Explained-How-Garbage-Data-Is-Powering-the-Next-Robot-AI-Breakthrough-150x150.webp",150,150,true],"medium":["https:\/\/deepinsightai.io\/wp-content\/uploads\/2026\/05\/LDA-1B-Explained-How-Garbage-Data-Is-Powering-the-Next-Robot-AI-Breakthrough-300x200.webp",300,200,true],"medium_large":["https:\/\/deepinsightai.io\/wp-content\/uploads\/2026\/05\/LDA-1B-Explained-How-Garbage-Data-Is-Powering-the-Next-Robot-AI-Breakthrough-768x512.webp",768,512,true],"large":["https:\/\/deepinsightai.io\/wp-content\/uploads\/2026\/05\/LDA-1B-Explained-How-Garbage-Data-Is-Powering-the-Next-Robot-AI-Breakthrough-1024x683.webp",1024,683,true],"1536x1536":["https:\/\/deepinsightai.io\/wp-content\/uploads\/2026\/05\/LDA-1B-Explained-How-Garbage-Data-Is-Powering-the-Next-Robot-AI-Breakthrough.webp",1536,1024,false],"2048x2048":["https:\/\/deepinsightai.io\/wp-content\/uploads\/2026\/05\/LDA-1B-Explained-How-Garbage-Data-Is-Powering-the-Next-Robot-AI-Breakthrough.webp",1536,1024,false],"trp-custom-language-flag":["https:\/\/deepinsightai.io\/wp-content\/uploads\/2026\/05\/LDA-1B-Explained-How-Garbage-Data-Is-Powering-the-Next-Robot-AI-Breakthrough-18x12.webp",18,12,true]},"uagb_author_info":{"display_name":"Claude Carter","author_link":"https:\/\/deepinsightai.io\/de\/author\/cloud-han03gmail-com\/"},"uagb_comment_info":0,"uagb_excerpt":"The arms race around robot foundation models has just welcomed a new player. A joint team from Peking University, Tsinghua University, Galaxy General, and Zhiyuan Institute has introduced LDA-1B, pushing parameter size directly to the billion scale. Behind this number sits a more aggressive idea: stop focusing only on expert demonstration data. Those \u201cgarbage data\u201d&hellip;","_links":{"self":[{"href":"https:\/\/deepinsightai.io\/de\/wp-json\/wp\/v2\/posts\/2705","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/deepinsightai.io\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/deepinsightai.io\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/deepinsightai.io\/de\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/deepinsightai.io\/de\/wp-json\/wp\/v2\/comments?post=2705"}],"version-history":[{"count":1,"href":"https:\/\/deepinsightai.io\/de\/wp-json\/wp\/v2\/posts\/2705\/revisions"}],"predecessor-version":[{"id":2709,"href":"https:\/\/deepinsightai.io\/de\/wp-json\/wp\/v2\/posts\/2705\/revisions\/2709"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/deepinsightai.io\/de\/wp-json\/wp\/v2\/media\/2708"}],"wp:attachment":[{"href":"https:\/\/deepinsightai.io\/de\/wp-json\/wp\/v2\/media?parent=2705"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/deepinsightai.io\/de\/wp-json\/wp\/v2\/categories?post=2705"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/deepinsightai.io\/de\/wp-json\/wp\/v2\/tags?post=2705"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}