Diffusion Models for Robotics Performance Optimization
跳到主要内容
领英
热门内容
会员
Learning
职位
游戏
马上加入
登录
热门内容
Productivity
Performance Optimization Techniques
Diffusion Models for Robotics Performance Optimization
浏览来自职场专家的热门领英内容。
摘要
Diffusion models for robotics performance optimization use advanced AI techniques inspired by how particles spread in nature to help robots better predict, plan, and control their actions in complex environments. These models allow robots to adapt in real time, improve motion reasoning, and handle new or changing scenarios without needing exhaustive retraining.
Embrace simulation data: Rely on scalable synthetic data generation to train diffusion models, making it easier to prepare robots for a variety of tasks and environments.
Adapt on the fly: Use inference-time steering and alignment methods to let robots dynamically adjust their actions when faced with unexpected changes or new objects.
Combine world knowledge: Integrate physics-aware planning and motion prediction into robot control systems for more reliable and robust manipulation in real-world settings.
由 AI 根据领英会员动态总结
Honglu Zhou
multimodal AI, computer vision, video understanding, machine reasoning
2,790 位关注者
4 个月
举报此动态
VLAs can't just mimic expert trajectories — they need 𝗽𝗿𝗲𝗱𝗶𝗰𝘁𝗶𝘃𝗲 𝗺𝗼𝘁𝗶𝗼𝗻 𝗿𝗲𝗮𝘀𝗼𝗻𝗶𝗻𝗴. Our new work shows that jointly learning motion prediction via image diffusion gives 𝗥𝗼𝗯𝗼𝘁𝗶𝗰 𝗩𝗟𝗔𝘀 superior ability to reason about what actions to take. The result: stronger, more reliable real-world manipulation. Code and model will be released.
📄 https://lnkd.in/g9vfn_SE
🔗 https://lnkd.in/g_9sBcVe
#Robotics #EmbodiedAI #VLA #DiffusionModels
🤿 Deep dive:
Our method extends the VLA architecture with a dual-head design: while the action head predicts action chunks as in vanilla VLAs, an additional motion head, implemented as a Diffusion Transformer (DiT), predicts optical-flow-based motion images that capture future dynamics.
The two heads are trained jointly, enabling the shared VLM backbone to learn representations that couple robot control with motion knowledge.
This joint learning builds temporally coherent and physically grounded representations without modifying the inference pathway of standard VLAs, thereby maintaining test-time latency.
Experiments in both simulation and real-world environments demonstrate that joint learning with motion image diffusion improves the success rate of pi-series VLAs to 97.5% on the LIBERO benchmark and 58.0% on the RoboTwin benchmark, yielding a 𝟮𝟯% 𝗶𝗺𝗽𝗿𝗼𝘃𝗲𝗺𝗲𝗻𝘁 𝗶𝗻 𝗿𝗲𝗮𝗹-𝘄𝗼𝗿𝗹𝗱 𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 and validating its effectiveness in enhancing the motion reasoning capability of large-scale VLAs.
Great work by our intern Yu Fang while he's at Salesforce AI Research!
…展开
98
赞
评论
分享
复制
LinkedIn
Facebook
X
Adithya Murali
Staff Research Scientist at NVIDIA | MIT TR35, Prev CMU PhD, Berkeley AI Research
3,219 位关注者
10 个月
举报此动态
I’m super excited to release a multi-year project we have been cooking at NVIDIA Robotics.
Grasping is a foundational challenge in robotics 🤖 — whether for industrial picking or general-purpose humanoids. VLA + real data collection is all the rage now but is expensive and scales poorly for this task. For every new embodiment and/or scene, we'll have to recollect the dataset in this paradigm for the best perf.
Key Idea: Since grasping is a well-defined task in physics simulation - why can’t we just scale synthetic data generation and train a GenAI model for grasping? By embracing modularity and standardized grasp formats, we can make this a turnkey technology that works zero-shot for multiple settings.
Introducing…
🚀 GraspGen: A Diffusion-Based Framework for 6-DOF Grasping
GraspGen is a modular framework for diffusion-based 6-DOF grasp generation that scales across embodiment types, observability conditions, clutter, task complexity.
Key Features:
✅ Multi-embodiment support: suction, antipodal pinch, and underactuated pinch grippers
✅ Generalization to both partial and complete 3D point clouds
✅ Generalization to both single-objects and cluttered scenes
✅ Modular design relies on other robotics packages and foundation models (SAM2, cuRobo, FoundationStereo, FoundationPose). This allows GraspGen to focus on only one thing - grasp generation
✅ Training recipe: grasp discriminator is trained with On-Generator data from the diffusion model - so that it learns to correct any mistakes of the diffusion generator
✅ Real-time performance (~20 Hz) before any GPU acceleration; low memory footprint
📊 Results:
• SOTA on the FetchBench [Han et. al. CoRL 2024] benchmark
• Zero-shot sim-to-real transfer on unknown objects and cluttered scenes
• Dataset of 53M simulated grasps across 8K objects from Objaverse
We're also releasing:
🔹 Simulation-based grasp data generation workflows
🔹 Standardized formats and gripper definitions
🔹 Full training infrastructure
📄 arXiv: https://lnkd.in/gaYmcfz4
🌐 Website: https://lnkd.in/gGiKRCMX
💻 Code: https://lnkd.in/gYR77bEh
A huge thank you to everyone involved in this journey — excited to hear the feedback from the community!
Joint work with Clemens Eppner, Balakumar Sundaralingam, Yu-Wei Chao, Mark T. Carlson, Jun Yamada and other collaborators. Many thanks to Yichao Pan, Shri Sundaram, Spencer Huang, Buck Babich, Amit Goel for product management and feedback.
#robotics #grasping #physicalAI #simtoreal
…展开
1,022
27 条评论
赞
评论
分享
复制
LinkedIn
Facebook
X
John Lambert
7,181 位关注者
9 个月
举报此动态
Can a single autonomous driving simulation world model jointly insert, delete, and control the behavior of all agents and traffic lights in a bird's-eye-view scene?
For the first time, we show this is possible in SceneDiffuser++, our CVPR '25 paper, w/ 60+ second simulations.
Led by our amazing intern at Waymo Research, Shuhan T., SceneDiffuser++ is a diffusion model that is solely trained on the diffusion denoising objective, yet supports all insertion, deletion, and behavior control capabilities via simple autoregressive rollout.
Only learned simulators can emulate the realism of crowded city scenes. Without the ability to insert or delete objects, these simulators can only simulate a few seconds before the scene becomes empty as initial logged agents and traffic lights leave the periphery of the AV.
Like SceneDiffuser, we learn an agents "scene tensor," but generalize this to multi-tensor diffusion. Agent spawning, removal and occlusion can be jointly modeled simply via predicting an additional validity channel along with other agent features such as x, y, size, type, etc.
For agents and traffic lights scene tensors, with a varying number of elements and feature dimensions, we can project scene tensors to the same latent dimension, and concatenate into a multi-tensor. We then pass this to a transformer denoiser backbone.
Though conceptually simple, this requires diffusion to learn to generate sparse tensors without prespecified sparse structure. During inference, we develop new clipping techniques to account for invalid entries in the denoising process.
We propose a new task, CitySim, where given a city map and an AV software stack, the simulator can simulate the trip from point A -> B by populating the city around the AV and controlling all aspects of the scene (e.g., vehicles, pedestrians, traffic light states).
Thanks to brilliant collaborators: Shuhan T., Hong Jeon, Sakshum Kulshrestha, Yijing Bai, Jing Luo, Dragomir Anguelov, Mingxing Tan, "Max" Chiyu Jiang.
Full details available here:
- SceneDiffuser++ Paper: https://lnkd.in/efanc7UM
- Watch our video: https://lnkd.in/ehYbADcU
- SceneDiffuser Paper: https://lnkd.in/edr2REsS
…展开
无上一项内容
无下一项内容
224
19 条评论
赞
评论
分享
复制
LinkedIn
Facebook
X
Jiafei Duan
Incoming Presidential Young Professor at NUS Computing | Robotics & AI PhD student at University of Washington, Seattle
8,355 位关注者
3 个月
举报此动态
Why do powerful pretrained generalist robot models fail when you move an object a few inches, swap a target, or change the scene layout?
It’s usually not a lack of motor skill — it’s an alignment problem at test time.
In our new paper, we introduce Vision–Language Steering (VLS):
a training-free, inference-time framework that adapts frozen diffusion and flow-matching robot policies to out-of-distribution (OOD) scenarios.
Key idea:
Treat adaptation as an inference-time control problem.
Instead of retraining policies, we steer the denoising process using:
-Vision–Language Models to interpret test-time constraints
-Differentiable, programmatic rewards grounded in 3D geometry
-Gradient-based guidance + particle resampling for stable long-horizon execution
📊 Results
CALVIN: +31% absolute success over prior steering methods
LIBERO-PRO: +13% improvement on strong VLAs (π0.5, OpenVLA)
Real world (Franka): Robust execution under appearance shifts, position swaps, and novel object substitutions
This work suggests a broader takeaway for robotics foundation models:
Scaling policies alone isn’t enough — inference-time alignment matters.
📄Paper: https://lnkd.in/g67pf5Tm
🌐 Project page: https://lnkd.in/gkPxZjXw
…展开
146
1 条评论
赞
评论
分享
复制
LinkedIn
Facebook
X
Dr. Kal Mos
Executive VP, Head of Research & Predevelopment @ Siemens, ex-Google, ex-Amazon AGI, Startup Founder, Board Member
13,490 位关注者
6 个月
举报此动态
This new paper proposes dual-stream diffusion (DUST), a world-model augmented VLA framework. It shows that combining world models with physics-aware VLA delivers major gains in generalization and real-world task success. DUST outperforms standard VLA architectures that map perception to action without internal physical simulation. DUST keeps vision + action streams separated but cross-modal, enabling a physically consistent internal state that boosts manipulation success by 6% in simulation and 13% on real robots. This hybrid approach is the direction next-gen Robotics Foundation Models will go: physics-aware, temporally grounded, scalable, general-purpose embodied intelligence. https://lnkd.in/gCQn3-Ta
#Robotics #RFM #RFM1 #RoboticsFoundationModel #WorldModel #LeCunWorldModel #EmbodiedAI #VLA #VisionLanguageAction #PhysicsAugmentedAI #DiffusionModels #ModelBasedRL #RobotManipulation #AutonomousSystems #PhysicalAI #EmbodiedFoundationModels #RobotLearning #Sim2Real #AIResearch #GeneralistRobots #IndustrialAI #DeepLearning #AIInfrastructure #FoundationModels #MachineLearning #Transformers #DiffusionTransformers #EmbodiedIntelligence #FutureOfAutomation #NextGenAI #Siemens
…展开
Dual-Stream Diffusion for World-Model Augmented Vision-Language-Action Model
arxiv.org
45
赞
评论
分享
复制
LinkedIn
Facebook
X
Heng Yang
Assistant Professor at Harvard SEAS
9,109 位关注者
2 个月
举报此动态
Glad that our work “Inference-Time Enhancement of Generative Robot Policies via Predictive World Modeling”, led by Han Qi, has been accepted to IEEE Robotics and Automation Letters! 🎉
We propose Generative Predictive Control (GPC):
sample action proposals from a pretrained diffusion policy (“look back”), roll them out with a diffusion-based action-conditioned video world model (“look forward”), then rank or optimize the actions using either a learned reward model or VLM preferences.
Conceptually, this is trajectory optimization / MPC with hybrid sampling + gradient optimization, interpreted through modern diffusion priors and video world models.
Interestingly, we first posted the paper on arXiv in Feb 2025, when action-conditioned video world models for planning were still rare—now this direction is rapidly gaining traction.
Still many open questions, e.g.,
• how to avoid local minima in planning
• what representations work best for world models
• how to balance physics priors vs. data-driven learning
Paper: https://lnkd.in/g9YdKmtn
…展开
无上一项内容
无下一项内容
119
1 条评论
赞
评论
分享
复制
LinkedIn
Facebook
X
Performance Optimization Techniques的更多内容
A/b Testing Strategies for Better Results
Advanced LLM Parameter Tuning Techniques
AI-Based Load Planning Systems
Amazon A10 Ranking Optimization Strategies
Amazon Dsp Performance Improvement Strategies
Amazon Engineering Strategies for Fast-Paced Execution
API Performance Optimization Techniques
Applying an Engineering Mindset to Performance Optimization
Benefits of Caching Techniques
Best Strategies for Effective Memory Management
Best Techniques for High-Performance Computing
Boosting LLM Performance Using Local Data Layers
Boosting LLM Performance Using P2L Methods
Capacity Allocation Strategies for Optimal Resource Management
Cargo Weight Distribution Strategies
Commercial Solar Performance Analysis Techniques
Common Pytorch Memory Management Strategies
CRO Testing Methods to Accelerate Results in 2025
CX and EX Strategies for High Performance
Data-Driven Load Optimization
Deploying Local LLMs for Reliable Performance
Dynamic Load Scheduling Algorithms
Embedded Solutions for Improved Performance
Error Budget Strategies for Performance Management
Error Mitigation Strategies in Quantum Computing
Holistic System Analysis for Optimizing Energy Output
How Data Structures Affect Programming Performance
How Indexing Improves Query Performance
How IOWN Technology Improves Data Center Performance
How Llms Boost Performance
How to Achieve Fast Data Transmission
How to Address Human Needs for Optimal Performance
How to Address Performance Drops
How to Analyze Database Performance
How to Apply Optimization Techniques in Practice
How to Boost Pipeline Performance
How to Boost Web App Performance
How to Deploy Llms for Optimal Performance
How to Embrace REST for Improved Performance
How to Ensure App Performance
How to Improve AI Performance With New Techniques
How to Improve Code Performance
How to Improve NOSQL Database Performance
How to Improve Page Load Speed
How to Improve Telecom Cabinet Performance
How to Improve Well Performance
How to Maintain IT System Performance
How to Maximize GPU Utilization
How to Optimize Application Performance
How to Optimize Cloud Database Performance
How to Optimize Cloud Resource Provisioning
How to Optimize Data Serialization
How to Optimize Data Streaming Performance
How to Optimize Digital Shelf Performance
How to Optimize Embedded System Performance
How to Optimize Images for Website Speed
How to Optimize Performance Using Cuda
How to Optimize Postgresql Database Performance
How to Optimize Pyspark Job Performance
How to Optimize Pytorch Performance
How to Optimize Query Strategies
How to Optimize Search Using Embeddings
How to Optimize SQL Server Performance
Importance of Process Optimization in Data Centers
Improve LCP, INP, and CLS for Web Performance 2025
Improving Data Center Performance Beyond Marketing Claims
Improving Data Center Profitability and Network Performance
Improving Energy System Performance with Near-Optimal Solutions
Improving LLM Performance Using Open-Source Approaches
Improving Quantum Subsystem Performance for Faster Results
Improving Solar Panel Performance for Small Systems
Improving UAS Mission Performance in Multiple Sectors
Integrated Load Management Approaches
Key Drivers of Solar PLF Performance
Key Performance Testing Strategies
Key Strategies for Service Optimization
Key Techniques for Achieving High Throughput
LLM Fine-Tuning Strategies for Multi-Domain Applications
LLM Memory Profiling Strategies for Design Space Exploration
LLM Strategies for Human-Level Performance
Load Balancing Techniques for Optimal Performance
Load Capacity Utilization Strategies
Load Consolidation for Cost Savings
Load Flexibility Enhancement Techniques
Load Prioritization Frameworks
Load Testing Strategies That Deliver Results
Maintenance Strategies for Optimal Performance
Memory Optimization Strategies
Mental Techniques to Improve Performance
Methods to Compare Solar String Performance
Multi-GPU Parallelism Techniques
Multi-Model Strategies for LLM Performance
Optimizing LLM Output Using APO Techniques
Optimizing Quantum Model Performance for Professionals
Optimizing Robotics Performance with Smaller Components
Optimizing Test Systems for Better Performance
Overcoming Scaling Issues in Quantum Numerical Methods
Performance Improvement Strategies
Proactive Load Adjustment Strategies
Production Optimization Methods for Field Operators
Quantization Techniques for Large-Scale Data Processing
Resource-Efficient Load Management
Resource Optimization Strategies
Rest Strategies for High Performers in 2025
Run Time Optimization in Solar Site Operations
Signal Stacking Strategies for Better Results
Simple ERP Optimization Techniques
Smart Load Allocation Algorithms
Solar Farm Network Performance Strategies
Stanford Method for Improving Open LLM Performance
Stochastic Optimization Methods
Strategies for Improving Fusion Reactor Performance
Strategies for Improving Midstream Oil & Gas Performance
Strategies for Optimizing Analytical Methods
Strategies for Optimizing Models
Strategies for Quantum Circuit Execution in Noisy Environments
Strategies for Results-Driven Energy Management
Strategies to Address EV Performance Challenges
Strategies to Address Operational Inefficiencies
Strategies to Boost BAL 2025 Performance
Strategies to Improve Delivery Performance
Strategies to Improve Inverter Performance
Strategies to Improve IT Infrastructure Performance
Strategies to Improve Physical Performance Consistency
Strategies to Improve String Handling in Algorithms
Strategies to Optimize Feed-to-Weight Conversion Ratio
Strategies to Prevent Network Bandwidth Bottlenecks in 2025
Streamlining Engineering While Maintaining Performance
Sustainable Load Management Practices
Techniques for Solar Plant Performance Assessment
Techniques to Boost XR Performance and Realism
Techniques to Streamline Large Language Model Performance
Testing Methods for Scaling LLM Performance
Tips for Cloud Optimization Strategies
Tips for Database Performance Optimization
Tips for Optimizing Apache Spark Performance
Tips for Optimizing App Performance Testing
Tips for Optimizing Images to Improve Load Times
Tips for Optimizing LLM Performance
Tips for Performance Optimization in C++
Tips for Real-Time Performance Tracking
Tips to Improve Performance in .Net
Tips to Improve Spark Job Execution Speed
Using I-V Curve Tracing for Solar PV Optimization
Using Models for Energy Performance Analysis
Wind Load Performance Analysis
展开
收起
浏览分类
Hospitality & Tourism
Finance
Soft Skills & Emotional Intelligence
Project Management
Education
Technology
Leadership
Ecommerce
User Experience
Recruitment & HR
Customer Experience
Real Estate
Marketing
Sales
Retail & Merchandising
Science
Supply Chain Management
Future Of Work
Consulting
Writing
Economics
Artificial Intelligence
Employee Experience
Healthcare
Workplace Trends
Fundraising
Networking
Corporate Social Responsibility
Negotiation
Communication
Engineering
Career
Business Strategy
Change Management
Organizational Culture
Design
Innovation
Event Planning
Training & Development
展开
收起
领英
© 2026
关于
无障碍模式
用户协议
隐私政策
Cookie 政策
版权政策
品牌政策
访客设置
社区准则
العربية (阿拉伯语)
বাংলা (孟加拉语)
Čeština (捷克语)
Dansk (丹麦语)
Deutsch (德语)
Ελληνικά (希腊语)
English (英语)
Español (西班牙语)
فارسی (波斯语)
Suomi (芬兰语)
Français (法语)
हिंदी (印地语)
Magyar (匈牙利语)
Bahasa Indonesia (印尼语)
Italiano (意大利语)
עברית (希伯来语)
日本語 (日语)
한국어 (韩语)
मराठी (马拉地语)
Bahasa Malaysia (马来语)
Nederlands (荷兰语)
Norsk (挪威语)
ਪੰਜਾਬੀ (旁遮普语)
Polski (波兰语)
Português (葡萄牙语)
Română (罗马尼亚语)
Русский (俄语)
Svenska (瑞典语)
తెలుగు (泰卢固语)
ภาษาไทย (泰语)
Tagalog (他加禄语)
Türkçe (土耳其语)
Українська (乌克兰语)
Tiếng Việt (越南语)
简体中文 (简体中文)
正體中文 (繁体中文)
语言