Diffusion Based Learning Papers
A selection of exciting new research papers in machine learning and robotics.
Imitating Human Behaviour with Diffusion Models
Authors: Tim Pearce, Tabish Rashid, Anssi Kanervisto, Dave Bignell, Mingfei Sun, et al.
Summary:
This paper explores the use of diffusion models for imitation learning, specifically for modeling human behavior in sequential environments. The authors argue that traditional behavior cloning approaches are limited in expressiveness and introduce biases. Diffusion models, commonly used in text-to-image generation, are leveraged to more accurately capture the full distribution of human actions. By introducing architectural designs and novel sampling strategies, the approach successfully models human demonstrations in complex environments such as robotic control tasks and 3D video games. Experiments show that diffusion models outperform other behavior cloning methods in replicating human actions with high fidelity.
Link: Imitating Human Behaviour with Diffusion Models(2301.10677v2)
Goal-Conditioned Imitation Learning using Score-based Diffusion Policies (BESO)
Authors: Moritz Reuss, Maximilian Li, Xiaogang Jia, and Rudolf Lioutikov
Summary:
This paper introduces BESO (BEhavior generation with ScOre-based Diffusion Policies), a novel approach to goal-conditioned imitation learning (GCIL) using score-based diffusion models (SDMs). BESO decouples score learning from inference, enabling fast behavior generation with only three denoising steps, compared to 30+ steps required by other diffusion-based policies. BESO is expressive enough to capture multi-modality in solution spaces and can learn both goal-dependent and goal-independent behaviors simultaneously from uncurated play data, making it highly effective in environments like the Relay Kitchen and Block-Push benchmarks. The approach significantly outperforms state-of-the-art GCIL methods.
Link: Goal-Conditioned Imitation Learning using Score-based Diffusion Policies(2304.02532v2)
Generative Adversarial Imitation Learning
Authors: Jonathan Ho, Stefano Ermon
Summary:
This paper introduces Generative Adversarial Imitation Learning (GAIL), a model-free imitation learning algorithm inspired by generative adversarial networks (GANs). The authors propose an alternative to inverse reinforcement learning (IRL), aiming to directly learn policies from expert demonstrations without the intermediate step of recovering a cost function. By using adversarial training, GAIL learns policies that mimic expert behavior in high-dimensional, continuous environments. The paper shows that GAIL outperforms existing model-free imitation learning methods across various control tasks, especially in complex physics-based environments.
Link: Generative Adversarial Imitation Learning
Octo: An Open-Source Generalist Robot Policy
Authors: Dibya Ghosh, Homer Walke, Karl Pertsch, Kevin Black, Oier Mees, et al.
Summary:
The paper introduces Octo, a large-scale transformer-based generalist robot policy trained on 800k trajectories from the Open X-Embodiment dataset. Octo is capable of controlling multiple robotic systems across different tasks, such as manipulation and navigation, and can be fine-tuned to new sensory inputs and action spaces. This policy supports language and goal-based commands and is designed to generalize across various robot platforms.
Link: Octo: An Open-Source Generalist Robot Policy
Scaling Cross-Embodied Learning: One Policy for Manipulation, Navigation, Locomotion, and Aviation
Authors: Ria Doshi, Homer Walke, Oier Mees, Sudeep Dasari, Sergey Levine
Summary:
This paper introduces CrossFormer, a transformer-based policy that scales across multiple robotic systems, including manipulation robots, wheeled robots, and quadcopters. The policy is trained on over 900,000 trajectories from 20 different types of robots, making it generalizable across various embodiments and tasks. CrossFormer enables a single policy to perform complex tasks such as manipulation, navigation, and flight control, achieving performance levels comparable to specialist policies designed for specific robot types. This approach advances cross-embodied learning by allowing one policy to seamlessly control a wide variety of robot platforms, paving the way for more flexible and scalable robotic systems.