LiReN: Lifelong Autonomous Fine-Tuning for Navigation Foundation Models

UC Berkeley

LiReN is a navigation foundation model trained on diverse data with offline reinforcement learning and capable of autonomous fine-tuning in open-world settings.

Abstract

Recent works have proposed a number of general-purpose robotic foundation models that can control a variety of robotic platforms to perform a range of different tasks, including in the domains of navigation and manipulation. However, such models are typically trained via imitation learning, which precludes the ability to adapt autonomously through experience that the robot gathers on the job.

In this work, we train general-purpose robotic foundation models in the domain of robotic navigation specifically with the aim of enabling autonomous self-improvement. We show that a combination of pretraining with offline reinforcement learning and a complete system for continual autonomous operation leads to a robotic learning framework that not only starts off with broad and diverse capabilities, but can further specialize and adapt those capabilities in the course of carrying out navigational tasks in a given deployment location. To our knowledge, our model LiReN is the first navigation robot foundation model that is capable of fine-tuning with autonomous online data in open-world settings.

Offline Generalist Policy

Pre-training on a large pre-existing robot dataset yields a strong generalist navigation policy that matches previous state-of-the-art models while being amenable to fine-tuning with online RL.

Online Fine-Tuning

Our model can autonomously fine-tune its navigation policy to adapt to a new environment or embodiment using online reinforcement learning.

Online Fine-Tuning

Before autonomous fine-tuning, the robot occasionally collides with low-contrast obstacles that are hard to see in the target environment. After deploying the robot for autonomous fine-tuning it is able to successfully navigate without collisions.

Enabling Autonomous Learning

Enabling deployable autonomous learning requires a complete system to support robust operation in the real world. While deploying a model in the real world already requires goal/task selection and safety mechanisms to avoid problematic actions, online learning also requires we collect diverse experiences for the model to learn from.

Goal Selection

Goal Selection
Battery Management

Battery Management

Asynchronous Online Training

Asynchronous Online Training

Recovery Behavior

Recovery Behavior
Keepout Zones

Keepout Zones

Hover over a box to see its description.

System Overview

LiReN continually fine-tunes a pre-trained offline RL policy with new online data. To maintain a robust deployment, we wrap the policy with a finite state machine acting as an autonomy supervisor.


System Flowchart

BibTeX

@article{stachowicz2024lifelong,
  author    = {Stachowicz, Kyle and Ignatova, Lydia and Levine, Sergey},
  title     = {Lifelong Autonomous Fine-Tuning for Navigation Foundation Models},
  journal   = {arXiv preprint arXiv:TODO},
  year      = {2024},
}