Abstract

Imitation learning (IL) enables efficient skill acquisition from demonstrations but often struggles with long-horizon tasks and high-precision control due to compounding errors. Residual policy learning offers a promising, model-agnostic solution by refining a base policy through closed-loop corrections. However, existing approaches primarily focus on local corrections to the base policy, lacking a global understanding of state evolution, which limits robustness and generalization to unseen scenarios. To address this, we propose incorporating global dynamics modeling to guide residual policy updates. Specifically, we leverage Koopman operator theory to impose linear time-invariant structure in a learned latent space, enabling reliable state transitions and improved extrapolation for long-horizon prediction and unseen environments. We introduce KORR (Koopman-guided Online Residual Refinement), a simple yet effective framework that conditions residual corrections on Koopman-predicted latent states, enabling globally informed and stable action refinement. We evaluate KORR on long-horizon, fine-grained robotic furniture assembly tasks under various perturbations. Results demonstrate consistent gains in performance, robustness, and generalization over strong baselines. Our findings further highlight the potential of Koopman-based modeling to bridge modern learning methods with classical control theory.

Introduction

🤔 Conventional residual policies sample corrections only near base actions, resulting in limited global awareness and poor extrapolation to novel or unseen situations. Consequently, when base actions deviate substantially due to model uncertainty, residual policies often fail to recover, regardless of the base policy's quality.
🤔 Koopman operator theory provides a compelling framework by lifting complex nonlinear dynamics into a linear latent space. In this lifted space, inherently nonlinear and coupled dynamics are represented as finite-dimensional, decoupled, and time-invariant linear transitions, which enables more reliable and globally consistent modeling of motion dynamics. It also alleviates the exponential instabilities often encountered in nonlinear systems, facilitating more stable online training.

Method

Koopman-guided Online Residual Refinement

Overview of KORR. The base policy predicts a chunk of base actions at a lower frequency, while KORR refines these actions step-by-step at a higher control rate. For each base action \( \boldsymbol{a}_{\text{base}_t} \), KORR extrapolates the next possible state \( \mathbf{z}_{t+1} \) using Koopman dynamics as an imagined future state, and then conditions the residual policy on this state to generate a corrective residual action \( \boldsymbol{a}_{\text{res}_t} \). The final executed action, obtained by combining the base and residual actions, forms a closed-loop refinement that enhances robustness and generalization.

Koopman-Guided Dynamics Modeling : The Koopman operator, represented by a matrix \( \mathbf{K} \), captures the linear evolution of system dynamics in the lifted latent space. We decompose \( \mathbf{K} \) into two matrices, \( \mathbf{A} \) and \( \mathbf{B} \), corresponding to the contributions from the current state \( \mathbf{z}_t \) and the control input \( \boldsymbol{a}_t \), respectively. \begin{equation} \mathbf{A} \cdot \mathbf{z}_t + \mathbf{B} \cdot \boldsymbol{a}_t = \mathbf{z}_{t+1} \label{equation:method_koopman_function} \end{equation} To learn the Koopman operators \( \mathbf{A} \), \( \mathbf{B} \), and the lift function parameters \( \theta \), we optimize them via gradient backpropagation to minimize the model prediction loss \( \mathcal{L} \). Specifically, given a dataset \( \mathcal{D} = {(\mathbf{x}_0, \boldsymbol{a}_0), \dots, (\mathbf{x}_M, \boldsymbol{a}_M)} \) of state-action trajectories, where \( \boldsymbol{a} \) (equivalently \( \boldsymbol{a}_\text{exe} \)) denotes the action executed in the environment, the objective is to minimize the MSE loss as follows: \begin{equation} \mathcal{L}_{\text{kpm}} = \mathbb{E}_{t\sim\mathcal{D}}||\mathbf{z}_{t+1}-(\mathbf{A} \cdot \mathbf{z}_t+\mathbf{B} \cdot \boldsymbol{a}_t)||^2 \ ; \ \mathbf{z}_{t+1} = g_{\theta}(\mathbf{x}_{t+1}) \label{equation:method_koopman_loss} \end{equation}

Residual Policy through Koopman Imagination : Using Koopman dynamics, we first project the next imagined state \( \mathbf{z}^{\text{base}}_{t+1} \) based on executing \( \boldsymbol{a}_{\text{base}_t} \). Then, the residual policy \( \pi_{\text{res}} \) conditions on this imagined state to generate the residual action \( \boldsymbol{a}_{\text{res}_t} \). Finally, the executable action \( \boldsymbol{a}_{\text{exe}_t} \) is computed by summing the base action \( \boldsymbol{a}_{\text{base}_t} \) and the residual action \( \boldsymbol{a}_{\text{res}_t} \). \begin{equation} \mathbf{A} \cdot g_{\theta}(\mathbf{x}_t) + \mathbf{B} \cdot \boldsymbol{a}_{\text{base}_t} = \mathbf{z}^{\text{base}}_{t+1} \ ;\ \boldsymbol{a}_{\text{res}_t} = \pi_{\text{res}}(\mathbf{z}^{\text{base}}_{t+1}) \label{equation:method_koopman_imaginary} \end{equation}

Experiment

Our experiments are structured to address the following key research questions: RQ1: Does KORR improve robustness and performance over traditional residual policies? RQ2: Does Koopman modeling offer advantages over nonlinear dynamics for residual learning? RQ3: Which design choices in KORR are critical for stable performance?

General Robustness and Performance Study

Linear Benefit Study

Additional Ablation Study

BibTeX

@misc{gong2025robustonlineresidualrefinement,
      title={Robust Online Residual Refinement via Koopman-Guided Dynamics Modeling}, 
      author={Zhefei Gong and Shangke Lyu and Pengxiang Ding and Wei Xiao and Donglin Wang},
      year={2025},
      eprint={2509.12562},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2509.12562}, 
}