Imitation learning (IL) enables efficient skill acquisition from demonstrations but often struggles with long-horizon tasks and high-precision control due to compounding errors. Residual policy learning offers a promising, model-agnostic solution by refining a base policy through closed-loop corrections. However, existing approaches primarily focus on local corrections to the base policy, lacking a global understanding of state evolution, which limits robustness and generalization to unseen scenarios. To address this, we propose incorporating global dynamics modeling to guide residual policy updates. Specifically, we leverage Koopman operator theory to impose linear time-invariant structure in a learned latent space, enabling reliable state transitions and improved extrapolation for long-horizon prediction and unseen environments. We introduce KORR (Koopman-guided Online Residual Refinement), a simple yet effective framework that conditions residual corrections on Koopman-predicted latent states, enabling globally informed and stable action refinement. We evaluate KORR on long-horizon, fine-grained robotic furniture assembly tasks under various perturbations. Results demonstrate consistent gains in performance, robustness, and generalization over strong baselines. Our findings further highlight the potential of Koopman-based modeling to bridge modern learning methods with classical control theory.
🤔 Conventional residual policies sample corrections only near base actions, resulting in limited global awareness and poor extrapolation to novel or unseen situations.
Consequently, when base actions deviate substantially due to model uncertainty, residual policies often fail to recover, regardless of the base policy's quality.
🤔 Koopman operator theory provides a compelling framework by lifting complex nonlinear dynamics into a linear latent space.
In this lifted space, inherently nonlinear and coupled dynamics are represented as finite-dimensional, decoupled, and time-invariant linear transitions, which enables more reliable and globally consistent modeling of motion dynamics.
It also alleviates the exponential instabilities often encountered in nonlinear systems, facilitating more stable online training.
Overview of KORR. The base policy predicts a chunk of base actions at a lower frequency, while KORR refines these actions step-by-step at a higher control rate. For each base action \( \boldsymbol{a}_{\text{base}_t} \), KORR extrapolates the next possible state \( \mathbf{z}_{t+1} \) using Koopman dynamics as an imagined future state, and then conditions the residual policy on this state to generate a corrective residual action \( \boldsymbol{a}_{\text{res}_t} \). The final executed action, obtained by combining the base and residual actions, forms a closed-loop refinement that enhances robustness and generalization.
@misc{gong2025robustonlineresidualrefinement,
title={Robust Online Residual Refinement via Koopman-Guided Dynamics Modeling},
author={Zhefei Gong and Shangke Lyu and Pengxiang Ding and Wei Xiao and Donglin Wang},
year={2025},
eprint={2509.12562},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2509.12562},
}