Memory-Augmented Representation for Efficient Event-based Visuomotor Policy Learning with Adaptive Perception and Control
Abstract
Event-based cameras are well-suited for fast and agile autonomous navigation due to their ultra-fast, microsecond-level temporal resolution. However, fully leveraging this potential requires highly efficient processing algorithms capable of asynchronous, event-by-event representations and policy updates. Current methods employ synchronous dense representation or process events in a fixed-rate time windows, leading to inefficiencies via redundant computation. We address this by proposing an end-to-end framework for event-to-control policy learning designed for reactive navigation tasks. Our method consists of a memory-augmented perception module that updates the representation asynchronously and adaptively selects the number of events to process. Using the memory representation, a lightweight policy module is jointly optimized with the perception module, and learns to predict control commands at rates that dynamically adjust to scene complexity in an event-based reinforcement learning setting. Evaluations on simulated drone navigation tasks demonstrate higher sample efficiency and robustness compared to dense frame-based methods. Moreover, our approach significantly reduces computational complexity by minimizing processing steps and event counts while maintaining competitive performance against state-of-the-art event-based methods.