多尺度传感器融合和与神经CDE的连续控制

论文标题

多尺度传感器融合和与神经CDE的连续控制

Multiscale Sensor Fusion and Continuous Control with Neural CDEs

论文作者

Singh, Sumeet, Ramirez, Francis McCann, Varley, Jacob, Zeng, Andy, Sindhwani, Vikas

论文摘要

尽管机器人学习通常是根据离散时间马尔可夫决策过程（MDP）来制定的，但物理机器人需要几乎连续的多尺度反馈控制。 Machines operate on multiple asynchronous sensing modalities, each with different frequencies, e.g., video frames at 30Hz, proprioceptive state at 100Hz, force-torque data at 500Hz, etc. While the classic approach is to batch observations into fixed-time windows then pass them through feed-forward encoders (e.g., with deep networks), we show that there exists a more elegant approach -- one that treats policy learning as modeling连续时间潜在状态动力学。具体而言，我们提出了“ Infuser”，这是一种统一的体系结构，它通过神经控制的微分方程（CDE）训练连续的时光。随着时间的流逝，Infuser通过（在）tegring和（Fus）进行多感觉观察（以不同的频率到达）并在连续时间推断动作来演变出单个潜在状态表示。这实现了可以对多频率多感觉反馈做出反应的政策，以实现真正的端到端视觉运动控制，而无需离散的时间假设。行为克隆实验表明，Infuser学习了动态任务的强大策略（例如，将球摇摆到杯子中）在某些情况下的表现尤其超过了几个基线，在这种情况下，从一个传感模式的观察结果可能比其他一些传感式的间隔更为稀疏。

Though robot learning is often formulated in terms of discrete-time Markov decision processes (MDPs), physical robots require near-continuous multiscale feedback control. Machines operate on multiple asynchronous sensing modalities, each with different frequencies, e.g., video frames at 30Hz, proprioceptive state at 100Hz, force-torque data at 500Hz, etc. While the classic approach is to batch observations into fixed-time windows then pass them through feed-forward encoders (e.g., with deep networks), we show that there exists a more elegant approach -- one that treats policy learning as modeling latent state dynamics in continuous-time. Specifically, we present 'InFuser', a unified architecture that trains continuous time-policies with Neural Controlled Differential Equations (CDEs). InFuser evolves a single latent state representation over time by (In)tegrating and (Fus)ing multi-sensory observations (arriving at different frequencies), and inferring actions in continuous-time. This enables policies that can react to multi-frequency multi sensory feedback for truly end-to-end visuomotor control, without discrete-time assumptions. Behavior cloning experiments demonstrate that InFuser learns robust policies for dynamic tasks (e.g., swinging a ball into a cup) notably outperforming several baselines in settings where observations from one sensing modality can arrive at much sparser intervals than others.

下载PDF全文

下载文献需遵守相关版权规定

论文标题