论文标题

多尺度传感器融合和与神经CDE的连续控制

Multiscale Sensor Fusion and Continuous Control with Neural CDEs

论文作者

Singh, Sumeet, Ramirez, Francis McCann, Varley, Jacob, Zeng, Andy, Sindhwani, Vikas

论文摘要

尽管机器人学习通常是根据离散时间马尔可夫决策过程(MDP)来制定的,但物理机器人需要几乎连续的多尺度反馈控制。 Machines operate on multiple asynchronous sensing modalities, each with different frequencies, e.g., video frames at 30Hz, proprioceptive state at 100Hz, force-torque data at 500Hz, etc. While the classic approach is to batch observations into fixed-time windows then pass them through feed-forward encoders (e.g., with deep networks), we show that there exists a more elegant approach -- one that treats policy learning as modeling连续时间潜在状态动力学。具体而言,我们提出了“ Infuser”,这是一种统一的体系结构,它通过神经控制的微分方程(CDE)训练连续的时光。随着时间的流逝,Infuser通过(在)tegring和(Fus)进行多感觉观察(以不同的频率到达)并在连续时间推断动作来演变出单个潜在状态表示。这实现了可以对多频率多感觉反馈做出反应的政策,以实现真正的端到端视觉运动控制,而无需离散的时间假设。行为克隆实验表明,Infuser学习了动态任务的强大策略(例如,将球摇摆到杯子中)在某些情况下的表现尤其超过了几个基线,在这种情况下,从一个传感模式的观察结果可能比其他一些传感式的间隔更为稀疏。

Though robot learning is often formulated in terms of discrete-time Markov decision processes (MDPs), physical robots require near-continuous multiscale feedback control. Machines operate on multiple asynchronous sensing modalities, each with different frequencies, e.g., video frames at 30Hz, proprioceptive state at 100Hz, force-torque data at 500Hz, etc. While the classic approach is to batch observations into fixed-time windows then pass them through feed-forward encoders (e.g., with deep networks), we show that there exists a more elegant approach -- one that treats policy learning as modeling latent state dynamics in continuous-time. Specifically, we present 'InFuser', a unified architecture that trains continuous time-policies with Neural Controlled Differential Equations (CDEs). InFuser evolves a single latent state representation over time by (In)tegrating and (Fus)ing multi-sensory observations (arriving at different frequencies), and inferring actions in continuous-time. This enables policies that can react to multi-frequency multi sensory feedback for truly end-to-end visuomotor control, without discrete-time assumptions. Behavior cloning experiments demonstrate that InFuser learns robust policies for dynamic tasks (e.g., swinging a ball into a cup) notably outperforming several baselines in settings where observations from one sensing modality can arrive at much sparser intervals than others.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源