Learning Long-term Visual Dynamics with Region Proposal Interaction\n Networks

Haozhi Qi; Xiaolong Wang; Deepak Pathak; Yi Ma; Jitendra Malik

doi:10.48550/arxiv.2008.02265

Abstract

1 min read

Learning long-term dynamics models is the key to understanding physical\ncommon sense. Most existing approaches on learning dynamics from visual input\nsidestep long-term predictions by resorting to rapid re-planning with\nshort-term models. This not only requires such models to be super accurate but\nalso limits them only to tasks where an agent can continuously obtain feedback\nand take action at each step until completion. In this paper, we aim to\nleverage the ideas from success stories in visual recognition tasks to build\nobject representations that can capture inter-object and object-environment\ninteractions over a long-range. To this end, we propose Region Proposal\nInteraction Networks (RPIN), which reason about each object's trajectory in a\nlatent region-proposal feature space. Thanks to the simple yet effective object\nrepresentation, our approach outperforms prior methods by a significant margin\nboth in terms of prediction quality and their ability to plan for downstream\ntasks, and also generalize well to novel environments. Code, pre-trained\nmodels, and more visualization results are available at https://haozhi.io/RPIN.\n

Learning Long-term Visual Dynamics with Region Proposal Interaction\n Networks

Abstract

Discussion(0)

Open reviews(0)

Related publications

Axiomatic Explanations for Visual Search, Retrieval, and Similarity\n Learning

What Matters to You? Towards Visual Representation Alignment for Robot Learning

Early visual learning induces long-lasting connectivity changes during rest in the human brain

Recurrent Network Models for Human Dynamics

Recurrent Network Models for Human Dynamics

Related publications

Preprint2021
Axiomatic Explanations for Visual Search, Retrieval, and Similarity\n Learning
Preprint2021

Preprint2023
What Matters to You? Towards Visual Representation Alignment for Robot Learning
Preprint2023

Article2013
Early visual learning induces long-lasting connectivity changes during rest in the human brain
Article2013

Article2015
Recurrent Network Models for Human Dynamics
Article2015

Preprint2015
Recurrent Network Models for Human Dynamics
Preprint2015