Motion is a strong cue for unsupervised object-level grouping. In this paper, we demonstrate that motion will be exploited most effectively, if it is regarded over larger time windows. Opposed to classical two-frame optical flow, point trajectories that span hundreds of frames are less susceptible to short-term variations that hinder separating different objects. As a positive side effect, the resulting groupings are temporally consistent over a whole video shot, a property that requires tedious post-processing in the vast majority of existing approaches. We suggest working with a paradigm that starts with semi-dense motion cues first and that fills up textureless areas afterwards based on color. This paper also contributes the Freiburg-Berkeley motion segmentation (FBMS) dataset, a large, heterogeneous benchmark with 59 sequences and pixel-accurate ground truth annotation of moving objects.
Chunhui Gu, Chen Sun, David A. Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, Jitendra Malik
Discussion(0)
No comments yet. Be the first to comment.