Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation

Minghan Li; Shuai Li; Lida Li; Lei Zhang

doi:10.1109/cvpr46437.2021.01106

RDLNetworkEkosistem

Hakkımızda SSS

Giriş yap Başla

Hakkımızda SSS Gizlilik Şartlar İletişim

Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation — Minghan Li (2021) | RDL Network

Back

Home
Publications
Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation

Shared by

Lei Zhang

Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation

Article 2021 en

Authors

ML
Minghan Li
SL
Shuai Li
LL
Lida Li

Abstract

1 min read

Modern one-stage video instance segmentation networks suffer from two limitations. First, convolutional features are neither aligned with anchor boxes nor with ground-truth bounding boxes, reducing the mask sensitivity to spatial location. Second, a video is directly divided into individual frames for frame-level instance segmentation, ignoring the temporal correlation between adjacent frames. To address these issues, we propose a simple yet effective one-stage video instance segmentation framework by spatial calibration and temporal fusion, namely STMask. To ensure spatial feature calibration with ground-truth bounding boxes, we first predict regressed bounding boxes around ground-truth bounding boxes, and extract features from them for frame-level instance segmentation. To further explore temporal correlation among video frames, we aggregate a temporal fusion module to infer instance masks from each frame to its adjacent frames, which helps our frame-work to handle challenging videos such as motion blur, partial occlusion and unusual object-to-camera poses. Experiments on the YouTube-VIS valid set show that the proposed STMask with ResNet-50/-101 backbone obtains 33.5 % / 36.8 % mask AP, while achieving 28.6 / 23.4 FPS on video instance segmentation. The code is released online https://github.com/MinghanLi/STMask.

Discussion(0)

No comments yet. Be the first to comment.

Publication Info

DOI: 10.1109/cvpr46437.2021.01106
Year: 2021
Published: —
Language: en

Article Details

Link Of The Paper: https://doi.org/10.1109/cvpr46437.2021.01106

Timeline

Created:June 19, 2026

Related publications

Article2024

Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation

Abstract

Discussion(0)

Related publications

UniVS: Unified and Universal Video Segmentation with Prompts as Queries

Box2Mask: Box-Supervised Instance Segmentation via Level-Set Evolution

Spatial-Temporal Color Video Reconstruction From Noisy CFA Sequence

Box-supervised Instance Segmentation with Level Set Evolution

Improvement of Color Video Demosaicking in Temporal Domain