Active Learning for Video Description With Cluster-Regularized Ensemble\n Ranking

David M. Chan; Sudheendra Vijayanarasimhan; David A. Ross; John F Canny

doi:10.48550/arxiv.2007.13913

Back

Active Learning for Video Description With Cluster-Regularized Ensemble\n Ranking

Preprint 2020

Authors

DC
David M. Chan
SV
Sudheendra Vijayanarasimhan
DR
David A. Ross

Abstract

1 min read

Automatic video captioning aims to train models to generate text descriptions\nfor all segments in a video, however, the most effective approaches require\nlarge amounts of manual annotation which is slow and expensive. Active learning\nis a promising way to efficiently build a training set for video captioning\ntasks while reducing the need to manually label uninformative examples. In this\nwork we both explore various active learning approaches for automatic video\ncaptioning and show that a cluster-regularized ensemble strategy provides the\nbest active learning approach to efficiently gather training sets for video\ncaptioning. We evaluate our approaches on the MSR-VTT and LSMDC datasets using\nboth transformer and LSTM based captioning models and show that our novel\nstrategy can achieve high performance while using up to 60% fewer training data\nthan the strong state of the art baselines.\n

Discussion(0)

No comments yet. Be the first to comment.

Open reviews(0)

Public, signed peer feedback on this preprint.

No reviews yet.

Related publications

Preprint2021

Active Learning for Video Description with Cluster-Regularized Ensemble Ranking

David M. Chan, Sudheendra Vijayanarasimhan, David A. Ross, John F Canny

Preprint2024

Wolf: Dense Video Captioning with a World Summarization Framework

Boyi Li, Ligeng Zhu, Ran Tian, Shuhan Tan, Yuxiao Chen, Yao Lu, Yin Cui, Sushant Veer, Max Simon Ehrlich, Jonah Philion, Xinshuo Weng, Fuzhao Xue, Andrew Tao, Ming-Yu Liu, Sanja Fidler, Boris Ivanovic, Trevor Darrell, Jitendra Malik, Song Han, Marco Pavone

Preprint2020

Active Learning for Video Description With Cluster-Regularized Ensemble\n Ranking

Abstract

Discussion(0)

Open reviews(0)

Related publications

Active Learning for Video Description with Cluster-Regularized Ensemble Ranking

Wolf: Dense Video Captioning with a World Summarization Framework

Learning Long-term Visual Dynamics with Region Proposal Interaction\n Networks

What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics

What’s in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics