Active Learning for Video Description With Cluster-Regularized Ensemble\n Ranking
Preprint 2020
Authors
DC
David M. Chan
SV
Sudheendra Vijayanarasimhan
DR
David A. Ross
Abstract
1 min read
Automatic video captioning aims to train models to generate text descriptions\nfor all segments in a video, however, the most effective approaches require\nlarge amounts of manual annotation which is slow and expensive. Active learning\nis a promising way to efficiently build a training set for video captioning\ntasks while reducing the need to manually label uninformative examples. In this\nwork we both explore various active learning approaches for automatic video\ncaptioning and show that a cluster-regularized ensemble strategy provides the\nbest active learning approach to efficiently gather training sets for video\ncaptioning. We evaluate our approaches on the MSR-VTT and LSMDC datasets using\nboth transformer and LSTM based captioning models and show that our novel\nstrategy can achieve high performance while using up to 60% fewer training data\nthan the strong state of the art baselines.\n
Discussion(0)
No comments yet. Be the first to comment.