IVAC-$\mathbf {P^{2}L}$: Leveraging Irregular Repetition Priors for Improving Video Action Counting
Article 2025 en
Authors
HW
Hang Wang
ZC
Zhi-Qi Cheng
YD
Youtian Du
Abstract
1 min read
The quantification of repetitive actions in videos, a task commonly referred to as Video Action Counting (VAC), is a critical challenge in understanding and analyzing content in sports, fitness, and daily activities. Traditional approaches to VAC have largely overlooked the nuanced irregularities inherent in action repetitions, such as interruptions and variable lengths between cycles. Addressing this gap, our study introduces a novel perspective on VAC, focusing on Irregular Video Action Counting (IVAC), which emphasizes the importance of modeling the irregular repetition priors present in video content. We conceptualize these priors through two key aspects: <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Inter-cycle Consistency</i> and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Cycle-interval Inconsistency</i>. Inter-cycle Consistency ensures that the <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">spatiotemporal</b> representations across all cycle segments in a video <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">remain</b> homogeneous, thereby reflecting the uniformity of actions between <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">different cycle segments</b>. In contrast, Cycle-interval Inconsistency mandates a clear semantic distinction between the representations of cycle segments and intervals, acknowledging the inherent dissimilarities in content. To effectively encapsulate these priors, we introduce a novel methodology consisting of consistency and inconsistency modules, underpinned by a tailored pull-push loss (<inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\mathrm {P^{2}~L}$</tex-math></inline-formula>) mechanism. This approach employs a pull loss to enhance the cohesion among cycle segment features and a push loss to distinctly differentiate between cycle and interval segment features. Empirical evaluations on the RepCount dataset illustrate that our IVAC-<inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\mathrm {P^{2}~L}$</tex-math></inline-formula> model sets a new benchmark in state-of-the-art performance for the VAC task. Moreover, our model demonstrates adaptability and generalization across diverse video content, achieving superior performance on two additional datasets, UCFRep and Countix, without necessitating dataset-specific fine-tuning. These findings not only validate the effectiveness of our approach in addressing the complexities of irregular repetitions in videos but also open new avenues for future research in video understanding and analysis.
Discussion(0)
No comments yet. Be the first to comment.