Rep-MMB: Bridging Mobile CNN and Transformer for Sensor-Based Human Activity Recognition
Article 2026
Authors
JL
Jinsheng Liu
LZ
L Zhang
XL
Xin Liu
Abstract
1 min read
Lightweight CNNs and Transformers have shown great promise in sensor-based human activity recognition (HAR), yet their structural synergies remain underexplored. This paper bridges this gap by integrating the MetaFormer paradigm—a general architecture abstracted from Transformers that structurally separates token mixing (i.e., self-attention) and channel mixing (i.e., feed-forward networks)—into efficient CNN design. While MetaFormer offers a powerful inductive bias, its standard self-attention mechanism is often computationally intensive for resource-constrained HAR. To address this, we revolutionize the classic MobileNetV3 architecture from a MetaFormer perspective, introducing Rep-MMB, a new family of pure lightweight CNNs. By leveraging structural reparameterization, Rep-MMB decouples multi-branch training-time complexity from efficient single-branch inference, enabling high accuracy with low latency. Evaluations on four public HAR benchmarks show that Rep-MMB outperforms state-of-the-art lightweight models in accuracy and efficiency, with practical validation on embedded devices. We hope that Rep-MMB may serve as a strong baseline to inspire future edge-deployed HAR research.
Discussion(0)
No comments yet. Be the first to comment.