Rep-MMB: Bridging Mobile CNN and Transformer for Sensor-Based Human Activity Recognition

Jinsheng Liu; L Zhang; Xin Liu; Guangjie Chen; Zenan Fu; W R Huang; Hao Wu; Aiguo Song

doi:10.1109/jiot.2026.3685081

Abstract

1 min read

Lightweight CNNs and Transformers have shown great promise in sensor-based human activity recognition (HAR), yet their structural synergies remain underexplored. This paper bridges this gap by integrating the MetaFormer paradigm—a general architecture abstracted from Transformers that structurally separates token mixing (i.e., self-attention) and channel mixing (i.e., feed-forward networks)—into efficient CNN design. While MetaFormer offers a powerful inductive bias, its standard self-attention mechanism is often computationally intensive for resource-constrained HAR. To address this, we revolutionize the classic MobileNetV3 architecture from a MetaFormer perspective, introducing Rep-MMB, a new family of pure lightweight CNNs. By leveraging structural reparameterization, Rep-MMB decouples multi-branch training-time complexity from efficient single-branch inference, enabling high accuracy with low latency. Evaluations on four public HAR benchmarks show that Rep-MMB outperforms state-of-the-art lightweight models in accuracy and efficiency, with practical validation on embedded devices. We hope that Rep-MMB may serve as a strong baseline to inspire future edge-deployed HAR research.

Rep-MMB: Bridging Mobile CNN and Transformer for Sensor-Based Human Activity Recognition

Abstract

Discussion(0)

Related publications

TSA-Former: Linear Transformer with Taylor Series Attention for Sensor-Based Human Activity Recognition

Dual-Branch Interactive Networks on Multichannel Time Series for Human Activity Recognition

Deep Neural Networks for Sensor-Based Human Activity Recognition Using Selective Kernel Convolution

MaskCAE: Masked Convolutional AutoEncoder via Sensor Data Reconstruction for Self-Supervised Human Activity Recognition

Revisiting Large-Kernel CNN Design via Structural Re-Parameterization for Sensor-Based Human Activity Recognition

Related publications

Article2026
TSA-Former: Linear Transformer with Taylor Series Attention for Sensor-Based Human Activity Recognition
Article2026

Article2022
Dual-Branch Interactive Networks on Multichannel Time Series for Human Activity Recognition
Article2022

Article2021
Deep Neural Networks for Sensor-Based Human Activity Recognition Using Selective Kernel Convolution
Article2021

Article2024
MaskCAE: Masked Convolutional AutoEncoder via Sensor Data Reconstruction for Self-Supervised Human Activity Recognition
Article2024

Article2024
Revisiting Large-Kernel CNN Design via Structural Re-Parameterization for Sensor-Based Human Activity Recognition
Article2024