Sensor-Prompt Tuning: Aligning Time Series Foundational Models With Motion Sensors for Few-Shot Activity Recognition

Xin Liu; Dongzhou Cheng; Zenan Fu; Lei Zhang; Hao Wu; Aiguo Song

doi:10.1109/tmc.2026.3664424

Abstract

1 min read

Inspired by recent success of foundation models in vision and language domains, time series foundation models (TSFMs) have garnered increasing attention in general time series analysis tasks like finance, weather, healthcare, and power. However, given high heterogeneity and severe annotation scarcity in time series sensor data, how to unlock the potential of large-scale general-purpose TSFMs for downstream activity recognition tasks remains yet unexplored? This paper makes the first attempt to address this timely challenge by adapting the self-supervised pre-trained TSFM (i.e., MOMENT) to few-shot activity recognition. We introduce a simple and efficient Sensor-Prompt Tuning (SPT) strategy, which employs multiple convolution-based sensor-friendly filters with a gating mechanism to act as learnable soft prompts, which can dynamically adapt sensor input space to the frozen TSFM backbone, effectively bridging domain gap between pre-training general time series data with wearable sensor stream. Extensive experiments across three public activity recognition benchmarks demonstrate that our SPT achieves up to 15.5% performance gains over existing state-of-the-art baselines under few-shot scenarios, while considerably outperforming other mainstream fine-tuning strategies with smaller than 1% of backbone parameters. Practical cloud-edge inference latencies are measured. This work offers a new prompt-tuning perspective on how to adapt pre-trained TSFMs for wearable activity recognition tasks. Code will be released.

Sensor-Prompt Tuning: Aligning Time Series Foundational Models With Motion Sensors for Few-Shot Activity Recognition

Abstract

Discussion(0)

Related publications

Dual-Branch Interactive Networks on Multichannel Time Series for Human Activity Recognition

R-TPT: Improving Adversarial Robustness of Vision-Language Models through Test-Time Prompt Tuning

BARE: Leveraging Base Language Models for Few-Shot Synthetic Data Generation

Knowledge Regularized Negative Feature Tuning of Vision-Language Models for Out-of-Distribution Detection

TSA-Former: Linear Transformer with Taylor Series Attention for Sensor-Based Human Activity Recognition

Related publications

Article2022
Dual-Branch Interactive Networks on Multichannel Time Series for Human Activity Recognition
Article2022

Preprint2025
R-TPT: Improving Adversarial Robustness of Vision-Language Models through Test-Time Prompt Tuning
Preprint2025

Preprint2025
BARE: Leveraging Base Language Models for Few-Shot Synthetic Data Generation
Preprint2025

Article2025
Knowledge Regularized Negative Feature Tuning of Vision-Language Models for Out-of-Distribution Detection
Article2025

Article2026
TSA-Former: Linear Transformer with Taylor Series Attention for Sensor-Based Human Activity Recognition
Article2026