A Mutation-Based Method for Backdoored Sample Detection Without Clean Data
Article 2024 en
Authors
LC
Luoyu Chen
FW
Feng Wu
TZ
Tao Zhang
Abstract
1 min read
Backdoor attacks significantly threaten machine learning-based vision systems. Existing detection methods typically require clean data from a similar distribution as the dataset under inspection, limiting practical deployment. This work proposes a Mutation-Based Method (MBM) for detecting and filtering backdoored samples in image training dataset, without referencing any external clean data. MBM aims at distinguishing backdoored and benign samples distribution via their distinct stability in feature space under certain data augmentations. Firstly, MBM applies multiple data augmentation techniques, generating mutated versions of each sample to ‘deactivate’ potential triggers while maintaining natural semantics not heavily distorted. Secondly, MBM measures how sample features diverge after mutating from its origin as poison score, which we call ‘Feature Stability’. Thirdly, by analyzing extreme scores within each class, MBM effectively identifies the backdoored class, and isolates samples not from backdoored class as clean data. Finally, a benign distribution is fit to benchmark against backdoored samples from backdoored class. We validated MBM on the CIFAR-10 dataset, achieving a true positive rate above 95% and a false positive rate below 0.2% for all defense settings. Our results confirm MBM’s efficacy without reliance on external clean data.
Discussion(0)
No comments yet. Be the first to comment.