Optimal Adversarial Policies in the Multiplicative Learning System With a Malicious Expert

S. Rasoul Etesami; Negar Kiyavash; Vincent Léon; H Vincent Vincent Poort

doi:10.1109/tifs.2021.3052360

Abstract

1 min read

We consider a learning system based on the conventional multiplicative weight (MW) rule that combines experts' advice to predict a sequence of true outcomes. It is assumed that one of the experts is malicious and aims to impose the maximum loss on the system. The system's loss is naturally defined to be the aggregate absolute difference between the sequence of predicted outcomes and the true outcomes. We consider this problem under both offline and online settings. In the offline setting where the malicious expert must choose its entire sequence of decisions a priori, we show somewhat surprisingly that a simple greedy policy of always reporting false prediction is asymptotically optimal with an approximation ratio of 1+O√(ln N)/N, where N is the total number of prediction stages. In particular, we describe a policy that closely resembles the structure of the optimal offline policy. For the online setting where the malicious expert can adaptively make its decisions, we show that the optimal online policy can be efficiently computed by solving a dynamic program in O(N <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">3</sup> ). We also discuss a generalization of our model to multi-expert settings. Our results provide a new direction for vulnerability assessment of commonly-used learning algorithms to internal adversarial attacks.

Optimal Adversarial Policies in the Multiplicative Learning System With a Malicious Expert

Abstract

Discussion(0)

Related publications

Toward Optimal Adversarial Policies in the Multiplicative Learning System with a Malicious Expert

Malicious Experts versus the multiplicative weights algorithm in online prediction

Malicious Experts Versus the Multiplicative Weights Algorithm in Online Prediction

Secrecy throughput of MANETs with malicious nodes

Efficient Reinforcement Learning With Impaired Observability: Learning to Act With Delayed and Missing State Observations

Related publications

Preprint2020
Toward Optimal Adversarial Policies in the Multiplicative Learning System with a Malicious Expert
Preprint2020

Preprint2020
Malicious Experts versus the multiplicative weights algorithm in online prediction
Preprint2020

Article2020
Malicious Experts Versus the Multiplicative Weights Algorithm in Online Prediction
Article2020

Article2009
Secrecy throughput of MANETs with malicious nodes
Article2009

Article2024
Efficient Reinforcement Learning With Impaired Observability: Learning to Act With Delayed and Missing State Observations
Article2024