A Sensing Policy Based on Confidence Bounds and a Restless Multi-Armed\n Bandit Model

Jan Oksanen; Visa Koivunen; H Vincent Vincent Poort

doi:10.48550/arxiv.1211.4384

Abstract

1 min read

A sensing policy for the restless multi-armed bandit problem with stationary\nbut unknown reward distributions is proposed. The work is presented in the\ncontext of cognitive radios in which the bandit problem arises when deciding\nwhich parts of the spectrum to sense and exploit. It is shown that the proposed\npolicy attains asymptotically logarithmic weak regret rate when the rewards are\nbounded independent and identically distributed or finite state Markovian.\nSimulation results verifying uniformly logarithmic weak regret are also\npresented. The proposed policy is a centrally coordinated index policy, in\nwhich the index of a frequency band is comprised of a sample mean term and a\nconfidence term. The sample mean term promotes spectrum exploitation whereas\nthe confidence term encourages exploration. The confidence term is designed\nsuch that the time interval between consecutive sensing instances of any\nsuboptimal band grows exponentially. This exponential growth between suboptimal\nsensing time instances leads to logarithmically growing weak regret. Simulation\nresults demonstrate that the proposed policy performs better than other similar\nmethods in the literature.\n

A Sensing Policy Based on Confidence Bounds and a Restless Multi-Armed\n Bandit Model

Abstract

Discussion(0)

Open reviews(0)

Related publications

A sensing policy based on confidence bounds and a restless multi-armed bandit model

Reinforcement learning based distributed multiagent sensing policy for cognitive radio networks

Medium access in cognitive radio networks: A competitive multi-armed bandit framework

Exploiting spatial diversity in multiagent reinforcement learning based spectrum sensing

A framework for optimizing COVID-19 testing policy using a Multi Armed Bandit approach

Related publications

Article2012
A sensing policy based on confidence bounds and a restless multi-armed bandit model
Article2012

Article2011
Reinforcement learning based distributed multiagent sensing policy for cognitive radio networks
Article2011

Article2008
Medium access in cognitive radio networks: A competitive multi-armed bandit framework
Article2008

Article2011
Exploiting spatial diversity in multiagent reinforcement learning based spectrum sensing
Article2011

Preprint2020
A framework for optimizing COVID-19 testing policy using a Multi Armed Bandit approach
Preprint2020