${{\cal Q} {\cal D}}$-Learning: A Collaborative Distributed Strategy for Multi-Agent Reinforcement Learning Through ${\rm Consensus} + {\rm Innovations}$

Soummya Kar; José M. F. Moura; H Vincent Vincent Poort

doi:10.1109/tsp.2013.2241057

Abstract

1 min read

The paper develops Q D -learning, a distributed version of reinforcement Q -learning, for multi-agent Markov decision processes (MDPs); the agents have no prior information on the global state transition and on the local agent cost statistics. The network agents minimize a network-averaged infinite horizon discounted cost, by local processing and by collaborating through mutual information exchange over a sparse (possibly stochastic) communication network. The agents respond differently (depending on their instantaneous one-stage random costs) to a global controlled state and the control actions of a remote controller. When each agent is aware only of its local online cost data and the inter-agent communication network is weakly connected, we prove that Q D -learning, a consensus + innovations algorithm with mixed time-scale stochastic dynamics, converges asymptotically almost surely to the desired value function and to the optimal stationary control policy at each network agent.

${{\cal Q} {\cal D}}$-Learning: A Collaborative Distributed Strategy for Multi-Agent Reinforcement Learning Through ${\rm Consensus} + {\rm Innovations}$

Abstract

Discussion(0)

Related publications

SecBoost: Secrecy-Aware Deep Reinforcement Learning Based Energy-Efficient Scheme for 5G HetNets

On Permutation Commutative Q-Algebras with Their Ideals

Distributed ${\cal H}_{\infty}$ Consensus of Higher Order Multiagent Systems With Switching Topologies

TPGCA: Transferable Policy Generation and Credit Assignment Network for Cooperative Multiagent Reinforcement Learning

Secrecy Throughput of MANETs Under Passive and Active Attacks

Related publications

Article2023
SecBoost: Secrecy-Aware Deep Reinforcement Learning Based Energy-Efficient Scheme for 5G HetNets
Article2023

Article2023
On Permutation Commutative Q-Algebras with Their Ideals
Article2023

Article2014
Distributed ${\cal H}_{\infty}$ Consensus of Higher Order Multiagent Systems With Switching Topologies
Article2014

Article2026
TPGCA: Transferable Policy Generation and Credit Assignment Network for Cooperative Multiagent Reinforcement Learning
Article2026

Article2011
Secrecy Throughput of MANETs Under Passive and Active Attacks
Article2011