MDDP: Making Decisions From Different Perspectives in Multiagent Reinforcement Learning

Wei Li; Ziming Qiu; Shitong Shao; Aiguo Song

doi:10.1109/tg.2023.3329376

Abstract

1 min read

Multi-Agent Reinforcement Learning (MARL) has made remarkable progress in recent years. However, in most MARL methods, agents share a policy or value network, which is easy to result in similar behaviors of agents and thus limits the flexibility of the method to handle complex tasks. To enhance the diversity of agent behaviors, we propose a novel method, Making Decisions from Different Perspectives (MDDP). This method enables agents to switch flexibly between different policy roles and make decisions from different perspectives, which can improve the adaptability of policy learning in complex scenarios. Specifically, in MDDP, we design a new Self-attention and Gated Recurrent Unit (GRU) based Dueling Architecture Network (SG-DAN) to estimate the individual <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Q</i> -values. SGDAN contains two components: the new Self-Attention based Role-switching network (SAR) and the capable GRU-based State value Estimation network (GSE). SAR takes charge of action advantage estimation and GSE is responsible for state value estimation. Experimental results on the challenging StarCraft II micromanagement benchmark not only verify the modeling reasonability of MDDP but also demonstrate its performance superiority over the related advanced approaches.

MDDP: Making Decisions From Different Perspectives in Multiagent Reinforcement Learning

Abstract

Discussion(0)

Related publications

TPGCA: Transferable Policy Generation and Credit Assignment Network for Cooperative Multiagent Reinforcement Learning

${{\cal Q} {\cal D}}$-Learning: A Collaborative Distributed Strategy for Multi-Agent Reinforcement Learning Through ${\rm Consensus} + {\rm Innovations}$

SecBoost: Secrecy-Aware Deep Reinforcement Learning Based Energy-Efficient Scheme for 5G HetNets

Secrecy Rate Maximization in THz-Aided Heterogeneous Networks: A Deep Reinforcement Learning Approach

Efficient Reinforcement Learning With Impaired Observability: Learning to Act With Delayed and Missing State Observations

Related publications

Article2026
TPGCA: Transferable Policy Generation and Credit Assignment Network for Cooperative Multiagent Reinforcement Learning
Article2026

Article2013
${{\cal Q} {\cal D}}$-Learning: A Collaborative Distributed Strategy for Multi-Agent Reinforcement Learning Through ${\rm Consensus} + {\rm Innovations}$
Article2013

Article2023
SecBoost: Secrecy-Aware Deep Reinforcement Learning Based Energy-Efficient Scheme for 5G HetNets
Article2023

Article2023
Secrecy Rate Maximization in THz-Aided Heterogeneous Networks: A Deep Reinforcement Learning Approach
Article2023

Article2024
Efficient Reinforcement Learning With Impaired Observability: Learning to Act With Delayed and Missing State Observations
Article2024