TPGCA: Transferable Policy Generation and Credit Assignment Network for Cooperative Multiagent Reinforcement Learning

W. J. Li; Jiali Lv; B. Hu; Kaizhu Huang; Aiguo Song

doi:10.1109/tcss.2025.3628971

Abstract

1 min read

Multiagent reinforcement learning (MARL) methods have good application performances and prospects in cooperative tasks. To improve the capability of agent policy learning in new scenarios, some methods transfer the learned policy knowledge to new scenarios. However, most methods only focus on the knowledge transfer of individual agent policies, neglecting the credit assignment among agents in cooperative tasks, which results in a transfer bias of cooperative policies. In this paper, we propose a novel method, transferable policy generation and credit assignment (TPGCA) network for cooperative MARL. TPGCA can transfer the entire MARL model by the constructed transferable <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$Q$</tex-math></inline-formula>-value network and mixing network. Specifically, in TPGCA, to enhance the effectivity and transferability of agent policies, we design the correspondence network between observations and actions (COA) on the basis of transformer and gated recurrent unit (GRU). To implement the reliable credit assignment and diminish the transfer bias, we devise the role-based joint <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$Q$</tex-math></inline-formula>-value decomposition network (RVD) that can evaluate the contributions of agents from different observation perspectives. Experimental results in various micro-management scenarios on StarCraft multiagent challenge (SMAC) and multiagent particle environment (MPE) sufficiently demonstrate the effectiveness and transferability of TPGCA.

TPGCA: Transferable Policy Generation and Credit Assignment Network for Cooperative Multiagent Reinforcement Learning

Abstract

Discussion(0)

Related publications

Distributed ${\cal H}_{\infty}$ Consensus of Higher Order Multiagent Systems With Switching Topologies

Cooperative Adaptive <i>H<sub>∞</sub> </i> Output Regulation of Continuous-Time Heterogeneous Multi-Agent Markov Jump Systems

Attention-based Intrinsic Reward Mixing Network for Credit Assignment in Multi-Agent Reinforcement Learning

Distributed Tracking for Discrete-Time Multiagent Networks via an Ultrafast Control Protocol

MDDP: Making Decisions From Different Perspectives in Multiagent Reinforcement Learning

Related publications

Article2014
Distributed ${\cal H}_{\infty}$ Consensus of Higher Order Multiagent Systems With Switching Topologies
Article2014

Article2021
Cooperative Adaptive <i>H<sub>∞</sub> </i> Output Regulation of Continuous-Time Heterogeneous Multi-Agent Markov Jump Systems
Article2021

Article2023
Attention-based Intrinsic Reward Mixing Network for Credit Assignment in Multi-Agent Reinforcement Learning
Article2023

Article2020
Distributed Tracking for Discrete-Time Multiagent Networks via an Ultrafast Control Protocol
Article2020

Article2023
MDDP: Making Decisions From Different Perspectives in Multiagent Reinforcement Learning
Article2023