TPGCA: Transferable Policy Generation and Credit Assignment Network for Cooperative Multiagent Reinforcement Learning
Article 2026
Authors
WL
W. J. Li
JL
Jiali Lv
BH
B. Hu
Abstract
1 min read
Multiagent reinforcement learning (MARL) methods have good application performances and prospects in cooperative tasks. To improve the capability of agent policy learning in new scenarios, some methods transfer the learned policy knowledge to new scenarios. However, most methods only focus on the knowledge transfer of individual agent policies, neglecting the credit assignment among agents in cooperative tasks, which results in a transfer bias of cooperative policies. In this paper, we propose a novel method, transferable policy generation and credit assignment (TPGCA) network for cooperative MARL. TPGCA can transfer the entire MARL model by the constructed transferable <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$Q$</tex-math></inline-formula>-value network and mixing network. Specifically, in TPGCA, to enhance the effectivity and transferability of agent policies, we design the correspondence network between observations and actions (COA) on the basis of transformer and gated recurrent unit (GRU). To implement the reliable credit assignment and diminish the transfer bias, we devise the role-based joint <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$Q$</tex-math></inline-formula>-value decomposition network (RVD) that can evaluate the contributions of agents from different observation perspectives. Experimental results in various micro-management scenarios on StarCraft multiagent challenge (SMAC) and multiagent particle environment (MPE) sufficiently demonstrate the effectiveness and transferability of TPGCA.
Discussion(0)
No comments yet. Be the first to comment.