Deep Reinforcement Learning-Based Multi-Agent System with Advanced Actor–Critic Framework for Complex Environment

The development of artificial intelligence (AI) game agents that use deep reinforcement learning (DRL) algorithms to process visual information for decision-making has emerged as a key research focus in both academia and industry. However, previous game agents have struggled to execute multiple commands simultaneously in a single decision, failing to accurately replicate the complex control patterns that characterize human gameplay. In this paper, we utilize the ViZDoom environment as the DRL research platform and transform the agent–environment interactions into a Partially Observable Markov Decision Process (POMDP). We introduce an advanced multi-agent deep reinforcement learning (DRL) framework, specifically a Multi-Agent Proximal Policy Optimization (MA-PPO), designed to optimize target acquisition while operating within defined ammunition and time constraints. In MA-PPO, each agent handles distinct parallel tasks with custom reward functions for performance evaluation. The agents make independent decisions while simultaneously executing multiple commands to mimic human-like gameplay behavior. Our evaluation compares MA-PPO against other DRL algorithms, showing a 30.67% performance improvement over the baseline algorithm.