This project focuses on using the multi-agent deep deterministic policy gradient (MADDPG) algorithm in a novel use case — training the ghosts in the game of Pac-Man to capture Pac-Man.
- Python 3 and higher.
- To install, run
pip install -r requirements.txt - To begin training with GUI, run
python train.py --display
| Command-line option | Purpose |
|---|---|
--max-episode-len |
Maximum length of each episode (default: 100) |
--num-episodes |
Total number of training episodes (default: 200000) |
--num-adversaries |
Number of ghost agents in the environment (default: 2) |
--good-policy |
Algorithm used for Pac-Man agent (default: ddpg, options: ddpg or maddpg) |
--adv-policy |
Algorithm used for Ghost agents (default: maddpg, options: ddpg or maddpg) |
--lr |
Learning rate for Adam optimizer (default: 1e-2) |
--gamma |
Discount factor (default: 0.95) |
--batch-size |
Batch size (default: 1024) |
--save-dir |
Directory where training state and model will be saved (default: "./save_files/") |
--save-rate |
Model is saved every x episodes (default: 1000) |
--restore |
Restore training from last training checkpoint (default: False) |
--display |
Displays the GUI (default: False) |
--load-dir |
Directory where training state and model are loaded from (default: "") |
--load |
Only loads model if this is set to True (default: False) |
--load-episode |
Loads a model tagged to a particular episode (default: 0) |
--layout |
Selects the game map (default: smallClassic) |
--pacman_obs_type |
Observation space for Pac-Man agent (default: partial_obs, options: partial_obs or full_obs) |
--ghost_obs_type |
Observation space for Ghost agents (default: full_obs, options: partial_obs or full_obs) |
--partial_obs_range |
Range for partial observation space, if chosen (default: 3) e.g. 3x3, 5x5, 7x7... |
--shared_obs |
Include same features in observation spaces of both Pac-Man and Ghost agents (default: False) |
--astarSearch |
Factor step distance between Pac-Man and Ghost into reward and observation of agents (default: False) |
--astartAlpha |
Multiplier for penalizing/rewarding agents using increase/decrease in step distance (default: 1) |
This project is licensed under the MIT License - see LICENSE for details
