Multi agent deep reinforcement learning thesis

Reinforcement learning libraries:

  1. TF agents: extremely cool, great design, contains also multi-agent RL algos, TensorFlow integration -> Stable baseline is just better
  2. Stable Baseline: fork from OpenAI Baseline, well mantained, no MARL algos, super easy, gold standard library for RL
  3. JAX: DeepMind library to speed up training, possible integration with PyTorch and TensorFlow
  4. Unity ML-Agents -> Problem is C# that can be limitating!
  5. Unreal Engine MindMaker (uses Stable Baseline, great!)
  6. RL Coach
  7. RAY RLlib
  8. MARLlib

Coding language:

  1. C++: high performance, verbose, memory leaks, possible usage of CUDA (problem is I’m poor so I don’t have an Nvidia GPU lol, possible Colab as alternative with K80 or premium with T4 and P100 but cloud service sucks)
  2. Python: simpler, lower speed, possible integration with Numba (I still miss a decent GPU)

Game engine?

Environment:

  1. OpenAI Gym (support multi-agents env) / OpenAI Gymnasium / OpenAI Universe
  2. PettingZoo
  3. DeepMind OpenSpiel (only two-player zero-sum games) / Lab (Complete mess, no high quality doc)
  4. Unity (C# which is higher level and less flexible than C++) -> If I start with C# I can only stick with it
  5. Unreal Engine (MindMaker API to use Stable Baseline, visual programming)

IDEs/ Development environment

  1. VS Code (/ VS) -> Has Copilot
  2. PyCharm (Resources heavy IDE) -> Has Copilot
  3. Colab / Jupyter Notebook / Deepnote (Simple)

Current decision: let’s stick with OpenAI Gym and Stable baseline since I can use hardware acceleration technique and JAX. Also I’ve not yet studied the Unreal plugin MindMaker to see how well it implements Stable baseline.

Update V2: No more OpenAI Gym and Stable Baseline. Upgrade to PettingZoo and RL Coach. Using xnetwork to model the patrolling graph. Converting it then into PettingZoo env. Add JAX for JIT.

Update V3: From OpenAI Gym to PettingZoo to support multi-agent case. Also use of MARLlib to get access to multi-agent rl algorithms, problem is it cannot create agent, just distribute computation and provide algos. TF-Agents vs Stable baselines to create agents. Addition of JAX for JIT training.

Update V4: Built basic testing single agent RL case using Gym (not Gymnasium) and now the environments is complete and fully working! Also PPO algo finaly works and learns a policy. Extremely sensitive parameters are observation space and reward function. Great great job! Now doing multiple test and analyses using TensorBoard to spot eventual bugs and refine the environment.

Update V5: Basic version of PPO working using Ray! Now time to use PettingZoo to adapt the basic Gym environment to a PettingZoo one suitable for the multi-agent case.

Update V6: PettingZoo environment ready and working. Time to adjust Ray for PettingZoo env.

Update V7: moved everything to Ray RLLib and MARLlib. Now testing PPO and ITROP families.

Update V8: Now I’m finishing RL experiments and need to publish results