# SAC-QMIX **Repository Path**: majingself/SAC-QMIX ## Basic Information - **Project Name**: SAC-QMIX - **Description**: No description available - **Primary Language**: Unknown - **License**: GPL-3.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2023-12-05 - **Last Updated**: 2023-12-05 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # SAC-QMIX Algorithm that applies SAC to QMIX for Multi-Agent Reinforcement Learning. Watch the [demo](https://youtu.be/T0t-d1e7IkE) here. ## Requirements SMAC pytorch (GPU support recommanded while training) tensorboard StarCraft II For the installation of SMAC and StarCraft II, refer to the repository of [SMAC](https://github.com/oxwhirl/smac). ## Train Train a model with the following command: ```shell python main.py ``` Configurations and parameters of the training are specified in `config.json`. Models will be saved at `./models` ## Test Test a trained model with the following command: ```shell python test_model.py ``` Configurations and parameters of the testing are specified in `test_config.json`. Match the `run_name` items in `config.json` and `test_config.json`. ## Theory & Algorithm ### Architecture

### Computation Flow Note that a_i is equivalent to \mu_i and s_i is equivalent to o_i in the architecture schema above. Train Objective: policies that maximum

Q-values computed by networks:

Individual state-value functions:

Total state-values (alpha is the entropy temperature):

Q-values expressed with Bellman Function:

Critic networks update: minimum

Actor networks update: maximum

Entropy temperatures update: minimum

## Result Note that data of other algorithm are from [SMAC paper](https://github.com/oxwhirl/smac/releases/download/v1/smac_run_data.json). Therefore methods of evaluations are kept the same as [SMAC paper](https://arxiv.org/abs/1902.04043) did (StarCraftII version: SC2.4.6.2.69232). ### Test Win Rate % of SAC-QMIX and other algorithms (Mean of 5 independent runs)

| Scenario | IQL | VDN | QMIX | SAC-QMIX | | :-------: | :-: | :-: | :--: | :------: | | 2s_vs_1sc | 100 | 100 | 100 | 100 | | 2s3z | 75 | 97 | 99 | 100 | | 3s5z | 10 | 84 | 97 | 97 | | 1c3s5z | 21 | 91 | 97 | 100 | | 10m_vs_11m | 34 | 97 | 97 | 100 | | 2c_vs_64zg | 7 | 21 | 58 | 56 | |bane_vs_bane| 99 | 94 | 85 | 100 | | 5m_vs_6m | 49 | 70 | 70 | 90 | | 3s_vs_5z | 45 | 91 | 87 | 100 | |3s5z_vs_3s6z| 0 | 2 | 2 | 85 | | 6h_vs_8z | 0 | 0 | 3 | 82 | | 27m_vs_30m | 0 | 0 | 49 | 100 | | MMM2 | 0 | 1 | 69 | 95 | | corridor | 0 | 0 | 1 | 0 |

### Learning curves of SAC-QMIX and other algorithms (Mean of 5 independent runs)