# imood **Repository Path**: mirrors_alibaba/imood ## Basic Information - **Project Name**: imood - **Description**: [NeurIPS 2024] Official implementation of the paper "Rethinking Out-of-Distribution Detection on Imbalanced Data Distribution" - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-10-30 - **Last Updated**: 2025-09-20 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # ImOOD: Rethinking Out-of-Distribution Detection on Imbalanced Data Distribution This is the official implementation of the [Rethinking Out-of-Distribution Detection on Imbalanced Data Distribution](https://www.arxiv.org/abs/2407.16430) paper.

## Installation This code is built on top of the [PASCL](https://github.com/amazon-science/long-tailed-ood-detection) framework, with a unified interface for customizing model designs and OOD metrics. Install the requirements and you are free to play with ImOOD. ```bash cd path/to/imood pip install -r requirements.txt ``` ## Data Preparation We suggest putting all datasets under the same folder (say `$DATA`, defauled as `./data`) to ease management and following the instructions below to organize datasets to avoid modifying the source code.
CIFAR-LT benchmarks
**In-Distribution Data**: The default script `datasets/ImbalanceCIFAR.py` will automatically download the oringial `CIFAR10/100` datasets through `torchvision`, and prepare the long-tailed version of `CIFAR10/100-LT` as the imbalanced ID data. **Out-of-Distribution Test Data**: Download the [SCOOD](https://drive.google.com/file/d/1cbLXZ39xnJjxXnDM7g2KODHIjE0Qj4gu/view?usp=sharing) test data and put it into `$DATA/SCOOD/data/`. **Out-of-Distribution Auxilliary Data**: Download the [300K_random_images.npy](https://people.eecs.berkeley.edu/~hendrycks/300K_random_images.npy) and put it into `$DATA/tinyimages80m/300K_random_images.npy` (you should create the folder ahead). The file structure looks like ``` $DATA/ |–– cifar-10-batches-py/ |–– cifar-100-python/ |–– SCOOD/data/ |–– tinyimages80M/ |–– cifar-10-python.tar.gz |–– cifar-100-python.tar.gz ```
ImageNet-LT benchmark (Optional)
**In-Distribution Data**: Create a folder named `imagenet/` under `$DATA`, download the dataset from the [official website](https://image-net.org/index.php), and extract the training and validation sets to `$DATA/imagenet/`. The default script `datasets/ImbalanceImageNet.py` will automatically prepare the long-tailed version of `ImageNet-LT` as the imbalanced ID data. **Out-of-Distribution Auxilliary/Test Data**: Execute the `datasets/ImageNet_LT/get_imagenet10k.sh` script and it will automatically download the whole [ImageNet-10K](https://image-net.org/index.php) dataset and extract the auxilliary/test OOD data for the ImageNet-LT benchmark to train or test. **Note**: it requires 700*2=1,400 GB storage and cost some time to download. The file structure looks like ``` $DATA/ |–– imagenet/ |–– train/ |–– val/ |–– extra_1k/ |–– ood_test_1k/ ```

If you have some datasets already installed somewhere else, you can create symbolic links in `$DATA/dataset_name` that point to the original data to avoid duplicate download. ## ImOOD: Training-time Regularization ### Training This repo provide a unified interface to train a OOD detector with the metric options mentioned above. Here are some examples to train a OOD detector on CIFAR10/100-LT and ImageNet-LT benchmarks. * CIFAR10/100-LT ```bash python train.py --gpu 0 --ds cifar10 --e 180 \ --drp data --srp runs \ --imbalance_ratio .01 --logit_adjust 1.0 \ --ood_metric ada_bin_disc \ --Lambda 0.5 --Lambda2 0.05 --aux_ood_loss pascl ``` * ImageNet-LT: ```bash python train.py --gpu 0,1 --ds imagenet --md ResNet50 -e 100 --opt sgd --decay multisteps --lr 0.1 --wd 5e-5 -b 192 --tb 100 \ --ddp --dist_url tcp://localhost:23457 \ --drp data --srp srp \ --imbalance_ratio .01 --logit_adjust 1.0 \ --ood_metric ada_bin_disc \ --Lambda 0.5 --Lambda2 0.05 --aux_ood_loss pascl ``` You can change `--ds` to swith on benchmarks (with different base models), and try on various OOD detectors by setting `--ood_metric` as `ada_oe` ([OE](https://github.com/hendrycks/outlier-exposure)), `ada_energy` ([Energy](https://github.com/wetliu/energy_ood)), `ada_bin_disc` ([BinDisc](https://github.com/j-cb/Breaking_Down_OOD_Detection)), and `ada_maha` ([Maha](https://github.com/pokaxpoka/deep_Mahalanobis_detector)). The detailed parameter descriptions ar as follows.
Parameter description
- `--gpu`: GPU device ID, where `0` indicates GPU 0 and `0,1` means GPU 0 and GPU 1 will be used together. - `--ds` or `--dataset`: dataset name, which can be `cifar10`, `cifar100`, and `imagenet`. - `--drp` or `--data_root_path` and `--srp` or `--save_root_path` determine the data or save paths. - `--imbalance_ratio`: imbalance ratio for the ID dataset, `0.01` as default. - `--logit_adjust`: scale factor for [logit adjustment](https://github.com/Chumsy0725/logit-adj-pytorch), `1.0` as default, and `0.0` means adjustment is not applied. - `--ood_metric`: the metric to train the OOD detector, which can be `ada_oe`, `ada_energy`, `ada_bin_disc`, and `ada_maha`, as well as the vanilla version of `oe`, `energy`, `bin_disc`, and `maha` for comparison. - `--Lambda` and `--Lambda2` determine the scaling factors for the primary loss (i.e., `ada_bin_disc`) and the auxilliary loss (i.e., `pascl`) for the OOD detection branch. `--aux_ood_loss` determines the auxilliary loss function, where `pascl` ([PASCL](https://github.com/amazon-science/long-tailed-ood-detection)) and `simclr` ([SimCLR](https://github.com/google-research/simclr)) are supported. - `--md`, `-e`, `--opt`, `--decay`, `--lr`, `--wd`, `-b`, `--tb`: hyper-parameters to train the base model (`ResNet18/ResNet50` for `cifar/imagenet` as default), and the details are displayed in `train.py`.
### Testing After training the OOD detectors with the commands provided above, you will obtain a better detector approaching the ideal balanced OOD detector on the imbalanced ID data distribution. Under this circumstance, the post-hoc normalization technique is not needed, and just test the detectors as usual. * CIFAR10/100-LT ```bash python test.py --gpu 0 --ds cifar10 --model ResNet18 \ --drp data --ood_metric bin_disc \ --ckpt_path weights/cifar10/ada_bin_disc ``` * ImageNet-LT ```bash python test.py --gpu 0 --ds imagenet --model ResNet50 --tb 200 \ --drp data --ood_metric bin_disc \ --ckpt_path weights/imagenet/ada_bin_disc ``` ## ImOOD: Post-hoc Normalization Assume you have got a baseline model trained on imbalanced ID data with whatever OOD metrics like OE, Energy, BinDisc (by using our commands for example), put them into the `weights/baseline` and run the following code to perform our post-hoc normalization technique for a cost-free improvement: * CIFAR10/100-LT ```bash python test.py --gpu 0 --ds cifar10 --model ResNet18 \ --drp data --ood_metric ada_energy \ --logit_adjust 1.0 \ --ckpt_path weights/baseline/cifar10/energy ``` * ImageNet-LT ```bash python test.py --gpu 0 --ds imagenet --model ResNet50 --tb 200 \ --drp data --ood_metric ada_energy \ --logit_adjust 1.0 \ --ckpt_path weights/baseline/imagenet/energy ``` You can easily adjust the `ood_metric` parameter according to the OOD metric that the baseline model is trained with. In this case, the prefix `ada_` indicates using our post-hoc nomarlization, and the suffix `energy` refers to the specific OOD metric. This repo temporally supports various adaptive inference metrics: `ada_msp` ([OE](https://github.com/hendrycks/outlier-exposure)/[MSP](https://github.com/hendrycks/error-detection)), `ada_energy` ([Energy](https://github.com/wetliu/energy_ood)), `ada_bin_disc` ([BinDisc](https://github.com/j-cb/Breaking_Down_OOD_Detection)), `ada_maha` ([Maha](https://github.com/pokaxpoka/deep_Mahalanobis_detector)), and `ada_gradnorm` ([GradNorm](https://github.com/brianlan/pytorch-grad-norm)). The implementation details can be found in `models/base.py`. ## Released models - [x] Please find the pre-trained weights from [Google Drive](https://drive.google.com/drive/folders/1T0HNgTcZOkq2LvCqNxlhk0kRRR7JvQC9?usp=drive_link) ## Citation If you find this repo or paper useful in your research, please kindly star this repo and cite this paper: ``` @article{liu2024imood, title={Rethinking Out-of-Distribution Detection on Imbalanced Data Distribution}, author={Liu, Kai and Fu, Zhihang and Jin, Sheng and Chen, Chao and Chen, Ze and Jiang, Rongxin and Zhou, Fan and Chen, Yaowu and Ye, Jieping}, journal={Advances in Neural Information Processing Systems}, volume={38}, year={2024} } ```