# bgmatting_trt

**Repository Path**: sherlock_king/bgmatting_trt

## Basic Information

- **Project Name**: bgmatting_trt
- **Description**: This is a model acceleration project using TensorRT. The model we use is BackgroundMattingV2.
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2021-04-29
- **Last Updated**: 2021-11-03

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Environment
 - OS: Ubuntu 18.04 64bit
 - GPU: Nvidia Tesla T4
 - docker: 19.03.6

# How to run
## 0.Get TensorRT docker and run

     docker pull nvcr.io/nvidia/tensorrt:21.03-py3
     # workspace is /root/workspace
     docker run -d --gpus all -it -rm -v /root/workspace:/root/workspace nvcr.io/nvidia/tensorrt:21.03-py3
     docker exec -it container_id /bin/sh

## 1.Get BackgroundMattingV2 source code from github and install dependencies:

    git clone https://github.com/PeterL1n/BackgroundMattingV2.git
    cd BackgroundMattingV2
    pip install requirements.txt [-i https://pypi.douban.com/simple/]

## 2.Download pytorch model from google driver: 
Link: https://drive.google.com/drive/folders/1cbetlrKREitIgjnIikG1HdM4x72FtgBh

Using pytorch and resnet101 version.

And you can also download it from BaiduPan:

Link: https://pan.baidu.com/s/1Bt_xMOeLXJCzgIKVHWE9Sw,
Code: h422. 

Then move the model to BackgroundMattingV2/models
## 3.Export onnx model:

    python export_onnx.py --model-type=mattingbase --model-backbone=resnet101 --model-checkpoint=./models/pytorch_resnet101.pth --output=../models/resnet101_base.onnx

## 4.Convert onnx model to trt engine and save it:

    python gen_engine_from_onnx.py
    #or using trtexec
     trtexec --verbose --onnx=resnet101_base.onnx --explicitBatch --saveEngine=../models/resnet101_base_fp32_1.trt \  
    --minShapes=src:1x3x224x224,bgr:1x3x224x224 \  
    --optShapes=src:1x3x1080x1920,bgr:1x3x1080x1920 \  
    --maxShapes=src:1x3x1080x1920,bgr:1x3x1080x1920 \  
    --fp16
Default batch size is 3, mode type is fp16, resolution of input is 1080p. You can change them in gen_engine_from_onnx.py or by modifing the parameters of trtexec.

## 5.Inference

    # mode: fp16 or fp32, default value is fp16
    # batch size: default is 3
    # Engine file is saved in ./models/resnet101_base_mode_batch_size.trt(./models/resnet101_base_fp16_3.trt)
    python base_inference_images.py --mode fp16  --batch-size 3
    
Default batch size is 3. The generated images are in output direction.
If you catch the error "ImportError: libGL.so.1: cannot open shared object file: No such file or directory", you can run：

    apt-get update
    apt-get install ffmpeg libsm6 libxext6  -y 
## 5.Benchmark
### 5.1TensorRT
    # mode: fp16 or fp32, default value is fp16
    # batch size: default is 3
    python base_benchmark.py --mode fp16  --batch-size 3
	#We got this result on our machine:
	#float16:
    100%|██████████████████████████████████████████| 1000/1000 [04:02<00:00,  4.12it/s]
    #This mean 4.12 Iterator per second. In other words, each execution takes 0.243 seconds.
If you want to use different batch size, Please regenerate trt engine.
### 5.2 Pytorch
	cd BackgroundMattingV2
	python inference_speed_test.py --model-type mattingbase --model-backbone resnet101 --model-backbone-scale 0.25 --model-checkpoint "./models/pytorch_resnet101.pth" --backend pytorch --image-src "../images/src/0.jpg" --precision float16 --batch-size 3 --image-bgr "../images/bgr/0.jpg"
    #We got this result on our machine:
    #float16:
    100%|██████████████████████████████████████████| 1000/1000 [11:53<00:00,  1.40it/s]
    #This mean 1.4 Iterator per second. In other words, each execution takes 0.714 seconds.
    #float32：
    100%|██████████████████████████████████████████| 1000/1000 [18:25<00:00,  1.11s/it]
    This mean 1.11 Iterator per second. In other words, each execution takes 0.9 seconds.
## 6. Result
| Model |Precision | Batch Size |  Latency(ms) | Throughput | Latency Speedup | Throughput Speedup |
|--|--|--|--|--|--|--|
| Pytorch | fp16 | 3 | 714 | 4.2 | x | x |
| TensorRT | fp16 | 3 | 243 | 17.304 | 2.94x | x |


 - Throughput =  1000/latency*batchsize
 - Latency Speedup =  TRT latency / original latency
 - Throughput Speedup = TRT throughput / original thoughput