# bgmatting_trt **Repository Path**: sherlock_king/bgmatting_trt ## Basic Information - **Project Name**: bgmatting_trt - **Description**: This is a model acceleration project using TensorRT. The model we use is BackgroundMattingV2. - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-04-29 - **Last Updated**: 2021-11-03 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Environment - OS: Ubuntu 18.04 64bit - GPU: Nvidia Tesla T4 - docker: 19.03.6 # How to run ## 0.Get TensorRT docker and run docker pull nvcr.io/nvidia/tensorrt:21.03-py3 # workspace is /root/workspace docker run -d --gpus all -it -rm -v /root/workspace:/root/workspace nvcr.io/nvidia/tensorrt:21.03-py3 docker exec -it container_id /bin/sh ## 1.Get BackgroundMattingV2 source code from github and install dependencies: git clone https://github.com/PeterL1n/BackgroundMattingV2.git cd BackgroundMattingV2 pip install requirements.txt [-i https://pypi.douban.com/simple/] ## 2.Download pytorch model from google driver: Link: https://drive.google.com/drive/folders/1cbetlrKREitIgjnIikG1HdM4x72FtgBh Using pytorch and resnet101 version. And you can also download it from BaiduPan: Link: https://pan.baidu.com/s/1Bt_xMOeLXJCzgIKVHWE9Sw, Code: h422. Then move the model to BackgroundMattingV2/models ## 3.Export onnx model: python export_onnx.py --model-type=mattingbase --model-backbone=resnet101 --model-checkpoint=./models/pytorch_resnet101.pth --output=../models/resnet101_base.onnx ## 4.Convert onnx model to trt engine and save it: python gen_engine_from_onnx.py #or using trtexec trtexec --verbose --onnx=resnet101_base.onnx --explicitBatch --saveEngine=../models/resnet101_base_fp32_1.trt \ --minShapes=src:1x3x224x224,bgr:1x3x224x224 \ --optShapes=src:1x3x1080x1920,bgr:1x3x1080x1920 \ --maxShapes=src:1x3x1080x1920,bgr:1x3x1080x1920 \ --fp16 Default batch size is 3, mode type is fp16, resolution of input is 1080p. You can change them in gen_engine_from_onnx.py or by modifing the parameters of trtexec. ## 5.Inference # mode: fp16 or fp32, default value is fp16 # batch size: default is 3 # Engine file is saved in ./models/resnet101_base_mode_batch_size.trt(./models/resnet101_base_fp16_3.trt) python base_inference_images.py --mode fp16 --batch-size 3 Default batch size is 3. The generated images are in output direction. If you catch the error "ImportError: libGL.so.1: cannot open shared object file: No such file or directory", you can run: apt-get update apt-get install ffmpeg libsm6 libxext6 -y ## 5.Benchmark ### 5.1TensorRT # mode: fp16 or fp32, default value is fp16 # batch size: default is 3 python base_benchmark.py --mode fp16 --batch-size 3 #We got this result on our machine: #float16: 100%|██████████████████████████████████████████| 1000/1000 [04:02<00:00, 4.12it/s] #This mean 4.12 Iterator per second. In other words, each execution takes 0.243 seconds. If you want to use different batch size, Please regenerate trt engine. ### 5.2 Pytorch cd BackgroundMattingV2 python inference_speed_test.py --model-type mattingbase --model-backbone resnet101 --model-backbone-scale 0.25 --model-checkpoint "./models/pytorch_resnet101.pth" --backend pytorch --image-src "../images/src/0.jpg" --precision float16 --batch-size 3 --image-bgr "../images/bgr/0.jpg" #We got this result on our machine: #float16: 100%|██████████████████████████████████████████| 1000/1000 [11:53<00:00, 1.40it/s] #This mean 1.4 Iterator per second. In other words, each execution takes 0.714 seconds. #float32: 100%|██████████████████████████████████████████| 1000/1000 [18:25<00:00, 1.11s/it] This mean 1.11 Iterator per second. In other words, each execution takes 0.9 seconds. ## 6. Result | Model |Precision | Batch Size | Latency(ms) | Throughput | Latency Speedup | Throughput Speedup | |--|--|--|--|--|--|--| | Pytorch | fp16 | 3 | 714 | 4.2 | x | x | | TensorRT | fp16 | 3 | 243 | 17.304 | 2.94x | x | - Throughput = 1000/latency*batchsize - Latency Speedup = TRT latency / original latency - Throughput Speedup = TRT throughput / original thoughput