diff --git a/.gitignore b/.gitignore
index 3c2169494ce2366c2ff2765191c30854ee520812..ad3454f03b1660a09fa1296695ed225653329114 100644
--- a/.gitignore
+++ b/.gitignore
@@ -26,7 +26,7 @@ __pycache__/
 /lib64/
 /output/
 /inference_model/
-/dygraph/output_inference/
+/output_inference/
 /parts/
 /sdist/
 /var/
diff --git a/.travis.yml b/.travis.yml
index 91789d5f68a82eb6747901ba965abf0c0ed80298..e9841259ff23b733747c81eba76336ff283dc000 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -18,9 +18,9 @@ addons:
       - python2.7-dev
   ssh_known_hosts: 13.229.163.131
 before_install:
-  - sudo pip install -U virtualenv pre-commit pip
+  - sudo pip install -U virtualenv pre-commit pip -i https://pypi.tuna.tsinghua.edu.cn/simple
   - docker pull paddlepaddle/paddle:latest
-  - git pull https://github.com/PaddlePaddle/PaddleDetection master
+  - git pull https://github.com/PaddlePaddle/PaddleDetection release/2.0
 
 script:
   - exit_code=0
diff --git a/README.md b/README.md
deleted file mode 100644
index e67afc245270380feffe98124d5bb9a6e8a057ff..0000000000000000000000000000000000000000
--- a/README.md
+++ /dev/null
@@ -1,163 +0,0 @@
-# PaddleDetection
-
-**注意：** PaddleDetection动态图版本为试用版本，模型广度、模型性能、文档、易用性和兼容性持续优化中，性能数据待发布。
-
-
-# 简介
-
-PaddleDetection飞桨目标检测开发套件，旨在帮助开发者更快更好地完成检测模型的组建、训练、优化及部署等全开发流程。
-
-PaddleDetection模块化地实现了多种主流目标检测算法，提供了丰富的数据增强策略、网络模块组件（如骨干网络）、损失函数等，并集成了模型压缩和跨平台高性能部署能力。
-
-经过长时间产业实践打磨，PaddleDetection已拥有顺畅、卓越的使用体验，被工业质检、遥感图像检测、无人巡检、新零售、互联网、科研等十多个行业的开发者广泛应用。
-
-### 套件结构概览
-
-<table>
-  <tbody>
-    <tr align="center" valign="bottom">
-      <td>
-        <b>Architectures</b>
-      </td>
-      <td>
-        <b>Backbones</b>
-      </td>
-      <td>
-        <b>Components</b>
-      </td>
-      <td>
-        <b>Data Augmentation</b>
-      </td>
-    </tr>
-    <tr valign="top">
-      <td>
-        <ul><li><b>Two-Stage Detection</b></li>
-          <ul>
-            <li>Faster RCNN</li>
-            <li>FPN</li>
-            <li>Cascade-RCNN</li>
-            <li>PSS-Det RCNN</li>
-          </ul>
-        </ul>
-        <ul><li><b>One-Stage Detection</b></li>
-          <ul>
-            <li>YOLOv3</li>
-            <li>PP-YOLO</li>
-            <li>SSD</li>
-          </ul>
-        </ul>
-        <ul><li><b>Anchor Free</b></li>
-          <ul>
-            <li>FCOS</li>  
-            <li>TTFNet</li>
-          </ul>
-        </ul>
-        <ul>
-          <li><b>Instance Segmentation</b></li>
-            <ul>
-             <li>Mask RCNN</li>
-             <li>SOLOv2</li>
-            </ul>
-        </ul>
-        <ul>
-          <li><b>Face-Detction</b></li>
-            <ul>
-             <li>BlazeFace</li>
-            </ul>
-        </ul>
-      </td>
-      <td>
-        <ul>
-          <li>ResNet(&vd)</li>
-          <li>HRNet</li>
-          <li>DarkNet</li>
-          <li>VGG</li>
-          <li>MobileNetv1/v3</li>  
-        </ul>
-      </td>
-      <td>
-        <ul><li><b>Common</b></li>
-          <ul>
-            <li>Sync-BN</li>
-            <li>DCNv2</li>
-          </ul>  
-        </ul>
-        <ul><li><b>Loss</b></li>
-          <ul>
-            <li>Smooth-L1 Loss</li>
-            <li>IoU Loss</li>  
-            <li>IoU Aware Loss</li>
-          </ul>  
-        </ul>  
-        <ul><li><b>Post-processing</b></li>
-          <ul>
-            <li>SoftNMS</li>
-            <li>MatrixNMS</li>  
-          </ul>  
-        </ul>
-      </td>
-      <td>
-        <ul>
-          <li>Resize</li>  
-          <li>Flipping</li>  
-          <li>Expand</li>
-          <li>Crop</li>
-          <li>Color Distort</li>  
-          <li>Random Erasing</li>  
-          <li>Mixup </li>
-          <li>Cutmix </li>
-          <li>Grid Mask</li>
-          <li>Auto Augment</li>  
-        </ul>  
-      </td>  
-    </tr>
-
-</td>
-    </tr>
-  </tbody>
-</table>
-
-
-### 扩展特性
-
-- [√] **Synchronized Batch Norm**
-- [√] **Modulated Deformable Convolution**
-- [x] **Group Norm**
-- [x] **Deformable PSRoI Pooling**
-
-## 文档教程
-
-### 入门教程
-
-- [安装说明](docs/tutorials/INSTALL_cn.md)
-- [快速开始](docs/tutorials/QUICK_STARTED_cn.md)
-- [如何准备数据](docs/tutorials/PrepareDataSet.md)
-- [训练/评估/预测流程](docs/tutorials/GETTING_STARTED_cn.md)
-
-### 进阶教程
-
-- [模型压缩](configs/slim)
-- [推理部署](deploy)
-    - [模型导出教程](deploy/EXPORT_MODEL.md)
-    - [Python端推理部署](deploy/python)
-    - [C++端推理部署](deploy/cpp)
-    - [服务端部署](deploy/serving)
-
-
-## 模型库
-
-- 通用目标检测:
-    - [模型库](docs/MODEL_ZOO_cn.md)
-- 垂类领域:
-    - [行人检测](configs/pedestrian/README.md)
-    - [车辆检测](configs/vehicle/README.md)
-
-
-## 许可证书
-
-本项目的发布受[Apache 2.0 license](LICENSE)许可认证。
-
-
-## 贡献代码
-
-我们非常欢迎你可以为PaddleDetection提供代码，也十分感谢你的反馈。
diff --git a/README.md b/README.md
new file mode 120000
index 0000000000000000000000000000000000000000..13c4f964bb9063f28d6e08dfb8c6b828a81d2536
--- /dev/null
+++ b/README.md
@@ -0,0 +1 @@
+README_en.md
\ No newline at end of file
diff --git a/README_cn.md b/README_cn.md
new file mode 100644
index 0000000000000000000000000000000000000000..691ce50a85390b36df6541f8f68584a73ae8d8ee
--- /dev/null
+++ b/README_cn.md
@@ -0,0 +1,310 @@
+简体中文 | [English](README_en.md)
+
+# PaddleDetection
+
+### PaddleDetection 2.0全面升级！目前默认使用动态图版本，静态图版本位于[static](./static)中
+### 超高性价比PPYOLO v2和1.3M超轻量PPYOLO tiny全新出炉！[欢迎使用](configs/ppyolo/README_cn.md)
+### Anchor Free SOTA模型PAFNet发布！[欢迎使用](configs/ttfnet/README.md)
+# 近期活动
+
+百度飞桨产业级目标检测技术详解系列直播课，看超越YOLOv5的PP-YOLOv2到底多强大
+
+欢迎大家加入PPYOLOv2&Tiny技术交流群
+
+
+<div align="left">
+  <img src="https://z3.ax1x.com/2021/05/11/gUDw0e.png" width='150'/>
+</div>
+
+
+### 课程安排     
+ [直播链接](http://live.bilibili.com/21689802)
+* 5月13日19:00-20:00
+  -  主题: 产业级目标检测算法全解读
+* 5月14日19:00-20:00
+   - 主题: 1.3M超轻量目标检测算法解读及应用
+* 5月21日20:00-21:00
+   - 主题: 复杂背景下小目标检测模型开发实战
+
+### 学习链接
+
+ [0【PaddleDetection2.0专项】新版本快速体验](https://aistudio.baidu.com/aistudio/projectdetail/1885319)
+
+ [1【PaddleDetection2.0专项】如何自定义数据集](https://aistudio.baidu.com/aistudio/projectdetail/1917140)
+
+ [2【PaddleDetection2.0专项】快速上手PP-YOLOv2](https://aistudio.baidu.com/aistudio/projectdetail/1922155)
+
+ [3【PaddleDetection2.0专项】快速上手PP-YOLO tiny](https://aistudio.baidu.com/aistudio/projectdetail/1918450)
+
+ [4【PaddleDetection2.0专项】快速上手S2ANet](https://aistudio.baidu.com/aistudio/projectdetail/1923957)
+
+ [5【PaddleDetection2.0专项】快速实现行人检测](https://aistudio.baidu.com/aistudio/projectdetail/1918451)
+
+ [6【PaddleDetection2.0专项】快速实现人脸检测](https://aistudio.baidu.com/aistudio/projectdetail/1918453)
+
+
+# 简介
+
+PaddleDetection飞桨目标检测开发套件，旨在帮助开发者更快更好地完成检测模型的组建、训练、优化及部署等全开发流程。
+
+PaddleDetection模块化地实现了多种主流目标检测算法，提供了丰富的数据增强策略、网络模块组件（如骨干网络）、损失函数等，并集成了模型压缩和跨平台高性能部署能力。
+
+经过长时间产业实践打磨，PaddleDetection已拥有顺畅、卓越的使用体验，被工业质检、遥感图像检测、无人巡检、新零售、互联网、科研等十多个行业的开发者广泛应用。
+
+<div align="center">
+  <img src="static/docs/images/football.gif" width='800'/>
+</div>
+
+### 产品动态
+- 2021.04.14: 发布release/2.0版本，PaddleDetection全面支持动态图，覆盖静态图模型算法，全面升级模型效果，同时发布[PP-YOLO v2, PPYOLO tiny](configs/ppyolo/README_cn.md)模型，增强版anchor free模型[PAFNet](configs/ttfnet/README.md)，新增旋转框检测[S2ANet](configs/dota/README.md)模型，详情参考[PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0)
+- 2021.02.07: 发布release/2.0-rc版本，PaddleDetection动态图试用版本，详情参考[PaddleDetection动态图](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0-rc)。
+
+### 特性
+
+- **模型丰富**: 包含**目标检测**、**实例分割**、**人脸检测**等**100+个预训练模型**，涵盖多种**全球竞赛冠军**方案
+- **使用简洁**：模块化设计，解耦各个网络组件，开发者轻松搭建、试用各种检测模型及优化策略，快速得到高性能、定制化的算法。
+- **端到端打通**: 从数据增强、组网、训练、压缩、部署端到端打通，并完备支持**云端**/**边缘端**多架构、多设备部署。
+- **高性能**: 基于飞桨的高性能内核，模型训练速度及显存占用优势明显。支持FP16训练, 支持多机训练。
+
+
+### 套件结构概览
+
+<table>
+  <tbody>
+    <tr align="center" valign="bottom">
+      <td>
+        <b>Architectures</b>
+      </td>
+      <td>
+        <b>Backbones</b>
+      </td>
+      <td>
+        <b>Components</b>
+      </td>
+      <td>
+        <b>Data Augmentation</b>
+      </td>
+    </tr>
+    <tr valign="top">
+      <td>
+        <ul><li><b>Two-Stage Detection</b></li>
+          <ul>
+            <li>Faster RCNN</li>
+            <li>FPN</li>
+            <li>Cascade-RCNN</li>
+            <li>Libra RCNN</li>
+            <li>Hybrid Task RCNN</li>
+            <li>PSS-Det</li>
+          </ul>
+        </ul>
+        <ul><li><b>One-Stage Detection</b></li>
+          <ul>
+            <li>RetinaNet</li>
+            <li>YOLOv3</li>
+            <li>YOLOv4</li>  
+            <li>PP-YOLO</li>
+            <li>SSD</li>
+          </ul>
+        </ul>
+        <ul><li><b>Anchor Free</b></li>
+          <ul>
+            <li>CornerNet-Squeeze</li>
+            <li>FCOS</li>  
+            <li>TTFNet</li>
+          </ul>
+        </ul>
+        <ul>
+          <li><b>Instance Segmentation</b></li>
+            <ul>
+             <li>Mask RCNN</li>
+             <li>SOLOv2</li>
+            </ul>
+        </ul>
+        <ul>
+          <li><b>Face-Detction</b></li>
+            <ul>
+             <li>FaceBoxes</li>
+             <li>BlazeFace</li>
+             <li>BlazeFace-NAS</li>
+            </ul>
+        </ul>
+      </td>
+      <td>
+        <ul>
+          <li>ResNet(&vd)</li>
+          <li>ResNeXt(&vd)</li>
+          <li>SENet</li>
+          <li>Res2Net</li>
+          <li>HRNet</li>
+          <li>Hourglass</li>
+          <li>CBNet</li>
+          <li>GCNet</li>
+          <li>DarkNet</li>
+          <li>CSPDarkNet</li>
+          <li>VGG</li>
+          <li>MobileNetv1/v3</li>  
+          <li>GhostNet</li>
+          <li>Efficientnet</li>  
+        </ul>
+      </td>
+      <td>
+        <ul><li><b>Common</b></li>
+          <ul>
+            <li>Sync-BN</li>
+            <li>Group Norm</li>
+            <li>DCNv2</li>
+            <li>Non-local</li>
+          </ul>  
+        </ul>
+        <ul><li><b>FPN</b></li>
+          <ul>
+            <li>BiFPN</li>
+            <li>BFP</li>  
+            <li>HRFPN</li>
+            <li>ACFPN</li>
+          </ul>  
+        </ul>  
+        <ul><li><b>Loss</b></li>
+          <ul>
+            <li>Smooth-L1</li>
+            <li>GIoU/DIoU/CIoU</li>  
+            <li>IoUAware</li>
+          </ul>  
+        </ul>  
+        <ul><li><b>Post-processing</b></li>
+          <ul>
+            <li>SoftNMS</li>
+            <li>MatrixNMS</li>  
+          </ul>  
+        </ul>
+        <ul><li><b>Speed</b></li>
+          <ul>
+            <li>FP16 training</li>
+            <li>Multi-machine training </li>  
+          </ul>  
+        </ul>  
+      </td>
+      <td>
+        <ul>
+          <li>Resize</li>  
+          <li>Flipping</li>  
+          <li>Expand</li>
+          <li>Crop</li>
+          <li>Color Distort</li>  
+          <li>Random Erasing</li>  
+          <li>Mixup </li>
+          <li>Cutmix </li>
+          <li>Grid Mask</li>
+          <li>Auto Augment</li>  
+        </ul>  
+      </td>  
+    </tr>
+
+
+</td>
+    </tr>
+  </tbody>
+</table>
+
+#### 模型性能概览
+
+各模型结构和骨干网络的代表模型在COCO数据集上精度mAP和单卡Tesla V100上预测速度(FPS)对比图。
+
+<div align="center">
+  <img src="docs/images/fps_map.png" />
+</div>
+
+**说明：**
+
+- `CBResNet`为`Cascade-Faster-RCNN-CBResNet200vd-FPN`模型，COCO数据集mAP高达53.3%
+- `Cascade-Faster-RCNN`为`Cascade-Faster-RCNN-ResNet50vd-DCN`，PaddleDetection将其优化到COCO数据mAP为47.8%时推理速度为20FPS
+- `PP-YOLO`在COCO数据集精度45.9%，Tesla V100预测速度72.9FPS，精度速度均优于[YOLOv4](https://arxiv.org/abs/2004.10934)
+- `PP-YOLO v2`是对`PP-YOLO`模型的进一步优化，在COCO数据集精度49.5%，Tesla V100预测速度68.9FPS
+- 图中模型均可在[模型库](#模型库)中获取
+
+## 文档教程
+
+### 入门教程
+
+- [安装说明](docs/tutorials/INSTALL_cn.md)
+- [快速开始](docs/tutorials/QUICK_STARTED_cn.md)
+- [如何准备数据](docs/tutorials/PrepareDataSet.md)
+- [训练/评估/预测流程](docs/tutorials/GETTING_STARTED_cn.md)
+
+### 进阶教程
+
+- 参数配置
+    - [RCNN参数说明](docs/tutorials/config_annotation/faster_rcnn_r50_fpn_1x_coco_annotation.md)
+    - [PP-YOLO参数说明](docs/tutorials/config_annotation/ppyolo_r50vd_dcn_1x_coco_annotation.md)
+- 模型压缩(基于[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim))
+    - [剪裁/量化/蒸馏教程](configs/slim)
+- [推理部署](deploy/README.md)
+    - [模型导出教程](deploy/EXPORT_MODEL.md)
+    - [Python端推理部署](deploy/python)
+    - [C++端推理部署](deploy/cpp)
+    - [服务端部署](deploy/serving)
+    - [推理benchmark](deploy/BENCHMARK_INFER.md)
+- 进阶开发
+    - [数据处理模块](docs/advanced_tutorials/READER.md)
+    - [新增检测模型](docs/advanced_tutorials/MODEL_TECHNICAL.md)
+
+
+## 模型库
+
+- 通用目标检测:
+    - [模型库](docs/MODEL_ZOO_cn.md)
+    - [PP-YOLO模型](configs/ppyolo/README_cn.md)
+    - [增强版Anchor Free模型TTFNet](configs/ttfnet/README.md)
+    - [移动端模型](static/configs/mobile/README.md)
+    - [676类目标检测](static/docs/featured_model/LARGE_SCALE_DET_MODEL.md)
+    - [两阶段实用模型PSS-Det](configs/rcnn_enhance/README.md)
+    - [半监督知识蒸馏预训练检测模型](docs/feature_models/SSLD_PRETRAINED_MODEL.md)
+- 通用实例分割
+    - [SOLOv2](configs/solov2/README.md)
+- 旋转框检测
+    - [S2ANet](configs/dota/README.md)
+- 垂类领域
+    - [行人检测](configs/pedestrian/README.md)
+    - [车辆检测](configs/vehicle/README.md)
+    - [人脸检测](configs/face_detection/README.md)
+- 比赛冠军方案
+    - [Objects365 2019 Challenge夺冠模型](static/docs/featured_model/champion_model/CACascadeRCNN.md)
+    - [Open Images 2019-Object Detction比赛最佳单模型](static/docs/featured_model/champion_model/OIDV5_BASELINE_MODEL.md)
+
+## 应用案例
+
+- [人像圣诞特效自动生成工具](static/application/christmas)
+
+## 第三方教程推荐
+
+- [PaddleDetection在Windows下的部署(一)](https://zhuanlan.zhihu.com/p/268657833)
+- [PaddleDetection在Windows下的部署(二)](https://zhuanlan.zhihu.com/p/280206376)
+- [Jetson Nano上部署PaddleDetection经验分享](https://zhuanlan.zhihu.com/p/319371293)
+- [安全帽检测YOLOv3模型在树莓派上的部署](https://github.com/PaddleCV-FAQ/PaddleDetection-FAQ/blob/main/Lite%E9%83%A8%E7%BD%B2/yolov3_for_raspi.md)
+- [使用SSD-MobileNetv1完成一个项目--准备数据集到完成树莓派部署](https://github.com/PaddleCV-FAQ/PaddleDetection-FAQ/blob/main/Lite%E9%83%A8%E7%BD%B2/ssd_mobilenet_v1_for_raspi.md)
+
+## 版本更新
+
+v2.0版本已经在`04/2021`发布，全面支持动态图版本，新增支持BlazeFace, PSSDet等系列模型和大量骨干网络，发布PP-YOLO v2, PP-YOLO tiny和旋转框检测S2ANet模型。支持模型蒸馏、VisualDL，新增动态图预测部署benchmark，详细内容请参考[版本更新文档](docs/CHANGELOG.md)。
+
+
+## 许可证书
+
+本项目的发布受[Apache 2.0 license](LICENSE)许可认证。
+
+
+## 贡献代码
+
+我们非常欢迎你可以为PaddleDetection提供代码，也十分感谢你的反馈。
+
+
+## 引用
+
+```
+@misc{ppdet2019,
+title={PaddleDetection, Object detection and instance segmentation toolkit based on PaddlePaddle.},
+author={PaddlePaddle Authors},
+howpublished = {\url{https://github.com/PaddlePaddle/PaddleDetection}},
+year={2019}
+}
+```
diff --git a/README_en.md b/README_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..5d51c63f7adb40b7bcb965b1ec6aee1067a12204
--- /dev/null
+++ b/README_en.md
@@ -0,0 +1,280 @@
+English | [简体中文](README_cn.md)
+
+### PaddleDetection 2.0 is ready! Dygraph mode is set by default and static graph code base is [here](static)
+
+### Highly effective PPYOLO v2 and ultra lightweight PPYOLO tiny are released! [link](configs/ppyolo/README.md)
+
+### SOTA Anchor Free model -- PAFNet is released! [link](configs/ttfnet/README.md)
+
+# Introduction
+
+PaddleDetection is an end-to-end object detection development kit based on PaddlePaddle, which aims to help developers in the whole development of constructing, training, optimizing and deploying detection models in a faster and better way.
+
+PaddleDetection implements varied mainstream object detection algorithms in modular design, and provides wealthy data augmentation methods, network components(such as backbones), loss functions, etc., and integrates abilities of model compression and cross-platform high-performance deployment.
+
+After a long time of industry practice polishing, PaddleDetection has had smooth and excellent user experience, it has been widely used by developers in more than ten industries such as industrial quality inspection, remote sensing image object detection, automatic inspection, new retail, Internet, and scientific research.
+
+<div align="center">
+  <img src="static/docs/images/football.gif" width='800'/>
+</div>
+
+### Product news
+
+- 2021.04.14: Release `release/2.0` version. Dygraph mode in PaddleDetection is fully supported. Cover all the algorithm of static graph and update the performance of mainstream detection models. Release [`PP-YOLO v2` and `PP-YOLO tiny`](configs/ppyolo/README.md), enhanced anchor free model [PAFNet](configs/ttfnet/README.md) and [`S2ANet`](configs/dota/README.md) which is aimed at rotation object detection.Please refer to [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0) for details.
+- 2020.02.07: Release `release/2.0-rc` version, Please refer to [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0-rc) for details.
+
+
+### Features
+
+- **Rich Models**
+PaddleDetection provides rich of models, including **100+ pre-trained models** such as **object detection**, **instance segmentation**, **face detection** etc. It covers a variety of **global competition champion** schemes.
+
+- **Highly Flexible:**
+Components are designed to be modular. Model architectures, as well as data preprocess pipelines and optimization strategies, can be easily customized with simple configuration changes.
+
+- **Production Ready:**
+From data augmentation, constructing models, training, compression, depolyment, get through end to end, and complete support for multi-architecture, multi-device deployment for **cloud and edge device**.
+
+- **High Performance:**
+Based on the high performance core of PaddlePaddle, advantages of training speed and memory occupation are obvious. FP16 training and multi-machine training are supported as well.
+
+#### Overview of Kit Structures
+
+<table>
+  <tbody>
+    <tr align="center" valign="bottom">
+      <td>
+        <b>Architectures</b>
+      </td>
+      <td>
+        <b>Backbones</b>
+      </td>
+      <td>
+        <b>Components</b>
+      </td>
+      <td>
+        <b>Data Augmentation</b>
+      </td>
+    </tr>
+    <tr valign="top">
+      <td>
+        <ul><li><b>Two-Stage Detection</b></li>
+          <ul>
+            <li>Faster RCNN</li>
+            <li>FPN</li>
+            <li>Cascade-RCNN</li>
+            <li>Libra RCNN</li>
+            <li>Hybrid Task RCNN</li>
+            <li>PSS-Det RCNN</li>
+          </ul>
+        </ul>
+        <ul><li><b>One-Stage Detection</b></li>
+          <ul>
+            <li>RetinaNet</li>
+            <li>YOLOv3</li>
+            <li>YOLOv4</li>  
+            <li>PP-YOLO</li>
+            <li>SSD</li>
+          </ul>
+        </ul>
+        <ul><li><b>Anchor Free</b></li>
+          <ul>
+            <li>CornerNet-Squeeze</li>
+            <li>FCOS</li>  
+            <li>TTFNet</li>
+          </ul>
+        </ul>
+        <ul>
+          <li><b>Instance Segmentation</b></li>
+            <ul>
+             <li>Mask RCNN</li>
+             <li>SOLOv2</li>
+            </ul>
+        </ul>
+        <ul>
+          <li><b>Face-Detction</b></li>
+            <ul>
+             <li>FaceBoxes</li>
+             <li>BlazeFace</li>
+             <li>BlazeFace-NAS</li>
+            </ul>
+        </ul>
+      </td>
+      <td>
+        <ul>
+          <li>ResNet(&vd)</li>
+          <li>ResNeXt(&vd)</li>
+          <li>SENet</li>
+          <li>Res2Net</li>
+          <li>HRNet</li>
+          <li>Hourglass</li>
+          <li>CBNet</li>
+          <li>GCNet</li>
+          <li>DarkNet</li>
+          <li>CSPDarkNet</li>
+          <li>VGG</li>
+          <li>MobileNetv1/v3</li>  
+          <li>GhostNet</li>
+          <li>Efficientnet</li>  
+        </ul>
+      </td>
+      <td>
+        <ul><li><b>Common</b></li>
+          <ul>
+            <li>Sync-BN</li>
+            <li>Group Norm</li>
+            <li>DCNv2</li>
+            <li>Non-local</li>
+          </ul>  
+        </ul>
+        <ul><li><b>FPN</b></li>
+          <ul>
+            <li>BiFPN</li>
+            <li>BFP</li>  
+            <li>HRFPN</li>
+            <li>ACFPN</li>
+          </ul>  
+        </ul>  
+        <ul><li><b>Loss</b></li>
+          <ul>
+            <li>Smooth-L1</li>
+            <li>GIoU/DIoU/CIoU</li>  
+            <li>IoUAware</li>
+          </ul>  
+        </ul>  
+        <ul><li><b>Post-processing</b></li>
+          <ul>
+            <li>SoftNMS</li>
+            <li>MatrixNMS</li>  
+          </ul>  
+        </ul>
+        <ul><li><b>Speed</b></li>
+          <ul>
+            <li>FP16 training</li>
+            <li>Multi-machine training </li>  
+          </ul>  
+        </ul>  
+      </td>
+      <td>
+        <ul>
+          <li>Resize</li>  
+          <li>Flipping</li>  
+          <li>Expand</li>
+          <li>Crop</li>
+          <li>Color Distort</li>  
+          <li>Random Erasing</li>  
+          <li>Mixup </li>
+          <li>Cutmix </li>
+          <li>Grid Mask</li>
+          <li>Auto Augment</li>  
+        </ul>  
+      </td>  
+    </tr>
+
+
+</td>
+    </tr>
+  </tbody>
+</table>
+
+#### Overview of Model Performance
+The relationship between COCO mAP and FPS on Tesla V100 of representative models of each architectures and backbones.
+
+<div align="center">
+  <img src="docs/images/fps_map.png" />
+</div>
+
+**NOTE:**
+
+- `CBResNet stands` for `Cascade-Faster-RCNN-CBResNet200vd-FPN`, which has highest mAP on COCO as 53.3%
+
+- `Cascade-Faster-RCNN` stands for `Cascade-Faster-RCNN-ResNet50vd-DCN`, which has been optimized to 20 FPS inference speed when COCO mAP as 47.8% in PaddleDetection models
+
+- `PP-YOLO` achieves mAP of 45.9% on COCO and 72.9FPS on Tesla V100. Both precision and speed surpass [YOLOv4](https://arxiv.org/abs/2004.10934)
+
+- `PP-YOLO v2` is optimized version of `PP-YOLO` which has mAP of 49.5% and 68.9FPS on Tesla V100
+
+- All these models can be get in [Model Zoo](#ModelZoo)
+
+
+## Tutorials
+
+### Get Started
+
+- [Installation guide](docs/tutorials/INSTALL_en.md)
+- [Quick start on small dataset](docs/tutorials/QUICK_STARTED_en.md)
+- [Prepare dataset](docs/tutorials/PrepareDataSet.md)
+- [Train/Evaluation/Inference/Deploy](docs/tutorials/GETTING_STARTED_en.md)
+
+
+### Advanced Tutorials
+
+- Parameter configuration
+  - [Parameter configuration for RCNN model](docs/tutorials/config_annotation/faster_rcnn_r50_fpn_1x_coco_annotation.md)
+  - [Parameter configuration for PP-YOLO model](docs/tutorials/config_annotation/ppyolo_r50vd_dcn_1x_coco_annotation.md)
+
+- Model Compression(Based on [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim))
+  - [Prune/Quant/Distill](configs/slim)
+
+- Inference and deployment
+  - [Export model for inference](deploy/EXPORT_MODEL.md)
+  - [Python inference](deploy/python)
+  - [C++ inference](deploy/cpp)
+  - [Serving](deploy/serving)
+  - [Inference benchmark](deploy/BENCHMARK_INFER.md)
+
+- Advanced development
+  - [New data augmentations](docs/advanced_tutorials/READER.md)
+  - [New detection algorithms](docs/advanced_tutorials/MODEL_TECHNICAL.md)
+
+
+## Model Zoo
+
+- Universal object detection
+  - [Model library and baselines](docs/MODEL_ZOO_cn.md)
+  - [PP-YOLO](configs/ppyolo/README.md)
+  - [Enhanced Anchor Free model--TTFNet](configs/ttfnet/README.md)
+  - [Mobile models](static/configs/mobile/README.md)
+  - [676 classes of object detection](static/docs/featured_model/LARGE_SCALE_DET_MODEL.md)
+  - [Two-stage practical PSS-Det](configs/rcnn_enhance/README.md)
+  - [SSLD pretrained models](docs/feature_models/SSLD_PRETRAINED_MODEL_en.md)
+- Universal instance segmentation
+  - [SOLOv2](configs/solov2/README.md)
+- Rotation object detection
+  - [S2ANet](configs/dota/README.md)
+- Vertical field
+  - [Face detection](configs/face_detection/README.md)
+  - [Pedestrian detection](configs/pedestrian/README.md)
+  - [Vehicle detection](configs/vehicle/README.md)
+- Competition Plan
+  - [Objects365 2019 Challenge champion model](static/docs/featured_model/champion_model/CACascadeRCNN.md)
+  - [Best single model of Open Images 2019-Object Detction](static/docs/featured_model/champion_model/OIDV5_BASELINE_MODEL.md)
+
+## Applications
+
+- [Christmas portrait automatic generation tool](static/application/christmas)
+
+## Updates
+
+v2.0 was released at `04/2021`, fully support dygraph version, which add BlazeFace, PSS-Det and plenty backbones, release `PP-YOLOv2`, `PP-YOLO tiny` and `S2ANet`, support model distillation and VisualDL, add inference benchmark, etc. Please refer to [change log](docs/CHANGELOG.md) for details.
+
+
+## License
+
+PaddleDetection is released under the [Apache 2.0 license](LICENSE).
+
+
+## Contributing
+
+Contributions are highly welcomed and we would really appreciate your feedback!!
+
+## Citation
+
+```
+@misc{ppdet2019,
+title={PaddleDetection, Object detection and instance segmentation toolkit based on PaddlePaddle.},
+author={PaddlePaddle Authors},
+howpublished = {\url{https://github.com/PaddlePaddle/PaddleDetection}},
+year={2019}
+}
+```
diff --git a/configs/cascade_rcnn/README.md b/configs/cascade_rcnn/README.md
index 887f706fd115ec839f8eef1ca4c817a4806e4db9..d93ec4f31f66e35d3ce3b84db53b11f45848877d 100644
--- a/configs/cascade_rcnn/README.md
+++ b/configs/cascade_rcnn/README.md
@@ -4,10 +4,13 @@
 
 | 骨架网络             | 网络类型       | 每张GPU图片个数 | 学习率策略 |推理时间(fps) | Box AP | Mask AP |                           下载                          | 配置文件 |
 | :------------------- | :------------- | :-----: | :-----: | :------------: | :-----: | :-----: | :-----------------------------------------------------: | :-----: |
-| ResNet50-FPN         | Cascade Faster         |    1    |   1x    |     ----     |  41.1  |    -    | [下载链接](https://paddledet.bj.bcebos.com/models/cascade_rcnn_r50_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/cascade_rcnn/cascade_rcnn_r50_fpn_1x_coco.yml) |
-| ResNet50-FPN         | Cascade Mask         |    1    |   1x    |     ----     |  41.8  |    36.3    | [下载链接](https://paddledet.bj.bcebos.com/models/cascade_mask_rcnn_r50_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/cascade_rcnn/cascade_mask_rcnn_r50_fpn_1x_coco.yml) |
+| ResNet50-FPN         | Cascade Faster         |    1    |   1x    |     ----     |  41.1  |    -    | [下载链接](https://paddledet.bj.bcebos.com/models/cascade_rcnn_r50_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/cascade_rcnn/cascade_rcnn_r50_fpn_1x_coco.yml) |
+| ResNet50-FPN         | Cascade Mask         |    1    |   1x    |     ----     |  41.8  |    36.3    | [下载链接](https://paddledet.bj.bcebos.com/models/cascade_mask_rcnn_r50_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/cascade_rcnn/cascade_mask_rcnn_r50_fpn_1x_coco.yml) |
+| ResNet50-vd-SSLDv2-FPN | Cascade Faster         |    1    |   1x    |     ----     |  44.4  |    -    | [下载链接](https://paddledet.bj.bcebos.com/models/cascade_rcnn_r50_vd_fpn_ssld_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/cascade_rcnn/cascade_rcnn_r50_vd_fpn_ssld_1x_coco.yml) |
+| ResNet50-vd-SSLDv2-FPN | Cascade Faster         |    1    |   2x    |     ----     |  45.0  |    -    | [下载链接](https://paddledet.bj.bcebos.com/models/cascade_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/cascade_rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_coco.yml) |
+| ResNet50-vd-SSLDv2-FPN | Cascade Mask         |    1    |   1x    |     ----     |  44.9  |    39.1    | [下载链接](https://paddledet.bj.bcebos.com/models/cascade_mask_rcnn_r50_vd_fpn_ssld_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/cascade_rcnn/cascade_mask_rcnn_r50_vd_fpn_ssld_1x_coco.yml) |
+| ResNet50-vd-SSLDv2-FPN | Cascade Mask         |    1    |   2x    |     ----     |  45.7  |    39.7    | [下载链接](https://paddledet.bj.bcebos.com/models/cascade_mask_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/cascade_rcnn/cascade_mask_rcnn_r50_vd_fpn_ssld_2x_coco.yml) |
 
-**注意：** Cascade R-CNN模型精度依赖Paddle develop分支修改，精度复现须使用[每日版本](https://www.paddlepaddle.org.cn/documentation/docs/zh/install/Tables.html#whl-dev)或2.0.1版本(将于2021.03发布)，使用Paddle 2.0.0版本会有少量精度损失。
 
 ## Citations
 ```
diff --git a/configs/cascade_rcnn/_base_/cascade_mask_rcnn_r50_fpn.yml b/configs/cascade_rcnn/_base_/cascade_mask_rcnn_r50_fpn.yml
index 8fef452e859ebf8171ff5bbc6d5e2a3d85929221..ea2937babd488b1e874f75494093d942366315e5 100644
--- a/configs/cascade_rcnn/_base_/cascade_mask_rcnn_r50_fpn.yml
+++ b/configs/cascade_rcnn/_base_/cascade_mask_rcnn_r50_fpn.yml
@@ -64,7 +64,7 @@ BBoxAssigner:
   use_random: True
 
 CascadeTwoFCHead:
-  mlp_dim: 1024
+  out_channel: 1024
 
 BBoxPostProcess:
   decode:
@@ -88,7 +88,7 @@ MaskHead:
 
 MaskFeat:
   num_convs: 4
-  out_channels: 256
+  out_channel: 256
 
 MaskAssigner:
   mask_resolution: 28
diff --git a/configs/cascade_rcnn/_base_/cascade_rcnn_r50_fpn.yml b/configs/cascade_rcnn/_base_/cascade_rcnn_r50_fpn.yml
index 51905687ebbd16413504667493bce81d4b211e6a..c5afe774347209812ed759e31fb03e5aff677d96 100644
--- a/configs/cascade_rcnn/_base_/cascade_rcnn_r50_fpn.yml
+++ b/configs/cascade_rcnn/_base_/cascade_rcnn_r50_fpn.yml
@@ -62,7 +62,7 @@ BBoxAssigner:
   use_random: True
 
 CascadeTwoFCHead:
-  mlp_dim: 1024
+  out_channel: 1024
 
 BBoxPostProcess:
   decode:
diff --git a/configs/cascade_rcnn/cascade_mask_rcnn_r50_vd_fpn_ssld_1x_coco.yml b/configs/cascade_rcnn/cascade_mask_rcnn_r50_vd_fpn_ssld_1x_coco.yml
new file mode 100644
index 0000000000000000000000000000000000000000..0ab507caa9548e9118aeafb32f5c7394409601c8
--- /dev/null
+++ b/configs/cascade_rcnn/cascade_mask_rcnn_r50_vd_fpn_ssld_1x_coco.yml
@@ -0,0 +1,18 @@
+_BASE_: [
+  '../datasets/coco_instance.yml',
+  '../runtime.yml',
+  '_base_/optimizer_1x.yml',
+  '_base_/cascade_mask_rcnn_r50_fpn.yml',
+  '_base_/cascade_mask_fpn_reader.yml',
+]
+pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_v2_pretrained.pdparams
+weights: output/cascade_mask_rcnn_r50_vd_fpn_ssld_1x_coco/model_final
+
+ResNet:
+  depth: 50
+  variant: d
+  norm_type: bn
+  freeze_at: 0
+  return_idx: [0,1,2,3]
+  num_stages: 4
+  lr_mult_list: [0.05, 0.05, 0.1, 0.15]
diff --git a/configs/cascade_rcnn/cascade_mask_rcnn_r50_vd_fpn_ssld_2x_coco.yml b/configs/cascade_rcnn/cascade_mask_rcnn_r50_vd_fpn_ssld_2x_coco.yml
new file mode 100644
index 0000000000000000000000000000000000000000..736ba2e7430717781364343312716d5b3f2ef4aa
--- /dev/null
+++ b/configs/cascade_rcnn/cascade_mask_rcnn_r50_vd_fpn_ssld_2x_coco.yml
@@ -0,0 +1,29 @@
+_BASE_: [
+  '../datasets/coco_instance.yml',
+  '../runtime.yml',
+  '_base_/optimizer_1x.yml',
+  '_base_/cascade_mask_rcnn_r50_fpn.yml',
+  '_base_/cascade_mask_fpn_reader.yml',
+]
+pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_v2_pretrained.pdparams
+weights: output/cascade_mask_rcnn_r50_vd_fpn_ssld_2x_coco/model_final
+
+ResNet:
+  depth: 50
+  variant: d
+  norm_type: bn
+  freeze_at: 0
+  return_idx: [0,1,2,3]
+  num_stages: 4
+  lr_mult_list: [0.05, 0.05, 0.1, 0.15]
+
+epoch: 24
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [12, 22]
+  - !LinearWarmup
+    start_factor: 0.1
+    steps: 1000
diff --git a/configs/cascade_rcnn/cascade_rcnn_r50_vd_fpn_ssld_1x_coco.yml b/configs/cascade_rcnn/cascade_rcnn_r50_vd_fpn_ssld_1x_coco.yml
new file mode 100644
index 0000000000000000000000000000000000000000..905adbd61a5b2b5213d737d5ad2a49df650c8425
--- /dev/null
+++ b/configs/cascade_rcnn/cascade_rcnn_r50_vd_fpn_ssld_1x_coco.yml
@@ -0,0 +1,18 @@
+_BASE_: [
+  '../datasets/coco_detection.yml',
+  '../runtime.yml',
+  '_base_/optimizer_1x.yml',
+  '_base_/cascade_rcnn_r50_fpn.yml',
+  '_base_/cascade_fpn_reader.yml',
+]
+pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_v2_pretrained.pdparams
+weights: output/cascade_rcnn_r50_vd_fpn_ssld_1x_coco/model_final
+
+ResNet:
+  depth: 50
+  variant: d
+  norm_type: bn
+  freeze_at: 0
+  return_idx: [0,1,2,3]
+  num_stages: 4
+  lr_mult_list: [0.05, 0.05, 0.1, 0.15]
diff --git a/configs/cascade_rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_coco.yml b/configs/cascade_rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_coco.yml
new file mode 100644
index 0000000000000000000000000000000000000000..a6272145d03bb273c90ccf8d950c8b88f9b3e13b
--- /dev/null
+++ b/configs/cascade_rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_coco.yml
@@ -0,0 +1,29 @@
+_BASE_: [
+  '../datasets/coco_detection.yml',
+  '../runtime.yml',
+  '_base_/optimizer_1x.yml',
+  '_base_/cascade_rcnn_r50_fpn.yml',
+  '_base_/cascade_fpn_reader.yml',
+]
+pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_v2_pretrained.pdparams
+weights: output/cascade_rcnn_r50_vd_fpn_ssld_2x_coco/model_final
+
+ResNet:
+  depth: 50
+  variant: d
+  norm_type: bn
+  freeze_at: 0
+  return_idx: [0,1,2,3]
+  num_stages: 4
+  lr_mult_list: [0.05, 0.05, 0.1, 0.15]
+
+epoch: 24
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [12, 22]
+  - !LinearWarmup
+    start_factor: 0.1
+    steps: 1000
diff --git a/configs/datasets/dota.yml b/configs/datasets/dota.yml
new file mode 100644
index 0000000000000000000000000000000000000000..2953a79944c3953113f67c91fc5025afb1d18390
--- /dev/null
+++ b/configs/datasets/dota.yml
@@ -0,0 +1,20 @@
+metric: COCO
+num_classes: 15
+
+TrainDataset:
+  !COCODataSet
+    image_dir: trainval_split/images
+    anno_path: trainval_split/s2anet_trainval_paddle_coco.json
+    dataset_dir: dataset/DOTA_1024_s2anet
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd', 'gt_rbox']
+
+EvalDataset:
+  !COCODataSet
+    image_dir: trainval_split/images
+    anno_path: trainval_split/s2anet_trainval_paddle_coco.json
+    dataset_dir: dataset/DOTA_1024_s2anet/
+
+TestDataset:
+  !ImageFolder
+    anno_path: trainval_split/s2anet_trainval_paddle_coco.json
+    dataset_dir: dataset/DOTA_1024_s2anet/
diff --git a/configs/datasets/roadsign_voc.yml b/configs/datasets/roadsign_voc.yml
index 10ce3090ed067a0387c32d384c13e4fd987cb1c4..ddbfc7889e0027d85971c6ab11f3f33adfe8be71 100644
--- a/configs/datasets/roadsign_voc.yml
+++ b/configs/datasets/roadsign_voc.yml
@@ -1,5 +1,5 @@
 metric: VOC
-map_type: 11point
+map_type: integral
 num_classes: 4
 
 TrainDataset:
diff --git a/configs/dcn/README.md b/configs/dcn/README.md
index 499b9cc0a016bdf5cc3a9bb7d621e60c8b2c0914..9c8613f85d25413270915bb06e6cd175cc05ac7e 100644
--- a/configs/dcn/README.md
+++ b/configs/dcn/README.md
@@ -2,17 +2,17 @@
 
 | 骨架网络             | 网络类型           | 卷积    | 每张GPU图片个数 | 学习率策略 |推理时间(fps)| Box AP | Mask AP |                           下载                           | 配置文件 |
 | :------------------- | :------------- | :-----: |:--------: | :-----: | :-----------: |:----: | :-----: | :----------------------------------------------------------: | :----: |
-| ResNet50-FPN         | Faster         | c3-c5   |    1      |   1x    |    -     |  42.1  |    -    | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_dcn_r50_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/dcn/faster_rcnn_dcn_r50_fpn_1x_coco.yml) |
-| ResNet50-vd-FPN      | Faster         | c3-c5   |    1      |   1x    |    -     |  42.7  |    -    | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_dcn_r50_vd_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/dcn/faster_rcnn_dcn_r50_vd_fpn_1x_coco.yml) |
-| ResNet50-vd-FPN      | Faster         | c3-c5   |    1      |   2x    |    -     |  43.7  |    -    | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_dcn_r50_vd_fpn_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/dcn/faster_rcnn_dcn_r50_vd_fpn_2x_coco.yml) |
-| ResNet101-vd-FPN     | Faster         | c3-c5   |    1      |   1x    |    -     |  45.1  |    -    | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_dcn_r101_vd_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/dcn/faster_rcnn_dcn_r101_vd_fpn_1x_coco.yml) |
-| ResNeXt101-vd-FPN    | Faster         | c3-c5   |    1      |   1x    |    -     |  46.5  |    -    | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_dcn_x101_vd_64x4d_fpn_1x_coco.pdparams) |[配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/dcn/faster_rcnn_dcn_x101_vd_64x4d_fpn_1x_coco.yml) |
-| ResNet50-FPN         | Mask           | c3-c5   |    1      |   1x    |    -     |  42.7  |   38.4   | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_dcn_r50_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/dcn/mask_rcnn_dcn_r50_fpn_1x_coco.yml) |
-| ResNet50-vd-FPN      | Mask           | c3-c5   |    1      |   2x    |    -     |  44.6  |  39.8   | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_dcn_r50_vd_fpn_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/dcn/mask_rcnn_dcn_r50_vd_fpn_2x_coco.yml) |
-| ResNet101-vd-FPN     | Mask           | c3-c5   |    1      |   1x    |    -     |  45.6 |  40.6  | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_dcn_r101_vd_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/dcn/mask_rcnn_dcn_r101_vd_fpn_1x_coco.yml) |
-| ResNeXt101-vd-FPN    | Mask           | c3-c5   |    1      |   1x    |     -    |  47.3 |  42.0  | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_dcn_x101_vd_64x4d_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/dcn/mask_rcnn_dcn_x101_vd_64x4d_fpn_1x_coco.yml) |
-| ResNet50-FPN         | Cascade Faster         | c3-c5   |    1      |   1x    |    -     |  42.1  |    -    | [下载链接](https://paddledet.bj.bcebos.com/models/cascade_rcnn_dcn_r50_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/dcn/cascade_rcnn_dcn_r50_fpn_1x_coco.yml) |
-| ResNeXt101-vd-FPN    | Cascade Faster           | c3-c5   |    1      |   1x    |     -    |  48.8 |  -  | [下载链接](https://paddledet.bj.bcebos.com/models/cascade_rcnn_dcn_x101_vd_64x4d_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/dcn/cascade_rcnn_dcn_x101_vd_64x4d_fpn_1x_coco.yml) |
+| ResNet50-FPN         | Faster         | c3-c5   |    1      |   1x    |    -     |  42.1  |    -    | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_dcn_r50_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/dcn/faster_rcnn_dcn_r50_fpn_1x_coco.yml) |
+| ResNet50-vd-FPN      | Faster         | c3-c5   |    1      |   1x    |    -     |  42.7  |    -    | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_dcn_r50_vd_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/dcn/faster_rcnn_dcn_r50_vd_fpn_1x_coco.yml) |
+| ResNet50-vd-FPN      | Faster         | c3-c5   |    1      |   2x    |    -     |  43.7  |    -    | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_dcn_r50_vd_fpn_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/dcn/faster_rcnn_dcn_r50_vd_fpn_2x_coco.yml) |
+| ResNet101-vd-FPN     | Faster         | c3-c5   |    1      |   1x    |    -     |  45.1  |    -    | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_dcn_r101_vd_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/dcn/faster_rcnn_dcn_r101_vd_fpn_1x_coco.yml) |
+| ResNeXt101-vd-FPN    | Faster         | c3-c5   |    1      |   1x    |    -     |  46.5  |    -    | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_dcn_x101_vd_64x4d_fpn_1x_coco.pdparams) |[配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/dcn/faster_rcnn_dcn_x101_vd_64x4d_fpn_1x_coco.yml) |
+| ResNet50-FPN         | Mask           | c3-c5   |    1      |   1x    |    -     |  42.7  |   38.4   | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_dcn_r50_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/dcn/mask_rcnn_dcn_r50_fpn_1x_coco.yml) |
+| ResNet50-vd-FPN      | Mask           | c3-c5   |    1      |   2x    |    -     |  44.6  |  39.8   | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_dcn_r50_vd_fpn_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/dcn/mask_rcnn_dcn_r50_vd_fpn_2x_coco.yml) |
+| ResNet101-vd-FPN     | Mask           | c3-c5   |    1      |   1x    |    -     |  45.6 |  40.6  | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_dcn_r101_vd_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/dcn/mask_rcnn_dcn_r101_vd_fpn_1x_coco.yml) |
+| ResNeXt101-vd-FPN    | Mask           | c3-c5   |    1      |   1x    |     -    |  47.3 |  42.0  | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_dcn_x101_vd_64x4d_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/dcn/mask_rcnn_dcn_x101_vd_64x4d_fpn_1x_coco.yml) |
+| ResNet50-FPN         | Cascade Faster         | c3-c5   |    1      |   1x    |    -     |  42.1  |    -    | [下载链接](https://paddledet.bj.bcebos.com/models/cascade_rcnn_dcn_r50_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/dcn/cascade_rcnn_dcn_r50_fpn_1x_coco.yml) |
+| ResNeXt101-vd-FPN    | Cascade Faster           | c3-c5   |    1      |   1x    |     -    |  48.8 |  -  | [下载链接](https://paddledet.bj.bcebos.com/models/cascade_rcnn_dcn_x101_vd_64x4d_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/dcn/cascade_rcnn_dcn_x101_vd_64x4d_fpn_1x_coco.yml) |
 
 
 **注意事项:**  
diff --git a/configs/dcn/cascade_rcnn_dcn_x101_vd_64x4d_fpn_1x_coco.yml b/configs/dcn/cascade_rcnn_dcn_x101_vd_64x4d_fpn_1x_coco.yml
index 19c5ad811b1d0369f7a56ee74986780671061c03..4180919edcbf139f4c61109c084e1cd289caba0e 100644
--- a/configs/dcn/cascade_rcnn_dcn_x101_vd_64x4d_fpn_1x_coco.yml
+++ b/configs/dcn/cascade_rcnn_dcn_x101_vd_64x4d_fpn_1x_coco.yml
@@ -8,7 +8,6 @@ ResNet:
   depth: 101
   groups: 64
   base_width: 4
-  base_channels: 64
   variant: d
   norm_type: bn
   freeze_at: 0
diff --git a/configs/dcn/faster_rcnn_dcn_x101_vd_64x4d_fpn_1x_coco.yml b/configs/dcn/faster_rcnn_dcn_x101_vd_64x4d_fpn_1x_coco.yml
index 2b09c7e15c01813dfdaeacc08b56ff03c807b1dc..68fef482bed4eeaa09faf27c9babece0c57adaed 100644
--- a/configs/dcn/faster_rcnn_dcn_x101_vd_64x4d_fpn_1x_coco.yml
+++ b/configs/dcn/faster_rcnn_dcn_x101_vd_64x4d_fpn_1x_coco.yml
@@ -9,7 +9,6 @@ ResNet:
   depth: 101
   groups: 64
   base_width: 4
-  base_channels: 64
   variant: d
   norm_type: bn
   freeze_at: 0
diff --git a/configs/dcn/mask_rcnn_dcn_x101_vd_64x4d_fpn_1x_coco.yml b/configs/dcn/mask_rcnn_dcn_x101_vd_64x4d_fpn_1x_coco.yml
index 933e21f3fae41ed6978256c7e786bb9426696ae7..8e7857c5916ccd0da2177fee64dc662971e8922f 100644
--- a/configs/dcn/mask_rcnn_dcn_x101_vd_64x4d_fpn_1x_coco.yml
+++ b/configs/dcn/mask_rcnn_dcn_x101_vd_64x4d_fpn_1x_coco.yml
@@ -10,7 +10,6 @@ ResNet:
   variant: d
   groups: 64
   base_width: 4
-  base_channels: 64
   norm_type: bn
   freeze_at: 0
   return_idx: [0,1,2,3]
diff --git a/configs/dota/README.md b/configs/dota/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..3fe6bd8031818b69b3567a69f1d0a77c5e3f14c8
--- /dev/null
+++ b/configs/dota/README.md
@@ -0,0 +1,125 @@
+# S2ANet模型
+
+## 内容
+- [简介](#简介)
+- [DOTA数据集](#DOTA数据集)
+- [模型库](#模型库)
+- [训练说明](#训练说明)
+
+## 简介
+
+[S2ANet](https://arxiv.org/pdf/2008.09397.pdf)是用于检测旋转框的模型，要求使用PaddlePaddle 2.0.1(可使用pip安装) 或适当的[develop版本](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/install/Tables.html#whl-release)。
+
+
+## DOTA数据集
+[DOTA Dataset]是航空影像中物体检测的数据集，包含2806张图像，每张图像4000*4000分辨率。
+
+|  数据版本  |  类别数  |   图像数   |  图像尺寸  |    实例数    |     标注方式     |
+|:--------:|:-------:|:---------:|:---------:| :---------:| :------------: |
+|   v1.0   |   15    |   2806    | 800~4000  |   118282    |   OBB + HBB     |
+|   v1.5   |   16    |   2806    | 800~4000  |   400000    |   OBB + HBB     |
+
+注：OBB标注方式是指标注任意四边形；顶点按顺时针顺序排列。HBB标注方式是指标注示例的外接矩形。
+
+DOTA数据集中总共有2806张图像，其中1411张图像作为训练集，458张图像作为评估集，剩余937张图像作为测试集。
+
+如果需要切割图像数据，请参考[DOTA_devkit](https://github.com/CAPTAIN-WHU/DOTA_devkit) 。
+
+设置`crop_size=1024, stride=824, gap=200`参数切割数据后，训练集15749张图像，评估集5297张图像，测试集10833张图像。
+
+## 模型库
+
+### S2ANet模型
+
+|     模型     | GPU个数  |  Conv类型  |   mAP    |   模型下载   |   配置文件   |
+|:-----------:|:-------:|:----------:|:--------:| :----------:| :---------: |
+|   S2ANet    |    8    |   Conv     |   71.42  |  [model](https://paddledet.bj.bcebos.com/models/s2anet_conv_1x_dota.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/dota/s2anet_conv_1x_dota.yml)                   |
+
+**注意：**这里使用`multiclass_nms`，与原作者使用nms略有不同，精度相比原始论文中高0.15 (71.27-->71.42)。
+
+## 训练说明
+
+### 1. 旋转框IOU计算OP
+
+旋转框IOU计算OP[ext_op](../../ppdet/ext_op)是参考Paddle[自定义外部算子](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/07_new_op/new_custom_op.html) 的方式开发。
+
+若使用旋转框IOU计算OP，需要环境满足：
+- PaddlePaddle >= 2.0.1
+- GCC == 8.2
+
+推荐使用docker镜像[paddle:2.0.1-gpu-cuda10.1-cudnn7](registry.baidubce.com/paddlepaddle/paddle:2.0.1-gpu-cuda10.1-cudnn7)。
+
+执行如下命令下载镜像并启动容器：
+```
+sudo nvidia-docker run -it --name paddle_s2anet -v $PWD:/paddle --network=host registry.baidubce.com/paddlepaddle/paddle:2.0.1-gpu-cuda10.1-cudnn7 /bin/bash
+```
+
+镜像中paddle2.0.1已安装好，进入python3.7，执行如下代码检查paddle安装是否正常：
+```
+import paddle
+print(paddle.__version__)
+paddle.utils.run_check()
+```
+
+进入到`ppdet/ext_op`文件夹，安装：
+```
+python3.7 setup.py install
+```
+
+Windows环境请按照如下步骤安装：
+
+（1）准备Visual Studio (版本需要>=Visual Studio 2015 update3)，这里以VS2017为例；
+
+（2）点击开始-->Visual Studio 2017-->适用于 VS 2017 的x64本机工具命令提示；
+
+（3）设置环境变量：`set DISTUTILS_USE_SDK=1`
+
+（4）进入`PaddleDetection/ppdet/ext_op`目录，通过`python3.7 setup.py install`命令进行安装。
+
+安装完成后，测试自定义op是否可以正常编译以及计算结果：
+```
+cd PaddleDetecetion/ppdet/ext_op
+python3.7 test.py
+```
+
+### 2. 数据格式
+DOTA 数据集中实例是按照任意四边形标注，在进行训练模型前，需要参考[DOTA2COCO](https://github.com/CAPTAIN-WHU/DOTA_devkit/blob/master/DOTA2COCO.py) 转换成`[xc, yc, bow_w, bow_h, angle]`格式，并以coco数据格式存储。
+
+## 评估
+
+执行如下命令，会在`output_dir`文件夹下将每个图像预测结果保存到同文件夹名的txt文本中。
+```
+python3.7 tools/infer.py -c configs/dota/s2anet_1x_dota.yml -o weights=./weights/s2anet_1x_dota.pdparams  --infer_dir=dota_test_images --draw_threshold=0.05 --save_txt=True --output_dir=output
+```
+
+
+请参考[DOTA_devkit](https://github.com/CAPTAIN-WHU/DOTA_devkit) 生成评估文件，评估文件格式请参考[DOTA Test](http://captain.whu.edu.cn/DOTAweb/tasks.html) ，生成zip文件，每个类一个txt文件，txt文件中每行格式为：`image_id score x1 y1 x2 y2 x3 y3 x4 y4`，提交服务器进行评估。
+
+## 预测部署
+
+Paddle中`multiclass_nms`算子的输入支持四边形输入，因此部署时可以不不需要依赖旋转框IOU计算算子。
+
+```bash
+# 预测
+CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/dota/s2anet_1x_dota.yml -o weights=model.pdparams --infer_img=demo/P0072__1.0__0___0.png
+```
+
+
+## Citations
+```
+@article{han2021align,  
+  author={J. {Han} and J. {Ding} and J. {Li} and G. -S. {Xia}},  
+  journal={IEEE Transactions on Geoscience and Remote Sensing},  
+  title={Align Deep Features for Oriented Object Detection},  
+  year={2021},
+  pages={1-11},  
+  doi={10.1109/TGRS.2021.3062048}}
+
+@inproceedings{xia2018dota,
+  title={DOTA: A large-scale dataset for object detection in aerial images},
+  author={Xia, Gui-Song and Bai, Xiang and Ding, Jian and Zhu, Zhen and Belongie, Serge and Luo, Jiebo and Datcu, Mihai and Pelillo, Marcello and Zhang, Liangpei},
+  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
+  pages={3974--3983},
+  year={2018}
+}
+```
diff --git a/configs/dota/_base_/s2anet.yml b/configs/dota/_base_/s2anet.yml
new file mode 100644
index 0000000000000000000000000000000000000000..f4e4974d91fd70f7772f4ac29272f33a1bea0279
--- /dev/null
+++ b/configs/dota/_base_/s2anet.yml
@@ -0,0 +1,55 @@
+architecture: S2ANet
+pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_cos_pretrained.pdparams
+weights: output/s2anet_r50_fpn_1x_dota/model_final.pdparams
+
+
+# Model Achitecture
+S2ANet:
+  backbone: ResNet
+  neck: FPN
+  s2anet_head: S2ANetHead
+  s2anet_bbox_post_process: S2ANetBBoxPostProcess
+
+ResNet:
+  depth: 50
+  norm_type: bn
+  return_idx: [1,2,3]
+  num_stages: 4
+
+FPN:
+  in_channels: [256, 512, 1024]
+  out_channel: 256
+  spatial_scales: [0.25, 0.125, 0.0625]
+  has_extra_convs: True
+  extra_stage: 2
+  relu_before_extra_convs: False
+
+S2ANetHead:
+  anchor_strides: [8, 16, 32, 64, 128]
+  anchor_scales: [4]
+  anchor_ratios: [1.0]
+  anchor_assign: RBoxAssigner
+  stacked_convs: 2
+  feat_in: 256
+  feat_out: 256
+  num_classes: 15
+  align_conv_type: 'Conv'  # AlignConv Conv
+  align_conv_size: 3
+  use_sigmoid_cls: True
+
+RBoxAssigner:
+  pos_iou_thr: 0.5
+  neg_iou_thr: 0.4
+  min_iou_thr: 0.0
+  ignore_iof_thr: -2
+
+S2ANetBBoxPostProcess:
+  nms_pre: 2000
+  min_bbox_size: 0.0
+  nms:
+    name: MultiClassNMS
+    keep_top_k: -1
+    score_threshold: 0.05
+    nms_threshold: 0.1
+    normalized: False
+    #background_label: -1
diff --git a/configs/dota/_base_/s2anet_optimizer_1x.yml b/configs/dota/_base_/s2anet_optimizer_1x.yml
new file mode 100644
index 0000000000000000000000000000000000000000..65f794dc34c55f5d597b94eb1b305b28a28707f7
--- /dev/null
+++ b/configs/dota/_base_/s2anet_optimizer_1x.yml
@@ -0,0 +1,20 @@
+epoch: 12
+
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [7, 10]
+  - !LinearWarmup
+    start_factor: 0.3333333333333333
+    steps: 500
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
+  clip_grad_by_norm: 35
diff --git a/configs/dota/_base_/s2anet_reader.yml b/configs/dota/_base_/s2anet_reader.yml
new file mode 100644
index 0000000000000000000000000000000000000000..c3df7a089ae0c3fa0bc9336330a06c3edfa94788
--- /dev/null
+++ b/configs/dota/_base_/s2anet_reader.yml
@@ -0,0 +1,42 @@
+worker_num: 0
+TrainReader:
+  sample_transforms:
+  - Decode: {}
+  - Rbox2Poly: {}
+  # Resize can process rbox
+  - Resize: {target_size: [1024, 1024], interp: 2, keep_ratio: False}
+  - RandomFlip: {prob: 0.5}
+  - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
+  - Permute: {}
+  batch_transforms:
+  - RboxPadBatch: {pad_to_stride: 32, pad_gt: true}
+  batch_size: 1
+  shuffle: true
+  drop_last: true
+
+
+EvalReader:
+  sample_transforms:
+  - Decode: {}
+  - Resize: {interp: 2, target_size: [1024, 1024], keep_ratio: True}
+  - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
+  - Permute: {}
+  batch_transforms:
+  - RboxPadBatch: {pad_to_stride: 32, pad_gt: false}
+  batch_size: 1
+  shuffle: false
+  drop_last: false
+  drop_empty: false
+
+
+TestReader:
+  sample_transforms:
+  - Decode: {}
+  - Resize: {interp: 2, target_size: [1024, 1024], keep_ratio: True}
+  - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
+  - Permute: {}
+  batch_transforms:
+  - RboxPadBatch: {pad_to_stride: 32, pad_gt: false}
+  batch_size: 1
+  shuffle: false
+  drop_last: false
diff --git a/configs/dota/s2anet_1x_dota.yml b/configs/dota/s2anet_1x_dota.yml
new file mode 100644
index 0000000000000000000000000000000000000000..d480c1c8669402727d16cfb1c3fbdd0d1d7464af
--- /dev/null
+++ b/configs/dota/s2anet_1x_dota.yml
@@ -0,0 +1,8 @@
+_BASE_: [
+  '../datasets/dota.yml',
+  '../runtime.yml',
+  '_base_/s2anet_optimizer_1x.yml',
+  '_base_/s2anet.yml',
+  '_base_/s2anet_reader.yml',
+]
+weights: output/s2anet_1x_dota/model_final
diff --git a/configs/dota/s2anet_conv_1x_dota.yml b/configs/dota/s2anet_conv_1x_dota.yml
new file mode 100644
index 0000000000000000000000000000000000000000..60931b13185be22b0a5c17bbb056c86260bb0d49
--- /dev/null
+++ b/configs/dota/s2anet_conv_1x_dota.yml
@@ -0,0 +1,21 @@
+_BASE_: [
+  '../datasets/dota.yml',
+  '../runtime.yml',
+  '_base_/s2anet_optimizer_1x.yml',
+  '_base_/s2anet.yml',
+  '_base_/s2anet_reader.yml',
+]
+weights: output/s2anet_1x_dota/model_final
+
+S2ANetHead:
+  anchor_strides: [8, 16, 32, 64, 128]
+  anchor_scales: [4]
+  anchor_ratios: [1.0]
+  anchor_assign: RBoxAssigner
+  stacked_convs: 2
+  feat_in: 256
+  feat_out: 256
+  num_classes: 15
+  align_conv_type: 'Conv'  # AlignConv Conv
+  align_conv_size: 3
+  use_sigmoid_cls: True
diff --git a/configs/face_detection/README.md b/configs/face_detection/README.md
index b6d37c652dbb694474516daa5cc2d7785a4f26cb..3f0fe240e455b834f16d67fe634eb198779dfa7c 100644
--- a/configs/face_detection/README.md
+++ b/configs/face_detection/README.md
@@ -11,7 +11,7 @@
 
 | 网络结构 | 输入尺寸 | 图片个数/GPU | 学习率策略 | Easy/Medium/Hard Set  | 预测时延（SD855）| 模型大小(MB) | 下载 | 配置文件 |
 |:------------:|:--------:|:----:|:-------:|:-------:|:---------:|:----------:|:---------:|:--------:|
-| BlazeFace  | 640  |    8    | 1000e     | 0.889 / 0.859 / 0.740 | - | 0.472 |[下载链接](https://paddledet.bj.bcebos.com/models/blazeface_1000e.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/face_detection/blazeface_1000e.yml) |
+| BlazeFace  | 640  |    8    | 1000e     | 0.885 / 0.855 / 0.731 | - | 0.472 |[下载链接](https://paddledet.bj.bcebos.com/models/blazeface_1000e.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/face_detection/blazeface_1000e.yml) |
 
 **注意:**  
 - 我们使用多尺度评估策略得到`Easy/Medium/Hard Set`里的mAP。具体细节请参考[在WIDER-FACE数据集上评估](#在WIDER-FACE数据集上评估)。
@@ -54,11 +54,10 @@ cd dataset/wider_face && ./download_wider_face.sh
 
 ### 训练与评估
 训练流程与评估流程方法与其他算法一致，请参考[GETTING_STARTED_cn.md](../../docs/tutorials/GETTING_STARTED_cn.md)。  
-**注意:**
-- 人脸检测模型目前不支持边训练边评估。
+**注意:** 人脸检测模型目前不支持边训练边评估。
 
 #### 在WIDER-FACE数据集上评估
-评估并生成结果文件：
+- 步骤一：评估并生成结果文件：
 ```shell
 python -u tools/eval.py -c configs/face_detection/blazeface_1000e.yml \
        -o weights=output/blazeface_1000e/model_final \
@@ -66,20 +65,31 @@ python -u tools/eval.py -c configs/face_detection/blazeface_1000e.yml \
 ```
 设置`multi_scale=True`进行多尺度评估，评估完成后，将在`output/pred`中生成txt格式的测试结果。
 
-- 下载官方评估脚本来评估AP指标：
+- 步骤二：下载官方评估脚本和Ground Truth文件：
 ```
 wget http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/support/eval_script/eval_tools.zip
 unzip eval_tools.zip && rm -f eval_tools.zip
 ```
-- 在`eval_tools/wider_eval.m`中修改保存结果路径和绘制曲线的名称：
+
+- 步骤三：开始评估
+
+方法一：python评估：
 ```
-# Modify the folder name where the result is stored.
-pred_dir = './pred';  
-# Modify the name of the curve to be drawn
-legend_name = 'Fluid-BlazeFace';
+git clone https://github.com/wondervictor/WiderFace-Evaluation.git
+cd WiderFace-Evaluation
+# 编译
+python3 setup.py build_ext --inplace
+# 开始评估
+python3 evaluation.py -p /path/to/PaddleDetection/output/pred -g /path/to/eval_tools/ground_truth
 ```
-- `wider_eval.m` 是评估模块的主要执行程序。运行命令如下：
+
+方法二：MatLab评估：
 ```
+# 在`eval_tools/wider_eval.m`中修改保存结果路径和绘制曲线的名称：
+pred_dir = './pred';  
+legend_name = 'Paddle-BlazeFace';
+
+`wider_eval.m` 是评估模块的主要执行程序。运行命令如下：
 matlab -nodesktop -nosplash -nojvm -r "run wider_eval.m;quit;"
 ```
 
diff --git a/configs/face_detection/_base_/blazeface.yml b/configs/face_detection/_base_/blazeface.yml
index 5d929613dee66666d589171bceb048e48c5a61d0..469aa9c4ca067e3dc38a6bba1832d8050e30b19e 100644
--- a/configs/face_detection/_base_/blazeface.yml
+++ b/configs/face_detection/_base_/blazeface.yml
@@ -17,7 +17,6 @@ FaceHead:
 
 SSDLoss:
   overlap_threshold: 0.35
-  neg_overlap: 0.35
 
 AnchorGeneratorSSD:
   steps: [8., 16.]
diff --git a/configs/faster_rcnn/README.md b/configs/faster_rcnn/README.md
index 826b1a1548d526778d541f2804eb5987fa7c0392..a7e08ab0581b963f95c1cb5e774c7449c6dd9e05 100644
--- a/configs/faster_rcnn/README.md
+++ b/configs/faster_rcnn/README.md
@@ -4,22 +4,23 @@
 
 | 骨架网络             | 网络类型       | 每张GPU图片个数 | 学习率策略 |推理时间(fps) | Box AP |                           下载                          | 配置文件 |
 | :------------------- | :------------- | :-----: | :-----: | :------------: | :-----: | :-----------------------------------------------------: | :-----: |
-| ResNet50             | Faster         |    1    |   1x    |     ----     |  36.7  | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/faster_rcnn/faster_rcnn_r50_1x_coco.yml) |
-| ResNet50-vd          | Faster         |    1    |   1x    |     ----     |  37.6  | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_vd_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/faster_rcnn/faster_rcnn_r50_vd_1x_coco.yml) |
-| ResNet101            | Faster         |    1    |   1x    |     ----     |  39.0  | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r101_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/faster_rcnn/faster_rcnn_r101_1x_coco.yml) |
-| ResNet34-FPN         | Faster         |    1    |   1x    |     ----     |  37.8  | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r34_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/faster_rcnn/faster_rcnn_r34_fpn_1x_coco.yml) |
-| ResNet34-vd-FPN      | Faster         |    1    |   1x    |     ----     |  38.5  | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r34_vd_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/faster_rcnn/faster_rcnn_r34_vd_fpn_1x_coco.yml) |
-| ResNet50-FPN         | Faster         |    1    |   1x    |     ----     |  38.4  | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml) |
-| ResNet50-FPN         | Faster         |    1    |   2x    |     ----     |  40.0  | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_fpn_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/faster_rcnn/faster_rcnn_r50_fpn_2x_coco.yml) |
-| ResNet50-vd-FPN      | Faster         |    1    |   1x    |     ----     |  39.5  | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_vd_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/faster_rcnn/faster_rcnn_r50_vd_fpn_1x_coco.yml) |
-| ResNet50-vd-FPN      | Faster         |    1    |   2x    |     ----     |  40.8  | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_vd_fpn_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/faster_rcnn/faster_rcnn_r50_vd_fpn_2x_coco.yml) |
-| ResNet101-FPN        | Faster         |    1    |   2x    |     ----     |  41.4  | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r101_fpn_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/faster_rcnn/faster_rcnn_r101_fpn_2x_coco.yml) |
-| ResNet101-vd-FPN     | Faster         |    1    |   1x    |     ----     |  42.0  | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r101_vd_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/faster_rcnn/faster_rcnn_r101_vd_fpn_1x_coco.yml) |
-| ResNet101-vd-FPN     | Faster         |    1    |   2x    |     ----     |  43.0  | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r101_vd_fpn_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/faster_rcnn/faster_rcnn_r101_vd_fpn_2x_coco.yml) |
-| ResNeXt101-vd-FPN    | Faster         |    1    |   1x    |     ----     |  43.4  | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_x101_vd_64x4d_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/faster_rcnn/faster_rcnn_x101_vd_64x4d_fpn_1x_coco.yml) |
-| ResNeXt101-vd-FPN    | Faster         |    1    |   2x    |     ----     |  44.0  | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_x101_vd_64x4d_fpn_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/faster_rcnn/faster_rcnn_x101_vd_64x4d_fpn_2x_coco.yml) |
+| ResNet50             | Faster         |    1    |   1x    |     ----     |  36.7  | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/faster_rcnn/faster_rcnn_r50_1x_coco.yml) |
+| ResNet50-vd          | Faster         |    1    |   1x    |     ----     |  37.6  | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_vd_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/faster_rcnn/faster_rcnn_r50_vd_1x_coco.yml) |
+| ResNet101            | Faster         |    1    |   1x    |     ----     |  39.0  | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r101_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/faster_rcnn/faster_rcnn_r101_1x_coco.yml) |
+| ResNet34-FPN         | Faster         |    1    |   1x    |     ----     |  37.8  | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r34_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/faster_rcnn/faster_rcnn_r34_fpn_1x_coco.yml) |
+| ResNet34-vd-FPN      | Faster         |    1    |   1x    |     ----     |  38.5  | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r34_vd_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/faster_rcnn/faster_rcnn_r34_vd_fpn_1x_coco.yml) |
+| ResNet50-FPN         | Faster         |    1    |   1x    |     ----     |  38.4  | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml) |
+| ResNet50-FPN         | Faster         |    1    |   2x    |     ----     |  40.0  | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_fpn_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/faster_rcnn/faster_rcnn_r50_fpn_2x_coco.yml) |
+| ResNet50-vd-FPN      | Faster         |    1    |   1x    |     ----     |  39.5  | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_vd_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/faster_rcnn/faster_rcnn_r50_vd_fpn_1x_coco.yml) |
+| ResNet50-vd-FPN      | Faster         |    1    |   2x    |     ----     |  40.8  | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_vd_fpn_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/faster_rcnn/faster_rcnn_r50_vd_fpn_2x_coco.yml) |
+| ResNet101-FPN        | Faster         |    1    |   2x    |     ----     |  41.4  | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r101_fpn_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/faster_rcnn/faster_rcnn_r101_fpn_2x_coco.yml) |
+| ResNet101-vd-FPN     | Faster         |    1    |   1x    |     ----     |  42.0  | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r101_vd_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/faster_rcnn/faster_rcnn_r101_vd_fpn_1x_coco.yml) |
+| ResNet101-vd-FPN     | Faster         |    1    |   2x    |     ----     |  43.0  | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r101_vd_fpn_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/faster_rcnn/faster_rcnn_r101_vd_fpn_2x_coco.yml) |
+| ResNeXt101-vd-FPN    | Faster         |    1    |   1x    |     ----     |  43.4  | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_x101_vd_64x4d_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/faster_rcnn/faster_rcnn_x101_vd_64x4d_fpn_1x_coco.yml) |
+| ResNeXt101-vd-FPN    | Faster         |    1    |   2x    |     ----     |  44.0  | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_x101_vd_64x4d_fpn_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/faster_rcnn/faster_rcnn_x101_vd_64x4d_fpn_2x_coco.yml) |
+| ResNet50-vd-SSLDv2-FPN | Faster       |    1    |   1x    |     ----     |  41.4  | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_vd_fpn_ssld_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/faster_rcnn/faster_rcnn_r50_vd_fpn_ssld_1x_coco.yml) |
+| ResNet50-vd-SSLDv2-FPN | Faster       |    1    |   2x    |     ----     |  42.3  | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_vd_ssld_fpn_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/faster_rcnn/faster_rcnn_r50_vd_ssld_fpn_2x_coco.yml) |
 
-**注意：** Faster R-CNN模型精度依赖Paddle develop分支修改，精度复现须使用[每日版本](https://www.paddlepaddle.org.cn/documentation/docs/zh/install/Tables.html#whl-dev)或2.0.1版本(将于2021.03发布)，使用Paddle 2.0.0版本会有少量精度损失。
 
 ## Citations
 ```
diff --git a/configs/faster_rcnn/_base_/faster_rcnn_r50_fpn.yml b/configs/faster_rcnn/_base_/faster_rcnn_r50_fpn.yml
index aa5f5b28811e263bdc5c256d8b381e810c6b7196..38ee81def0cb528f3f67e8ed616b9589bd72de9e 100644
--- a/configs/faster_rcnn/_base_/faster_rcnn_r50_fpn.yml
+++ b/configs/faster_rcnn/_base_/faster_rcnn_r50_fpn.yml
@@ -61,7 +61,7 @@ BBoxAssigner:
   use_random: True
 
 TwoFCHead:
-  mlp_dim: 1024
+  out_channel: 1024
 
 
 BBoxPostProcess:
diff --git a/configs/faster_rcnn/_base_/faster_reader.yml b/configs/faster_rcnn/_base_/faster_reader.yml
index c1be1de4cf084d489f990c16ac85da7035572990..ebb4e7949654e346b9faaf8a33a78fee0bf32a3c 100644
--- a/configs/faster_rcnn/_base_/faster_reader.yml
+++ b/configs/faster_rcnn/_base_/faster_reader.yml
@@ -7,7 +7,7 @@ TrainReader:
   - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
   - Permute: {}
   batch_transforms:
-  - PadBatch: {pad_to_stride: -1., pad_gt: true}
+  - PadBatch: {pad_to_stride: -1, pad_gt: true}
   batch_size: 1
   shuffle: true
   drop_last: true
@@ -20,7 +20,7 @@ EvalReader:
   - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
   - Permute: {}
   batch_transforms:
-  - PadBatch: {pad_to_stride: -1., pad_gt: false}
+  - PadBatch: {pad_to_stride: -1, pad_gt: false}
   batch_size: 1
   shuffle: false
   drop_last: false
@@ -34,7 +34,7 @@ TestReader:
   - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
   - Permute: {}
   batch_transforms:
-  - PadBatch: {pad_to_stride: -1., pad_gt: false}
+  - PadBatch: {pad_to_stride: -1}
   batch_size: 1
   shuffle: false
   drop_last: false
diff --git a/configs/faster_rcnn/faster_rcnn_r50_vd_fpn_ssld_1x_coco.yml b/configs/faster_rcnn/faster_rcnn_r50_vd_fpn_ssld_1x_coco.yml
new file mode 100644
index 0000000000000000000000000000000000000000..d71b82d8301ffd26c86d68245be890fd99e4dec0
--- /dev/null
+++ b/configs/faster_rcnn/faster_rcnn_r50_vd_fpn_ssld_1x_coco.yml
@@ -0,0 +1,29 @@
+_BASE_: [
+  '../datasets/coco_detection.yml',
+  '../runtime.yml',
+  '_base_/optimizer_1x.yml',
+  '_base_/faster_rcnn_r50_fpn.yml',
+  '_base_/faster_fpn_reader.yml',
+]
+pretrain_weights:  https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_v2_pretrained.pdparams
+weights: output/faster_rcnn_r50_vd_fpn_ssld_1x_coco/model_final
+
+ResNet:
+  depth: 50
+  variant: d
+  norm_type: bn
+  freeze_at: 0
+  return_idx: [0,1,2,3]
+  num_stages: 4
+  lr_mult_list: [0.05, 0.05, 0.1, 0.15]
+
+epoch: 12
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [8, 11]
+  - !LinearWarmup
+    start_factor: 0.1
+    steps: 1000
diff --git a/configs/faster_rcnn/faster_rcnn_r50_vd_fpn_ssld_2x_coco.yml b/configs/faster_rcnn/faster_rcnn_r50_vd_fpn_ssld_2x_coco.yml
new file mode 100644
index 0000000000000000000000000000000000000000..0562354e7a3c64bf1dd96a21108868dbca70d46e
--- /dev/null
+++ b/configs/faster_rcnn/faster_rcnn_r50_vd_fpn_ssld_2x_coco.yml
@@ -0,0 +1,29 @@
+_BASE_: [
+  '../datasets/coco_detection.yml',
+  '../runtime.yml',
+  '_base_/optimizer_1x.yml',
+  '_base_/faster_rcnn_r50_fpn.yml',
+  '_base_/faster_fpn_reader.yml',
+]
+pretrain_weights:  https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_v2_pretrained.pdparams
+weights: output/faster_rcnn_r50_vd_fpn_ssld_2x_coco/model_final
+
+ResNet:
+  depth: 50
+  variant: d
+  norm_type: bn
+  freeze_at: 0
+  return_idx: [0,1,2,3]
+  num_stages: 4
+  lr_mult_list: [0.05, 0.05, 0.1, 0.15]
+
+epoch: 24
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [12, 22]
+  - !LinearWarmup
+    start_factor: 0.1
+    steps: 1000
diff --git a/configs/faster_rcnn/faster_rcnn_x101_vd_64x4d_fpn_1x_coco.yml b/configs/faster_rcnn/faster_rcnn_x101_vd_64x4d_fpn_1x_coco.yml
index 20e38b762144c6f9515cbb52abf676fa4415713e..317d3741e38e5e4a4720add59e1f0792bf8c4a82 100644
--- a/configs/faster_rcnn/faster_rcnn_x101_vd_64x4d_fpn_1x_coco.yml
+++ b/configs/faster_rcnn/faster_rcnn_x101_vd_64x4d_fpn_1x_coco.yml
@@ -10,7 +10,6 @@ ResNet:
   depth: 101
   groups: 64
   base_width: 4
-  base_channels: 64
   variant: d
   norm_type: bn
   freeze_at: 0
diff --git a/configs/faster_rcnn/faster_rcnn_x101_vd_64x4d_fpn_2x_coco.yml b/configs/faster_rcnn/faster_rcnn_x101_vd_64x4d_fpn_2x_coco.yml
index 82e0b39f275a88754ab751bac10309b0f0b2948b..939878f247b2552d6e9e4364f5c9e6443c71de31 100644
--- a/configs/faster_rcnn/faster_rcnn_x101_vd_64x4d_fpn_2x_coco.yml
+++ b/configs/faster_rcnn/faster_rcnn_x101_vd_64x4d_fpn_2x_coco.yml
@@ -10,7 +10,6 @@ ResNet:
   depth: 101
   groups: 64
   base_width: 4
-  base_channels: 64
   variant: d
   norm_type: bn
   freeze_at: 0
diff --git a/configs/fcos/README.md b/configs/fcos/README.md
index 604deb654752b8e7e5f8e0d12e89eb73067ce5b3..27362f276549fcffdaa1a8dbfeb6b058a03d97af 100644
--- a/configs/fcos/README.md
+++ b/configs/fcos/README.md
@@ -10,16 +10,15 @@ FCOS (Fully Convolutional One-Stage Object Detection) is a fast anchor-free obje
 
 ## Model Zoo
 
-| 骨架网络        | 网络类型       | 每张GPU图片个数 | 学习率策略 |推理时间(fps) | Box AP |                           下载                          | 配置文件 |
+| Backbone        | Model      | images/GPU | lr schedule |FPS | Box AP |                           download                          | config |
 | :-------------- | :------------- | :-----: | :-----: | :------------: | :-----: | :-----------------------------------------------------: | :-----: |
-| ResNet50-FPN    | FCOS           |    2    |   1x      |     ----     |  39.6  | [下载链接](https://paddledet.bj.bcebos.com/models/fcos_r50_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/fcos/fcos_r50_fpn_1x_coco.yml) |
-| ResNet50-FPN    | FCOS+DCN       |    2    |   1x      |     ----     |  44.3  | [下载链接](https://paddledet.bj.bcebos.com/models/fcos_dcn_r50_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/fcos/fcos_dcn_r50_fpn_1x_coco.yml) |
-| ResNet50-FPN    | FCOS+multiscale_train    |    2    |   2x      |     ----     |  41.8  | [下载链接](https://paddledet.bj.bcebos.com/models/fcos_r50_fpn_multiscale_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/fcos/fcos_r50_fpn_multiscale_2x_coco.yml) |
+| ResNet50-FPN    | FCOS           |    2    |   1x      |     ----     |  39.6  | [download](https://paddledet.bj.bcebos.com/models/fcos_r50_fpn_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/fcos/fcos_r50_fpn_1x_coco.yml) |
+| ResNet50-FPN    | FCOS+DCN       |    2    |   1x      |     ----     |  44.3  | [download](https://paddledet.bj.bcebos.com/models/fcos_dcn_r50_fpn_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/fcos/fcos_dcn_r50_fpn_1x_coco.yml) |
+| ResNet50-FPN    | FCOS+multiscale_train    |    2    |   2x      |     ----     |  41.8  | [download](https://paddledet.bj.bcebos.com/models/fcos_r50_fpn_multiscale_2x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/fcos/fcos_r50_fpn_multiscale_2x_coco.yml) |
 
 **Notes:**
 
 - FCOS is trained on COCO train2017 dataset and evaluated on val2017 results of `mAP(IoU=0.5:0.95)`.
-- FCOS training performace is dependented on Paddle develop branch, performance reproduction shoule based on [Paddle daily version](https://www.paddlepaddle.org.cn/documentation/docs/zh/install/Tables.html#whl-dev) or Paddle 2.0.1(will be published on 2021.03), performace will loss slightly is training base on Paddle 2.0.0
 
 ## Citations
 ```
diff --git a/configs/fcos/_base_/fcos_r50_fpn.yml b/configs/fcos/_base_/fcos_r50_fpn.yml
index 1124082ee32ded9e4ad74f2700918c030f2cad44..64a275d88023030b2299b0c3932b1c3fc9ce1e34 100644
--- a/configs/fcos/_base_/fcos_r50_fpn.yml
+++ b/configs/fcos/_base_/fcos_r50_fpn.yml
@@ -47,7 +47,6 @@ FCOSPostProcess:
   decode:
     name: FCOSBox
     num_classes: 80
-    batch_size: 1
   nms:
     name: MultiClassNMS
     nms_top_k: 1000
diff --git a/configs/gn/README.md b/configs/gn/README.md
index 8875286a7ac92115373c29ca3a7bfea07e2fa5da..cc398af62185ffeeea55cb4d219ab863d0bd69e1 100644
--- a/configs/gn/README.md
+++ b/configs/gn/README.md
@@ -4,10 +4,10 @@
 
 | 骨架网络         | 网络类型        | 每张GPU图片个数 | 学习率策略 |推理时间(fps)| Box AP | Mask AP |  下载  | 配置文件 |
 | :------------- | :------------- | :-----------: | :------: | :--------: |:-----: | :-----: | :----: | :----: |
-| ResNet50-FPN   | Faster         |    1          |   2x     |    -       |  41.9  |    -    | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_fpn_gn_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/gn/faster_rcnn_r50_fpn_gn_2x_coco.yml) |
-| ResNet50-FPN   | Mask           |    1          |   2x     |    -       |  42.3   |  38.4  | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_fpn_gn_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/gn/mask_rcnn_r50_fpn_gn_2x_coco.yml) |
-| ResNet50-FPN   | Cascade Faster    |    1          |   2x     |    -       |  -   |  -  | [下载链接](https://paddledet.bj.bcebos.com/models/cascade_rcnn_r50_fpn_gn_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/gn/cascade_rcnn_r50_fpn_gn_2x_coco.yml) |
-| ResNet50-FPN   | Cacade Mask      |    1          |   2x     |    -       |  45.0   |  39.3  | [下载链接](https://paddledet.bj.bcebos.com/models/cascade_mask_rcnn_r50_fpn_gn_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/gn/cascade_mask_rcnn_r50_fpn_gn_2x_coco.yml) |
+| ResNet50-FPN   | Faster         |    1          |   2x     |    -       |  41.9  |    -    | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_fpn_gn_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/gn/faster_rcnn_r50_fpn_gn_2x_coco.yml) |
+| ResNet50-FPN   | Mask           |    1          |   2x     |    -       |  42.3   |  38.4  | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_fpn_gn_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/gn/mask_rcnn_r50_fpn_gn_2x_coco.yml) |
+| ResNet50-FPN   | Cascade Faster    |    1          |   2x     |    -       |  44.6   |  -  | [下载链接](https://paddledet.bj.bcebos.com/models/cascade_rcnn_r50_fpn_gn_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/gn/cascade_rcnn_r50_fpn_gn_2x_coco.yml) |
+| ResNet50-FPN   | Cacade Mask      |    1          |   2x     |    -       |  45.0   |  39.3  | [下载链接](https://paddledet.bj.bcebos.com/models/cascade_mask_rcnn_r50_fpn_gn_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/gn/cascade_mask_rcnn_r50_fpn_gn_2x_coco.yml) |
 
 
 **注意：** Faster R-CNN baseline仅使用 `2fc` head，而此处使用[`4conv1fc` head](https://arxiv.org/abs/1803.08494)（4层conv之间使用GN），并且FPN也使用GN，而对于Mask R-CNN是在mask head的4层conv之间也使用GN。
diff --git a/configs/gn/cascade_mask_rcnn_r50_fpn_gn_2x_coco.yml b/configs/gn/cascade_mask_rcnn_r50_fpn_gn_2x_coco.yml
index 1281148da634bb40862d37db7e60266d79203f40..e2c750dfbe481eb6875fff6df0febba69d0ab947 100644
--- a/configs/gn/cascade_mask_rcnn_r50_fpn_gn_2x_coco.yml
+++ b/configs/gn/cascade_mask_rcnn_r50_fpn_gn_2x_coco.yml
@@ -31,7 +31,7 @@ CascadeHead:
 
 CascadeXConvNormHead:
   num_convs: 4
-  mlp_dim: 1024
+  out_channel: 1024
   norm_type: gn
 
 MaskHead:
@@ -45,7 +45,7 @@ MaskHead:
 
 MaskFeat:
   num_convs: 4
-  out_channels: 256
+  out_channel: 256
   norm_type: gn
 
 
diff --git a/configs/gn/cascade_rcnn_r50_fpn_gn_2x_coco.yml b/configs/gn/cascade_rcnn_r50_fpn_gn_2x_coco.yml
index 701b8306ec2e4c7cc854fc71569716db40c00a2e..2706790ed77301739e9d1374e9292f16a0c1c090 100644
--- a/configs/gn/cascade_rcnn_r50_fpn_gn_2x_coco.yml
+++ b/configs/gn/cascade_rcnn_r50_fpn_gn_2x_coco.yml
@@ -21,7 +21,7 @@ CascadeHead:
 
 CascadeXConvNormHead:
   num_convs: 4
-  mlp_dim: 1024
+  out_channel: 1024
   norm_type: gn
 
 
diff --git a/configs/gn/faster_rcnn_r50_fpn_gn_2x_coco.yml b/configs/gn/faster_rcnn_r50_fpn_gn_2x_coco.yml
index e8eb5679347900cbbedde7363cffb6d43ede13bd..200a98b4b9fb615c17b7bd42f88b3bb1b2474370 100644
--- a/configs/gn/faster_rcnn_r50_fpn_gn_2x_coco.yml
+++ b/configs/gn/faster_rcnn_r50_fpn_gn_2x_coco.yml
@@ -29,7 +29,7 @@ BBoxHead:
 
 XConvNormHead:
   num_convs: 4
-  mlp_dim: 1024
+  out_channel: 1024
   norm_type: gn
 
 
diff --git a/configs/gn/mask_rcnn_r50_fpn_gn_2x_coco.yml b/configs/gn/mask_rcnn_r50_fpn_gn_2x_coco.yml
index 2104fa901e194fbbf3c66aef36ae1ffaf0f88272..70beaf5851df945745c904dc9932928d9cedac01 100644
--- a/configs/gn/mask_rcnn_r50_fpn_gn_2x_coco.yml
+++ b/configs/gn/mask_rcnn_r50_fpn_gn_2x_coco.yml
@@ -31,7 +31,7 @@ BBoxHead:
 
 XConvNormHead:
   num_convs: 4
-  mlp_dim: 1024
+  out_channel: 1024
   norm_type: gn
 
 MaskHead:
@@ -45,7 +45,7 @@ MaskHead:
 
 MaskFeat:
   num_convs: 4
-  out_channels: 256
+  out_channel: 256
   norm_type: gn
 
 
diff --git a/configs/hrnet/README.md b/configs/hrnet/README.md
index 4e1807a39bd8699f4c55307d0d368c5af9794ea1..9f581abf803db28be2d005b83d23fa6d20ace128 100644
--- a/configs/hrnet/README.md
+++ b/configs/hrnet/README.md
@@ -30,5 +30,5 @@
 
 | Backbone                | Type           | Image/gpu | Lr schd | Inf time (fps) | Box AP | Mask AP |                           Download                           | Configs |
 | :---------------------- | :------------- | :-------: | :-----: | :------------: | :----: | :-----: | :----------------------------------------------------------: | :-----: |
-| HRNetV2p_W18            | Faster         |     1     |   1x    |    -      |  36.8  |    -    | [model](https://paddledet.bj.bcebos.com/models/faster_rcnn_hrnetv2p_w18_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/hrnet/faster_rcnn_hrnetv2p_w18_1x_coco.yml) |
-| HRNetV2p_W18            | Faster         |     1     |   2x    |    -      |  39.0  |    -    | [model](https://paddledet.bj.bcebos.com/models/faster_rcnn_hrnetv2p_w18_2x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/hrnet/faster_rcnn_hrnetv2p_w18_2x_coco.yml) |
+| HRNetV2p_W18            | Faster         |     1     |   1x    |    -      |  36.8  |    -    | [model](https://paddledet.bj.bcebos.com/models/faster_rcnn_hrnetv2p_w18_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/hrnet/faster_rcnn_hrnetv2p_w18_1x_coco.yml) |
+| HRNetV2p_W18            | Faster         |     1     |   2x    |    -      |  39.0  |    -    | [model](https://paddledet.bj.bcebos.com/models/faster_rcnn_hrnetv2p_w18_2x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/hrnet/faster_rcnn_hrnetv2p_w18_2x_coco.yml) |
diff --git a/configs/hrnet/_base_/faster_rcnn_hrnetv2p_w18.yml b/configs/hrnet/_base_/faster_rcnn_hrnetv2p_w18.yml
index cf6645d1e338ca303876fed5c5f480cb505eeca6..6c556f306fdc2ea5bd320376236143984f4cba6a 100644
--- a/configs/hrnet/_base_/faster_rcnn_hrnetv2p_w18.yml
+++ b/configs/hrnet/_base_/faster_rcnn_hrnetv2p_w18.yml
@@ -57,7 +57,7 @@ BBoxAssigner:
   use_random: True
 
 TwoFCHead:
-  mlp_dim: 1024
+  out_channel: 1024
 
 BBoxPostProcess:
   decode: RCNNBox
diff --git a/configs/mask_rcnn/README.md b/configs/mask_rcnn/README.md
index 4b98ada650426034f36932540efcbd07f48ab6c1..89f7f8eef2a1b4c1415779cefb7d602c25f797ea 100644
--- a/configs/mask_rcnn/README.md
+++ b/configs/mask_rcnn/README.md
@@ -4,18 +4,19 @@
 
 | 骨架网络              | 网络类型       | 每张GPU图片个数 | 学习率策略 |推理时间(fps) | Box AP | Mask AP |                           下载                          | 配置文件 |
 | :------------------- | :------------| :-----: | :-----: | :------------: | :-----: | :-----: | :-----------------------------------------------------: | :-----: |
-| ResNet50             | Mask         |    1    |   1x    |     ----     |  37.4  |    32.8    | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/mask_rcnn/mask_rcnn_r50_1x_coco.yml) |
-| ResNet50             | Mask         |    1    |   2x    |     ----     |  39.7  |    34.5    | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/mask_rcnn/mask_rcnn_r50_2x_coco.yml) |
-| ResNet50-FPN         | Mask         |    1    |   1x    |     ----     |  39.2  |    35.6    | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml) |
-| ResNet50-FPN         | Mask         |    1    |   2x    |     ----     |  40.5  |    36.7    | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_fpn_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/mask_rcnn/mask_rcnn_r50_fpn_2x_coco.yml) |
-| ResNet50-vd-FPN         | Mask         |    1    |   1x    |     ----     |  40.3  |    36.4    | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_vd_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/mask_rcnn/mask_rcnn_r50_vd_fpn_1x_coco.yml) |
-| ResNet50-vd-FPN         | Mask         |    1    |   2x    |     ----     |  41.4  |    37.5    | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_vd_fpn_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/mask_rcnn/mask_rcnn_r50_vd_fpn_2x_coco.yml) |
-| ResNet101-FPN         | Mask         |    1    |   1x    |     ----     |  40.6  |    36.6    | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_r101_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/mask_rcnn/mask_rcnn_r101_fpn_1x_coco.yml) |
-| ResNet101-vd-FPN         | Mask         |    1    |   1x    |     ----     |  42.4  |    38.1    | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_r101_vd_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/mask_rcnn/mask_rcnn_r101_vd_fpn_1x_coco.yml) |
-| ResNeXt101-vd-FPN        | Mask         |    1    |   1x    |     ----     |  44.0  |    39.5   | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_x101_vd_64x4d_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/mask_rcnn/mask_rcnn_x101_vd_64x4d_fpn_1x_coco.yml) |
-| ResNeXt101-vd-FPN        | Mask         |    1    |   2x    |     ----     |  44.6  |    39.8   | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_x101_vd_64x4d_fpn_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/mask_rcnn/mask_rcnn_x101_vd_64x4d_fpn_2x_coco.yml) |
+| ResNet50             | Mask         |    1    |   1x    |     ----     |  37.4  |    32.8    | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/mask_rcnn/mask_rcnn_r50_1x_coco.yml) |
+| ResNet50             | Mask         |    1    |   2x    |     ----     |  39.7  |    34.5    | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/mask_rcnn/mask_rcnn_r50_2x_coco.yml) |
+| ResNet50-FPN         | Mask         |    1    |   1x    |     ----     |  39.2  |    35.6    | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml) |
+| ResNet50-FPN         | Mask         |    1    |   2x    |     ----     |  40.5  |    36.7    | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_fpn_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/mask_rcnn/mask_rcnn_r50_fpn_2x_coco.yml) |
+| ResNet50-vd-FPN         | Mask         |    1    |   1x    |     ----     |  40.3  |    36.4    | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_vd_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/mask_rcnn/mask_rcnn_r50_vd_fpn_1x_coco.yml) |
+| ResNet50-vd-FPN         | Mask         |    1    |   2x    |     ----     |  41.4  |    37.5    | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_vd_fpn_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/mask_rcnn/mask_rcnn_r50_vd_fpn_2x_coco.yml) |
+| ResNet101-FPN         | Mask         |    1    |   1x    |     ----     |  40.6  |    36.6    | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_r101_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/mask_rcnn/mask_rcnn_r101_fpn_1x_coco.yml) |
+| ResNet101-vd-FPN         | Mask         |    1    |   1x    |     ----     |  42.4  |    38.1    | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_r101_vd_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/mask_rcnn/mask_rcnn_r101_vd_fpn_1x_coco.yml) |
+| ResNeXt101-vd-FPN        | Mask         |    1    |   1x    |     ----     |  44.0  |    39.5   | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_x101_vd_64x4d_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/mask_rcnn/mask_rcnn_x101_vd_64x4d_fpn_1x_coco.yml) |
+| ResNeXt101-vd-FPN        | Mask         |    1    |   2x    |     ----     |  44.6  |    39.8   | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_x101_vd_64x4d_fpn_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/mask_rcnn/mask_rcnn_x101_vd_64x4d_fpn_2x_coco.yml) |
+| ResNet50-vd-SSLDv2-FPN   | Mask       |    1    |   1x    |     ----     |  42.0  |    38.2    | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_vd_fpn_ssld_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/mask_rcnn/mask_rcnn_r50_vd_fpn_ssld_1x_coco.yml) |
+| ResNet50-vd-SSLDv2-FPN   | Mask       |    1    |   2x    |     ----     |  42.7  |    38.9    | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/mask_rcnn/mask_rcnn_r50_vd_fpn_ssld_2x_coco.yml) |
 
-**注意：** Mask R-CNN模型精度依赖Paddle develop分支修改，精度复现须使用[每日版本](https://www.paddlepaddle.org.cn/documentation/docs/zh/install/Tables.html#whl-dev)或2.0.1版本(将于2021.03发布)，使用Paddle 2.0.0版本会有少量精度损失。
 
 ## Citations
 ```
diff --git a/configs/mask_rcnn/_base_/mask_rcnn_r50.yml b/configs/mask_rcnn/_base_/mask_rcnn_r50.yml
index aa6e0db56920c15099a05f544d92ab3e3250b7a1..04dab63701171ada046b60e687422e06f8043c26 100644
--- a/configs/mask_rcnn/_base_/mask_rcnn_r50.yml
+++ b/configs/mask_rcnn/_base_/mask_rcnn_r50.yml
@@ -78,7 +78,7 @@ MaskHead:
 
 MaskFeat:
   num_convs: 0
-  out_channels: 256
+  out_channel: 256
 
 MaskAssigner:
   mask_resolution: 14
diff --git a/configs/mask_rcnn/_base_/mask_rcnn_r50_fpn.yml b/configs/mask_rcnn/_base_/mask_rcnn_r50_fpn.yml
index 74004b281096346e7127950bb6022a7cd55e90ba..dd7587669661a9e24431a167835ef89527f5e0c8 100644
--- a/configs/mask_rcnn/_base_/mask_rcnn_r50_fpn.yml
+++ b/configs/mask_rcnn/_base_/mask_rcnn_r50_fpn.yml
@@ -61,7 +61,7 @@ BBoxAssigner:
   use_random: True
 
 TwoFCHead:
-  mlp_dim: 1024
+  out_channel: 1024
 
 BBoxPostProcess:
   decode: RCNNBox
@@ -82,7 +82,7 @@ MaskHead:
 
 MaskFeat:
   num_convs: 4
-  out_channels: 256
+  out_channel: 256
 
 MaskAssigner:
   mask_resolution: 28
diff --git a/configs/mask_rcnn/_base_/mask_reader.yml b/configs/mask_rcnn/_base_/mask_reader.yml
index 22ef9f44b5b7a9d97637c130e5f0cda9f658d846..b43d312b82e00903980b686c0cbd4017c2fa8f80 100644
--- a/configs/mask_rcnn/_base_/mask_reader.yml
+++ b/configs/mask_rcnn/_base_/mask_reader.yml
@@ -7,7 +7,7 @@ TrainReader:
   - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
   - Permute: {}
   batch_transforms:
-  - PadBatch: {pad_to_stride: -1., pad_gt: true}
+  - PadBatch: {pad_to_stride: -1, pad_gt: true}
   batch_size: 1
   shuffle: true
   drop_last: true
@@ -20,7 +20,7 @@ EvalReader:
   - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
   - Permute: {}
   batch_transforms:
-  - PadBatch: {pad_to_stride: -1., pad_gt: false}
+  - PadBatch: {pad_to_stride: -1}
   batch_size: 1
   shuffle: false
   drop_last: false
@@ -34,7 +34,7 @@ TestReader:
   - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
   - Permute: {}
   batch_transforms:
-  - PadBatch: {pad_to_stride: -1., pad_gt: false}
+  - PadBatch: {pad_to_stride: -1}
   batch_size: 1
   shuffle: false
   drop_last: false
diff --git a/configs/mask_rcnn/mask_rcnn_r50_vd_fpn_ssld_1x_coco.yml b/configs/mask_rcnn/mask_rcnn_r50_vd_fpn_ssld_1x_coco.yml
new file mode 100644
index 0000000000000000000000000000000000000000..c5718a8d277d442081d91e89787be16c90b5e01a
--- /dev/null
+++ b/configs/mask_rcnn/mask_rcnn_r50_vd_fpn_ssld_1x_coco.yml
@@ -0,0 +1,29 @@
+_BASE_: [
+  '../datasets/coco_instance.yml',
+  '../runtime.yml',
+  '_base_/optimizer_1x.yml',
+  '_base_/mask_rcnn_r50_fpn.yml',
+  '_base_/mask_fpn_reader.yml',
+]
+pretrain_weights:  https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_v2_pretrained.pdparams
+weights: output/mask_rcnn_r50_vd_fpn_ssld_1x_coco/model_final
+
+ResNet:
+  depth: 50
+  variant: d
+  norm_type: bn
+  freeze_at: 0
+  return_idx: [0,1,2,3]
+  num_stages: 4
+  lr_mult_list: [0.05, 0.05, 0.1, 0.15]
+
+epoch: 12
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [8, 11]
+  - !LinearWarmup
+    start_factor: 0.1
+    steps: 1000
diff --git a/configs/mask_rcnn/mask_rcnn_r50_vd_fpn_ssld_2x_coco.yml b/configs/mask_rcnn/mask_rcnn_r50_vd_fpn_ssld_2x_coco.yml
new file mode 100644
index 0000000000000000000000000000000000000000..65b31e6f18d9795db1758b651eccef5969b1f74c
--- /dev/null
+++ b/configs/mask_rcnn/mask_rcnn_r50_vd_fpn_ssld_2x_coco.yml
@@ -0,0 +1,29 @@
+_BASE_: [
+  '../datasets/coco_instance.yml',
+  '../runtime.yml',
+  '_base_/optimizer_1x.yml',
+  '_base_/mask_rcnn_r50_fpn.yml',
+  '_base_/mask_fpn_reader.yml',
+]
+pretrain_weights:  https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_v2_pretrained.pdparams
+weights: output/mask_rcnn_r50_vd_fpn_ssld_2x_coco/model_final
+
+ResNet:
+  depth: 50
+  variant: d
+  norm_type: bn
+  freeze_at: 0
+  return_idx: [0,1,2,3]
+  num_stages: 4
+  lr_mult_list: [0.05, 0.05, 0.1, 0.15]
+
+epoch: 24
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [12, 22]
+  - !LinearWarmup
+    start_factor: 0.1
+    steps: 1000
diff --git a/configs/mask_rcnn/mask_rcnn_x101_vd_64x4d_fpn_1x_coco.yml b/configs/mask_rcnn/mask_rcnn_x101_vd_64x4d_fpn_1x_coco.yml
index c318e1d2a8885a6fae920b95efbb3971b271bccd..238750294f9783a59ca6ff9f8bdcb4799865f5fe 100644
--- a/configs/mask_rcnn/mask_rcnn_x101_vd_64x4d_fpn_1x_coco.yml
+++ b/configs/mask_rcnn/mask_rcnn_x101_vd_64x4d_fpn_1x_coco.yml
@@ -11,7 +11,6 @@ ResNet:
   variant: d
   groups: 64
   base_width: 4
-  base_channels: 64
   norm_type: bn
   freeze_at: 0
   return_idx: [0,1,2,3]
diff --git a/configs/mask_rcnn/mask_rcnn_x101_vd_64x4d_fpn_2x_coco.yml b/configs/mask_rcnn/mask_rcnn_x101_vd_64x4d_fpn_2x_coco.yml
index 501c618a690e73f004cc04f9abd8f5dc2d493ff6..6a0d0f789972b8f11fc04475b69726d42f150746 100644
--- a/configs/mask_rcnn/mask_rcnn_x101_vd_64x4d_fpn_2x_coco.yml
+++ b/configs/mask_rcnn/mask_rcnn_x101_vd_64x4d_fpn_2x_coco.yml
@@ -11,7 +11,6 @@ ResNet:
   variant: d
   groups: 64
   base_width: 4
-  base_channels: 64
   norm_type: bn
   freeze_at: 0
   return_idx: [0,1,2,3]
diff --git a/configs/pedestrian/README.md b/configs/pedestrian/README.md
index fc7100b95ffe3328023653f54057bd4f36078cb2..b5b91249e151e414a2d9a06bccd6a46e4d9324d9 100644
--- a/configs/pedestrian/README.md
+++ b/configs/pedestrian/README.md
@@ -5,7 +5,7 @@ We provide some models implemented by PaddlePaddle to detect objects in specific
 
 | Task                 | Algorithm | Box AP | Download                                                                                | Configs |
 |:---------------------|:---------:|:------:| :-------------------------------------------------------------------------------------: |:------:|
-| Pedestrian Detection |  YOLOv3  |  51.8  | [model](https://paddledet.bj.bcebos.com/models/pedestrian_yolov3_darknet.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/pedestrian/pedestrian_yolov3_darknet.yml) |
+| Pedestrian Detection |  YOLOv3  |  51.8  | [model](https://paddledet.bj.bcebos.com/models/pedestrian_yolov3_darknet.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/pedestrian/pedestrian_yolov3_darknet.yml) |
 
 ## Pedestrian Detection
 
@@ -17,7 +17,7 @@ The network for detecting vehicles is YOLOv3, the backbone of which is Dacknet53
 
 ### 2. Configuration for training
 
-PaddleDetection provides users with a configuration file [yolov3_darknet53_270e_coco.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/master/dygraph/configs/yolov3/yolov3_darknet53_270e_coco.yml) to train YOLOv3 on the COCO dataset, compared with this file, we modify some parameters as followed to conduct the training for pedestrian detection:
+PaddleDetection provides users with a configuration file [yolov3_darknet53_270e_coco.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.0/configs/yolov3/yolov3_darknet53_270e_coco.yml) to train YOLOv3 on the COCO dataset, compared with this file, we modify some parameters as followed to conduct the training for pedestrian detection:
 
 * num_classes: 1
 * dataset_dir: dataset/pedestrian
@@ -45,6 +45,6 @@ python -u tools/infer.py -c configs/pedestrian/pedestrian_yolov3_darknet.yml \
 
 Some inference results are visualized below:
 
-![](https://github.com/PaddlePaddle/PaddleDetection/tree/master/docs/images/PedestrianDetection_001.png)
+![](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/static/docs/images/PedestrianDetection_001.png)
 
-![](https://github.com/PaddlePaddle/PaddleDetection/tree/master/docs/images/PedestrianDetection_004.png)
+![](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/static/docs/images/PedestrianDetection_004.png)
diff --git a/configs/pedestrian/README_cn.md b/configs/pedestrian/README_cn.md
index 112c55806153ee4e6e556e977da2395e6a18181b..3456670f6084111f30fddfc56cdf0cfe08b982fa 100644
--- a/configs/pedestrian/README_cn.md
+++ b/configs/pedestrian/README_cn.md
@@ -5,7 +5,7 @@
 
 | 任务                 | 算法 | 精度(Box AP) | 下载                                                                                | 配置文件 |
 |:---------------------|:---------:|:------:| :---------------------------------------------------------------------------------: | :------:|
-| 行人检测 |  YOLOv3  |  51.8  | [下载链接](https://paddledet.bj.bcebos.com/models/pedestrian_yolov3_darknet.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/pedestrian/pedestrian_yolov3_darknet.yml) |
+| 行人检测 |  YOLOv3  |  51.8  | [下载链接](https://paddledet.bj.bcebos.com/models/pedestrian_yolov3_darknet.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/pedestrian/pedestrian_yolov3_darknet.yml) |
 
 ## 行人检测（Pedestrian Detection）
 
@@ -18,7 +18,7 @@ Backbone为Dacknet53的YOLOv3。
 
 ### 2. 训练参数配置
 
-PaddleDetection提供了使用COCO数据集对YOLOv3进行训练的参数配置文件[yolov3_darknet53_270e_coco.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/master/dygraph/configs/yolov3/yolov3_darknet53_270e_coco.yml)，与之相比，在进行行人检测的模型训练时，我们对以下参数进行了修改：
+PaddleDetection提供了使用COCO数据集对YOLOv3进行训练的参数配置文件[yolov3_darknet53_270e_coco.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.0/configs/yolov3/yolov3_darknet53_270e_coco.yml)，与之相比，在进行行人检测的模型训练时，我们对以下参数进行了修改：
 
 * num_classes: 1
 * dataset_dir: dataset/pedestrian
@@ -46,6 +46,6 @@ python -u tools/infer.py -c configs/pedestrian/pedestrian_yolov3_darknet.yml \
 
 预测结果示例：
 
-![](../../../docs/images/PedestrianDetection_001.png)
+![](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/static/docs/images/PedestrianDetection_001.png)
 
-![](../../../docs/images/PedestrianDetection_004.png)
+![](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/static/docs/images/PedestrianDetection_004.png)
diff --git a/configs/ppyolo/README.md b/configs/ppyolo/README.md
index 6917e694522c2f5443c7ce74c26a4b2154acdeb5..31432290183ddf950b5a7b76ca0c0d71413d5a65 100644
--- a/configs/ppyolo/README.md
+++ b/configs/ppyolo/README.md
@@ -11,9 +11,9 @@ English | [简体中文](README_cn.md)
 
 ## Introduction
 
-[PP-YOLO](https://arxiv.org/abs/2007.12099) is a optimized model based on YOLOv3 in PaddleDetection，whose performance(mAP on COCO) and inference spped are better than [YOLOv4](https://arxiv.org/abs/2004.10934)，PaddlePaddle 2.0.0rc1(available on pip now) or [Daily Version](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/install/Tables.html#whl-release) is required to run this PP-YOLO。
+[PP-YOLO](https://arxiv.org/abs/2007.12099) is a optimized model based on YOLOv3 in PaddleDetection，whose performance(mAP on COCO) and inference speed are better than [YOLOv4](https://arxiv.org/abs/2004.10934)，PaddlePaddle 2.0.0rc1(available on pip now) or [Daily Version](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/install/Tables.html#whl-release) is required to run this PP-YOLO。
 
-PP-YOLO reached mmAP(IoU=0.5:0.95) as 45.9% on COCO test-dev2017 dataset, and inference speed of FP32 on single V100 is 72.9 FPS, inference speed of FP16 with TensorRT on single V100 is 155.6 FPS.
+PP-YOLO reached mAP(IoU=0.5:0.95) as 45.9% on COCO test-dev2017 dataset, and inference speed of FP32 on single V100 is 72.9 FPS, inference speed of FP16 with TensorRT on single V100 is 155.6 FPS.
 
 <div align="center">
   <img src="../../../docs/images/ppyolo_map_fps.png" width=500 />
@@ -38,22 +38,25 @@ PP-YOLO improved performance and speed of YOLOv3 with following methods:
 
 |          Model           | GPU number | images/GPU |  backbone  | input shape | Box AP<sup>val</sup> | Box AP<sup>test</sup> | V100 FP32(FPS) | V100 TensorRT FP16(FPS) | download | config  |
 |:------------------------:|:-------:|:-------------:|:----------:| :-------:| :------------------: | :-------------------: | :------------: | :---------------------: | :------: | :------: |
-| PP-YOLO                  |     8      |     24     | ResNet50vd |     608     |         44.8         |         45.2          |      72.9      |          155.6          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml)                   |
-| PP-YOLO                  |     8      |     24     | ResNet50vd |     512     |         43.9         |         44.4          |      89.9      |          188.4          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml)                   |
-| PP-YOLO                  |     8      |     24     | ResNet50vd |     416     |         42.1         |         42.5          |      109.1      |          215.4          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml)                   |
-| PP-YOLO                  |     8      |     24     | ResNet50vd |     320     |         38.9         |         39.3          |      132.2      |          242.2          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml)                   |
-| PP-YOLO_2x               |     8      |     24     | ResNet50vd |     608     |         45.3         |         45.9          |      72.9      |          155.6          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_2x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/ppyolo/ppyolo_r50vd_dcn_2x_coco.yml)                   |
-| PP-YOLO_2x               |     8      |     24     | ResNet50vd |     512     |         44.4         |         45.0          |      89.9      |          188.4          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_2x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/ppyolo/ppyolo_r50vd_dcn_2x_coco.yml)                   |
-| PP-YOLO_2x               |     8      |     24     | ResNet50vd |     416     |         42.7         |         43.2          |      109.1      |          215.4          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_2x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/ppyolo/ppyolo_r50vd_dcn_2x_coco.yml)                   |
-| PP-YOLO_2x               |     8      |     24     | ResNet50vd |     320     |         39.5         |         40.1          |      132.2      |          242.2          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_2x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/ppyolo/ppyolo_r50vd_dcn_2x_coco.yml)                   |
-| PP-YOLO_ResNet18vd               |     4      |     32     | ResNet18vd |     512     |         29.2         |         29.5          |      357.1      |          657.9          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r18vd_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/ppyolo/ppyolo_r18vd_coco.yml)                   |
-| PP-YOLO_ResNet18vd               |     4      |     32     | ResNet18vd |     416     |         28.6         |         28.9          |      409.8      |          719.4          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r18vd_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/ppyolo/ppyolo_r18vd_coco.yml)                   |
-| PP-YOLO_ResNet18vd               |     4      |     32     | ResNet18vd |     320     |         26.2         |         26.4          |      480.7      |          763.4          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r18vd_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/ppyolo/ppyolo_r18vd_coco.yml)                   |
+| PP-YOLO                  |     8      |     24     | ResNet50vd |     608     |         44.8         |         45.2          |      72.9      |          155.6          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml)                   |
+| PP-YOLO                  |     8      |     24     | ResNet50vd |     512     |         43.9         |         44.4          |      89.9      |          188.4          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml)                   |
+| PP-YOLO                  |     8      |     24     | ResNet50vd |     416     |         42.1         |         42.5          |      109.1      |          215.4          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml)                   |
+| PP-YOLO                  |     8      |     24     | ResNet50vd |     320     |         38.9         |         39.3          |      132.2      |          242.2          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml)                   |
+| PP-YOLO_2x               |     8      |     24     | ResNet50vd |     608     |         45.3         |         45.9          |      72.9      |          155.6          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_2x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ppyolo/ppyolo_r50vd_dcn_2x_coco.yml)                   |
+| PP-YOLO_2x               |     8      |     24     | ResNet50vd |     512     |         44.4         |         45.0          |      89.9      |          188.4          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_2x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ppyolo/ppyolo_r50vd_dcn_2x_coco.yml)                   |
+| PP-YOLO_2x               |     8      |     24     | ResNet50vd |     416     |         42.7         |         43.2          |      109.1      |          215.4          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_2x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ppyolo/ppyolo_r50vd_dcn_2x_coco.yml)                   |
+| PP-YOLO_2x               |     8      |     24     | ResNet50vd |     320     |         39.5         |         40.1          |      132.2      |          242.2          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_2x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ppyolo/ppyolo_r50vd_dcn_2x_coco.yml)                   |
+| PP-YOLO               |     4      |     32     | ResNet18vd |     512     |         29.2         |         29.5          |      357.1      |          657.9          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r18vd_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ppyolo/ppyolo_r18vd_coco.yml)                   |
+| PP-YOLO               |     4      |     32     | ResNet18vd |     416     |         28.6         |         28.9          |      409.8      |          719.4          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r18vd_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ppyolo/ppyolo_r18vd_coco.yml)                   |
+| PP-YOLO               |     4      |     32     | ResNet18vd |     320     |         26.2         |         26.4          |      480.7      |          763.4          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r18vd_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ppyolo/ppyolo_r18vd_coco.yml)                   |
+| PP-YOLOv2               |     8      |     12     | ResNet50vd |     640     |         49.1         |         49.5          |      68.9      |          106.5          | [model](https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml)                   |
+| PP-YOLOv2               |     8      |     12     | ResNet101vd |     640     |         49.7         |         50.3          |     49.5     |         87.0         | [model](https://paddledet.bj.bcebos.com/models/ppyolov2_r101vd_dcn_365e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ppyolo/ppyolov2_r101vd_dcn_365e_coco.yml)                   |
+
 
 **Notes:**
 
 - PP-YOLO is trained on COCO train2017 dataset and evaluated on val2017 & test-dev2017 dataset，Box AP<sup>test</sup> is evaluation results of `mAP(IoU=0.5:0.95)`.
-- PP-YOLO used 8 GPUs for training and mini-batch size as 24 on each GPU, if GPU number and mini-batch size is changed, learning rate and iteration times should be adjusted according [FAQ](https://github.com/PaddlePaddle/PaddleDetection/blob/master/docs/FAQ.md).
+- PP-YOLO used 8 GPUs for training and mini-batch size as 24 on each GPU, if GPU number and mini-batch size is changed, learning rate and iteration times should be adjusted according [FAQ](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.0/static/docs/FAQ.md).
 - PP-YOLO inference speed is tesed on single Tesla V100 with batch size as 1, CUDA 10.2, CUDNN 7.5.1, TensorRT 5.1.2.2 in TensorRT mode.
 - PP-YOLO FP32 inference speed testing uses inference model exported by `tools/export_model.py` and benchmarked by running `depoly/python/infer.py` with `--run_benchmark`. All testing results do not contains the time cost of data reading and post-processing(NMS), which is same as [YOLOv4(AlexyAB)](https://github.com/AlexeyAB/darknet) in testing method.
 - TensorRT FP16 inference speed testing exclude the time cost of bounding-box decoding(`yolo_box`) part comparing with FP32 testing above, which means that data reading, bounding-box decoding and post-processing(NMS) is excluded(test method same as [YOLOv4(AlexyAB)](https://github.com/AlexeyAB/darknet) too)
@@ -62,24 +65,38 @@ PP-YOLO improved performance and speed of YOLOv3 with following methods:
 
 |            Model             | GPU number | images/GPU | Model Size | input shape | Box AP<sup>val</sup> |  Box AP50<sup>val</sup> | Kirin 990 1xCore(FPS) | download | config  |
 |:----------------------------:|:-------:|:-------------:|:----------:| :-------:| :------------------: |  :--------------------: | :--------------------: | :------: | :------: |
-| PP-YOLO_MobileNetV3_large    |    4    |      32       |    28MB    |   320    |         23.2         |           42.6          |           14.1         | [model](https://paddledet.bj.bcebos.com/models/ppyolo_mbv3_large_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/ppyolo/ppyolo_mbv3_large_coco.yml)                   |
-| PP-YOLO_MobileNetV3_small    |    4    |      32       |    16MB    |   320    |         17.2         |           33.8          |           21.5         | [model](https://paddledet.bj.bcebos.com/models/ppyolo_mbv3_small_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/ppyolo/ppyolo_mbv3_small_coco.yml)                   |
+| PP-YOLO_MobileNetV3_large    |    4    |      32       |    28MB    |   320    |         23.2         |           42.6          |           14.1         | [model](https://paddledet.bj.bcebos.com/models/ppyolo_mbv3_large_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ppyolo/ppyolo_mbv3_large_coco.yml)                   |
+| PP-YOLO_MobileNetV3_small    |    4    |      32       |    16MB    |   320    |         17.2         |           33.8          |           21.5         | [model](https://paddledet.bj.bcebos.com/models/ppyolo_mbv3_small_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ppyolo/ppyolo_mbv3_small_coco.yml)                   |
 
 **Notes:**
 
-- PP-YOLO_MobileNetV3 is trained on COCO train2017 datast and evaluated on val2017 dataset，Box AP<sup>val</sup> is evaluation results of `mAP(IoU=0.5:0.95)`, Box AP<sup>val</sup> is evaluation results of `mAP(IoU=0.5)`.
-- PP-YOLO_MobileNetV3 used 4 GPUs for training and mini-batch size as 32 on each GPU, if GPU number and mini-batch size is changed, learning rate and iteration times should be adjusted according [FAQ](https://github.com/PaddlePaddle/PaddleDetection/blob/master/docs/FAQ.md).
+- PP-YOLO_MobileNetV3 is trained on COCO train2017 datast and evaluated on val2017 dataset，Box AP<sup>val</sup> is evaluation results of `mAP(IoU=0.5:0.95)`, Box AP50<sup>val</sup> is evaluation results of `mAP(IoU=0.5)`.
+- PP-YOLO_MobileNetV3 used 4 GPUs for training and mini-batch size as 32 on each GPU, if GPU number and mini-batch size is changed, learning rate and iteration times should be adjusted according [FAQ](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.0/static/docs/FAQ.md).
 - PP-YOLO_MobileNetV3 inference speed is tested on Kirin 990 with 1 thread.
 
+### PP-YOLO tiny
+
+|            Model             | GPU number | images/GPU | Model Size | Post Quant Model Size | input shape | Box AP<sup>val</sup> | Kirin 990 4xCore(FPS) | download | config | post quant model |
+|:----------------------------:|:-------:|:-------------:|:----------:| :-------------------: | :---------: | :------------------: | :-------------------: | :------: | :----: | :--------------: |
+| PP-YOLO tiny                 |    8    |      32       |   4.2MB    |       **1.3M**        |     320     |         20.6         |          92.3         | [model](https://paddledet.bj.bcebos.com/models/ppyolo_tiny_650e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ppyolo/ppyolo_tiny_650e_coco.yml)  | [inference model](https://paddledet.bj.bcebos.com/models/ppyolo_tiny_quant.tar) |
+| PP-YOLO tiny                 |    8    |      32       |   4.2MB    |       **1.3M**        |     416     |         22.7         |          65.4         | [model](https://paddledet.bj.bcebos.com/models/ppyolo_tiny_650e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ppyolo/ppyolo_tiny_650e_coco.yml)  | [inference model](https://paddledet.bj.bcebos.com/models/ppyolo_tiny_quant.tar) |
+
+**Notes:**
+
+- PP-YOLO-tiny is trained on COCO train2017 datast and evaluated on val2017 dataset，Box AP<sup>val</sup> is evaluation results of `mAP(IoU=0.5:0.95)`.
+- PP-YOLO-tiny used 8 GPUs for training and mini-batch size as 32 on each GPU, if GPU number and mini-batch size is changed, learning rate and iteration times should be adjusted according [FAQ](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.0/static/docs/FAQ.md).
+- PP-YOLO-tiny inference speed is tested on Kirin 990 with 4 threads by arm8
+- we alse provide PP-YOLO-tiny post quant inference model, which can compress model to **1.3MB** with nearly no inference on inference speed and performance
+
 ### PP-YOLO on Pascal VOC
 
 PP-YOLO trained on Pascal VOC dataset as follows:
 
 |       Model        | GPU number | images/GPU |  backbone  | input shape | Box AP50<sup>val</sup> | download | config  |
 |:------------------:|:----------:|:----------:|:----------:| :----------:| :--------------------: | :------: | :-----: |
-| PP-YOLO            |    8    |       12      | ResNet50vd |     608     |          84.9          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/ppyolo/ppyolo_r50vd_dcn_voc.yml)                   |
-| PP-YOLO            |    8    |       12      | ResNet50vd |     416     |          84.3          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/ppyolo/ppyolo_r50vd_dcn_voc.yml)                   |
-| PP-YOLO            |    8    |       12      | ResNet50vd |     320     |          82.2          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/ppyolo/ppyolo_r50vd_dcn_voc.yml)                   |
+| PP-YOLO            |    8    |       12      | ResNet50vd |     608     |          84.9          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ppyolo/ppyolo_r50vd_dcn_voc.yml)                   |
+| PP-YOLO            |    8    |       12      | ResNet50vd |     416     |          84.3          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ppyolo/ppyolo_r50vd_dcn_voc.yml)                   |
+| PP-YOLO            |    8    |       12      | ResNet50vd |     320     |          82.2          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ppyolo/ppyolo_r50vd_dcn_voc.yml)                   |
 
 ## Getting Start
 
@@ -91,6 +108,12 @@ Training PP-YOLO on 8 GPUs with following command(all commands should be run und
 python -m paddle.distributed.launch --log_dir=./ppyolo_dygraph/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml &>ppyolo_dygraph.log 2>&1 &
 ```
 
+optional: Run `tools/anchor_cluster.py` to get anchors suitable for your dataset, and modify the anchor setting in model configuration file and reader configuration file, such as `configs/ppyolo/_base_/ppyolo_tiny.yml` and `configs/ppyolo/_base_/ppyolo_tiny_reader.yml`.
+
+``` bash
+python tools/anchor_cluster.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml -n 9 -s 320 -m v2 -i 1000
+```
+
 ### 2. Evaluation
 
 Evaluating PP-YOLO on COCO val2017 dataset in single GPU with following commands:
@@ -115,7 +138,27 @@ CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ppyolo/ppyolo_test.yml -o
 
 Evaluation results will be saved in `bbox.json`, compress it into a `zip` package and upload to [COCO dataset evaluation](https://competitions.codalab.org/competitions/20794#participate) to evaluate.
 
-**NOTE:** `configs/ppyolo/ppyolo_test.yml` is only used for evaluation on COCO test-dev2017 dataset, could not be used for training or COCO val2017 dataset evaluating.
+**NOTE 1:** `configs/ppyolo/ppyolo_test.yml` is only used for evaluation on COCO test-dev2017 dataset, could not be used for training or COCO val2017 dataset evaluating.
+
+**NOTE 2:** Due to the overall upgrade of the dynamic graph framework, the following weight models published by paddledetection need to be evaluated by adding the -- bias field, such as
+
+```bash
+# use weights released in PaddleDetection model zoo
+CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams --bias
+```
+These models are:
+
+1.ppyolo_r50vd_dcn_1x_coco
+
+2.ppyolo_r50vd_dcn_voc
+
+3.ppyolo_r18vd_coco
+
+4.ppyolo_mbv3_large_coco
+
+5.ppyolo_mbv3_small_coco
+
+6.ppyolo_tiny_650e_coco
 
 ### 3. Inference
 
@@ -123,10 +166,10 @@ Inference images in single GPU with following commands, use `--infer_img` to inf
 
 ```bash
 # inference single image
-CUDA_VISIBLE_DEVICES=0 python tools/infer.py configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams --infer_img=../demo/000000014439_640x640.jpg
+CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams --infer_img=demo/000000014439_640x640.jpg
 
 # inference all images in the directory
-CUDA_VISIBLE_DEVICES=0 python tools/infer.py configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams --infer_dir=../demo
+CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams --infer_dir=demo
 ```
 
 ### 4. Inferece deployment
@@ -138,7 +181,7 @@ For inference deployment or benchmard, model exported with `tools/export_model.p
 python tools/export_model.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams
 
 # inference with Paddle Inference library
-CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyolo_r50vd_dcn_1x_coco --image_file=../demo/000000014439_640x640.jpg --use_gpu=True
+CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyolo_r50vd_dcn_1x_coco --image_file=demo/000000014439_640x640.jpg --use_gpu=True
 ```
 
 
@@ -170,4 +213,24 @@ Optimizing method and ablation experiments of PP-YOLO compared with YOLOv3.
 - Performance and inference spedd are measure with input shape as 608
 - All models are trained on COCO train2017 datast and evaluated on val2017 & test-dev2017 dataset，`Box AP` is evaluation results as `mAP(IoU=0.5:0.95)`.
 - Inference speed is tested on single Tesla V100 with batch size as 1 following test method and environment configuration in benchmark above.
-- [YOLOv3-DarkNet53](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/yolov3/yolov3_darknet53_270e_coco.yml) with mAP as 39.0 is optimized YOLOv3 model in PaddleDetection，see [Model Zoo](https://github.com/PaddlePaddle/PaddleDetection/blob/master/docs/MODEL_ZOO.md) for details.
+- [YOLOv3-DarkNet53](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.0/configs/yolov3/yolov3_darknet53_270e_coco.yml) with mAP as 39.0 is optimized YOLOv3 model in PaddleDetection，see [Model Zoo](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.0/docs/MODEL_ZOO_cn.md) for details.
+
+
+## Citation
+
+```
+@misc{long2020ppyolo,
+title={PP-YOLO: An Effective and Efficient Implementation of Object Detector},
+author={Xiang Long and Kaipeng Deng and Guanzhong Wang and Yang Zhang and Qingqing Dang and Yuan Gao and Hui Shen and Jianguo Ren and Shumin Han and Errui Ding and Shilei Wen},
+year={2020},
+eprint={2007.12099},
+archivePrefix={arXiv},
+primaryClass={cs.CV}
+}
+@misc{ppdet2019,
+title={PaddleDetection, Object detection and instance segmentation toolkit based on PaddlePaddle.},
+author={PaddlePaddle Authors},
+howpublished = {\url{https://github.com/PaddlePaddle/PaddleDetection}},
+year={2019}
+}
+```
diff --git a/configs/ppyolo/README_cn.md b/configs/ppyolo/README_cn.md
index 648d15fe40fe34144567e4a0da917c3790fb32cc..4e7c7bc73af45d412c50f4d9b19e71283c8d79d7 100644
--- a/configs/ppyolo/README_cn.md
+++ b/configs/ppyolo/README_cn.md
@@ -38,47 +38,62 @@ PP-YOLO从如下方面优化和提升YOLOv3模型的精度和速度：
 
 |          模型            | GPU个数 | 每GPU图片个数 |  骨干网络  | 输入尺寸 | Box AP<sup>val</sup> | Box AP<sup>test</sup> | V100 FP32(FPS) | V100 TensorRT FP16(FPS) | 模型下载 | 配置文件 |
 |:------------------------:|:-------:|:-------------:|:----------:| :-------:| :------------------: | :-------------------: | :------------: | :---------------------: | :------: | :------: |
-| PP-YOLO                  |     8      |     24     | ResNet50vd |     608     |         44.8         |         45.2          |      72.9      |          155.6          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml)                   |
-| PP-YOLO                  |     8      |     24     | ResNet50vd |     512     |         43.9         |         44.4          |      89.9      |          188.4          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml)                   |
-| PP-YOLO                  |     8      |     24     | ResNet50vd |     416     |         42.1         |         42.5          |      109.1      |          215.4          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml)                   |
-| PP-YOLO                  |     8      |     24     | ResNet50vd |     320     |         38.9         |         39.3          |      132.2      |          242.2          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml)                   |
-| PP-YOLO_2x               |     8      |     24     | ResNet50vd |     608     |         45.3         |         45.9          |      72.9      |          155.6          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_2x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/ppyolo/ppyolo_r50vd_dcn_2x_coco.yml)                   |
-| PP-YOLO_2x               |     8      |     24     | ResNet50vd |     512     |         44.4         |         45.0          |      89.9      |          188.4          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_2x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/ppyolo/ppyolo_r50vd_dcn_2x_coco.yml)                   |
-| PP-YOLO_2x               |     8      |     24     | ResNet50vd |     416     |         42.7         |         43.2          |      109.1      |          215.4          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_2x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/ppyolo/ppyolo_r50vd_dcn_2x_coco.yml)                   |
-| PP-YOLO_2x               |     8      |     24     | ResNet50vd |     320     |         39.5         |         40.1          |      132.2      |          242.2          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_2x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/ppyolo/ppyolo_r50vd_dcn_2x_coco.yml)                   |
-| PP-YOLO_ResNet18vd               |     4      |     32     | ResNet18vd |     512     |         29.2         |         29.5          |      357.1      |          657.9          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r18vd_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/ppyolo/ppyolo_r18vd_coco.yml)                   |
-| PP-YOLO_ResNet18vd               |     4      |     32     | ResNet18vd |     416     |         28.6         |         28.9          |      409.8      |          719.4          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r18vd_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/ppyolo/ppyolo_r18vd_coco.yml)                   |
-| PP-YOLO_ResNet18vd               |     4      |     32     | ResNet18vd |     320     |         26.2         |         26.4          |      480.7      |          763.4          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r18vd_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/ppyolo/ppyolo_r18vd_coco.yml)                   |
+| PP-YOLO                  |     8      |     24     | ResNet50vd |     608     |         44.8         |         45.2          |      72.9      |          155.6          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml)                   |
+| PP-YOLO                  |     8      |     24     | ResNet50vd |     512     |         43.9         |         44.4          |      89.9      |          188.4          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml)                   |
+| PP-YOLO                  |     8      |     24     | ResNet50vd |     416     |         42.1         |         42.5          |      109.1      |          215.4          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml)                   |
+| PP-YOLO                  |     8      |     24     | ResNet50vd |     320     |         38.9         |         39.3          |      132.2      |          242.2          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml)                   |
+| PP-YOLO_2x               |     8      |     24     | ResNet50vd |     608     |         45.3         |         45.9          |      72.9      |          155.6          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_2x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ppyolo/ppyolo_r50vd_dcn_2x_coco.yml)                   |
+| PP-YOLO_2x               |     8      |     24     | ResNet50vd |     512     |         44.4         |         45.0          |      89.9      |          188.4          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_2x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ppyolo/ppyolo_r50vd_dcn_2x_coco.yml)                   |
+| PP-YOLO_2x               |     8      |     24     | ResNet50vd |     416     |         42.7         |         43.2          |      109.1      |          215.4          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_2x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ppyolo/ppyolo_r50vd_dcn_2x_coco.yml)                   |
+| PP-YOLO_2x               |     8      |     24     | ResNet50vd |     320     |         39.5         |         40.1          |      132.2      |          242.2          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_2x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ppyolo/ppyolo_r50vd_dcn_2x_coco.yml)                   |
+| PP-YOLO               |     4      |     32     | ResNet18vd |     512     |         29.2         |         29.5          |      357.1      |          657.9          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r18vd_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ppyolo/ppyolo_r18vd_coco.yml)                   |
+| PP-YOLO               |     4      |     32     | ResNet18vd |     416     |         28.6         |         28.9          |      409.8      |          719.4          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r18vd_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ppyolo/ppyolo_r18vd_coco.yml)                   |
+| PP-YOLO               |     4      |     32     | ResNet18vd |     320     |         26.2         |         26.4          |      480.7      |          763.4          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r18vd_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ppyolo/ppyolo_r18vd_coco.yml)                   |
+| PP-YOLOv2               |     8      |     12     | ResNet50vd |     640     |         49.1         |         49.5          |      68.9      |          106.5          | [model](https://paddledet.bj.bcebos.com/models/ppyolov2_r50vd_dcn_365e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml)                   |
+| PP-YOLOv2               |     8      |     12     | ResNet101vd |     640     |         49.7         |         50.3          |     49.5     |         87.0         | [model](https://paddledet.bj.bcebos.com/models/ppyolov2_r101vd_dcn_365e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ppyolo/ppyolov2_r101vd_dcn_365e_coco.yml)                   |
 
 **注意:**
 
 - PP-YOLO模型使用COCO数据集中train2017作为训练集，使用val2017和test-dev2017作为测试集，Box AP<sup>test</sup>为`mAP(IoU=0.5:0.95)`评估结果。
-- PP-YOLO模型训练过程中使用8 GPUs，每GPU batch size为24进行训练，如训练GPU数和batch size不使用上述配置，须参考[FAQ](https://github.com/PaddlePaddle/PaddleDetection/blob/master/docs/FAQ.md)调整学习率和迭代次数。
+- PP-YOLO模型训练过程中使用8 GPUs，每GPU batch size为24进行训练，如训练GPU数和batch size不使用上述配置，须参考[FAQ](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.0/static/docs/FAQ.md)调整学习率和迭代次数。
 - PP-YOLO模型推理速度测试采用单卡V100，batch size=1进行测试，使用CUDA 10.2, CUDNN 7.5.1，TensorRT推理速度测试使用TensorRT 5.1.2.2。
 - PP-YOLO模型FP32的推理速度测试数据为使用`tools/export_model.py`脚本导出模型后，使用`deploy/python/infer.py`脚本中的`--run_benchnark`参数使用Paddle预测库进行推理速度benchmark测试结果, 且测试的均为不包含数据预处理和模型输出后处理(NMS)的数据(与[YOLOv4(AlexyAB)](https://github.com/AlexeyAB/darknet)测试方法一致)。
 - TensorRT FP16的速度测试相比于FP32去除了`yolo_box`(bbox解码)部分耗时，即不包含数据预处理，bbox解码和NMS(与[YOLOv4(AlexyAB)](https://github.com/AlexeyAB/darknet)测试方法一致)。
-- PP-YOLO模型推理速度测试采用单卡V100，batch size=1进行测试，使用CUDA 10.2, CUDNN 7.5.1，TensorRT推理速度测试使用TensorRT 5.1.2.2。
 
 ### PP-YOLO 轻量级模型
 
 |          模型                | GPU个数 | 每GPU图片个数 |  模型体积  | 输入尺寸 | Box AP<sup>val</sup> |  Box AP50<sup>val</sup> | Kirin 990 1xCore (FPS) | 模型下载 |  配置文件 |
 |:----------------------------:|:-------:|:-------------:|:----------:| :-------:| :------------------: |  :--------------------: | :--------------------: | :------: | :------: |
-| PP-YOLO_MobileNetV3_large    |    4    |      32       |    28MB    |   320    |         23.2         |           42.6          |           14.1         | [下载链接](https://paddledet.bj.bcebos.com/models/ppyolo_mbv3_large_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/ppyolo/ppyolo_mbv3_large_coco.yml)                   |
-| PP-YOLO_MobileNetV3_small    |    4    |      32       |    16MB    |   320    |         17.2         |           33.8          |           21.5         | [下载链接](https://paddledet.bj.bcebos.com/models/ppyolo_mbv3_small_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/ppyolo/ppyolo_mbv3_small_coco.yml)                   |
+| PP-YOLO_MobileNetV3_large    |    4    |      32       |    28MB    |   320    |         23.2         |           42.6          |           14.1         | [下载链接](https://paddledet.bj.bcebos.com/models/ppyolo_mbv3_large_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ppyolo/ppyolo_mbv3_large_coco.yml)                   |
+| PP-YOLO_MobileNetV3_small    |    4    |      32       |    16MB    |   320    |         17.2         |           33.8          |           21.5         | [下载链接](https://paddledet.bj.bcebos.com/models/ppyolo_mbv3_small_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ppyolo/ppyolo_mbv3_small_coco.yml)                   |
 
 - PP-YOLO_MobileNetV3 模型使用COCO数据集中train2017作为训练集，使用val2017作为测试集，Box AP<sup>val</sup>为`mAP(IoU=0.5:0.95)`评估结果, Box AP50<sup>val</sup>为`mAP(IoU=0.5)`评估结果。
-- PP-YOLO_MobileNetV3 模型训练过程中使用4GPU，每GPU batch size为32进行训练，如训练GPU数和batch size不使用上述配置，须参考[FAQ](https://github.com/PaddlePaddle/PaddleDetection/blob/master/docs/FAQ.md)调整学习率和迭代次数。
+- PP-YOLO_MobileNetV3 模型训练过程中使用4GPU，每GPU batch size为32进行训练，如训练GPU数和batch size不使用上述配置，须参考[FAQ](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.0/static/docs/FAQ.md)调整学习率和迭代次数。
 - PP-YOLO_MobileNetV3 模型推理速度测试环境配置为麒麟990芯片单线程。
 
+### PP-YOLO tiny
+
+|    模型  | GPU个数 | 每GPU图片个数 | 模型体积   | 量化后模型体积  |   输入尺寸    | Box AP<sup>val</sup> | Kirin 990 4xCore(FPS) | 模型下载 |  配置文件 | 量化后模型下载 |
+|:---------:|:-------:|:---------:|:---------:| :-------------------: | :---------: | :------------------: | :-------------------: | :------: | :----: | :--------------: |
+| PP-YOLO tiny  |  8  |    32     |   4.2MB   |       **1.3M**        |     320     |         20.6         |          92.3         | [下载链接](https://paddledet.bj.bcebos.com/models/ppyolo_tiny_650e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ppyolo/ppyolo_tiny_650e_coco.yml)  | [推理模型](https://paddledet.bj.bcebos.com/models/ppyolo_tiny_quant.tar) |
+| PP-YOLO tiny  |  8  |    32     |   4.2MB   |       **1.3M**        |     416     |         22.7         |          65.4         | [下载链接](https://paddledet.bj.bcebos.com/models/ppyolo_tiny_650e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ppyolo/ppyolo_tiny_650e_coco.yml)  | [推理模型](https://paddledet.bj.bcebos.com/models/ppyolo_tiny_quant.tar) |
+
+**注意:**
+
+- PP-YOLO-tiny 在COCO train2017数据集上进行训练，在val2017数据集上进行评估，Box AP<sup>val</sup> 是`mAP(IoU=0.5:0.95)`的评估结果。
+- PP-YOLO-tiny 使用8个GPU进行训练，每个GPU上的batch size为32，如果GPU数量和最小批量大小发生变化，则应根据[FAQ](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.0/static/docs/FAQ.md)调整学习速率和迭代次数。
+- PP-YOLO-tiny 是利用arm8在Kirin 990上4个线程来测试推理速度的。
+- 我们还提供了PP-YOLO-tiny 量化后的推理模型, 它可以将模型压缩到**1.3MB**，并且几乎不需要对推理速度和性能进行任何推理。
+
 ### Pascal VOC数据集上的PP-YOLO
 
 PP-YOLO在Pascal VOC数据集上训练模型如下:
 
 |       模型         | GPU个数 | 每GPU图片个数 |  骨干网络  |   输入尺寸  | Box AP50<sup>val</sup> | 模型下载 | 配置文件 |
 |:------------------:|:-------:|:-------------:|:----------:| :----------:| :--------------------: | :------: | :-----: |
-| PP-YOLO            |    8    |       12      | ResNet50vd |     608     |          84.9          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/ppyolo/ppyolo_r50vd_dcn_voc.yml)                   |
-| PP-YOLO            |    8    |       12      | ResNet50vd |     416     |          84.3          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/ppyolo/ppyolo_r50vd_dcn_voc.yml)                   |
-| PP-YOLO            |    8    |       12      | ResNet50vd |     320     |          82.2          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/ppyolo/ppyolo_r50vd_dcn_voc.yml)                   |
+| PP-YOLO            |    8    |       12      | ResNet50vd |     608     |          84.9          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ppyolo/ppyolo_r50vd_dcn_voc.yml)                   |
+| PP-YOLO            |    8    |       12      | ResNet50vd |     416     |          84.3          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ppyolo/ppyolo_r50vd_dcn_voc.yml)                   |
+| PP-YOLO            |    8    |       12      | ResNet50vd |     320     |          82.2          | [model](https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ppyolo/ppyolo_r50vd_dcn_voc.yml)                   |
 
 ## 使用说明
 
@@ -90,6 +105,11 @@ PP-YOLO在Pascal VOC数据集上训练模型如下:
 python -m paddle.distributed.launch --log_dir=./ppyolo_dygraph/ --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml &>ppyolo_dygraph.log 2>&1 &
 ```
 
+可选：在训练之前使用`tools/anchor_cluster.py`得到适用于你的数据集的anchor，并注意修改模型配置文件和Reader配置文件中的anchor设置，如`configs/ppyolo/_base_/ppyolo_tiny.yml`和`configs/ppyolo/_base_/ppyolo_tiny_reader.yml`中anchor设置
+```bash
+python tools/anchor_cluster.py -c configs/ppyolo/ppyolo_tiny_650e_coco.yml -n 9 -s 320 -m v2 -i 1000
+```
+
 ### 2. 评估
 
 使用单GPU通过如下命令一键式评估模型在COCO val2017数据集效果
@@ -114,7 +134,27 @@ CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ppyolo/ppyolo_test.yml -o
 
 评估结果保存于`bbox.json`中，将其压缩为zip包后通过[COCO数据集评估页](https://competitions.codalab.org/competitions/20794#participate)提交评估。
 
-**注意:** `configs/ppyolo/ppyolo_test.yml`仅用于评估COCO test-dev数据集，不用于训练和评估COCO val2017数据集。
+**注意1:** `configs/ppyolo/ppyolo_test.yml`仅用于评估COCO test-dev数据集，不用于训练和评估COCO val2017数据集。
+
+**注意2:** 由于动态图框架整体升级，以下几个PaddleDetection发布的权重模型评估时需要添加--bias字段, 例如
+
+```bash
+# 使用PaddleDetection发布的权重
+CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams --bias
+```
+主要有:
+
+1.ppyolo_r50vd_dcn_1x_coco
+
+2.ppyolo_r50vd_dcn_voc
+
+3.ppyolo_r18vd_coco
+
+4.ppyolo_mbv3_large_coco
+
+5.ppyolo_mbv3_small_coco
+
+6.ppyolo_tiny_650e_coco
 
 ### 3. 推理
 
@@ -122,10 +162,10 @@ CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ppyolo/ppyolo_test.yml -o
 
 ```bash
 # 推理单张图像
-CUDA_VISIBLE_DEVICES=0 python tools/infer.py configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams --infer_img=../demo/000000014439_640x640.jpg
+CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams --infer_img=demo/000000014439_640x640.jpg
 
 # 推理目录下所有图像
-CUDA_VISIBLE_DEVICES=0 python tools/infer.py configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams --infer_dir=../demo
+CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams --infer_dir=demo
 ```
 
 ### 4. 推理部署
@@ -137,7 +177,7 @@ PP-YOLO模型部署及推理benchmark需要通过`tools/export_model.py`导出
 python tools/export_model.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyolo_r50vd_dcn_1x_coco.pdparams
 
 # 预测库推理
-CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyolo_r50vd_dcn_1x_coco --image_file=../demo/000000014439_640x640.jpg --use_gpu=True
+CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyolo_r50vd_dcn_1x_coco --image_file=demo/000000014439_640x640.jpg --use_gpu=True
 ```
 
 
@@ -169,4 +209,23 @@ PP-YOLO模型相对于YOLOv3模型优化项消融实验数据如下表所示。
 - 精度与推理速度数据均为使用输入图像尺寸为608的测试结果
 - Box AP为在COCO train2017数据集训练，val2017和test-dev2017数据集上评估`mAP(IoU=0.5:0.95)`数据
 - 推理速度为单卡V100上，batch size=1, 使用上述benchmark测试方法的测试结果，测试环境配置为CUDA 10.2，CUDNN 7.5.1
-- [YOLOv3-DarkNet53](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/yolov3/yolov3_darknet53_270e_coco.yml)精度38.9为PaddleDetection优化后的YOLOv3模型，可参见[模型库](https://github.com/PaddlePaddle/PaddleDetection/blob/master/docs/MODEL_ZOO.md)
+- [YOLOv3-DarkNet53](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.0/configs/yolov3/yolov3_darknet53_270e_coco.yml)精度38.9为PaddleDetection优化后的YOLOv3模型，可参见[模型库](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.0/docs/MODEL_ZOO_cn.md)
+
+## 引用
+
+```
+@misc{long2020ppyolo,
+title={PP-YOLO: An Effective and Efficient Implementation of Object Detector},
+author={Xiang Long and Kaipeng Deng and Guanzhong Wang and Yang Zhang and Qingqing Dang and Yuan Gao and Hui Shen and Jianguo Ren and Shumin Han and Errui Ding and Shilei Wen},
+year={2020},
+eprint={2007.12099},
+archivePrefix={arXiv},
+primaryClass={cs.CV}
+}
+@misc{ppdet2019,
+title={PaddleDetection, Object detection and instance segmentation toolkit based on PaddlePaddle.},
+author={PaddlePaddle Authors},
+howpublished = {\url{https://github.com/PaddlePaddle/PaddleDetection}},
+year={2019}
+}
+```
diff --git a/configs/ppyolo/_base_/optimizer_365e.yml b/configs/ppyolo/_base_/optimizer_365e.yml
new file mode 100644
index 0000000000000000000000000000000000000000..d834a4ce0547a77a236964f7dc6ce52c217be2d5
--- /dev/null
+++ b/configs/ppyolo/_base_/optimizer_365e.yml
@@ -0,0 +1,21 @@
+epoch: 365
+
+LearningRate:
+  base_lr: 0.005
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones:
+    - 243
+  - !LinearWarmup
+    start_factor: 0.
+    steps: 4000
+
+OptimizerBuilder:
+  clip_grad_by_norm: 35.
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0005
+    type: L2
diff --git a/configs/ppyolo/_base_/optimizer_650e.yml b/configs/ppyolo/_base_/optimizer_650e.yml
new file mode 100644
index 0000000000000000000000000000000000000000..79a1f98eacb86cf8ae8ac34ce0c1e601cce78322
--- /dev/null
+++ b/configs/ppyolo/_base_/optimizer_650e.yml
@@ -0,0 +1,22 @@
+epoch: 650
+
+LearningRate:
+  base_lr: 0.005
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones:
+    - 430
+    - 540
+    - 610
+  - !LinearWarmup
+    start_factor: 0.
+    steps: 4000
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0005
+    type: L2
diff --git a/configs/ppyolo/_base_/ppyolo_mbv3_large.yml b/configs/ppyolo/_base_/ppyolo_mbv3_large.yml
index dc84ae75c5e3e2a94380e59c02b55934a1f51661..0faaa9a9a3bb1d94abe183ed385558852d0fbc20 100644
--- a/configs/ppyolo/_base_/ppyolo_mbv3_large.yml
+++ b/configs/ppyolo/_base_/ppyolo_mbv3_large.yml
@@ -18,7 +18,7 @@ MobileNetV3:
   feature_maps: [13, 16]
 
 PPYOLOFPN:
-  feat_channels: [160, 368]
+  in_channels: [160, 368]
   coord_conv: true
   conv_block_num: 0
   spp: true
diff --git a/configs/ppyolo/_base_/ppyolo_mbv3_small.yml b/configs/ppyolo/_base_/ppyolo_mbv3_small.yml
index 7e3a30c9a0dc6ebe29751e8dd0562e1f0fc78dec..dda938298f2c1b65652405b808c6df14ed049c77 100644
--- a/configs/ppyolo/_base_/ppyolo_mbv3_small.yml
+++ b/configs/ppyolo/_base_/ppyolo_mbv3_small.yml
@@ -18,7 +18,7 @@ MobileNetV3:
   feature_maps: [9, 12]
 
 PPYOLOFPN:
-  feat_channels: [96, 304]
+  in_channels: [96, 304]
   coord_conv: true
   conv_block_num: 0
   spp: true
diff --git a/configs/ppyolo/_base_/ppyolo_r18vd.yml b/configs/ppyolo/_base_/ppyolo_r18vd.yml
index 4b9e924be1d11a098e0b73a4592347bb1a8c2bbe..56a34838574f277b4b43dd536449ee39b7c4e0c1 100644
--- a/configs/ppyolo/_base_/ppyolo_r18vd.yml
+++ b/configs/ppyolo/_base_/ppyolo_r18vd.yml
@@ -19,7 +19,6 @@ ResNet:
   norm_decay: 0.
 
 PPYOLOFPN:
-  feat_channels: [512, 512]
   drop_block: true
   block_size: 3
   keep_prob: 0.9
diff --git a/configs/ppyolo/_base_/ppyolo_tiny.yml b/configs/ppyolo/_base_/ppyolo_tiny.yml
new file mode 100644
index 0000000000000000000000000000000000000000..d03e2bb86a494d07b785ede5bf93db7886fe40cc
--- /dev/null
+++ b/configs/ppyolo/_base_/ppyolo_tiny.yml
@@ -0,0 +1,55 @@
+architecture: YOLOv3
+pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/MobileNetV3_large_x0_5_pretrained.pdparams
+norm_type: sync_bn
+use_ema: true
+ema_decay: 0.9998
+
+YOLOv3:
+  backbone: MobileNetV3
+  neck: PPYOLOTinyFPN
+  yolo_head: YOLOv3Head
+  post_process: BBoxPostProcess
+
+MobileNetV3:
+  model_name: large
+  scale: .5
+  with_extra_blocks: false
+  extra_block_filters: []
+  feature_maps: [7, 13, 16]
+
+PPYOLOTinyFPN:
+  detection_block_channels: [160, 128, 96]
+  spp: true
+  drop_block: true
+
+YOLOv3Head:
+  anchors: [[10, 15], [24, 36], [72, 42],
+            [35, 87], [102, 96], [60, 170],
+            [220, 125], [128, 222], [264, 266]]
+  anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
+  loss: YOLOv3Loss
+
+YOLOv3Loss:
+  ignore_thresh: 0.5
+  downsample: [32, 16, 8]
+  label_smooth: false
+  scale_x_y: 1.05
+  iou_loss: IouLoss
+
+IouLoss:
+  loss_weight: 2.5
+  loss_square: true
+
+BBoxPostProcess:
+  decode:
+    name: YOLOBox
+    conf_thresh: 0.005
+    downsample_ratio: 32
+    clip_bbox: true
+    scale_x_y: 1.05
+  nms:
+    name: MultiClassNMS
+    keep_top_k: 100
+    nms_threshold: 0.45
+    nms_top_k: 1000
+    score_threshold: 0.005
diff --git a/configs/ppyolo/_base_/ppyolo_tiny_reader.yml b/configs/ppyolo/_base_/ppyolo_tiny_reader.yml
new file mode 100644
index 0000000000000000000000000000000000000000..4cbc090c9baeea55af16237867783d84ff63751f
--- /dev/null
+++ b/configs/ppyolo/_base_/ppyolo_tiny_reader.yml
@@ -0,0 +1,43 @@
+worker_num: 4
+TrainReader:
+  inputs_def:
+    num_max_boxes: 100
+  sample_transforms:
+    - Decode: {}
+    - Mixup: {alpha: 1.5, beta: 1.5}
+    - RandomDistort: {}
+    - RandomExpand: {fill_value: [123.675, 116.28, 103.53]}
+    - RandomCrop: {}
+    - RandomFlip: {}
+  batch_transforms:
+    - BatchRandomResize: {target_size: [192, 224, 256, 288, 320, 352, 384, 416, 448, 480, 512], random_size: True, random_interp: True, keep_ratio: False}
+    - NormalizeBox: {}
+    - PadBox: {num_max_boxes: 100}
+    - BboxXYXY2XYWH: {}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+    - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 15], [24, 36], [72, 42], [35, 87], [102, 96], [60, 170], [220, 125], [128, 222], [264, 266]], downsample_ratios: [32, 16, 8]}
+  batch_size: 32
+  shuffle: true
+  drop_last: true
+  mixup_epoch: 500
+  use_shared_memory: true
+
+EvalReader:
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: [320, 320], keep_ratio: False, interp: 2}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+  batch_size: 8
+  drop_empty: false
+
+TestReader:
+  inputs_def:
+    image_shape: [3, 320, 320]
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: [320, 320], keep_ratio: False, interp: 2}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+  batch_size: 1
diff --git a/configs/ppyolo/_base_/ppyolov2_r50vd_dcn.yml b/configs/ppyolo/_base_/ppyolov2_r50vd_dcn.yml
new file mode 100644
index 0000000000000000000000000000000000000000..6288adeed8a4b057261f98132456f71b724fc45d
--- /dev/null
+++ b/configs/ppyolo/_base_/ppyolov2_r50vd_dcn.yml
@@ -0,0 +1,65 @@
+architecture: YOLOv3
+pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_pretrained.pdparams
+norm_type: sync_bn
+use_ema: true
+ema_decay: 0.9998
+
+YOLOv3:
+  backbone: ResNet
+  neck: PPYOLOPAN
+  yolo_head: YOLOv3Head
+  post_process: BBoxPostProcess
+
+ResNet:
+  depth: 50
+  variant: d
+  return_idx: [1, 2, 3]
+  dcn_v2_stages: [3]
+  freeze_at: -1
+  freeze_norm: false
+  norm_decay: 0.
+
+PPYOLOPAN:
+  drop_block: true
+  block_size: 3
+  keep_prob: 0.9
+  spp: true
+
+YOLOv3Head:
+  anchors: [[10, 13], [16, 30], [33, 23],
+            [30, 61], [62, 45], [59, 119],
+            [116, 90], [156, 198], [373, 326]]
+  anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
+  loss: YOLOv3Loss
+  iou_aware: true
+  iou_aware_factor: 0.5
+
+YOLOv3Loss:
+  ignore_thresh: 0.7
+  downsample: [32, 16, 8]
+  label_smooth: false
+  scale_x_y: 1.05
+  iou_loss: IouLoss
+  iou_aware_loss: IouAwareLoss
+
+IouLoss:
+  loss_weight: 2.5
+  loss_square: true
+
+IouAwareLoss:
+  loss_weight: 1.0
+
+BBoxPostProcess:
+  decode:
+    name: YOLOBox
+    conf_thresh: 0.01
+    downsample_ratio: 32
+    clip_bbox: true
+    scale_x_y: 1.05
+  nms:
+    name: MatrixNMS
+    keep_top_k: 100
+    score_threshold: 0.01
+    post_threshold: 0.01
+    nms_top_k: -1
+    background_label: -1
diff --git a/configs/ppyolo/_base_/ppyolov2_reader.yml b/configs/ppyolo/_base_/ppyolov2_reader.yml
new file mode 100644
index 0000000000000000000000000000000000000000..7472531315d13425d082ad571d3015b2a08faebb
--- /dev/null
+++ b/configs/ppyolo/_base_/ppyolov2_reader.yml
@@ -0,0 +1,43 @@
+worker_num: 8
+TrainReader:
+  inputs_def:
+    num_max_boxes: 100
+  sample_transforms:
+    - Decode: {}
+    - Mixup: {alpha: 1.5, beta: 1.5}
+    - RandomDistort: {}
+    - RandomExpand: {fill_value: [123.675, 116.28, 103.53]}
+    - RandomCrop: {}
+    - RandomFlip: {}
+  batch_transforms:
+    - BatchRandomResize: {target_size: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608, 640, 672, 704, 736, 768], random_size: True, random_interp: True, keep_ratio: False}
+    - NormalizeBox: {}
+    - PadBox: {num_max_boxes: 100}
+    - BboxXYXY2XYWH: {}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+    - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8]}
+  batch_size: 12
+  shuffle: true
+  drop_last: true
+  mixup_epoch: 25000
+  use_shared_memory: true
+
+EvalReader:
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+  batch_size: 8
+  drop_empty: false
+
+TestReader:
+  inputs_def:
+    image_shape: [3, 640, 640]
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: [640, 640], keep_ratio: False, interp: 2}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+  batch_size: 1
diff --git a/configs/ppyolo/ppyolo_r50vd_dcn_voc.yml b/configs/ppyolo/ppyolo_r50vd_dcn_voc.yml
index 4b2bcc49248bf28ec79bc07d6535cbb45b869bc1..eac22ce85c026d835aafa26b9886bbcdbdb4a1e9 100644
--- a/configs/ppyolo/ppyolo_r50vd_dcn_voc.yml
+++ b/configs/ppyolo/ppyolo_r50vd_dcn_voc.yml
@@ -10,17 +10,13 @@ snapshot_epoch: 83
 weights: output/ppyolo_r50vd_dcn_voc/model_final
 
 TrainReader:
-  batch_transforms:
-    - BatchRandomResizeOp: {target_size: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608], random_size: True, random_interp: True, keep_ratio: False}
-    - NormalizeBoxOp: {}
-    - PadBoxOp: {num_max_boxes: 50}
-    - BboxXYXY2XYWHOp: {}
-    - NormalizeImageOp: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
-    - PermuteOp: {}
-    - Gt2YoloTargetOp: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8], num_classes: 20}
   mixup_epoch: 350
   batch_size: 12
 
+EvalReader:
+  batch_transforms:
+  - PadBatch: {pad_gt: True}
+
 epoch: 583
 
 LearningRate:
diff --git a/configs/ppyolo/ppyolo_tiny_650e_coco.yml b/configs/ppyolo/ppyolo_tiny_650e_coco.yml
new file mode 100644
index 0000000000000000000000000000000000000000..288a0eba8063864877762dfecf9b22373121fe2a
--- /dev/null
+++ b/configs/ppyolo/ppyolo_tiny_650e_coco.yml
@@ -0,0 +1,10 @@
+_BASE_: [
+  '../datasets/coco_detection.yml',
+  '../runtime.yml',
+  './_base_/ppyolo_tiny.yml',
+  './_base_/optimizer_650e.yml',
+  './_base_/ppyolo_tiny_reader.yml',
+]
+
+snapshot_epoch: 1
+weights: output/ppyolo_tiny_650e_coco/model_final
diff --git a/configs/ppyolo/ppyolov2_r101vd_dcn_365e_coco.yml b/configs/ppyolo/ppyolov2_r101vd_dcn_365e_coco.yml
new file mode 100644
index 0000000000000000000000000000000000000000..0f1aee746e4fd58ed060c83213c3306aea57e83e
--- /dev/null
+++ b/configs/ppyolo/ppyolov2_r101vd_dcn_365e_coco.yml
@@ -0,0 +1,20 @@
+_BASE_: [
+  '../datasets/coco_detection.yml',
+  '../runtime.yml',
+  './_base_/ppyolov2_r50vd_dcn.yml',
+  './_base_/optimizer_365e.yml',
+  './_base_/ppyolov2_reader.yml',
+]
+
+snapshot_epoch: 8
+weights: output/ppyolov2_r101vd_dcn_365e_coco/model_final
+pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet101_vd_ssld_pretrained.pdparams
+
+ResNet:
+  depth: 101
+  variant: d
+  return_idx: [1, 2, 3]
+  dcn_v2_stages: [3]
+  freeze_at: -1
+  freeze_norm: false
+  norm_decay: 0.
diff --git a/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml b/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml
new file mode 100644
index 0000000000000000000000000000000000000000..a5e1bc33560f882594156a6deb03798ea5553e7f
--- /dev/null
+++ b/configs/ppyolo/ppyolov2_r50vd_dcn_365e_coco.yml
@@ -0,0 +1,10 @@
+_BASE_: [
+  '../datasets/coco_detection.yml',
+  '../runtime.yml',
+  './_base_/ppyolov2_r50vd_dcn.yml',
+  './_base_/optimizer_365e.yml',
+  './_base_/ppyolov2_reader.yml',
+]
+
+snapshot_epoch: 8
+weights: output/ppyolov2_r50vd_dcn_365e_coco/model_final
diff --git a/configs/rcnn_enhance/README.md b/configs/rcnn_enhance/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..4a53da5b8606aed841398162cd27b9c60c641029
--- /dev/null
+++ b/configs/rcnn_enhance/README.md
@@ -0,0 +1,12 @@
+## 服务器端实用目标检测方案
+
+### 简介
+
+* 近年来，学术界和工业界广泛关注图像中目标检测任务。基于[PaddleClas](https://github.com/PaddlePaddle/PaddleClas)中SSLD蒸馏方案训练得到的ResNet50_vd预训练模型(ImageNet1k验证集上Top1 Acc为82.39%)，结合PaddleDetection中的丰富算子，飞桨提供了一种面向服务器端实用的目标检测方案PSS-DET(Practical Server Side Detection)。基于COCO2017目标检测数据集，V100单卡预测速度为为61FPS时，COCO mAP可达41.2%。
+
+
+### 模型库
+
+| 骨架网络             | 网络类型       | 每张GPU图片个数 | 学习率策略 |推理时间(fps) | Box AP | Mask AP |                           下载                          | 配置文件 |
+| :---------------------- | :-------------:  | :-------: | :-----: | :------------: | :----: | :-----: | :-------------: | :-----: |
+| ResNet50-vd-FPN-Dcnv2         | Faster     |     2     |   3x    |     61.425     |  41.5  |    -    | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_enhance_3x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/rcnn_enhance/faster_rcnn_enhance_3x_coco.yml) |
diff --git a/configs/rcnn_enhance/_base_/faster_rcnn_enhance.yml b/configs/rcnn_enhance/_base_/faster_rcnn_enhance.yml
new file mode 100644
index 0000000000000000000000000000000000000000..d47fd2c98ce28ab3e75f56e981a2be70326a8bbd
--- /dev/null
+++ b/configs/rcnn_enhance/_base_/faster_rcnn_enhance.yml
@@ -0,0 +1,81 @@
+architecture: FasterRCNN
+pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_v2_pretrained.pdparams
+
+FasterRCNN:
+  backbone: ResNet
+  neck: FPN
+  rpn_head: RPNHead
+  bbox_head: BBoxHead
+  # post process
+  bbox_post_process: BBoxPostProcess
+
+
+ResNet:
+  # index 0 stands for res2
+  depth: 50
+  norm_type: bn
+  variant: d
+  freeze_at: 0
+  return_idx: [0,1,2,3]
+  num_stages: 4
+  dcn_v2_stages: [1,2,3]
+  lr_mult_list: [0.05, 0.05, 0.1, 0.15]
+
+FPN:
+  in_channels: [256, 512, 1024, 2048]
+  out_channel: 64
+
+RPNHead:
+  anchor_generator:
+    aspect_ratios: [0.5, 1.0, 2.0]
+    anchor_sizes: [[32], [64], [128], [256], [512]]
+    strides: [4, 8, 16, 32, 64]
+  rpn_target_assign:
+    batch_size_per_im: 256
+    fg_fraction: 0.5
+    negative_overlap: 0.3
+    positive_overlap: 0.7
+    use_random: True
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 2000
+    post_nms_top_n: 2000
+    topk_after_collect: True
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 500
+    post_nms_top_n: 300
+
+
+BBoxHead:
+  head: TwoFCHead
+  roi_extractor:
+    resolution: 7
+    sampling_ratio: 0
+    aligned: True
+  bbox_assigner: BBoxLibraAssigner
+  bbox_loss: DIouLoss
+
+TwoFCHead:
+  out_channel: 1024
+
+BBoxLibraAssigner:
+  batch_size_per_im: 512
+  bg_thresh: 0.5
+  fg_thresh: 0.5
+  fg_fraction: 0.25
+  use_random: True
+
+DIouLoss:
+  loss_weight: 10.0
+  use_complete_iou_loss: true
+
+BBoxPostProcess:
+  decode: RCNNBox
+  nms:
+    name: MultiClassNMS
+    keep_top_k: 100
+    score_threshold: 0.05
+    nms_threshold: 0.5
diff --git a/configs/rcnn_enhance/_base_/faster_rcnn_enhance_reader.yml b/configs/rcnn_enhance/_base_/faster_rcnn_enhance_reader.yml
new file mode 100644
index 0000000000000000000000000000000000000000..da6ce65daa61b8c5ae8dc0af9f47b547dc5bc53e
--- /dev/null
+++ b/configs/rcnn_enhance/_base_/faster_rcnn_enhance_reader.yml
@@ -0,0 +1,41 @@
+worker_num: 2
+TrainReader:
+  sample_transforms:
+  - Decode: {}
+  - RandomResize: {target_size: [[384,1000], [416,1000], [448,1000], [480,1000], [512,1000], [544,1000], [576,1000], [608,1000], [640,1000], [672,1000]], interp: 2, keep_ratio: True}
+  - RandomFlip: {prob: 0.5}
+  - AutoAugment: {autoaug_type: v1}
+  - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
+  - Permute: {}
+  batch_transforms:
+  - PadBatch: {pad_to_stride: 32, pad_gt: true}
+  batch_size: 2
+  shuffle: true
+  drop_last: true
+
+
+EvalReader:
+  sample_transforms:
+  - Decode: {}
+  - Resize: {interp: 2, target_size: [640, 640], keep_ratio: True}
+  - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
+  - Permute: {}
+  batch_transforms:
+  - PadBatch: {pad_to_stride: 32, pad_gt: false}
+  batch_size: 1
+  shuffle: false
+  drop_last: false
+  drop_empty: false
+
+
+TestReader:
+  sample_transforms:
+  - Decode: {}
+  - Resize: {interp: 2, target_size: [640, 640], keep_ratio: True}
+  - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
+  - Permute: {}
+  batch_transforms:
+  - PadBatch: {pad_to_stride: 32, pad_gt: false}
+  batch_size: 1
+  shuffle: false
+  drop_last: false
diff --git a/configs/rcnn_enhance/_base_/optimizer_3x.yml b/configs/rcnn_enhance/_base_/optimizer_3x.yml
new file mode 100644
index 0000000000000000000000000000000000000000..8bd85fae359c552952bdfc7cec4cbb5ff1198e85
--- /dev/null
+++ b/configs/rcnn_enhance/_base_/optimizer_3x.yml
@@ -0,0 +1,19 @@
+epoch: 36
+
+LearningRate:
+  base_lr: 0.02
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [24, 33]
+  - !LinearWarmup
+    start_factor: 0.
+    steps: 1000
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0001
+    type: L2
diff --git a/configs/rcnn_enhance/faster_rcnn_enhance_3x_coco.yml b/configs/rcnn_enhance/faster_rcnn_enhance_3x_coco.yml
new file mode 100644
index 0000000000000000000000000000000000000000..a49f245f22dbcaf80cb9a8ca382c35f549858b18
--- /dev/null
+++ b/configs/rcnn_enhance/faster_rcnn_enhance_3x_coco.yml
@@ -0,0 +1,8 @@
+_BASE_: [
+  '../datasets/coco_detection.yml',
+  '../runtime.yml',
+  '_base_/optimizer_3x.yml',
+  '_base_/faster_rcnn_enhance.yml',
+  '_base_/faster_rcnn_enhance_reader.yml',
+]
+weights: output/faster_rcnn_enhance_r50_3x_coco/model_final
diff --git a/configs/slim/README.md b/configs/slim/README.md
index 728c2f3760ffb7355e0ac3627ff3cfd193bd2b82..8a07b08529aa8cdae64f53f1d25f1db25c71dd04 100755
--- a/configs/slim/README.md
+++ b/configs/slim/README.md
@@ -4,41 +4,33 @@
 
 - [剪裁](prune)
 - [量化](quant)
+- [蒸馏](distill)
+- [联合策略](extensions)
 
 推荐您使用剪裁和蒸馏联合训练，或者使用剪裁和量化，进行检测模型压缩。 下面以YOLOv3为例，进行剪裁、蒸馏和量化实验。
 
-## Benchmark
-
-### 剪裁
-
-#### Pascal VOC上benchmark
-
-| 模型         |  压缩策略 |     GFLOPs     |  模型体积(MB)   | 输入尺寸 | 预测时延（SD855）|   Box AP   |                           下载                          | 模型配置文件 | 压缩算法配置文件  |
-| :----------------| :-------: | :------------: | :-------------: | :------: | :--------: | :------: | :-----------------------------------------------------: |:-------------: | :------: |
-| YOLOv3-MobileNetV1      |  baseline | 24.13          |  93          |   608    | 289.9ms | 75.1       | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/yolov3/yolov3_mobilenet_v1_270e_voc.yml)  |  -  |
-| YOLOv3-MobileNetV1      |  剪裁-l1_norm(sensity) | 15.78(-34.49%) |  66(-29%) |   608   | - | 77.6(+2.5) | [下载链接](https://paddledet.bj.bcebos.com/models/slim/yolov3_mobilenet_v1_voc_prune_l1_norm.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/yolov3/yolov3_mobilenet_v1_270e_voc.yml)  |  [slim配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/slim/prune/yolov3_prune_l1_norm.yml)  |
-
-### 量化
-
-#### COCO上benchmark
-
-| 模型               | 压缩策略     | 输入尺寸 |   Box AP    |                             下载                             |                         模型配置文件                         |                       压缩算法配置文件                       |
-| ------------------ | ------------ | -------- | :---------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: |
-| YOLOv3-MobileNetV1 | baseline     | 608      |    28.8     | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml) |                              -                               |
-| YOLOv3-MobileNetV1 | 普通在线量化 | 608      | 27.5 (-1.3) | [下载链接](https://paddledet.bj.bcebos.com/models/slim/yolov3_mobilenet_v1_coco_qat.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml) | [slim配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/slim/quant/yolov3_mobilenet_v1_qat.yml) |
-| YOLOv3-MobileNetV3 | baseline     | 608      |    31.4     | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/yolov3/yolov3_mobilenet_v3_large_270e_coco.yml) |                              -                               |
-| YOLOv3-MobileNetV3 | PACT在线量化 | 608      | 29.0 (-2.4) | [下载链接](https://paddledet.bj.bcebos.com/models/slim/yolov3_mobilenet_v3_coco_qat.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/yolov3/yolov3_mobilenet_v3_large_270e_coco.yml) | [slim配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/slim/quant/yolov3_mobilenet_v3_qat.yml) |
-
-- SD855预测时延为使用PaddleLite部署，使用arm8架构并使用4线程(4 Threads)推理时延
-
 ## 实验环境
 
 - Python 3.7+
-- PaddlePaddle >= 2.0.0
+- PaddlePaddle >= 2.0.1
 - PaddleSlim >= 2.0.0
 - CUDA 9.0+
 - cuDNN >=7.5
 
+**注意：** 量化训练需要依赖Paddle develop分支，可在[PaddlePaddle每日版本](https://www.paddlepaddle.org.cn/documentation/docs/zh/install/Tables.html#whl-dev)中下载安装合适的PaddlePaddle版本。
+
+#### 安装PaddleSlim
+- 方法一：直接安装：
+```
+pip install paddleslim -i https://pypi.tuna.tsinghua.edu.cn/simple
+```
+- 方法二：编译安装：
+```
+git clone https://github.com/PaddlePaddle/PaddleSlim.git
+cd PaddleSlim
+python setup.py install
+```
+
 ## 快速开始
 
 ### 训练
@@ -84,3 +76,56 @@ python tools/export_model.py -c configs/{MODEL.yml} --slim_config configs/slim/{
 - `-c`: 指定模型配置文件。
 - `--slim_config`: 指定压缩策略配置文件。
 - `-o weights`: 指定压缩算法训好的模型路径。
+
+
+## Benchmark
+
+### 剪裁
+
+#### Pascal VOC上benchmark
+
+| 模型         |  压缩策略 |     GFLOPs     |  模型体积(MB)   | 输入尺寸 | 预测时延（SD855）|   Box AP   |                           下载                          | 模型配置文件 | 压缩算法配置文件  |
+| :----------------| :-------: | :------------: | :-------------: | :------: | :--------: | :------: | :-----------------------------------------------------: |:-------------: | :------: |
+| YOLOv3-MobileNetV1      |  baseline | 24.13          |  93          |   608    | 289.9ms | 75.1       | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v1_270e_voc.yml)  |  -  |
+| YOLOv3-MobileNetV1      |  剪裁-l1_norm(sensity) | 15.78(-34.49%) |  66(-29%) |   608   | - | 78.4(+3.3) | [下载链接](https://paddledet.bj.bcebos.com/models/slim/yolov3_mobilenet_v1_voc_prune_l1_norm.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v1_270e_voc.yml)  |  [slim配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/slim/prune/yolov3_prune_l1_norm.yml)  |
+
+- 目前剪裁支持YOLO系列、SSD、TTFNet、BlazeFace，其余模型正在开发支持中。
+- SD855预测时延为使用PaddleLite部署，使用arm8架构并使用4线程(4 Threads)推理时延。
+
+### 量化
+
+#### COCO上benchmark
+
+| 模型               | 压缩策略     | 输入尺寸 |   Box AP    |                             下载                             |                         模型配置文件                         |                       压缩算法配置文件                       |
+| ------------------ | ------------ | -------- | :---------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: |
+| YOLOv3-MobileNetV1 | baseline     | 608      |    28.8     | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml) |                            -                               |
+| YOLOv3-MobileNetV1 | 普通在线量化 | 608      | 30.5 (+1.7) | [下载链接](https://paddledet.bj.bcebos.com/models/slim/yolov3_mobilenet_v1_coco_qat.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml) | [slim配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/slim/quant/yolov3_mobilenet_v1_qat.yml) |
+| YOLOv3-MobileNetV3 | baseline     | 608      |    31.4     | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v3_large_270e_coco.yml) |                              -                               |
+| YOLOv3-MobileNetV3 | PACT在线量化 | 608      | 29.1 (-2.3) | [下载链接](https://paddledet.bj.bcebos.com/models/slim/yolov3_mobilenet_v3_coco_qat.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v3_large_270e_coco.yml) | [slim配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/slim/quant/yolov3_mobilenet_v3_qat.yml) |
+| YOLOv3-DarkNet53 | baseline     | 608      |    39.0     | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_darknet53_270e_coco.yml) |                           -                               |
+| YOLOv3-DarkNet53 | 普通在线量化 | 608      | 38.8 (-0.2) | [下载链接](https://paddledet.bj.bcebos.com/models/slim/yolov3_darknet_coco_qat.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_darknet53_270e_coco.yml) | [slim配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/slim/quant/yolov3_darknet_qat.yml) |
+| SSD-MobileNet_v1    |  baseline   |   300   |  73.8  | [下载链接](https://paddledet.bj.bcebos.com/models/ssd_mobilenet_v1_300_120e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ssd/ssd_mobilenet_v1_300_120e_voc.yml) |     -    |
+| SSD-MobileNet_v1    |  普通在线量化   |   300   |  72.9(-0.9)  | [下载链接](https://paddledet.bj.bcebos.com/models/slim/ssd_mobilenet_v1_300_voc_qat.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ssd/ssd_mobilenet_v1_300_120e_voc.yml) | [slim配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/slim/quant/ssd_mobilenet_v1_qat.yml) |
+| Mask-ResNet50-FPN     |    baseline      |    (800, 1333)   |  39.2/35.6    | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_fpn_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml) |  -  |
+| Mask-ResNet50-FPN     |    普通在线量化      |    (800, 1333)   |  39.7(+0.5)/35.9(+0.3)    | [下载链接](https://paddledet.bj.bcebos.com/models/slim/mask_rcnn_r50_fpn_1x_qat.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml) |  [slim配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/slim/quant/mask_rcnn_r50_fpn_1x_qat.yml)  |
+
+
+### 蒸馏
+
+#### COCO上benchmark
+
+| 模型               | 压缩策略     | 输入尺寸 |   Box AP    |                             下载                             |                         模型配置文件                         |                       压缩算法配置文件                       |
+| ------------------ | ------------ | -------- | :---------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: |
+| YOLOv3-MobileNetV1 | baseline     | 608      |    29.4     | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml) |                              -                               |
+| YOLOv3-MobileNetV1 | 蒸馏 | 608      | 31.0(+1.6) | [下载链接](https://paddledet.bj.bcebos.com/models/slim/yolov3_mobilenet_v1_coco_distill.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml) | [slim配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/slim/distill/yolov3_mobilenet_v1_coco_distill.yml) |
+
+- 具体蒸馏方法请参考[蒸馏策略文档](distill/README.md)
+
+### 蒸馏剪裁联合策略
+
+#### COCO上benchmark
+
+| 模型               | 压缩策略     | 输入尺寸 | GFLOPs | 模型体积(MB) |  Box AP    |                             下载                             |                         模型配置文件                         |                       压缩算法配置文件                       |
+| ------------------ | ------------ | -------- | :---------: |:---------: | :---------: |:----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: |
+| YOLOv3-MobileNetV1 | baseline     | 608      | 24.65 | 94.6 |  29.4     | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml) |                              -                               |
+| YOLOv3-MobileNetV1 | 蒸馏+剪裁 | 608      | 7.54(-69.4%) | 32.0(-66.0%) | 28.4(-1.0) | [下载链接](https://paddledet.bj.bcebos.com/models/slim/yolov3_mobilenet_v1_coco_distill_prune.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml) | [slim配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/slim/extensions/yolov3_mobilenet_v1_coco_distill_prune.yml) |
diff --git a/configs/slim/distill/README.md b/configs/slim/distill/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..da5795764cec02ea384f8e063f918b56b4f2b9bb
--- /dev/null
+++ b/configs/slim/distill/README.md
@@ -0,0 +1,18 @@
+# Distillation(蒸馏)
+
+## YOLOv3模型蒸馏
+以YOLOv3-MobileNetV1为例，使用YOLOv3-ResNet34作为蒸馏训练的teacher网络, 对YOLOv3-MobileNetV1结构的student网络进行蒸馏。
+COCO数据集作为目标检测任务的训练目标难度更大，意味着teacher网络会预测出更多的背景bbox，如果直接用teacher的预测输出作为student学习的`soft label`会有严重的类别不均衡问题。解决这个问题需要引入新的方法，详细背景请参考论文:[Object detection at 200 Frames Per Second](https://arxiv.org/abs/1805.06361)。
+为了确定蒸馏的对象，我们首先需要找到student和teacher网络得到的`x,y,w,h,cls,objness`等Tensor，用teacher得到的结果指导student训练。具体实现可参考[代码](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/ppdet/slim/distill.py)
+
+## Citations
+```
+@article{mehta2018object,
+      title={Object detection at 200 Frames Per Second},
+      author={Rakesh Mehta and Cemalettin Ozturk},
+      year={2018},
+      eprint={1805.06361},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV}
+}
+```
diff --git a/configs/slim/distill/yolov3_mobilenet_v1_coco_distill.yml b/configs/slim/distill/yolov3_mobilenet_v1_coco_distill.yml
new file mode 100644
index 0000000000000000000000000000000000000000..9998dec5620adac38fd8a487f7ad1ec6aeb055dd
--- /dev/null
+++ b/configs/slim/distill/yolov3_mobilenet_v1_coco_distill.yml
@@ -0,0 +1,12 @@
+_BASE_: [
+  '../../yolov3/yolov3_r34_270e_coco.yml',
+]
+
+pretrain_weights: https://paddledet.bj.bcebos.com/models/yolov3_r34_270e_coco.pdparams
+
+
+slim: Distill
+distill_loss: DistillYOLOv3Loss
+
+DistillYOLOv3Loss:
+  weight: 1000
diff --git a/configs/slim/extensions/yolov3_mobilenet_v1_coco_distill_prune.yml b/configs/slim/extensions/yolov3_mobilenet_v1_coco_distill_prune.yml
new file mode 100644
index 0000000000000000000000000000000000000000..f86fac5e9ed0f291c5b3f9b6266ac5755807422c
--- /dev/null
+++ b/configs/slim/extensions/yolov3_mobilenet_v1_coco_distill_prune.yml
@@ -0,0 +1,24 @@
+_BASE_: [
+  '../../yolov3/yolov3_r34_270e_coco.yml',
+]
+
+pretrain_weights: https://paddledet.bj.bcebos.com/models/yolov3_r34_270e_coco.pdparams
+
+slim: DistillPrune
+
+distill_loss: DistillYOLOv3Loss
+
+DistillYOLOv3Loss:
+  weight: 1000
+
+pruner: Pruner
+
+Pruner:
+  criterion: l1_norm
+  pruned_params: ['conv2d_27.w_0', 'conv2d_28.w_0', 'conv2d_29.w_0',
+                  'conv2d_30.w_0', 'conv2d_31.w_0', 'conv2d_32.w_0',
+                  'conv2d_34.w_0', 'conv2d_35.w_0', 'conv2d_36.w_0',
+                  'conv2d_37.w_0', 'conv2d_38.w_0', 'conv2d_39.w_0',
+                  'conv2d_41.w_0', 'conv2d_42.w_0', 'conv2d_43.w_0',
+                  'conv2d_44.w_0', 'conv2d_45.w_0', 'conv2d_46.w_0']
+  pruned_ratios: [0.5,0.5,0.5,0.5,0.5,0.5,0.7,0.7,0.7,0.7,0.7,0.7,0.8,0.8,0.8,0.8,0.8,0.8]
diff --git a/configs/slim/prune/yolov3_prune_fpgm.yml b/configs/slim/prune/yolov3_prune_fpgm.yml
index ed9495a73e4fbacbe20bbeb3093f2a7a406ea9e6..f3745386823a45a970d077d3201baffa3665490b 100644
--- a/configs/slim/prune/yolov3_prune_fpgm.yml
+++ b/configs/slim/prune/yolov3_prune_fpgm.yml
@@ -1,15 +1,14 @@
 # Weights of yolov3_mobilenet_v1_voc
 pretrain_weights: https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_voc.pdparams
-weight_type: resume
 slim: Pruner
 
 Pruner:
   criterion: fpgm
-  pruned_params: ['yolo_block.0.0.0.conv.weights', 'yolo_block.0.0.1.conv.weights', 'yolo_block.0.1.0.conv.weights',
-                  'yolo_block.0.1.1.conv.weights', 'yolo_block.0.2.conv.weights', 'yolo_block.0.tip.conv.weights',
-                  'yolo_block.1.0.0.conv.weights', 'yolo_block.1.0.1.conv.weights', 'yolo_block.1.1.0.conv.weights',
-                  'yolo_block.1.1.1.conv.weights', 'yolo_block.1.2.conv.weights', 'yolo_block.1.tip.conv.weights',
-                  'yolo_block.2.0.0.conv.weights', 'yolo_block.2.0.1.conv.weights', 'yolo_block.2.1.0.conv.weights',
-                  'yolo_block.2.1.1.conv.weights', 'yolo_block.2.2.conv.weights', 'yolo_block.2.tip.conv.weights']
+  pruned_params: ['conv2d_27.w_0', 'conv2d_28.w_0', 'conv2d_29.w_0',
+                  'conv2d_30.w_0', 'conv2d_31.w_0', 'conv2d_32.w_0',
+                  'conv2d_34.w_0', 'conv2d_35.w_0', 'conv2d_36.w_0',
+                  'conv2d_37.w_0', 'conv2d_38.w_0', 'conv2d_39.w_0',
+                  'conv2d_41.w_0', 'conv2d_42.w_0', 'conv2d_43.w_0',
+                  'conv2d_44.w_0', 'conv2d_45.w_0', 'conv2d_46.w_0']
   pruned_ratios: [0.1,0.2,0.2,0.2,0.2,0.1,0.2,0.3,0.3,0.3,0.2,0.1,0.3,0.4,0.4,0.4,0.4,0.3]
   print_params: False
diff --git a/configs/slim/prune/yolov3_prune_l1_norm.yml b/configs/slim/prune/yolov3_prune_l1_norm.yml
index db2a616daab7087e4b02c76c72df34c3a6a7937f..5b4f4667f2285cd73907df12aa1bd0f446a0f5c0 100644
--- a/configs/slim/prune/yolov3_prune_l1_norm.yml
+++ b/configs/slim/prune/yolov3_prune_l1_norm.yml
@@ -1,15 +1,14 @@
 # Weights of yolov3_mobilenet_v1_voc
 pretrain_weights: https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_voc.pdparams
-weight_type: resume
 slim: Pruner
 
 Pruner:
   criterion: l1_norm
-  pruned_params: ['yolo_block.0.0.0.conv.weights', 'yolo_block.0.0.1.conv.weights', 'yolo_block.0.1.0.conv.weights',
-                  'yolo_block.0.1.1.conv.weights', 'yolo_block.0.2.conv.weights', 'yolo_block.0.tip.conv.weights',
-                  'yolo_block.1.0.0.conv.weights', 'yolo_block.1.0.1.conv.weights', 'yolo_block.1.1.0.conv.weights',
-                  'yolo_block.1.1.1.conv.weights', 'yolo_block.1.2.conv.weights', 'yolo_block.1.tip.conv.weights',
-                  'yolo_block.2.0.0.conv.weights', 'yolo_block.2.0.1.conv.weights', 'yolo_block.2.1.0.conv.weights',
-                  'yolo_block.2.1.1.conv.weights', 'yolo_block.2.2.conv.weights', 'yolo_block.2.tip.conv.weights']
+  pruned_params: ['conv2d_27.w_0', 'conv2d_28.w_0', 'conv2d_29.w_0',
+                  'conv2d_30.w_0', 'conv2d_31.w_0', 'conv2d_32.w_0',
+                  'conv2d_34.w_0', 'conv2d_35.w_0', 'conv2d_36.w_0',
+                  'conv2d_37.w_0', 'conv2d_38.w_0', 'conv2d_39.w_0',
+                  'conv2d_41.w_0', 'conv2d_42.w_0', 'conv2d_43.w_0',
+                  'conv2d_44.w_0', 'conv2d_45.w_0', 'conv2d_46.w_0']
   pruned_ratios: [0.1,0.2,0.2,0.2,0.2,0.1,0.2,0.3,0.3,0.3,0.2,0.1,0.3,0.4,0.4,0.4,0.4,0.3]
   print_params: False
diff --git a/configs/slim/quant/mask_rcnn_r50_fpn_1x_qat.yml b/configs/slim/quant/mask_rcnn_r50_fpn_1x_qat.yml
new file mode 100644
index 0000000000000000000000000000000000000000..7363b4e55245024d5534a805be66301ca8b720fb
--- /dev/null
+++ b/configs/slim/quant/mask_rcnn_r50_fpn_1x_qat.yml
@@ -0,0 +1,22 @@
+pretrain_weights: https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_fpn_1x_coco.pdparams
+slim: QAT
+
+QAT:
+  quant_config: {
+    'weight_quantize_type': 'channel_wise_abs_max', 'activation_quantize_type': 'moving_average_abs_max',
+    'weight_bits': 8, 'activation_bits': 8, 'dtype': 'int8', 'window_size': 10000, 'moving_rate': 0.9,
+    'quantizable_layer_type': ['Conv2D', 'Linear']}
+  print_model: True
+
+
+epoch: 5
+
+LearningRate:
+  base_lr: 0.001
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [3, 4]
+  - !LinearWarmup
+    start_factor: 0.001
+    steps: 100
diff --git a/configs/slim/quant/ssd_mobilenet_v1_qat.yml b/configs/slim/quant/ssd_mobilenet_v1_qat.yml
new file mode 100644
index 0000000000000000000000000000000000000000..05e068368fced56bdd3298323cf901dbbe29f925
--- /dev/null
+++ b/configs/slim/quant/ssd_mobilenet_v1_qat.yml
@@ -0,0 +1,9 @@
+pretrain_weights: https://paddlemodels.bj.bcebos.com/object_detection/dygraph/ssd_mobilenet_v1_300_120e_voc.pdparams
+slim: QAT
+
+QAT:
+  quant_config: {
+    'weight_quantize_type': 'channel_wise_abs_max', 'activation_quantize_type': 'moving_average_abs_max',
+    'weight_bits': 8, 'activation_bits': 8, 'dtype': 'int8', 'window_size': 10000, 'moving_rate': 0.9,
+    'quantizable_layer_type': ['Conv2D', 'Linear']}
+  print_model: True
diff --git a/configs/slim/quant/yolov3_darknet_qat.yml b/configs/slim/quant/yolov3_darknet_qat.yml
new file mode 100644
index 0000000000000000000000000000000000000000..281b53418c215751470082794ef4c8d8b0d529e7
--- /dev/null
+++ b/configs/slim/quant/yolov3_darknet_qat.yml
@@ -0,0 +1,31 @@
+pretrain_weights: https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams
+slim: QAT
+
+QAT:
+  quant_config: {
+    'weight_quantize_type': 'channel_wise_abs_max', 'activation_quantize_type': 'moving_average_abs_max',
+    'weight_bits': 8, 'activation_bits': 8, 'dtype': 'int8', 'window_size': 10000, 'moving_rate': 0.9,
+    'quantizable_layer_type': ['Conv2D', 'Linear']}
+  print_model: True
+
+epoch: 50
+
+LearningRate:
+  base_lr: 0.0001
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones:
+    - 30
+    - 45
+  - !LinearWarmup
+    start_factor: 0.
+    steps: 1000
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0005
+    type: L2
diff --git a/configs/slim/quant/yolov3_mobilenet_v1_qat.yml b/configs/slim/quant/yolov3_mobilenet_v1_qat.yml
index dfa365c10f581499192fc043031087971e8a44f8..d1452082983ced70d1709343cd42017d8a19d361 100644
--- a/configs/slim/quant/yolov3_mobilenet_v1_qat.yml
+++ b/configs/slim/quant/yolov3_mobilenet_v1_qat.yml
@@ -1,6 +1,5 @@
 # Weights of yolov3_mobilenet_v1_coco
 pretrain_weights: https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_coco.pdparams
-weight_type: resume
 slim: QAT
 
 QAT:
diff --git a/configs/slim/quant/yolov3_mobilenet_v3_qat.yml b/configs/slim/quant/yolov3_mobilenet_v3_qat.yml
index 288e72a10ec05c9753c588eda26e474b2ffc8afa..81269090865a8cac0e98af8c55b723eaf84fbf94 100644
--- a/configs/slim/quant/yolov3_mobilenet_v3_qat.yml
+++ b/configs/slim/quant/yolov3_mobilenet_v3_qat.yml
@@ -1,6 +1,5 @@
 # Weights of yolov3_mobilenet_v3_coco
 pretrain_weights: https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_270e_coco.pdparams
-weight_type: resume
 slim: QAT
 
 QAT:
@@ -10,3 +9,16 @@ QAT:
     'weight_bits': 8, 'activation_bits': 8, 'dtype': 'int8', 'window_size': 10000, 'moving_rate': 0.9,
     'quantizable_layer_type': ['Conv2D', 'Linear']}
   print_model: True
+
+epoch: 30
+LearningRate:
+  base_lr: 0.0001
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones:
+    - 25
+    - 28
+  - !LinearWarmup
+    start_factor: 0.
+    steps: 2000
diff --git a/configs/solov2/README.md b/configs/solov2/README.md
index 68f0bc06d7747470a10a5a1fb8b9a26418bc154f..b3268df2e491d8ea6ad45caf8778154e99c43447 100644
--- a/configs/solov2/README.md
+++ b/configs/solov2/README.md
@@ -19,8 +19,8 @@ SOLOv2 (Segmenting Objects by Locations) is a fast instance segmentation framewo
 | BlendMask | R50-FPN | True        |   3x    |     37.8        |  13.5  | V100 |   -  |  -  |
 | SOLOv2 (Paper) | R50-FPN | False        |   1x    |     34.8        |  18.5  | V100 |   -  |  -  |
 | SOLOv2 (Paper) | X101-DCN-FPN | True        |   3x    |     42.4        |  5.9  | V100 |   -  |  -  |
-| SOLOv2 | R50-FPN                 |  False                |   1x    |    35.5         |  21.9     | V100 |  [model](https://paddledet.bj.bcebos.com/models/solov2_r50_fpn_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/solov2/solov2_r50_fpn_1x_coco.yml) |
-| SOLOv2 | R50-FPN                 |  True                |   3x    |     38.0         |   21.9    | V100 |  [model](https://paddledet.bj.bcebos.com/models/solov2_r50_fpn_3x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/solov2/solov2_r50_fpn_3x_coco.yml) |
+| SOLOv2 | R50-FPN                 |  False                |   1x    |    35.5         |  21.9     | V100 |  [model](https://paddledet.bj.bcebos.com/models/solov2_r50_fpn_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/solov2/solov2_r50_fpn_1x_coco.yml) |
+| SOLOv2 | R50-FPN                 |  True                |   3x    |     38.0         |   21.9    | V100 |  [model](https://paddledet.bj.bcebos.com/models/solov2_r50_fpn_3x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/solov2/solov2_r50_fpn_3x_coco.yml) |
 
 **Notes:**
 
diff --git a/configs/ssd/README.md b/configs/ssd/README.md
index 9340e7e812986bbcc427f27a485b2ca9f376e50d..b2bcd67d1d38406d4da62b9bd7419a613d750ac0 100644
--- a/configs/ssd/README.md
+++ b/configs/ssd/README.md
@@ -6,8 +6,8 @@
 
 | 骨架网络        | 网络类型       | 每张GPU图片个数 | 学习率策略 |推理时间(fps) | Box AP |                           下载                          | 配置文件 |
 | :-------------- | :------------- | :-----: | :-----: | :------------: | :-----: | :-----------------------------------------------------: | :-----: |
-| VGG             | SSD            |    8    |   240e    |     ----     |  77.8  | [下载链接](https://paddledet.bj.bcebos.com/models/ssd_vgg16_300_240e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/ssd/ssd_vgg16_300_240e_voc.yml) |
-| MobileNet v1    | SSD            |    32    |   120e    |     ----     |  73.8  | [下载链接](https://paddledet.bj.bcebos.com/models/ssd_mobilenet_v1_300_120e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/ssd/ssd_mobilenet_v1_300_120e_voc.yml) |
+| VGG             | SSD            |    8    |   240e    |     ----     |  77.8  | [下载链接](https://paddledet.bj.bcebos.com/models/ssd_vgg16_300_240e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ssd/ssd_vgg16_300_240e_voc.yml) |
+| MobileNet v1    | SSD            |    32    |   120e    |     ----     |  73.8  | [下载链接](https://paddledet.bj.bcebos.com/models/ssd_mobilenet_v1_300_120e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ssd/ssd_mobilenet_v1_300_120e_voc.yml) |
 
 **注意：** SSD-VGG使用4GPU在总batch size为32下训练240个epoch。SSD-MobileNetv1使用2GPU在总batch size为64下训练120周期。
 
diff --git a/configs/ssd/ssd_mobilenet_v1_300_120e_voc.yml b/configs/ssd/ssd_mobilenet_v1_300_120e_voc.yml
index 45de7733a3f263b7dbf84a0f0bf933eedcc9b78e..3453f027682456f738be55d7b983efde0620a570 100644
--- a/configs/ssd/ssd_mobilenet_v1_300_120e_voc.yml
+++ b/configs/ssd/ssd_mobilenet_v1_300_120e_voc.yml
@@ -6,3 +6,7 @@ _BASE_: [
   '_base_/ssd_mobilenet_reader.yml',
 ]
 weights: output/ssd_mobilenet_v1_300_120e_voc/model_final
+
+EvalReader:
+  batch_transforms:
+  - PadBatch: {pad_gt: True}
diff --git a/configs/ssd/ssd_vgg16_300_240e_voc.yml b/configs/ssd/ssd_vgg16_300_240e_voc.yml
index 58cf4b9855a4ca414faf67eae2179f40fc63bf77..e2e2d307cbf4a67c9ff8ee806892657bfe39768c 100644
--- a/configs/ssd/ssd_vgg16_300_240e_voc.yml
+++ b/configs/ssd/ssd_vgg16_300_240e_voc.yml
@@ -6,3 +6,7 @@ _BASE_: [
   '_base_/ssd_reader.yml',
 ]
 weights: output/ssd_vgg16_300_240e_voc/model_final
+
+EvalReader:
+  batch_transforms:
+  - PadBatch: {pad_gt: True}
diff --git a/configs/ttfnet/README.md b/configs/ttfnet/README.md
index 392df1375e6e5a0fce90e09761006d4ceb461f41..a20660ecb4bb6315d5bebeb4176a005dc0e03e69 100644
--- a/configs/ttfnet/README.md
+++ b/configs/ttfnet/README.md
@@ -1,4 +1,4 @@
-# TTFNet
+# 1. TTFNet
 
 ## 简介
 
@@ -13,7 +13,49 @@ TTFNet是一种用于实时目标检测且对训练时间友好的网络，对Ce
 
 | 骨架网络        | 网络类型       | 每张GPU图片个数 | 学习率策略 |推理时间(fps) | Box AP |                           下载                          | 配置文件 |
 | :-------------- | :------------- | :-----: | :-----: | :------------: | :-----: | :-----------------------------------------------------: | :-----: |
-| DarkNet53    | TTFNet           |    12    |   1x      |     ----     |  33.5  | [下载链接](https://paddledet.bj.bcebos.com/models/ttfnet_darknet53_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/ttfnet/ttfnet_darknet53_1x_coco.yml) |
+| DarkNet53    | TTFNet           |    12    |   1x      |     ----     |  33.5  | [下载链接](https://paddledet.bj.bcebos.com/models/ttfnet_darknet53_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ttfnet/ttfnet_darknet53_1x_coco.yml) |
+
+
+
+
+
+# 2. PAFNet
+
+## 简介
+
+PAFNet（Paddle Anchor Free）是PaddleDetection基于TTFNet的优化模型，精度达到anchor free领域SOTA水平，同时产出移动端轻量级模型PAFNet-Lite
+
+PAFNet系列模型从如下方面优化TTFNet模型：
+
+- [CutMix](https://arxiv.org/abs/1905.04899)
+- 更优的骨干网络: ResNet50vd-DCN
+- 更大的训练batch size: 8 GPUs，每GPU batch_size=18
+- Synchronized Batch Normalization
+- [Deformable Convolution](https://arxiv.org/abs/1703.06211)
+- [Exponential Moving Average](https://www.investopedia.com/terms/e/ema.asp)
+- 更优的预训练模型
+
+
+## 模型库
+
+| 骨架网络        | 网络类型       | 每张GPU图片个数 | 学习率策略 |推理时间(fps) | Box AP |                           下载                          | 配置文件 |
+| :-------------- | :------------- | :-----: | :-----: | :------------: | :-----: | :-----------------------------------------------------: | :-----: |
+| ResNet50vd   | PAFNet           |    18    |   10x      |     ----     |  39.8  | [下载链接](https://paddledet.bj.bcebos.com/models/pafnet_10x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ttfnet/pafnet_10x_coco.yml) |
+
+
+
+### PAFNet-Lite
+
+| 骨架网络        | 网络类型       | 每张GPU图片个数 | 学习率策略 | Box AP | 麒麟990延时（ms） | 体积（M）                          | 下载                          | 配置文件 |
+| :-------------- | :------------- | :-----: | :-----: | :-----: | :------------: | :-----: | :-----------------------------------------------------: | :-----: |
+| MobileNetv3   |  PAFNet-Lite          |    12    |   20x     |     23.9    |  26.00   | 14 | [下载链接](https://paddledet.bj.bcebos.com/models/pafnet_lite_mobilenet_v3_20x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ttfnet/pafnet_lite_mobilenet_v3_20x_coco.yml) |
+
+**注意：** 由于动态图框架整体升级，PAFNet的PaddleDetection发布的权重模型评估时需要添加--bias字段, 例如
+
+```bash
+# 使用PaddleDetection发布的权重
+CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/ttfnet/pafnet_10x_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/pafnet_10x_coco.pdparams --bias
+```
 
 ## Citations
 ```
diff --git a/configs/ttfnet/_base_/optimizer_10x.yml b/configs/ttfnet/_base_/optimizer_10x.yml
new file mode 100644
index 0000000000000000000000000000000000000000..dd2c29d966650d76b0636b3f889e13efbbe5d95a
--- /dev/null
+++ b/configs/ttfnet/_base_/optimizer_10x.yml
@@ -0,0 +1,19 @@
+epoch: 120
+
+LearningRate:
+  base_lr: 0.015
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [80, 110]
+  - !LinearWarmup
+    start_factor: 0.2
+    steps: 500
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0004
+    type: L2
diff --git a/configs/ttfnet/_base_/optimizer_20x.yml b/configs/ttfnet/_base_/optimizer_20x.yml
new file mode 100644
index 0000000000000000000000000000000000000000..4dd3492202a3fdf9a612541c0ecd1dc76f1b6519
--- /dev/null
+++ b/configs/ttfnet/_base_/optimizer_20x.yml
@@ -0,0 +1,20 @@
+epoch: 240
+
+LearningRate:
+  base_lr: 0.015
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones: [160, 220]
+  - !LinearWarmup
+    start_factor: 0.2
+    steps: 1000
+
+OptimizerBuilder:
+  clip_grad_by_norm: 35
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0004
+    type: L2
diff --git a/configs/ttfnet/_base_/pafnet.yml b/configs/ttfnet/_base_/pafnet.yml
new file mode 100644
index 0000000000000000000000000000000000000000..5319fe6c898a8d0b31223882fbe80de17345a586
--- /dev/null
+++ b/configs/ttfnet/_base_/pafnet.yml
@@ -0,0 +1,41 @@
+architecture: TTFNet
+pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_pretrained.pdparams
+norm_type: sync_bn
+use_ema: true
+ema_decay: 0.9998
+
+TTFNet:
+  backbone: ResNet
+  neck: TTFFPN
+  ttf_head: TTFHead
+  post_process: BBoxPostProcess
+
+ResNet:
+  depth: 50
+  variant: d
+  return_idx: [0, 1, 2, 3]
+  freeze_at: -1
+  norm_decay: 0.
+  variant: d
+  dcn_v2_stages: [1, 2, 3]
+
+TTFFPN:
+  planes: [256, 128, 64]
+  shortcut_num: [3, 2, 1]
+
+TTFHead:
+  dcn_head: true
+  hm_loss:
+    name: CTFocalLoss
+    loss_weight: 1.
+  wh_loss:
+    name: GIoULoss
+    loss_weight: 5.
+    reduction: sum
+
+BBoxPostProcess:
+  decode:
+    name: TTFBox
+    max_per_img: 100
+    score_thresh: 0.01
+    down_ratio: 4
diff --git a/configs/ttfnet/_base_/pafnet_lite.yml b/configs/ttfnet/_base_/pafnet_lite.yml
new file mode 100644
index 0000000000000000000000000000000000000000..5ed2fa235b6eb0f35690183a884dabbea43b279e
--- /dev/null
+++ b/configs/ttfnet/_base_/pafnet_lite.yml
@@ -0,0 +1,44 @@
+architecture: TTFNet
+pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/MobileNetV3_large_x1_0_ssld_pretrained.pdparams
+norm_type: sync_bn
+
+TTFNet:
+  backbone: MobileNetV3
+  neck: TTFFPN
+  ttf_head: TTFHead
+  post_process: BBoxPostProcess
+
+MobileNetV3:
+  scale: 1.0
+  model_name: large
+  feature_maps: [5, 8, 14, 17]
+  with_extra_blocks: true
+  lr_mult_list: [0.25, 0.25, 0.5, 0.5, 0.75]
+  conv_decay: 0.00001
+  norm_decay: 0.0
+  extra_block_filters: []
+
+TTFFPN:
+  planes: [96, 48, 24]
+  shortcut_num: [2, 2, 1]
+  lite_neck: true
+  fusion_method: concat
+
+TTFHead:
+  hm_head_planes: 48
+  wh_head_planes: 24
+  lite_head: true
+  hm_loss:
+    name: CTFocalLoss
+    loss_weight: 1.
+  wh_loss:
+    name: GIoULoss
+    loss_weight: 5.
+    reduction: sum
+
+BBoxPostProcess:
+  decode:
+    name: TTFBox
+    max_per_img: 100
+    score_thresh: 0.01
+    down_ratio: 4
diff --git a/configs/ttfnet/_base_/pafnet_lite_reader.yml b/configs/ttfnet/_base_/pafnet_lite_reader.yml
new file mode 100644
index 0000000000000000000000000000000000000000..446a13a3cfae2e51f5d48c49655e56b8907fe7ad
--- /dev/null
+++ b/configs/ttfnet/_base_/pafnet_lite_reader.yml
@@ -0,0 +1,40 @@
+worker_num: 2
+TrainReader:
+  sample_transforms:
+    - Decode: {}
+    - Cutmix: {alpha: 1.5, beta: 1.5}
+    - RandomDistort: {}
+    - RandomExpand: {fill_value: [123.675, 116.28, 103.53]}
+    - RandomCrop: {aspect_ratio: NULL, cover_all_box: True}
+    - RandomFlip: {}
+    - GridMask: {upper_iter: 300000}
+  batch_transforms:
+    - BatchRandomResize: {target_size: [320, 352, 384, 416, 448, 480, 512], random_interp: True, keep_ratio: False}
+    - NormalizeImage: {mean: [123.675, 116.28, 103.53], std: [58.395, 57.12, 57.375], is_scale: false}
+    - Permute: {}
+    - Gt2TTFTarget: {down_ratio: 4}
+    - PadBatch: {pad_to_stride: 32}
+  batch_size: 12
+  shuffle: true
+  drop_last: true
+  use_shared_memory: true
+
+EvalReader:
+  sample_transforms:
+  - Decode: {}
+  - Resize: {interp: 1, target_size: [320, 320], keep_ratio: False}
+  - NormalizeImage: {is_scale: false, mean: [123.675, 116.28, 103.53], std: [58.395, 57.12, 57.375]}
+  - Permute: {}
+  batch_size: 1
+  drop_last: false
+  drop_empty: false
+
+TestReader:
+  sample_transforms:
+  - Decode: {}
+  - Resize: {interp: 1, target_size: [320, 320], keep_ratio: False}
+  - NormalizeImage: {is_scale: false, mean: [123.675, 116.28, 103.53], std: [58.395, 57.12, 57.375]}
+  - Permute: {}
+  batch_size: 1
+  drop_last: false
+  drop_empty: false
diff --git a/configs/ttfnet/_base_/pafnet_reader.yml b/configs/ttfnet/_base_/pafnet_reader.yml
new file mode 100644
index 0000000000000000000000000000000000000000..ea90a134f03ab90427e994b6a9991b5e4534c5be
--- /dev/null
+++ b/configs/ttfnet/_base_/pafnet_reader.yml
@@ -0,0 +1,40 @@
+worker_num: 2
+TrainReader:
+  sample_transforms:
+  - Decode: {}
+  - Cutmix: {alpha: 1.5, beta: 1.5}
+  - RandomDistort: {random_apply: false, random_channel: true}
+  - RandomExpand: {fill_value: [123.675, 116.28, 103.53]}
+  - RandomCrop: {aspect_ratio: NULL, cover_all_box: True}
+  - RandomFlip: {prob: 0.5}
+  batch_transforms:
+  - BatchRandomResize: {target_size: [416, 448, 480, 512, 544, 576, 608, 640, 672], keep_ratio: false}
+  - NormalizeImage: {mean: [123.675, 116.28, 103.53], std: [58.395, 57.12, 57.375], is_scale: false}
+  - Permute: {}
+  - Gt2TTFTarget: {down_ratio: 4}
+  - PadBatch: {pad_to_stride: 32}
+  batch_size: 18
+  shuffle: true
+  drop_last: true
+  use_shared_memory: true
+  mixup_epoch: 100
+
+EvalReader:
+  sample_transforms:
+  - Decode: {}
+  - Resize: {interp: 1, target_size: [512, 512], keep_ratio: False}
+  - NormalizeImage: {is_scale: false, mean: [123.675, 116.28, 103.53], std: [58.395, 57.12, 57.375]}
+  - Permute: {}
+  batch_size: 1
+  drop_last: false
+  drop_empty: false
+
+TestReader:
+  sample_transforms:
+  - Decode: {}
+  - Resize: {interp: 1, target_size: [512, 512], keep_ratio: False}
+  - NormalizeImage: {is_scale: false, mean: [123.675, 116.28, 103.53], std: [58.395, 57.12, 57.375]}
+  - Permute: {}
+  batch_size: 1
+  drop_last: false
+  drop_empty: false
diff --git a/configs/ttfnet/_base_/ttfnet_darknet53.yml b/configs/ttfnet/_base_/ttfnet_darknet53.yml
index 90b0c3361b12c41d6556910efe09d2159b22e793..05c7dce6503209c76da2c62613e3e2960ce47cc0 100644
--- a/configs/ttfnet/_base_/ttfnet_darknet53.yml
+++ b/configs/ttfnet/_base_/ttfnet_darknet53.yml
@@ -14,8 +14,9 @@ DarkNet:
   norm_type: bn
   norm_decay: 0.0004
 
-# use default config
-# TTFFPN:
+TTFFPN:
+  planes: [256, 128, 64]
+  shortcut_num: [3, 2, 1]
 
 TTFHead:
   hm_loss:
diff --git a/configs/ttfnet/_base_/ttfnet_reader.yml b/configs/ttfnet/_base_/ttfnet_reader.yml
index 5a69c59ddf4cf68a8488a107360c83e34dbf7f01..f9ed6cc57d9609afb25d3027fac32570287a17d1 100644
--- a/configs/ttfnet/_base_/ttfnet_reader.yml
+++ b/configs/ttfnet/_base_/ttfnet_reader.yml
@@ -8,7 +8,7 @@ TrainReader:
   - Permute: {}
   batch_transforms:
   - Gt2TTFTarget: {down_ratio: 4}
-  - PadBatch: {pad_to_stride: 32, pad_gt: true}
+  - PadBatch: {pad_to_stride: 32}
   batch_size: 12
   shuffle: true
   drop_last: true
diff --git a/configs/ttfnet/pafnet_10x_coco.yml b/configs/ttfnet/pafnet_10x_coco.yml
new file mode 100644
index 0000000000000000000000000000000000000000..b14a2bc912cce4cc4b0edca538cc19c3e51f65a5
--- /dev/null
+++ b/configs/ttfnet/pafnet_10x_coco.yml
@@ -0,0 +1,8 @@
+_BASE_: [
+  '../datasets/coco_detection.yml',
+  '../runtime.yml',
+  '_base_/optimizer_10x.yml',
+  '_base_/pafnet.yml',
+  '_base_/pafnet_reader.yml',
+]
+weights: output/pafnet_10x_coco/model_final
diff --git a/configs/ttfnet/pafnet_lite_mobilenet_v3_20x_coco.yml b/configs/ttfnet/pafnet_lite_mobilenet_v3_20x_coco.yml
new file mode 100644
index 0000000000000000000000000000000000000000..577af1635acc3c7778114db78775bd720727a588
--- /dev/null
+++ b/configs/ttfnet/pafnet_lite_mobilenet_v3_20x_coco.yml
@@ -0,0 +1,8 @@
+_BASE_: [
+  '../datasets/coco_detection.yml',
+  '../runtime.yml',
+  '_base_/optimizer_20x.yml',
+  '_base_/pafnet_lite.yml',
+  '_base_/pafnet_lite_reader.yml',
+]
+weights: output/pafnet_lite_mobilenet_v3_10x_coco/model_final
diff --git a/configs/vehicle/README.md b/configs/vehicle/README.md
index 9219e9c1f9ad60d84754fb618b390fbda8a0434b..56e5e19829a2842a77c03d9e3f5a305211c768c9 100644
--- a/configs/vehicle/README.md
+++ b/configs/vehicle/README.md
@@ -1,11 +1,11 @@
-English | [简体中文](CONTRIB_cn.md)
+English | [简体中文](README_cn.md)
 # PaddleDetection applied for specific scenarios
 
 We provide some models implemented by PaddlePaddle to detect objects in specific scenarios, users can download the models and use them in these scenarios.
 
 | Task                 | Algorithm | Box AP | Download                                                                                | Configs |
 |:---------------------|:---------:|:------:| :-------------------------------------------------------------------------------------: |:------:|
-| Vehicle Detection    |  YOLOv3  |  54.5  | [model](https://paddledet.bj.bcebos.com/models/vehicle_yolov3_darknet.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/vehicle/vehicle_yolov3_darknet.yml) |
+| Vehicle Detection    |  YOLOv3  |  54.5  | [model](https://paddledet.bj.bcebos.com/models/vehicle_yolov3_darknet.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/vehicle/vehicle_yolov3_darknet.yml) |
 
 ## Vehicle Detection
 
@@ -17,7 +17,7 @@ The network for detecting vehicles is YOLOv3, the backbone of which is Dacknet53
 
 ### 2. Configuration for training
 
-PaddleDetection provides users with a configuration file [yolov3_darknet53_270e_coco.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/master/dygraph/configs/yolov3/yolov3_darknet53_270e_coco.yml) to train YOLOv3 on the COCO dataset, compared with this file, we modify some parameters as followed to conduct the training for vehicle detection:
+PaddleDetection provides users with a configuration file [yolov3_darknet53_270e_coco.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.0/configs/yolov3/yolov3_darknet53_270e_coco.yml) to train YOLOv3 on the COCO dataset, compared with this file, we modify some parameters as followed to conduct the training for vehicle detection:
 
 * num_classes: 6
 * anchors: [[8, 9], [10, 23], [19, 15], [23, 33], [40, 25], [54, 50], [101, 80], [139, 145], [253, 224]]
@@ -48,6 +48,6 @@ python -u tools/infer.py -c configs/vehicle/vehicle_yolov3_darknet.yml \
 
 Some inference results are visualized below:
 
-![](https://github.com/PaddlePaddle/PaddleDetection/tree/master/docs/images/VehicleDetection_001.jpeg)
+![](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/static/docs/images/VehicleDetection_001.jpeg)
 
-![](https://github.com/PaddlePaddle/PaddleDetection/tree/master/docs/images/VehicleDetection_005.png)
+![](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/static/docs/images/VehicleDetection_005.png)
diff --git a/configs/vehicle/README_cn.md b/configs/vehicle/README_cn.md
index 9a030fdc278127fc8af946f756cadff4f39943d7..5fd7f66a0fcae8aed1be0d34962bc865dd25350c 100644
--- a/configs/vehicle/README_cn.md
+++ b/configs/vehicle/README_cn.md
@@ -1,11 +1,11 @@
-[English](CONTRIB.md) | 简体中文
+[English](README.md) | 简体中文
 # 特色垂类检测模型
 
 我们提供了针对不同场景的基于PaddlePaddle的检测模型，用户可以下载模型进行使用。
 
 | 任务                 | 算法 | 精度(Box AP) | 下载                                                                                | 配置文件 |
 |:---------------------|:---------:|:------:| :---------------------------------------------------------------------------------: | :------:|
-| 车辆检测    |  YOLOv3  |  54.5  | [下载链接](https://paddledet.bj.bcebos.com/models/vehicle_yolov3_darknet.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/vehicle/vehicle_yolov3_darknet.yml) |
+| 车辆检测    |  YOLOv3  |  54.5  | [下载链接](https://paddledet.bj.bcebos.com/models/vehicle_yolov3_darknet.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/vehicle/vehicle_yolov3_darknet.yml) |
 
 
 ## 车辆检测（Vehicle Detection）
@@ -18,7 +18,7 @@ Backbone为Dacknet53的YOLOv3。
 
 ### 2. 训练参数配置
 
-PaddleDetection提供了使用COCO数据集对YOLOv3进行训练的参数配置文件[yolov3_darknet53_270e_coco.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/master/dygraph/configs/yolov3/yolov3_darknet53_270e_coco.yml)，与之相比，在进行车辆检测的模型训练时，我们对以下参数进行了修改：
+PaddleDetection提供了使用COCO数据集对YOLOv3进行训练的参数配置文件[yolov3_darknet53_270e_coco.yml](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.0/configs/yolov3/yolov3_darknet53_270e_coco.yml)，与之相比，在进行车辆检测的模型训练时，我们对以下参数进行了修改：
 
 * num_classes: 6
 * anchors: [[8, 9], [10, 23], [19, 15], [23, 33], [40, 25], [54, 50], [101, 80], [139, 145], [253, 224]]
@@ -49,6 +49,6 @@ python -u tools/infer.py -c configs/vehicle/vehicle_yolov3_darknet.yml \
 
 预测结果示例：
 
-![](https://github.com/PaddlePaddle/PaddleDetection/tree/master/docs/images/VehicleDetection_001.jpeg)
+![](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/static/docs/images/VehicleDetection_001.jpeg)
 
-![](https://github.com/PaddlePaddle/PaddleDetection/tree/master/docs/images/VehicleDetection_005.png)
+![](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/static/docs/images/VehicleDetection_005.png)
diff --git a/configs/yolov3/README.md b/configs/yolov3/README.md
index 97465271518066b886ed023372afbde7b267cd03..e4408c566c841b6cb7c24b959759a8b89a66f287 100644
--- a/configs/yolov3/README.md
+++ b/configs/yolov3/README.md
@@ -9,30 +9,53 @@
 | DarkNet53(paper)  | 608         |    8    |   270e    |     ----     |  33.0  |    -   |    -   |
 | DarkNet53(paper)  | 416         |    8    |   270e    |     ----     |  31.0  |    -   |    -   |
 | DarkNet53(paper)  | 320         |    8    |   270e    |     ----     |  28.2  |    -   |    -   |
-| DarkNet53         | 608         |    8    |   270e    |     ----     |  39.0  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/yolov3/yolov3_darknet53_270e_coco.yml) |
-| DarkNet53         | 416         |    8    |   270e    |     ----     |  37.5  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/yolov3/yolov3_darknet53_270e_coco.yml) |
-| DarkNet53         | 320         |    8    |   270e    |     ----     |  34.6  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/yolov3/yolov3_darknet53_270e_coco.yml) |
-|   ResNet50_vd        | 608        |    8    |   270e    |     ----     |  39.1  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_r50vd_dcn_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/yolov3/yolov3_r50vd_dcn_270e_coco.yml) |
-| MobileNet-V1         | 608         |    8    |   270e    |     ----     |  28.8  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml) |
-| MobileNet-V1         | 416         |    8    |   270e    |     ----     |  28.7  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml) |
-| MobileNet-V1         | 320         |    8    |   270e    |     ----     |  26.5  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml) |
-| MobileNet-V3         | 608         |    8    |   270e    |     ----     |  31.4  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/yolov3/yolov3_mobilenet_v3_large_270e_coco.yml) |
-| MobileNet-V3         | 416         |    8    |   270e    |     ----     |  29.7  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/yolov3/yolov3_mobilenet_v3_large_270e_coco.yml) |
-| MobileNet-V3         | 320         |    8    |   270e    |     ----     |  26.9  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/yolov3/yolov3_mobilenet_v3_large_270e_coco.yml) |
+| DarkNet53         | 608         |    8    |   270e    |     ----     |  39.0  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_darknet53_270e_coco.yml) |
+| DarkNet53         | 416         |    8    |   270e    |     ----     |  37.5  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_darknet53_270e_coco.yml) |
+| DarkNet53         | 320         |    8    |   270e    |     ----     |  34.6  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_darknet53_270e_coco.yml) |
+|   ResNet50_vd        | 608        |    8    |   270e    |     ----     |  39.1  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_r50vd_dcn_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_r50vd_dcn_270e_coco.yml) |
+|   ResNet50_vd        | 416        |    8    |   270e    |     ----     |  36.6  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_r50vd_dcn_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_r50vd_dcn_270e_coco.yml) |
+|   ResNet50_vd        | 320        |    8    |   270e    |     ----     |  33.6  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_r50vd_dcn_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_r50vd_dcn_270e_coco.yml) |
+| ResNet34         | 608         |    8    |   270e    |     ----     |  36.2  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_r34_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_r34_270e_coco.yml) |
+| ResNet34         | 416         |    8    |   270e    |     ----     |  34.3  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_r34_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_r34_270e_coco.yml) |
+| ResNet34         | 320         |    8    |   270e    |     ----     |  31.2  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_r34_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_r34_270e_coco.yml) |
+| MobileNet-V1         | 608         |    8    |   270e    |     ----     |  29.4  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml) |
+| MobileNet-V1         | 416         |    8    |   270e    |     ----     |  29.3  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml) |
+| MobileNet-V1         | 320         |    8    |   270e    |     ----     |  27.2  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml) |
+| MobileNet-V3         | 608         |    8    |   270e    |     ----     |  31.4  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v3_large_270e_coco.yml) |
+| MobileNet-V3         | 416         |    8    |   270e    |     ----     |  29.6  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v3_large_270e_coco.yml) |
+| MobileNet-V3         | 320         |    8    |   270e    |     ----     |  27.1  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v3_large_270e_coco.yml) |
+| MobileNet-V1-SSLD    | 608         |    8    |   270e    |     ----     |  31.0  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_coco.yml) |
+| MobileNet-V1-SSLD    | 416         |    8    |   270e    |     ----     |  30.6  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_coco.yml) |
+| MobileNet-V1-SSLD    | 320         |    8    |   270e    |     ----     |  28.4  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_coco.yml) |
 
 ### YOLOv3 on Pasacl VOC
 
 | 骨架网络     | 输入尺寸 | 每张GPU图片个数 | 学习率策略 |推理时间(fps)| Box AP | 下载 | 配置文件 |
 | :----------- | :--: | :-----: | :-----: |:------------: |:----: | :-------: | :----: |
-| MobileNet-V1 | 608  |    8    |   270e  |      -        |  75.1  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/yolov3/yolov3_mobilenet_v1_270e_voc.yml) |
-| MobileNet-V1 | 416  |    8    |   270e  |      -        |  76.1  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/yolov3/yolov3_mobilenet_v1_270e_voc.yml) |
-| MobileNet-V1 | 320  |    8    |   270e  |      -        |  73.6  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/yolov3/yolov3_mobilenet_v1_270e_voc.yml) |
-| MobileNet-V3 | 608  |    8    |   270e  |      -        |  79.6  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_270e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/yolov3/yolov3_mobilenet_v3_large_270e_voc.yml) |
-| MobileNet-V3 | 416  |    8    |   270e  |      -        |  78.6  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_270e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/yolov3/yolov3_mobilenet_v3_large_270e_voc.yml) |
-| MobileNet-V3 | 320  |    8    |   270e  |      -        |  76.4  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_270e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/yolov3/yolov3_mobilenet_v3_large_270e_voc.yml) |
+| MobileNet-V1 | 608  |    8    |   270e  |      -        |  75.2  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v1_270e_voc.yml) |
+| MobileNet-V1 | 416  |    8    |   270e  |      -        |  76.2  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v1_270e_voc.yml) |
+| MobileNet-V1 | 320  |    8    |   270e  |      -        |  74.3  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v1_270e_voc.yml) |
+| MobileNet-V3 | 608  |    8    |   270e  |      -        |  79.6  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_270e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v3_large_270e_voc.yml) |
+| MobileNet-V3 | 416  |    8    |   270e  |      -        |  78.6  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_270e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v3_large_270e_voc.yml) |
+| MobileNet-V3 | 320  |    8    |   270e  |      -        |  76.4  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_270e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v3_large_270e_voc.yml) |
+| MobileNet-V1-SSLD | 608  |    8    |   270e  |      -        |  78.3  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_voc.yml) |
+| MobileNet-V1-SSLD | 416  |    8    |   270e  |      -        |  79.6  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_voc.yml) |
+| MobileNet-V1-SSLD | 320  |    8    |   270e  |      -        |  77.3  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_voc.yml) |
+| MobileNet-V3-SSLD | 608  |    8    |   270e  |      -        |  80.4  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_ssld_270e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v3_large_ssld_270e_voc.yml) |
+| MobileNet-V3-SSLD | 416  |    8    |   270e  |      -        |  79.2  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_ssld_270e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v3_large_ssld_270e_voc.yml) |
+| MobileNet-V3-SSLD | 320  |    8    |   270e  |      -        |  77.3  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_ssld_270e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v3_large_ssld_270e_voc.yml) |
 
-**注意：** YOLOv3均使用8GPU训练，训练270个epoch
+**注意：** YOLOv3均使用8GPU训练，训练270个epoch。由于动态图框架整体升级，以下几个PaddleDetection发布的权重模型评估时需要添加--bias字段, 例如
 
+```bash
+# 使用PaddleDetection发布的权重
+CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/yolov3_darknet53_270e_coco.pdparams --bias
+```
+主要有:
+
+1.yolov3_darknet53_270e_coco
+
+2.yolov3_r50vd_dcn_270e_coco
 
 ## Citations
 ```
diff --git a/configs/yolov3/_base_/yolov3_r34.yml b/configs/yolov3/_base_/yolov3_r34.yml
new file mode 100644
index 0000000000000000000000000000000000000000..c2d1489f07ba65240e5b545662b8c1672750b705
--- /dev/null
+++ b/configs/yolov3/_base_/yolov3_r34.yml
@@ -0,0 +1,41 @@
+architecture: YOLOv3
+pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet34_pretrained.pdparams
+norm_type: sync_bn
+
+YOLOv3:
+  backbone: ResNet
+  neck: YOLOv3FPN
+  yolo_head: YOLOv3Head
+  post_process: BBoxPostProcess
+
+ResNet:
+  depth: 34
+  return_idx: [1, 2, 3]
+  freeze_at: -1
+  freeze_norm: false
+  norm_decay: 0.
+
+YOLOv3Head:
+  anchors: [[10, 13], [16, 30], [33, 23],
+            [30, 61], [62, 45], [59, 119],
+            [116, 90], [156, 198], [373, 326]]
+  anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
+  loss: YOLOv3Loss
+
+YOLOv3Loss:
+  ignore_thresh: 0.7
+  downsample: [32, 16, 8]
+  label_smooth: false
+
+BBoxPostProcess:
+  decode:
+    name: YOLOBox
+    conf_thresh: 0.005
+    downsample_ratio: 32
+    clip_bbox: true
+  nms:
+    name: MultiClassNMS
+    keep_top_k: 100
+    score_threshold: 0.01
+    nms_threshold: 0.45
+    nms_top_k: 1000
diff --git a/configs/yolov3/yolov3_darknet53_270e_voc.yml b/configs/yolov3/yolov3_darknet53_270e_voc.yml
index bb7a315ef29a09eb5d223447789af83ca40a1619..e24c01e311f2b084d950e6dacc95ed25ddfef774 100644
--- a/configs/yolov3/yolov3_darknet53_270e_voc.yml
+++ b/configs/yolov3/yolov3_darknet53_270e_voc.yml
@@ -8,3 +8,7 @@ _BASE_: [
 
 snapshot_epoch: 5
 weights: output/yolov3_darknet53_270e_voc/model_final
+
+EvalReader:
+  batch_transforms:
+  - PadBatch: {pad_gt: True}
diff --git a/configs/yolov3/yolov3_mobilenet_v1_270e_voc.yml b/configs/yolov3/yolov3_mobilenet_v1_270e_voc.yml
index a6b2303f9f9bdf6a85798cd36650888e069d1c5a..7b25cd0e38fa59794050048b7e1d100c1d403170 100644
--- a/configs/yolov3/yolov3_mobilenet_v1_270e_voc.yml
+++ b/configs/yolov3/yolov3_mobilenet_v1_270e_voc.yml
@@ -9,37 +9,6 @@ _BASE_: [
 snapshot_epoch: 5
 weights: output/yolov3_mobilenet_v1_270e_voc/model_final
 
-TrainReader:
-  inputs_def:
-    num_max_boxes: 50
-  sample_transforms:
-    - Decode: {}
-    - Mixup: {alpha: 1.5, beta: 1.5}
-    - RandomDistort: {}
-    - RandomExpand: {fill_value: [123.675, 116.28, 103.53]}
-    - RandomCrop: {}
-    - RandomFlip: {}
-  batch_transforms:
-    - BatchRandomResize:
-        target_size: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608]
-        random_size: True
-        random_interp: True
-        keep_ratio: False
-    - NormalizeBox: {}
-    - PadBox: {num_max_boxes: 50}
-    - BboxXYXY2XYWH: {}
-    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
-    - Permute: {}
-    - Gt2YoloTarget:
-        anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
-        anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]]
-        downsample_ratios: [32, 16, 8]
-        num_classes: 20
-  batch_size: 8
-  shuffle: true
-  drop_last: true
-  mixup_epoch: 250
-
 LearningRate:
   base_lr: 0.001
   schedulers:
diff --git a/configs/yolov3/yolov3_mobilenet_v1_roadsign.yml b/configs/yolov3/yolov3_mobilenet_v1_roadsign.yml
index c79397ddd1004de0508b885aeb3913dbdda9cef3..d899375244a603c38061dc3e6aae00021ef47113 100644
--- a/configs/yolov3/yolov3_mobilenet_v1_roadsign.yml
+++ b/configs/yolov3/yolov3_mobilenet_v1_roadsign.yml
@@ -5,45 +5,12 @@ _BASE_: [
   '_base_/yolov3_reader.yml',
 ]
 pretrain_weights: https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_270e_coco.pdparams
-norm_type: sync_bn
 weights: output/yolov3_mobilenet_v1_roadsign/model_final
-metric: VOC
-map_type: integral
 
 YOLOv3Loss:
   ignore_thresh: 0.7
   label_smooth: true
 
-TrainReader:
-  inputs_def:
-    num_max_boxes: 50
-  sample_transforms:
-    - Decode: {}
-    - Mixup: {alpha: 1.5, beta: 1.5}
-    - RandomDistort: {}
-    - RandomExpand: {fill_value: [123.675, 116.28, 103.53]}
-    - RandomCrop: {}
-    - RandomFlip: {}
-  batch_transforms:
-    - BatchRandomResize:
-        target_size: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608]
-        random_size: True
-        random_interp: True
-        keep_ratio: False
-    - NormalizeBox: {}
-    - PadBox: {num_max_boxes: 50}
-    - BboxXYXY2XYWH: {}
-    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
-    - Permute: {}
-    - Gt2YoloTarget:
-        anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
-        anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]]
-        downsample_ratios: [32, 16, 8]
-        num_classes: 4
-  batch_size: 8
-  shuffle: true
-  drop_last: true
-
 snapshot_epoch: 2
 epoch: 40
 
diff --git a/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_coco.yml b/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_coco.yml
new file mode 100644
index 0000000000000000000000000000000000000000..10cf8166d9e9ab1a63211f873d73a3e8eee4eb91
--- /dev/null
+++ b/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_coco.yml
@@ -0,0 +1,11 @@
+_BASE_: [
+  '../datasets/coco_detection.yml',
+  '../runtime.yml',
+  '_base_/optimizer_270e.yml',
+  '_base_/yolov3_mobilenet_v1.yml',
+  '_base_/yolov3_reader.yml',
+]
+
+snapshot_epoch: 5
+pretrain_weights:  https://paddledet.bj.bcebos.com/models/pretrained/MobileNetV1_ssld_pretrained.pdparams
+weights: output/yolov3_mobilenet_v1_ssld_270e_coco/model_final
diff --git a/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_voc.yml b/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_voc.yml
new file mode 100644
index 0000000000000000000000000000000000000000..7a3e62fa1ee1effe9e7109938a2d8e217f9d5b9e
--- /dev/null
+++ b/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_voc.yml
@@ -0,0 +1,23 @@
+_BASE_: [
+  '../datasets/voc.yml',
+  '../runtime.yml',
+  '_base_/optimizer_270e.yml',
+  '_base_/yolov3_mobilenet_v1.yml',
+  '_base_/yolov3_reader.yml',
+]
+
+snapshot_epoch: 5
+pretrain_weights:  https://paddledet.bj.bcebos.com/models/pretrained/MobileNetV1_ssld_pretrained.pdparams
+weights: output/yolov3_mobilenet_v1_ssld_270e_voc/model_final
+
+LearningRate:
+  base_lr: 0.001
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones:
+    - 216
+    - 243
+  - !LinearWarmup
+    start_factor: 0.
+    steps: 1000
diff --git a/configs/yolov3/yolov3_mobilenet_v3_large_270e_voc.yml b/configs/yolov3/yolov3_mobilenet_v3_large_270e_voc.yml
index 5725accb7b12e4115f6f21d8b9e4bb2f632dd68b..abf492e235eb7e1a9a3f58905383486a4900a4aa 100644
--- a/configs/yolov3/yolov3_mobilenet_v3_large_270e_voc.yml
+++ b/configs/yolov3/yolov3_mobilenet_v3_large_270e_voc.yml
@@ -9,37 +9,6 @@ _BASE_: [
 snapshot_epoch: 5
 weights: output/yolov3_mobilenet_v3_large_270e_voc/model_final
 
-TrainReader:
-  inputs_def:
-    num_max_boxes: 50
-  sample_transforms:
-    - Decode: {}
-    - Mixup: {alpha: 1.5, beta: 1.5}
-    - RandomDistort: {}
-    - RandomExpand: {fill_value: [123.675, 116.28, 103.53]}
-    - RandomCrop: {}
-    - RandomFlip: {}
-  batch_transforms:
-    - BatchRandomResize:
-        target_size: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608]
-        random_size: True
-        random_interp: True
-        keep_ratio: False
-    - NormalizeBox: {}
-    - PadBox: {num_max_boxes: 50}
-    - BboxXYXY2XYWH: {}
-    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
-    - Permute: {}
-    - Gt2YoloTarget:
-        anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
-        anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]]
-        downsample_ratios: [32, 16, 8]
-        num_classes: 20
-  batch_size: 8
-  shuffle: true
-  drop_last: true
-  mixup_epoch: 250
-
 LearningRate:
   base_lr: 0.001
   schedulers:
diff --git a/configs/yolov3/yolov3_mobilenet_v3_large_ssld_270e_voc.yml b/configs/yolov3/yolov3_mobilenet_v3_large_ssld_270e_voc.yml
new file mode 100644
index 0000000000000000000000000000000000000000..6d183e3e2207b0dc2023b81113513b4fbdcdd4f7
--- /dev/null
+++ b/configs/yolov3/yolov3_mobilenet_v3_large_ssld_270e_voc.yml
@@ -0,0 +1,23 @@
+_BASE_: [
+  '../datasets/voc.yml',
+  '../runtime.yml',
+  '_base_/optimizer_270e.yml',
+  '_base_/yolov3_mobilenet_v3_large.yml',
+  '_base_/yolov3_reader.yml',
+]
+
+snapshot_epoch: 5
+pretrain_weights:  https://paddledet.bj.bcebos.com/models/pretrained/MobileNetV3_large_x1_0_ssld_pretrained.pdparams
+weights: output/yolov3_mobilenet_v3_large_ssld_270e_voc/model_final
+
+LearningRate:
+  base_lr: 0.001
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones:
+    - 216
+    - 243
+  - !LinearWarmup
+    start_factor: 0.
+    steps: 1000
diff --git a/configs/yolov3/yolov3_r34_270e_coco.yml b/configs/yolov3/yolov3_r34_270e_coco.yml
new file mode 100644
index 0000000000000000000000000000000000000000..8653b06161b9145dbd23e00878d5c056986db5ec
--- /dev/null
+++ b/configs/yolov3/yolov3_r34_270e_coco.yml
@@ -0,0 +1,10 @@
+_BASE_: [
+  '../datasets/coco_detection.yml',
+  '../runtime.yml',
+  '_base_/optimizer_270e.yml',
+  '_base_/yolov3_r34.yml',
+  '_base_/yolov3_reader.yml',
+]
+
+snapshot_epoch: 5
+weights: output/yolov3_r34_270e_coco/model_final
diff --git a/demo/P0072__1.0__0___0.png b/demo/P0072__1.0__0___0.png
new file mode 100644
index 0000000000000000000000000000000000000000..aaf9c59bc18f09a342b13e88d966c590b7c16024
Binary files /dev/null and b/demo/P0072__1.0__0___0.png differ
diff --git a/demo/P0861__1.0__1154___824.png b/demo/P0861__1.0__1154___824.png
new file mode 100644
index 0000000000000000000000000000000000000000..47ab7ae3b4698c18a70513e262274f2bfeb98622
Binary files /dev/null and b/demo/P0861__1.0__1154___824.png differ
diff --git a/deploy/BENCHMARK_INFER.md b/deploy/BENCHMARK_INFER.md
new file mode 100644
index 0000000000000000000000000000000000000000..988cf30f6c672195d4b3833fe9a186b497a11c2e
--- /dev/null
+++ b/deploy/BENCHMARK_INFER.md
@@ -0,0 +1,60 @@
+# 推理Benchmark
+
+## 一、环境准备
+- 1、测试环境:
+  - CUDA 10.1
+  - CUDNN 7.6
+  - TensorRT-6.0.1
+  - PaddlePaddle v2.0.1
+  - GPU分别为: Tesla V100和GTX 1080Ti和Jetson AGX Xavier
+- 2、测试方式:
+  - 为了方便比较不同模型的推理速度，输入采用同样大小的图片，为 3x640x640，采用 `demo/000000014439_640x640.jpg` 图片。
+  - Batch Size=1
+  - 去掉前100轮warmup时间，测试100轮的平均时间，单位ms/image，包括网络计算时间、数据拷贝至CPU的时间。
+  - 采用Fluid C++预测引擎: 包含Fluid C++预测、Fluid-TensorRT预测，下面同时测试了Float32 (FP32) 和Float16 (FP16)的推理速度。
+
+**注意：**  TensorRT中固定尺寸和动态尺寸区别请参考文档[TENSOR教程](TENSOR_RT.md)。由于固定尺寸下对两阶段模型支持不完善，所以faster rcnn模型采用动态尺寸测试。固定尺寸和动态尺寸支持融合的OP不完全一样，因此同一个模型在固定尺寸和动态尺寸下测试的性能可能会有一点差异。
+
+## 二、推理速度
+
+### 1、Linux系统
+#### （1）Tesla V100
+
+| 模型                            | backbone     | 是否固定尺寸 | 入网尺寸     | paddle_inference | trt_fp32 | trt_fp16 |
+|-------------------------------|--------------|--------|----------|------------------|----------|----------|
+| Faster RCNN FPN            | ResNet50    | 否      | 640x640  | 27.99            | 26.15    | 21.92    |
+| Faster RCNN FPN   | ResNet50    | 否      | 800x1312 | 32.49            | 25.54    | 21.70    |
+| YOLOv3           | Mobilenet\_v1 | 是      | 608x608  | 9.74             | 8.61     | 6.28     |
+| YOLOv3              | Darknet53    | 是      | 608x608  | 17.84            | 15.43    | 9.86     |
+| PPYOLO              | ResNet50    | 是      | 608x608  | 20.77            | 18.40    | 13.53    |
+| SSD              | Mobilenet\_v1 | 是      | 300x300  | 5.17             | 4.43     | 4.29     |
+| TTFNet              | Darknet53    | 是      | 512x512  | 10.14            | 8.71     | 5.55     |
+| FCOS              | ResNet50    | 是      | 640x640  | 35.47            | 35.02    | 34.24    |
+
+
+#### （2）Jetson AGX Xavier
+
+| 模型                            | backbone     | 是否固定尺寸 | 入网尺寸     | paddle_inference | trt_fp32 | trt_fp16 |
+|-------------------------------|--------------|--------|----------|------------------|----------|----------|
+| Faster RCNN FPN            | ResNet50     | 否      | 640x640  | 169.45           | 158.92   | 119.25   |
+| Faster RCNN FPN  | ResNet50     | 否      | 800x1312 | 228.07           | 156.39   | 117.03   |
+| YOLOv3           | Mobilenet\_v1 | 是      | 608x608  | 48.76            | 43.83    | 18.41    |
+| YOLOv3              | Darknet53    | 是      | 608x608  | 121.61           | 110.30   | 42.38    |
+| PPYOLO              | ResNet50     | 是      | 608x608  | 111.80           | 99.40    | 48.05    |
+| SSD              | Mobilenet\_v1 | 是      | 300x300  | 10.52            | 8.84     | 8.77     |
+| TTFNet              | Darknet53    | 是      | 512x512  | 73.77            | 64.03    | 31.46    |
+| FCOS              | ResNet50     | 是      | 640x640  | 217.11           | 214.38   | 205.78   |
+
+### 2、Windows系统
+#### （1）GTX 1080Ti
+
+| 模型                            | backbone     | 是否固定尺寸 | 入网尺寸     | paddle_inference | trt_fp32 | trt_fp16 |
+|-------------------------------|--------------|--------|----------|------------------|----------|----------|
+| Faster RCNN FPN           | ResNet50     | 否      | 640x640  | 50.74            | 57.17    | 62.08    |
+| Faster RCNN FPN  | ResNet50     | 否      | 800x1312 | 50.31            | 57.61    | 62.05    |
+| YOLOv3           | Mobilenet\_v1 | 是      | 608x608  | 14.51            | 11.23    | 11.13    |
+| YOLOv3             | Darknet53    | 是      | 608x608  | 30.26            | 23.92    | 24.02    |
+| PPYOLO              | ResNet50     | 是      | 608x608  | 38.06            | 31.40    | 31.94    |
+| SSD              | Mobilenet\_v1 | 是      | 300x300  | 16.47            | 13.87    | 13.76    |
+| TTFNet              | Darknet53    | 是      | 512x512  | 21.83            | 17.14    | 17.09    |
+| FCOS              | ResNet50     | 是      | 640x640  | 71.88            | 69.93    | 69.52    |
diff --git a/deploy/EXPORT_MODEL.md b/deploy/EXPORT_MODEL.md
index eb46e48edd8d4621915da40e81c5d85921b404c1..50f50cba230495679e9ac8f882e2a617b90b0368 100644
--- a/deploy/EXPORT_MODEL.md
+++ b/deploy/EXPORT_MODEL.md
@@ -1,35 +1,37 @@
-# PaddleDetection模型导出教程
+# 模型导出教程
 
-## 模型导出
+## 一、模型导出
 本章节介绍如何使用`tools/export_model.py`脚本导出模型。
-### 导出模输入输出说明
-- `PaddleDetection`中输入变量以及输入形状如下：
-| 输入名称 | 输入形状 | 表示含义 |
-| :---------: | ----------- | ---------- |
-| image |  [None, 3, H, W] | 输入网络的图像，None表示batch维度，如果输入图像大小为变长，则H,W为None |
-| im_shape | [None, 2] | 图像经过resize后的大小，表示为H,W, None表示batch维度 |
-| scale_factor | [None, 2] | 输入图像大小比真实图像大小，表示为scale_y, scale_x |
 
-**注意**具体预处理方式可参考配置文件中TestReader部分。
+### 1、导出模输入输出说明
+- 输入变量以及输入形状如下：
 
+  | 输入名称 | 输入形状 | 表示含义 |
+  | :---------: | ----------- | ---------- |
+  | image |  [None, 3, H, W] | 输入网络的图像，None表示batch维度，如果输入图像大小为变长，则H,W为None |
+  | im_shape | [None, 2] | 图像经过resize后的大小，表示为H,W, None表示batch维度 |
+  | scale_factor | [None, 2] | 输入图像大小比真实图像大小，表示为scale_y, scale_x |
 
-- PaddleDetection`中动转静导出模型输出统一为：
+  **注意** : 具体预处理方式可参考配置文件中TestReader部分。
+
+
+- 动转静导出模型输出统一为：
 
   - bbox, NMS的输出，形状为[N, 6], 其中N为预测框的个数，6为[class_id, score, x1, y1, x2, y2]。
   - bbox\_num, 每张图片对应预测框的个数，例如batch_size为2，输出为[N1, N2], 表示第一张图包含N1个预测框，第二张图包含N2个预测框，并且预测框的总个数和NMS输出的第一维N相同
   - mask，如果网络中包含mask，则会输出mask分支
 
-**注意**模型动转静导出不支持模型结构中包含numpy相关操作的情况。
+  **注意**模型动转静导出不支持模型结构中包含numpy相关操作的情况。
 
 
-### 启动参数说明
+### 2、启动参数说明
 
 |      FLAG      |      用途      |    默认值    |                 备注                      |
 |:--------------:|:--------------:|:------------:|:-----------------------------------------:|
 |       -c       |  指定配置文件  |     None     |                                           |
 |  --output_dir  |  模型保存路径  |  `./output_inference`  |  模型默认保存在`output/配置文件名/`路径下 |
 
-### 使用示例
+### 3、使用示例
 
 使用训练得到的模型进行试用，脚本如下
 
@@ -42,7 +44,7 @@ python tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml --
 预测模型会导出到`inference_model/yolov3_darknet53_270e_coco`目录下，分别为`infer_cfg.yml`, `model.pdiparams`,  `model.pdiparams.info`, `model.pdmodel`。
 
 
-### 设置导出模型的输入大小
+### 4、设置导出模型的输入大小
 
 使用Fluid-TensorRT进行预测时，由于<=TensorRT 5.1的版本仅支持定长输入，保存模型的`data`层的图片大小需要和实际输入图片大小一致。而Fluid C++预测引擎没有此限制。设置TestReader中的`image_shape`可以修改保存模型中的输入图片大小。示例如下:
 
diff --git a/deploy/README.md b/deploy/README.md
index a57e810d42fcf1fb21677bfbbd60e4d9bada69d5..b026ded94a399894490d664ab8188691397df78a 100644
--- a/deploy/README.md
+++ b/deploy/README.md
@@ -2,7 +2,7 @@
 训练得到一个满足要求的模型后，如果想要将该模型部署到已选择的平台上，需要通过`tools/export_model.py`将模型导出预测部署的模型和配置文件。
 并在同一文件夹下导出预测时使用的配置文件，配置文件名为`infer_cfg.yml`。
 
-## `PaddleDetection`目前支持的部署方式按照部署设备可以分为：
+## 1、`PaddleDetection`目前支持的部署方式按照部署设备可以分为：
 - 在本机`python`语言部署，支持在有`python paddle`(支持`CPU`、`GPU`)环境下部署，有两种方式：
     - 使用`tools/infer.py`，此种方式依赖`PaddleDetection`代码库。
     - 将模型导出，使用`deploy/python/infer.py`，此种方式不依赖`PaddleDetection`代码库，可以单个`python`文件部署。
@@ -13,7 +13,7 @@
 - `NV Jetson`嵌入式设备上部署
 - `TensorRT`加速请参考文档[TensorRT预测部署教程](TENSOR_RT.md)
 
-## 模型导出
+## 2、模型导出
 使用`tools/export_model.py`脚本导出模型已经部署时使用的配置文件，配置文件名字为`infer_cfg.yml`。模型导出脚本如下：
 ```bash
 # 导出YOLOv3模型
@@ -29,12 +29,12 @@ python tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml -o
 
 模型导出具体请参考文档[PaddleDetection模型导出教程](EXPORT_MODEL.md)。
 
-## 如何选择部署时依赖库的版本
+## 3、如何选择部署时依赖库的版本
 
-### CUDA、cuDNN、TensorRT版本选择
+### （1）CUDA、cuDNN、TensorRT版本选择
 由于CUDA、cuDNN、TENSORRT不一定都是向前兼容的，需要使用与编译Paddle预测库使用的环境完全一致的环境进行部署。
 
-### 部署时预测库版本、预测引擎版本选择
+### （2）部署时预测库版本、预测引擎版本选择
 
 - Linux、Windows平台下C++部署，需要使用Paddle预测库进行部署。
   （1）Paddle官网提供在不同平台、不同环境下编译好的预测库，您可以直接使用，请在这里[Paddle预测库](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/05_inference_deployment/inference/build_and_install_lib_cn.html) 选择。
@@ -55,7 +55,7 @@ python tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml -o
   若列表中没有您需要的预测库，您可以在您的平台上自行编译，编译过程请参考[Paddle源码编译](https://www.paddlepaddle.org.cn/documentation/docs/zh/install/compile/linux-compile.html)。
 
 
-## 部署
+## 4、部署
 - C++部署，先使用跨平台编译工具`CMake`根据`CMakeLists.txt`生成`Makefile`，支持`Windows、Linux、NV Jetson`平台，然后进行编译产出可执行文件。可以直接使用`cpp/scripts/build.sh`脚本编译：
 ```buildoutcfg
 cd cpp
@@ -69,7 +69,7 @@ sh scripts/build.sh
 - 手机移动端部署，请参考[Paddle-Lite-Demo](https://github.com/PaddlePaddle/Paddle-Lite-Demo)部署。
 
 
-## 常见问题QA
+## 5、常见问题QA
 - 1、`Paddle 1.8.4`训练的模型，可以用`Paddle2.0`部署吗？
   Paddle 2.0是兼容Paddle 1.8.4的，因此是可以的。但是部分模型(如SOLOv2)使用到了Paddle 2.0中新增OP，这类模型不可以。
 
diff --git a/deploy/TENSOR_RT.md b/deploy/TENSOR_RT.md
index 593f7a32549d73a2900639658bd34b345574b6a6..9d97cf294b1ff5ed0950f4fa2fb8cbbb1a60d286 100644
--- a/deploy/TENSOR_RT.md
+++ b/deploy/TENSOR_RT.md
@@ -42,6 +42,13 @@ TensorRT版本<=5时，使用TensorRT预测时，只支持固定尺寸输入。
 
 同时需要将图像预处理后的尺寸与设置车模型输入尺寸保持一致，需要设置`infer_cfg.yml`配置文件中`Resize OP`的`target_size`参数和`keep_ratio`参数。
 
+注意：由于TesnorRT不支持在batch维度进行slice操作，Faster RCNN 和 Mask RCNN使用固定尺寸输入预测会报错，这两个模型请使用动态尺寸输入。
+
+以`YOLOv3`为例，使用动态尺寸输入预测：
+```
+python python/infer.py --model_dir=../inference_model/yolov3_darknet53_270e_coco/ --image_file=../demo/000000014439_640x640.jpg --use_gpu=True --run_mode=trt_fp32 --run_benchmark=True
+```
+
 ### 3.3 TensorRT动态尺寸预测
 
 TensorRT版本>=6时，使用TensorRT预测时，可以支持动态尺寸输入。
@@ -59,6 +66,11 @@ Paddle预测库关于动态尺寸输入请查看[Paddle CPP预测](https://www.p
 
 **注意：`TensorRT`中动态尺寸设置是4维的，这里只设置输入图像的尺寸。**
 
+以`Faster RCNN`为例，使用动态尺寸输入预测：
+```
+python python/infer.py --model_dir=../inference_model/faster_rcnn_r50_fpn_1x_coco/ --image_file=../demo/000000014439.jpg --use_gpu=True --run_mode=trt_fp16 --run_benchmark=True --use_dynamic_shape=True --trt_max_shape=1280 --trt_min_shape=800 --trt_opt_shape=960
+```
+
 ## 4、常见问题QA
 **Q:** 提示没有`tensorrt_op`</br>
 **A:** 请检查是否使用带有TensorRT的Paddle Python包或预测库。
@@ -76,3 +88,6 @@ Paddle预测库关于动态尺寸输入请查看[Paddle CPP预测](https://www.p
 
 **Q:** 如何打开日志</br>
 **A:** 预测库默认是打开日志的，只要注释掉`config.disable_glog_info()`就可以打开日志
+
+**Q:** 开启TensorRT，预测时提示Slice on batch axis is not supported in TensorRT</br>
+**A:** 请尝试使用动态尺寸输入
diff --git a/deploy/cpp/CMakeLists.txt b/deploy/cpp/CMakeLists.txt
index 453be9bd742d2f4a6c10ebf3c2908a29d51c397f..0bc0be9aa949dfb89f726555bac16066127502fb 100644
--- a/deploy/cpp/CMakeLists.txt
+++ b/deploy/cpp/CMakeLists.txt
@@ -4,10 +4,10 @@ project(PaddleObjectDetector CXX C)
 option(WITH_MKL        "Compile demo with MKL/OpenBlas support,defaultuseMKL."          ON)
 option(WITH_GPU        "Compile demo with GPU/CPU, default use CPU."                    ON)
 option(WITH_TENSORRT   "Compile demo with TensorRT."                                    OFF)
-option(USE_PADDLE_20RC1 "Compile demo with paddle_inference_lib 2.0rc1"                 ON)
 
 
 SET(PADDLE_DIR "" CACHE PATH "Location of libraries")
+SET(PADDLE_LIB_NAME "" CACHE STRING "libpaddle_inference")
 SET(OPENCV_DIR "" CACHE PATH "Location of libraries")
 SET(CUDA_LIB "" CACHE PATH "Location of libraries")
 SET(CUDNN_LIB "" CACHE PATH "Location of libraries")
@@ -153,41 +153,23 @@ endif()
 
 
 if (WIN32)
-    if (USE_PADDLE_20RC1)
-        # 2.0rc1 win32 shared lib name is paddle_fluid.dll and paddle_fluid.lib
-        if(EXISTS "${PADDLE_DIR}/paddle/fluid/inference/paddle_fluid${CMAKE_STATIC_LIBRARY_SUFFIX}")
+    if(EXISTS "${PADDLE_DIR}/paddle/fluid/inference/${PADDLE_LIB_NAME}${CMAKE_STATIC_LIBRARY_SUFFIX}")
         set(DEPS
-            ${PADDLE_DIR}/paddle/fluid/inference/paddle_fluid${CMAKE_STATIC_LIBRARY_SUFFIX})
-        else()
-            set(DEPS
-                ${PADDLE_DIR}/paddle/lib/paddle_fluid${CMAKE_STATIC_LIBRARY_SUFFIX})
-        endif()
+            ${PADDLE_DIR}/paddle/fluid/inference/${PADDLE_LIB_NAME}${CMAKE_STATIC_LIBRARY_SUFFIX})
     else()
-        # before 2.0rc1 win32 shared lib name is libpaddle_fluid.dll and libpaddle_fluid.lib
-        if(EXISTS "${PADDLE_DIR}/paddle/fluid/inference/libpaddle_fluid${CMAKE_STATIC_LIBRARY_SUFFIX}")
-            set(DEPS
-                ${PADDLE_DIR}/paddle/fluid/inference/libpaddle_fluid${CMAKE_STATIC_LIBRARY_SUFFIX})
-        else()
-            set(DEPS
-                ${PADDLE_DIR}/paddle/lib/libpaddle_fluid${CMAKE_STATIC_LIBRARY_SUFFIX})
-        endif()
+        set(DEPS
+            ${PADDLE_DIR}/paddle/lib/${PADDLE_LIB_NAME}${CMAKE_STATIC_LIBRARY_SUFFIX})
     endif()
 endif()
 
 
 if (WIN32)
-    if (USE_PADDLE_20RC1)
-        # 2.0rc1 win32 shared lib name is paddle_fluid.dll and paddle_fluid.lib
-        set(DEPS ${PADDLE_DIR}/paddle/lib/paddle_fluid${CMAKE_STATIC_LIBRARY_SUFFIX})
-    else()
-        # before 2.0rc1 win32 shared lib name is libpaddle_fluid.dll and libpaddle_fluid.lib
-        set(DEPS ${PADDLE_DIR}/paddle/lib/libpaddle_fluid${CMAKE_STATIC_LIBRARY_SUFFIX})
-    endif()
+    set(DEPS ${PADDLE_DIR}/paddle/lib/${PADDLE_LIB_NAME}${CMAKE_STATIC_LIBRARY_SUFFIX})
 else()
-    # linux shared lib name is libpaddle_fluid.so
-    set(DEPS ${PADDLE_DIR}/paddle/lib/libpaddle_fluid${CMAKE_SHARED_LIBRARY_SUFFIX})
+    set(DEPS ${PADDLE_DIR}/paddle/lib/${PADDLE_LIB_NAME}${CMAKE_SHARED_LIBRARY_SUFFIX})
 endif()
 
+message("PADDLE_LIB_NAME:" ${PADDLE_LIB_NAME})
 message("DEPS:" $DEPS)
 
 if (NOT WIN32)
@@ -248,12 +230,12 @@ if (WIN32 AND WITH_MKL)
         COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./release/mklml.dll
         COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./release/libiomp5md.dll
         COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mkldnn/lib/mkldnn.dll ./release/mkldnn.dll
+        COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/paddle/lib/${PADDLE_LIB_NAME}.dll ./release/${PADDLE_LIB_NAME}.dll
     )
 endif()
 
-
-if (WIN32 AND USE_PADDLE_20RC1)
+if (WIN32)
     add_custom_command(TARGET main POST_BUILD
-        COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/paddle/lib/paddle_fluid.dll ./release/paddle_fluid.dll
+        COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/paddle/lib/${PADDLE_LIB_NAME}.dll ./release/${PADDLE_LIB_NAME}.dll
     )
 endif()
diff --git a/deploy/cpp/docs/Jetson_build.md b/deploy/cpp/docs/Jetson_build.md
new file mode 100644
index 0000000000000000000000000000000000000000..d7ece3058b1c918ad52b96ca216f74d5584d32e5
--- /dev/null
+++ b/deploy/cpp/docs/Jetson_build.md
@@ -0,0 +1,188 @@
+# Jetson平台编译指南
+
+## 说明
+`NVIDIA Jetson`设备是具有`NVIDIA GPU`的嵌入式设备，可以将目标检测算法部署到该设备上。本文档是在`Jetson`硬件上部署`PaddleDetection`模型的教程。
+
+本文档以`Jetson TX2`硬件、`JetPack 4.3`版本为例进行说明。
+
+`Jetson`平台的开发指南请参考[NVIDIA Jetson Linux Developer Guide](https://docs.nvidia.com/jetson/l4t/index.html).
+
+## Jetson环境搭建
+`Jetson`系统软件安装，请参考[NVIDIA Jetson Linux Developer Guide](https://docs.nvidia.com/jetson/l4t/index.html).
+
+* (1) 查看硬件系统的l4t的版本号
+```
+cat /etc/nv_tegra_release
+```
+* (2) 根据硬件，选择硬件可安装的`JetPack`版本，硬件和`JetPack`版本对应关系请参考[jetpack-archive](https://developer.nvidia.com/embedded/jetpack-archive).
+
+* (3) 下载`JetPack`，请参考[NVIDIA Jetson Linux Developer Guide](https://docs.nvidia.com/jetson/l4t/index.html) 中的`Preparing a Jetson Developer Kit for Use`章节内容进行刷写系统镜像。
+
+**注意**: 请在[jetpack-archive](https://developer.nvidia.com/embedded/jetpack-archive) 根据硬件选择适配的`JetPack`版本进行刷机。
+
+## 下载或编译`Paddle`预测库
+本文档使用`Paddle`在`JetPack4.3`上预先编译好的预测库，请根据硬件在[安装与编译 Linux 预测库](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/05_inference_deployment/inference/build_and_install_lib_cn.html) 中选择对应版本的`Paddle`预测库。
+
+这里选择[nv_jetson_cuda10_cudnn7.6_trt6(jetpack4.3)](https://paddle-inference-lib.bj.bcebos.com/2.0.0-nv-jetson-jetpack4.3-all/paddle_inference.tgz), `Paddle`版本`2.0.0-rc0`,`CUDA`版本`10.0`,`CUDNN`版本`7.6`，`TensorRT`版本`6`。
+
+若需要自己在`Jetson`平台上自定义编译`Paddle`库，请参考文档[安装与编译 Linux 预测库](https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/inference_deployment/inference/build_and_install_lib_cn.html) 的`NVIDIA Jetson嵌入式硬件预测库源码编译`部分内容。
+
+### Step1: 下载代码
+
+ `git clone https://github.com/PaddlePaddle/PaddleDetection.git`
+
+**说明**：其中`C++`预测代码在`/root/projects/PaddleDetection/deploy/cpp` 目录，该目录不依赖任何`PaddleDetection`下其他目录。
+
+
+### Step2: 下载PaddlePaddle C++ 预测库 fluid_inference
+
+解压下载的[nv_jetson_cuda10_cudnn7.6_trt6(jetpack4.3)](https://paddle-inference-lib.bj.bcebos.com/2.0.1-nv-jetson-jetpack4.3-all/paddle_inference.tgz) 。
+
+下载并解压后`/root/projects/fluid_inference`目录包含内容为：
+```
+fluid_inference
+├── paddle # paddle核心库和头文件
+|
+├── third_party # 第三方依赖库和头文件
+|
+└── version.txt # 版本和编译信息
+```
+
+**注意:** 预编译库`nv-jetson-cuda10-cudnn7.6-trt6`使用的`GCC`版本是`7.5.0`，其他都是使用`GCC 4.8.5`编译的。使用高版本的GCC可能存在`ABI`兼容性问题，建议降级或[自行编译预测库](https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/inference_deployment/inference/build_and_install_lib_cn.html)。
+
+
+### Step4: 编译
+
+编译`cmake`的命令在`scripts/build.sh`中，请根据实际情况修改主要参数，其主要内容说明如下：
+
+注意，`TX2`平台的`CUDA`、`CUDNN`需要通过`JetPack`安装。
+
+```
+# 是否使用GPU(即是否使用 CUDA)
+WITH_GPU=ON
+
+# 是否使用MKL or openblas，TX2需要设置为OFF
+WITH_MKL=OFF
+
+# 是否集成 TensorRT(仅WITH_GPU=ON 有效)
+WITH_TENSORRT=ON
+
+# TensorRT 的include路径
+TENSORRT_INC_DIR=/usr/include/aarch64-linux-gnu
+
+# TensorRT 的lib路径
+TENSORRT_LIB_DIR=/usr/lib/aarch64-linux-gnu
+
+# Paddle 预测库路径
+PADDLE_DIR=/path/to/fluid_inference/
+
+# Paddle 预测库名称
+PADDLE_LIB_NAME=paddle_inference
+
+# Paddle 的预测库是否使用静态库来编译
+# 使用TensorRT时，Paddle的预测库通常为动态库
+WITH_STATIC_LIB=OFF
+
+# CUDA 的 lib 路径
+CUDA_LIB=/usr/local/cuda-10.0/lib64
+
+# CUDNN 的 lib 路径
+CUDNN_LIB=/usr/lib/aarch64-linux-gnu
+
+# OPENCV_DIR 的路径
+# linux平台请下载：https://bj.bcebos.com/paddleseg/deploy/opencv3.4.6gcc4.8ffmpeg.tar.gz2，并解压到deps文件夹下
+# TX2平台请下载：https://paddlemodels.bj.bcebos.com/TX2_JetPack4.3_opencv_3.4.10_gcc7.5.0.zip，并解压到deps文件夹下
+OPENCV_DIR=/path/to/opencv
+
+# 请检查以上各个路径是否正确
+
+# 以下无需改动
+cmake .. \
+    -DWITH_GPU=${WITH_GPU} \
+    -DWITH_MKL=OFF \
+    -DWITH_TENSORRT=${WITH_TENSORRT} \
+    -DTENSORRT_DIR=${TENSORRT_DIR} \
+    -DPADDLE_DIR=${PADDLE_DIR} \
+    -DWITH_STATIC_LIB=${WITH_STATIC_LIB} \
+    -DCUDA_LIB=${CUDA_LIB} \
+    -DCUDNN_LIB=${CUDNN_LIB} \
+    -DOPENCV_DIR=${OPENCV_DIR} \
+    -DPADDLE_LIB_NAME={PADDLE_LIB_NAME}
+make
+```
+
+例如设置如下：
+```
+# 是否使用GPU(即是否使用 CUDA)
+WITH_GPU=ON
+
+# 是否使用MKL or openblas
+WITH_MKL=OFF
+
+# 是否集成 TensorRT(仅WITH_GPU=ON 有效)
+WITH_TENSORRT=OFF
+
+# TensorRT 的include路径
+TENSORRT_INC_DIR=/usr/include/aarch64-linux-gnu
+
+# TensorRT 的lib路径
+TENSORRT_LIB_DIR=/usr/lib/aarch64-linux-gnu
+
+# Paddle 预测库路径
+PADDLE_DIR=/home/nvidia/PaddleDetection_infer/fluid_inference/
+
+# Paddle 预测库名称
+PADDLE_LIB_NAME=paddle_inference
+
+# Paddle 的预测库是否使用静态库来编译
+# 使用TensorRT时，Paddle的预测库通常为动态库
+WITH_STATIC_LIB=OFF
+
+# CUDA 的 lib 路径
+CUDA_LIB=/usr/local/cuda-10.0/lib64
+
+# CUDNN 的 lib 路径
+CUDNN_LIB=/usr/lib/aarch64-linux-gnu/
+```
+
+修改脚本设置好主要参数后，执行`build`脚本：
+ ```shell
+ sh ./scripts/build.sh
+ ```
+
+### Step5: 预测及可视化
+编译成功后，预测入口程序为`build/main`其主要命令参数说明如下：
+|  参数   | 说明  |
+|  ----  | ----  |
+| --model_dir  | 导出的预测模型所在路径 |
+| --image_path  | 要预测的图片文件路径 |
+| --video_path  | 要预测的视频文件路径 |
+| --camera_id | Option | 用来预测的摄像头ID，默认为-1（表示不使用摄像头预测）|
+| --use_gpu  | 是否使用 GPU 预测, 支持值为0或1(默认值为0)|
+| --gpu_id  |  指定进行推理的GPU device id(默认值为0)|
+| --run_mode | 使用GPU时，默认为fluid, 可选（fluid/trt_fp32/trt_fp16/trt_int8）|
+| --run_benchmark | 是否重复预测来进行benchmark测速 ｜
+| --output_dir | 输出图片所在的文件夹, 默认为output ｜
+
+**注意**: 如果同时设置了`video_path`和`image_path`，程序仅预测`video_path`。
+
+
+`样例一`：
+```shell
+#不使用`GPU`测试图片 `/root/projects/images/test.jpeg`  
+./main --model_dir=/root/projects/models/yolov3_darknet --image_path=/root/projects/images/test.jpeg
+```
+
+图片文件`可视化预测结果`会保存在当前目录下`output.jpg`文件中。
+
+
+`样例二`:
+```shell
+#使用 `GPU`预测视频`/root/projects/videos/test.mp4`
+./main --model_dir=/root/projects/models/yolov3_darknet --video_path=/root/projects/images/test.mp4 --use_gpu=1
+```
+视频文件目前支持`.mp4`格式的预测，`可视化预测结果`会保存在当前目录下`output.mp4`文件中。
+
+
+## 性能测试
+benchmark请查看[BENCHMARK_INFER](../../BENCHMARK_INFER.md)
diff --git a/deploy/cpp/docs/linux_build.md b/deploy/cpp/docs/linux_build.md
index 1ddb7a150e11e4ec6c5fa1b62d6606fc60719a83..76b961955662d840dc7330c1e3924c07086dd997 100644
--- a/deploy/cpp/docs/linux_build.md
+++ b/deploy/cpp/docs/linux_build.md
@@ -1,10 +1,10 @@
 # Linux平台编译指南
 
 ## 说明
-本文档在 `Linux`平台使用`GCC 4.8.5` 和 `GCC 4.9.4`测试过，如果需要使用更高G++版本编译使用，则需要重新编译Paddle预测库，请参考: [从源码编译Paddle预测库](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/advanced_guide/inference_deployment/inference/build_and_install_lib_cn.html)。本文档使用的预置的opencv库是在ubuntu 16.04上用gcc4.8编译的，如果需要在ubuntu 16.04以外的系统环境编译，那么需自行编译opencv库。
+本文档在 `Linux`平台使用`GCC 8.2`测试过，如果需要使用其他G++版本编译使用，则需要重新编译Paddle预测库，请参考: [从源码编译Paddle预测库](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/advanced_guide/inference_deployment/inference/build_and_install_lib_cn.html)。本文档使用的预置的opencv库是在ubuntu 16.04上用gcc4.8编译的，如果需要在ubuntu 16.04以外的系统环境编译，那么需自行编译opencv库。
 
 ## 前置条件
-* G++ 4.8.2 ~ 4.9.4
+* G++ 8.2
 * CUDA 9.0 / CUDA 10.0, cudnn 7+ （仅在使用GPU版本的预测库时需要）
 * CMake 3.0+
 
@@ -19,7 +19,7 @@
 
 ### Step2: 下载PaddlePaddle C++ 预测库 fluid_inference
 
-PaddlePaddle C++ 预测库针对不同的`CPU`和`CUDA`版本提供了不同的预编译版本，请根据实际情况下载:  [C++预测库下载列表](https://www.paddlepaddle.org.cn/documentation/docs/zh/2.0-rc1/guides/05_inference_deployment/inference/build_and_install_lib_cn.html)
+PaddlePaddle C++ 预测库针对不同的`CPU`和`CUDA`版本提供了不同的预编译版本，请根据实际情况下载:  [C++预测库下载列表](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/05_inference_deployment/inference/build_and_install_lib_cn.html)
 
 
 下载并解压后`/root/projects/fluid_inference`目录包含内容为：
@@ -58,6 +58,9 @@ TENSORRT_LIB_DIR=/path/to/TensorRT/lib
 # Paddle 预测库路径
 PADDLE_DIR=/path/to/fluid_inference
 
+# Paddle 预测库名称
+PADDLE_LIB_NAME=paddle_inference
+
 # CUDA 的 lib 路径
 CUDA_LIB=/path/to/cuda/lib
 
@@ -76,7 +79,8 @@ cmake .. \
     -DPADDLE_DIR=${PADDLE_DIR} \
     -DCUDA_LIB=${CUDA_LIB} \
     -DCUDNN_LIB=${CUDNN_LIB} \
-    -DOPENCV_DIR=${OPENCV_DIR}
+    -DOPENCV_DIR=${OPENCV_DIR} \
+    -DPADDLE_LIB_NAME={PADDLE_LIB_NAME}
 make
 
 ```
@@ -98,7 +102,7 @@ make
 | --camera_id | Option | 用来预测的摄像头ID，默认为-1（表示不使用摄像头预测）|
 | --use_gpu  | 是否使用 GPU 预测, 支持值为0或1(默认值为0)|
 | --gpu_id  |  指定进行推理的GPU device id(默认值为0)|
-| --run_mode | 使用GPU时，默认为fluid, 可选（fluid/trt_fp32/trt_fp16）|
+| --run_mode | 使用GPU时，默认为fluid, 可选（fluid/trt_fp32/trt_fp16/trt_int8）|
 | --run_benchmark | 是否重复预测来进行benchmark测速 ｜
 | --output_dir | 输出图片所在的文件夹, 默认为output ｜
 
@@ -120,3 +124,6 @@ make
 ./build/main --model_dir=/root/projects/models/yolov3_darknet --video_path=/root/projects/images/test.mp4 --use_gpu=1
 ```
 视频文件目前支持`.mp4`格式的预测，`可视化预测结果`会保存在当前目录下`output.mp4`文件中。
+
+## 性能测试
+benchmark请查看[BENCHMARK_INFER](../../BENCHMARK_INFER.md)
diff --git a/deploy/cpp/docs/windows_vs2019_build.md b/deploy/cpp/docs/windows_vs2019_build.md
index 7cfb63a62757225e7db17f7435d4847584c02825..34607b21d1bef79b36ca03fb6642d3976e5c3fef 100644
--- a/deploy/cpp/docs/windows_vs2019_build.md
+++ b/deploy/cpp/docs/windows_vs2019_build.md
@@ -24,7 +24,7 @@ git clone https://github.com/PaddlePaddle/PaddleDetection.git
 
 ### Step2: 下载PaddlePaddle C++ 预测库 fluid_inference
 
-PaddlePaddle C++ 预测库针对不同的`CPU`和`CUDA`版本提供了不同的预编译版本，请根据实际情况下载:  [C++预测库下载列表](https://www.paddlepaddle.org.cn/documentation/docs/zh/2.0-rc1/guides/05_inference_deployment/inference/windows_cpp_inference.html)
+PaddlePaddle C++ 预测库针对不同的`CPU`和`CUDA`版本提供了不同的预编译版本，请根据实际情况下载:  [C++预测库下载列表](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/05_inference_deployment/inference/windows_cpp_inference.html)
 
 解压后`D:\projects\fluid_inference`目录包含内容为：
 ```
@@ -62,18 +62,18 @@ cd D:\projects\PaddleDetection\deploy\cpp
 | *CUDNN_LIB | CUDNN的库路径 |
 | OPENCV_DIR  | OpenCV的安装路径， |
 | PADDLE_DIR | Paddle预测库的路径 |
-| USE_PADDLE_20RC1 | 是否使用2.0rc1预测库。如果使用2.0rc1，在windows环境下预测库名称发生变化，且仅支持动态库方式编译 |
+| PADDLE_LIB_NAME | Paddle 预测库名称 |
 
 **注意：** 1. 使用`CPU`版预测库，请把`WITH_GPU`的勾去掉 2. 如果使用的是`openblas`版本，请把`WITH_MKL`勾去掉
 
 执行如下命令项目文件：
 ```
-cmake . -G "Visual Studio 16 2019" -A x64 -T host=x64 -DWITH_GPU=ON -DWITH_MKL=ON -DCMAKE_BUILD_TYPE=Release -DCUDA_LIB=path_to_cuda_lib -DCUDNN_LIB=path_to_cudnn_lib -DPADDLE_DIR=path_to_paddle_lib -DOPENCV_DIR=path_to_opencv
+cmake . -G "Visual Studio 16 2019" -A x64 -T host=x64 -DWITH_GPU=ON -DWITH_MKL=ON -DCMAKE_BUILD_TYPE=Release -DCUDA_LIB=path_to_cuda_lib -DCUDNN_LIB=path_to_cudnn_lib -DPADDLE_DIR=path_to_paddle_lib -DPADDLE_LIB_NAME=paddle_inference -DOPENCV_DIR=path_to_opencv
 ```
 
 例如：
 ```
-cmake . -G "Visual Studio 16 2019" -A x64 -T host=x64 -DWITH_GPU=ON -DWITH_MKL=ON -DCMAKE_BUILD_TYPE=Release -DCUDA_LIB=D:\projects\packages\cuda10_0\lib\x64 -DCUDNN_LIB=D:\projects\packages\cuda10_0\lib\x64 -DPADDLE_DIR=D:\projects\packages\fluid_inference -DOPENCV_DIR=D:\projects\packages\opencv3_4_6
+cmake . -G "Visual Studio 16 2019" -A x64 -T host=x64 -DWITH_GPU=ON -DWITH_MKL=ON -DCMAKE_BUILD_TYPE=Release -DCUDA_LIB=D:\projects\packages\cuda10_0\lib\x64 -DCUDNN_LIB=D:\projects\packages\cuda10_0\lib\x64 -DPADDLE_DIR=D:\projects\packages\fluid_inference -DPADDLE_LIB_NAME=paddle_inference -DOPENCV_DIR=D:\projects\packages\opencv3_4_6
 ```
 
 3. 编译
@@ -97,7 +97,7 @@ cd D:\projects\PaddleDetection\deploy\cpp\out\build\x64-Release
 | --camera_id | Option | 用来预测的摄像头ID，默认为-1（表示不使用摄像头预测）|
 | --use_gpu  | 是否使用 GPU 预测, 支持值为0或1(默认值为0)|
 | --gpu_id  |  指定进行推理的GPU device id(默认值为0)|
-| --run_mode | 使用GPU时，默认为fluid, 可选（fluid/trt_fp32/trt_fp16）|
+| --run_mode | 使用GPU时，默认为fluid, 可选（fluid/trt_fp32/trt_fp16/trt_int8）|
 | --run_benchmark | 是否重复预测来进行benchmark测速 |
 | --output_dir | 输出图片所在的文件夹, 默认为output |
 
@@ -125,18 +125,4 @@ cd D:\projects\PaddleDetection\deploy\cpp\out\build\x64-Release
 
 
 ## 性能测试
-测试环境为：系统: Windows 10专业版系统，CPU: I9-9820X, GPU: GTX 2080 Ti，Paddle预测库: 1.8.4，CUDA: 10.0, CUDNN: 7.4.  
-
-去掉前100轮warmup时间，测试100轮的平均时间，单位ms/image，只计算模型运行时间，不包括数据的处理和拷贝。
-
-
-|模型 | AnalysisPredictor(ms) | 输入|
-|---|----|---|
-| YOLOv3-MobileNetv1 | 41.51 |  608*608
-| faster_rcnn_r50_1x | 194.47 | 1333*1333
-| faster_rcnn_r50_vd_fpn_2x | 43.35 | 1344*1344
-| mask_rcnn_r50_fpn_1x | 96.96 | 1344*1344
-| mask_rcnn_r50_vd_fpn_2x | 97.66 | 1344*1344
-| ppyolo_r18vd | 5.54 | 320*320
-| ppyolo_2x | 56.93 | 608*608
-| ttfnet_darknet | 36.17 | 512*512
+benchmark请查看[BENCHMARK_INFER](../../BENCHMARK_INFER.md)
diff --git a/deploy/cpp/include/preprocess_op.h b/deploy/cpp/include/preprocess_op.h
index e48639b553867dff859cef4816558df7654edcfb..26a91cc9eb74008919cbac12e736afff6bb9ad72 100644
--- a/deploy/cpp/include/preprocess_op.h
+++ b/deploy/cpp/include/preprocess_op.h
@@ -38,7 +38,7 @@ class ImageBlob {
   // Buffer for image data after preprocessing
   std::vector<float> im_data_;
   // in net data shape(after pad)
-  std::vector<int> in_net_shape_;
+  std::vector<float> in_net_shape_;
   // Evaluation image width and height
   //std::vector<float>  eval_im_size_f_;
   // Scale factor for image size to origin image size
diff --git a/deploy/cpp/scripts/build.sh b/deploy/cpp/scripts/build.sh
index a32b1d383256f0d775dba41d16b99468558e9135..ed901d01462779ef1f2ad74c80d27f5d8067ad17 100644
--- a/deploy/cpp/scripts/build.sh
+++ b/deploy/cpp/scripts/build.sh
@@ -7,17 +7,17 @@ WITH_MKL=ON
 # 是否集成 TensorRT(仅WITH_GPU=ON 有效)
 WITH_TENSORRT=OFF
 
-# 是否使用2.0rc1预测库
-USE_PADDLE_20RC1=ON
+# paddle 预测库lib名称，由于不同平台不同版本预测库lib名称不同，请查看所下载的预测库中`paddle_inference/lib/`文件夹下`lib`的名称
+PADDLE_LIB_NAME=libpaddle_inference
 
 # TensorRT 的include路径
-TENSORRT_INC_DIR=/path/to/tensorrt/lib
+TENSORRT_INC_DIR=/path/to/tensorrt/include
 
 # TensorRT 的lib路径
-TENSORRT_LIB_DIR=/path/to/tensorrt/include
+TENSORRT_LIB_DIR=/path/to/tensorrt/lib
 
 # Paddle 预测库路径
-PADDLE_DIR=/path/to/fluid_inference/
+PADDLE_DIR=/path/to/paddle_inference
 
 # CUDA 的 lib 路径
 CUDA_LIB=/path/to/cuda/lib
@@ -72,7 +72,8 @@ cmake .. \
     -DWITH_STATIC_LIB=${WITH_STATIC_LIB} \
     -DCUDA_LIB=${CUDA_LIB} \
     -DCUDNN_LIB=${CUDNN_LIB} \
-    -DOPENCV_DIR=${OPENCV_DIR}
+    -DOPENCV_DIR=${OPENCV_DIR} \
+    -DPADDLE_LIB_NAME=${PADDLE_LIB_NAME}
 
 make
 echo "make finished!"
diff --git a/deploy/cpp/src/main.cc b/deploy/cpp/src/main.cc
index 6fb49769288f12f7c71345f03f689d6a67757f19..d2211659daa0c8ef1830904eef0ceb0e0bbfda7c 100644
--- a/deploy/cpp/src/main.cc
+++ b/deploy/cpp/src/main.cc
@@ -36,7 +36,7 @@ DEFINE_string(image_path, "", "Path of input image");
 DEFINE_string(video_path, "", "Path of input video");
 DEFINE_bool(use_gpu, false, "Infering with GPU or CPU");
 DEFINE_bool(use_camera, false, "Use camera or not");
-DEFINE_string(run_mode, "fluid", "Mode of running(fluid/trt_fp32/trt_fp16)");
+DEFINE_string(run_mode, "fluid", "Mode of running(fluid/trt_fp32/trt_fp16/trt_int8)");
 DEFINE_int32(gpu_id, 0, "Device id of GPU to execute");
 DEFINE_int32(camera_id, -1, "Device id of camera to predict");
 DEFINE_bool(run_benchmark, false, "Whether to predict a image_file repeatedly for benchmark");
@@ -207,9 +207,6 @@ int main(int argc, char** argv) {
     return -1;
   }
   // Load model and create a object detector
-  const std::vector<int> trt_min_shape = {1, FLAGS_trt_min_shape, FLAGS_trt_min_shape};
-  const std::vector<int> trt_max_shape = {1, FLAGS_trt_max_shape, FLAGS_trt_max_shape};
-  const std::vector<int> trt_opt_shape = {1, FLAGS_trt_opt_shape, FLAGS_trt_opt_shape};
   PaddleDetection::ObjectDetector det(FLAGS_model_dir, FLAGS_use_gpu, FLAGS_run_mode,
                         FLAGS_gpu_id, FLAGS_use_dynamic_shape, FLAGS_trt_min_shape,
                         FLAGS_trt_max_shape, FLAGS_trt_opt_shape);
diff --git a/deploy/cpp/src/object_detector.cc b/deploy/cpp/src/object_detector.cc
index 969425dcedc1ec532951d7919aff215026a423ff..95b8dbb2217cd9dd28761f82724214a68a1a0ebe 100644
--- a/deploy/cpp/src/object_detector.cc
+++ b/deploy/cpp/src/object_detector.cc
@@ -68,9 +68,9 @@ void ObjectDetector::LoadModel(const std::string& model_dir,
       // set use dynamic shape
       if (use_dynamic_shape) {
         // set DynamicShsape for image tensor
-        const std::vector<int> min_input_shape = {1, trt_min_shape, trt_min_shape};
-        const std::vector<int> max_input_shape = {1, trt_max_shape, trt_max_shape};
-        const std::vector<int> opt_input_shape = {1, trt_opt_shape, trt_opt_shape};
+        const std::vector<int> min_input_shape = {1, 3, trt_min_shape, trt_min_shape};
+        const std::vector<int> max_input_shape = {1, 3, trt_max_shape, trt_max_shape};
+        const std::vector<int> opt_input_shape = {1, 3, trt_opt_shape, trt_opt_shape};
         const std::map<std::string, std::vector<int>> map_min_input_shape = {{"image", min_input_shape}};
         const std::map<std::string, std::vector<int>> map_max_input_shape = {{"image", max_input_shape}};
         const std::map<std::string, std::vector<int>> map_opt_input_shape = {{"image", opt_input_shape}};
diff --git a/deploy/cpp/src/preprocess_op.cc b/deploy/cpp/src/preprocess_op.cc
index 8edd3eb1f2b7957649f7075bbcd20ba582c841a7..6a2be41799620ec681aed18db05ab5df4b281052 100644
--- a/deploy/cpp/src/preprocess_op.cc
+++ b/deploy/cpp/src/preprocess_op.cc
@@ -26,8 +26,8 @@ void InitInfo::Run(cv::Mat* im, ImageBlob* data) {
   };
   data->scale_factor_ = {1., 1.};
   data->in_net_shape_ = {
-      static_cast<int>(im->rows),
-      static_cast<int>(im->cols)
+      static_cast<float>(im->rows),
+      static_cast<float>(im->cols)
   };
 }
 
@@ -63,12 +63,12 @@ void Permute::Run(cv::Mat* im, ImageBlob* data) {
 void Resize::Run(cv::Mat* im, ImageBlob* data) {
   auto resize_scale = GenerateScale(*im);
   data->im_shape_ = {
-      static_cast<int>(im->cols * resize_scale.first),
-      static_cast<int>(im->rows * resize_scale.second)
+      static_cast<float>(im->cols * resize_scale.first),
+      static_cast<float>(im->rows * resize_scale.second)
   };
   data->in_net_shape_ = {
-      static_cast<int>(im->cols * resize_scale.first),
-      static_cast<int>(im->rows * resize_scale.second)
+      static_cast<float>(im->cols * resize_scale.first),
+      static_cast<float>(im->rows * resize_scale.second)
   };
   cv::resize(
       *im, *im, cv::Size(), resize_scale.first, resize_scale.second, interp_);
@@ -126,8 +126,8 @@ void PadStride::Run(cv::Mat* im, ImageBlob* data) {
     cv::BORDER_CONSTANT,
     cv::Scalar(0));
   data->in_net_shape_ = {
-    static_cast<int>(im->rows),
-    static_cast<int>(im->cols),
+    static_cast<float>(im->rows),
+    static_cast<float>(im->cols),
   };
 
 }
diff --git a/deploy/python/README.md b/deploy/python/README.md
index d8874ea6260b79cb7411ce303f3f1fb06c0b2d36..e0a5a32b03c8fc7854833012efba7becb325de3a 100644
--- a/deploy/python/README.md
+++ b/deploy/python/README.md
@@ -3,7 +3,7 @@
 Python预测可以使用`tools/infer.py`，此种方式依赖PaddleDetection源码；也可以使用本篇教程预测方式，先将模型导出，使用一个独立的文件进行预测。
 
 
-本篇教程使用AnalysisPredictor对[导出模型](https://github.com/PaddlePaddle/PaddleDetection/tree/dygraph/deploy/EXPORT_MODEL.md)进行高性能预测。
+本篇教程使用AnalysisPredictor对[导出模型](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/EXPORT_MODEL.md)进行高性能预测。
 
 在PaddlePaddle中预测引擎和训练引擎底层有着不同的优化方法, 预测引擎使用了AnalysisPredictor，专门针对推理进行了优化，是基于[C++预测库](https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/inference_deployment/inference/native_infer.html)的Python接口，该引擎可以对模型进行多项图优化，减少不必要的内存拷贝。如果用户在部署已训练模型的过程中对性能有较高的要求，我们提供了独立于PaddleDetection的预测脚本，方便用户直接集成部署。
 
@@ -15,7 +15,7 @@ Python预测可以使用`tools/infer.py`，此种方式依赖PaddleDetection源
 
 ## 1. 导出预测模型
 
-PaddleDetection在训练过程包括网络的前向和优化器相关参数，而在部署过程中，我们只需要前向参数，具体参考:[导出模型](https://github.com/PaddlePaddle/PaddleDetection/tree/dygraph/deploy/EXPORT_MODEL.md)
+PaddleDetection在训练过程包括网络的前向和优化器相关参数，而在部署过程中，我们只需要前向参数，具体参考:[导出模型](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/EXPORT_MODEL.md)
 
 导出后目录下，包括`infer_cfg.yml`, `model.pdiparams`,  `model.pdiparams.info`, `model.pdmodel`四个文件。
 
@@ -43,7 +43,7 @@ python deploy/python/infer.py --model_dir=/path/to/models --image_file=/path/to/
 | --video_file | Option |需要预测的视频 |
 | --camera_id | Option | 用来预测的摄像头ID，默认为-1(表示不使用摄像头预测，可设置为：0 - (摄像头数目-1) )，预测过程中在可视化界面按`q`退出输出预测结果到：output/output.mp4|
 | --use_gpu |No|是否GPU，默认为False|
-| --run_mode |No|使用GPU时，默认为fluid, 可选（fluid/trt_fp32/trt_fp16）|
+| --run_mode |No|使用GPU时，默认为fluid, 可选（fluid/trt_fp32/trt_fp16/trt_int8）|
 | --threshold |No|预测得分的阈值，默认为0.5|
 | --output_dir |No|可视化结果保存的根目录，默认为output/|
 | --run_benchmark |No|是否运行benchmark，同时需指定--image_file|
diff --git a/deploy/python/infer.py b/deploy/python/infer.py
index b10576132b6d3f979f59ec720586ccee5f7534e2..5bfd5455410183edf9151b17b0b8b47f85cb74fc 100644
--- a/deploy/python/infer.py
+++ b/deploy/python/infer.py
@@ -321,7 +321,7 @@ def load_predictor(model_dir,
     Args:
         model_dir (str): root path of __model__ and __params__
         use_gpu (bool): whether use gpu
-        run_mode (str): mode of running(fluid/trt_fp32/trt_fp16)
+        run_mode (str): mode of running(fluid/trt_fp32/trt_fp16/trt_int8)
         use_dynamic_shape (bool): use dynamic shape or not
         trt_min_shape (int): min shape for dynamic shape in trt
         trt_max_shape (int): max shape for dynamic shape in trt
@@ -335,11 +335,6 @@ def load_predictor(model_dir,
         raise ValueError(
             "Predict by TensorRT mode: {}, expect use_gpu==True, but use_gpu == {}"
             .format(run_mode, use_gpu))
-    if run_mode == 'trt_int8' and not os.path.exists(
-            os.path.join(model_dir, '_opt_cache')):
-        raise ValueError(
-            "TensorRT int8 must calibration first, and model_dir must has _opt_cache dir"
-        )
     use_calib_mode = True if run_mode == 'trt_int8' else False
     config = Config(
         os.path.join(model_dir, 'model.pdmodel'),
@@ -512,7 +507,7 @@ if __name__ == '__main__':
         "--run_mode",
         type=str,
         default='fluid',
-        help="mode of running(fluid/trt_fp32/trt_fp16)")
+        help="mode of running(fluid/trt_fp32/trt_fp16/trt_int8)")
     parser.add_argument(
         "--use_gpu",
         type=ast.literal_eval,
diff --git a/deploy/python/trt_int8_calib.py b/deploy/python/trt_int8_calib.py
deleted file mode 100644
index 32f0e0ddea30a1790428bf67ed0348a60ec74c39..0000000000000000000000000000000000000000
--- a/deploy/python/trt_int8_calib.py
+++ /dev/null
@@ -1,300 +0,0 @@
-# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import os
-import argparse
-import time
-import yaml
-import ast
-from functools import reduce
-
-from PIL import Image
-import cv2
-import numpy as np
-import glob
-import paddle
-from preprocess import preprocess, Resize, NormalizeImage, Permute, PadStride
-from visualize import visualize_box_mask
-from paddle.inference import Config
-from paddle.inference import create_predictor
-
-# Global dictionary
-SUPPORT_MODELS = {
-    'YOLO',
-    'RCNN',
-    'SSD',
-    'FCOS',
-    'SOLOv2',
-    'TTFNet',
-}
-
-
-class Detector(object):
-    """
-    Args:
-        config (object): config of model, defined by `Config(model_dir)`
-        model_dir (str): root path of model.pdiparams, model.pdmodel and infer_cfg.yml
-        use_gpu (bool): whether use gpu
-    """
-
-    def __init__(self, pred_config, model_dir, use_gpu=False):
-        self.pred_config = pred_config
-        self.predictor = load_predictor(
-            model_dir,
-            min_subgraph_size=self.pred_config.min_subgraph_size,
-            use_gpu=use_gpu)
-
-    def preprocess(self, im):
-        preprocess_ops = []
-        for op_info in self.pred_config.preprocess_infos:
-            new_op_info = op_info.copy()
-            op_type = new_op_info.pop('type')
-            preprocess_ops.append(eval(op_type)(**new_op_info))
-        im, im_info = preprocess(im, preprocess_ops,
-                                 self.pred_config.input_shape)
-        inputs = create_inputs(im, im_info)
-        return inputs
-
-    def postprocess(self, np_boxes, np_masks, inputs, threshold=0.5):
-        # postprocess output of predictor
-        results = {}
-        if self.pred_config.arch in ['Face']:
-            h, w = inputs['im_shape']
-            scale_y, scale_x = inputs['scale_factor']
-            w, h = float(h) / scale_y, float(w) / scale_x
-            np_boxes[:, 2] *= h
-            np_boxes[:, 3] *= w
-            np_boxes[:, 4] *= h
-            np_boxes[:, 5] *= w
-        results['boxes'] = np_boxes
-        if np_masks is not None:
-            results['masks'] = np_masks
-        return results
-
-    def predict(self,
-                image,
-                threshold=0.5,
-                warmup=0,
-                repeats=1,
-                run_benchmark=False):
-        '''
-        Args:
-            image (str/np.ndarray): path of image/ np.ndarray read by cv2
-            threshold (float): threshold of predicted box' score
-        Returns:
-            results (dict): include 'boxes': np.ndarray: shape:[N,6], N: number of box,
-                            matix element:[class, score, x_min, y_min, x_max, y_max]
-                            MaskRCNN's results include 'masks': np.ndarray:
-                            shape: [N, im_h, im_w]
-        '''
-        inputs = self.preprocess(image)
-        np_boxes, np_masks = None, None
-        input_names = self.predictor.get_input_names()
-        for i in range(len(input_names)):
-            input_tensor = self.predictor.get_input_handle(input_names[i])
-            input_tensor.copy_from_cpu(inputs[input_names[i]])
-
-        for i in range(warmup):
-            self.predictor.run()
-            output_names = self.predictor.get_output_names()
-            boxes_tensor = self.predictor.get_output_handle(output_names[0])
-            np_boxes = boxes_tensor.copy_to_cpu()
-            if self.pred_config.mask:
-                masks_tensor = self.predictor.get_output_handle(output_names[2])
-                np_masks = masks_tensor.copy_to_cpu()
-
-        t1 = time.time()
-        for i in range(repeats):
-            self.predictor.run()
-            output_names = self.predictor.get_output_names()
-            boxes_tensor = self.predictor.get_output_handle(output_names[0])
-            np_boxes = boxes_tensor.copy_to_cpu()
-            if self.pred_config.mask:
-                masks_tensor = self.predictor.get_output_handle(output_names[2])
-                np_masks = masks_tensor.copy_to_cpu()
-        t2 = time.time()
-        ms = (t2 - t1) * 1000.0 / repeats
-        print("Inference: {} ms per batch image".format(ms))
-
-        # do not perform postprocess in benchmark mode
-        results = []
-        if not run_benchmark:
-            if reduce(lambda x, y: x * y, np_boxes.shape) < 6:
-                print('[WARNNING] No object detected.')
-                results = {'boxes': np.array([])}
-            else:
-                results = self.postprocess(
-                    np_boxes, np_masks, inputs, threshold=threshold)
-
-        return results
-
-
-def create_inputs(im, im_info):
-    """generate input for different model type
-    Args:
-        im (np.ndarray): image (np.ndarray)
-        im_info (dict): info of image
-        model_arch (str): model type
-    Returns:
-        inputs (dict): input of model
-    """
-    inputs = {}
-    inputs['image'] = np.array((im, )).astype('float32')
-    inputs['im_shape'] = np.array((im_info['im_shape'], )).astype('float32')
-    inputs['scale_factor'] = np.array(
-        (im_info['scale_factor'], )).astype('float32')
-
-    return inputs
-
-
-class PredictConfig():
-    """set config of preprocess, postprocess and visualize
-    Args:
-        model_dir (str): root path of model.yml
-    """
-
-    def __init__(self, model_dir):
-        # parsing Yaml config for Preprocess
-        deploy_file = os.path.join(model_dir, 'infer_cfg.yml')
-        with open(deploy_file) as f:
-            yml_conf = yaml.safe_load(f)
-        self.check_model(yml_conf)
-        self.arch = yml_conf['arch']
-        self.preprocess_infos = yml_conf['Preprocess']
-        self.min_subgraph_size = yml_conf['min_subgraph_size']
-        self.labels = yml_conf['label_list']
-        self.mask = False
-        if 'mask' in yml_conf:
-            self.mask = yml_conf['mask']
-        self.input_shape = yml_conf['image_shape']
-        self.print_config()
-
-    def check_model(self, yml_conf):
-        """
-        Raises:
-            ValueError: loaded model not in supported model type 
-        """
-        for support_model in SUPPORT_MODELS:
-            if support_model in yml_conf['arch']:
-                return True
-        raise ValueError("Unsupported arch: {}, expect {}".format(yml_conf[
-            'arch'], SUPPORT_MODELS))
-
-    def print_config(self):
-        print('-----------  Model Configuration -----------')
-        print('%s: %s' % ('Model Arch', self.arch))
-        print('%s: ' % ('Transform Order'))
-        for op_info in self.preprocess_infos:
-            print('--%s: %s' % ('transform op', op_info['type']))
-        print('--------------------------------------------')
-
-
-def load_predictor(model_dir, batch_size=1, use_gpu=False, min_subgraph_size=3):
-    """set AnalysisConfig, generate AnalysisPredictor
-    Args:
-        model_dir (str): root path of __model__ and __params__
-        use_gpu (bool): whether use gpu
-    Returns:
-        predictor (PaddlePredictor): AnalysisPredictor
-    Raises:
-        ValueError: predict by TensorRT need use_gpu == True.
-    """
-    run_mode = 'trt_int8'
-    if not use_gpu and not run_mode == 'fluid':
-        raise ValueError(
-            "Predict by TensorRT mode: {}, expect use_gpu==True, but use_gpu == {}"
-            .format(run_mode, use_gpu))
-    config = Config(
-        os.path.join(model_dir, 'model.pdmodel'),
-        os.path.join(model_dir, 'model.pdiparams'))
-    precision_map = {
-        'trt_int8': Config.Precision.Int8,
-        'trt_fp32': Config.Precision.Float32,
-        'trt_fp16': Config.Precision.Half
-    }
-    if use_gpu:
-        # initial GPU memory(M), device ID
-        config.enable_use_gpu(200, 0)
-        # optimize graph and fuse op
-        config.switch_ir_optim(True)
-    else:
-        config.disable_gpu()
-
-    if run_mode in precision_map.keys():
-        config.enable_tensorrt_engine(
-            workspace_size=1 << 10,
-            max_batch_size=batch_size,
-            min_subgraph_size=min_subgraph_size,
-            precision_mode=precision_map[run_mode],
-            use_static=False,
-            use_calib_mode=True)
-
-    # disable print log when predict
-    config.disable_glog_info()
-    # enable shared memory
-    config.enable_memory_optim()
-    # disable feed, fetch OP, needed by zero_copy_run
-    config.switch_use_feed_fetch_ops(False)
-    predictor = create_predictor(config)
-    return predictor
-
-
-def print_arguments(args):
-    print('-----------  Running Arguments -----------')
-    for arg, value in sorted(vars(args).items()):
-        print('%s: %s' % (arg, value))
-    print('------------------------------------------')
-
-
-def predict_image_dir(detector):
-    for image_file in glob.glob(FLAGS.image_dir + '/*.jpg'):
-        print('image_file is', image_file)
-        results = detector.predict(image_file, threshold=0.5)
-
-
-def main():
-    pred_config = PredictConfig(FLAGS.model_dir)
-    detector = Detector(pred_config, FLAGS.model_dir, use_gpu=FLAGS.use_gpu)
-    # predict from image
-    if FLAGS.image_dir != '':
-        predict_image_dir(detector)
-
-
-if __name__ == '__main__':
-    paddle.enable_static()
-    parser = argparse.ArgumentParser(description=__doc__)
-    parser.add_argument(
-        "--model_dir",
-        type=str,
-        default=None,
-        help=("Directory include:'model.pdiparams', 'model.pdmodel', "
-              "'infer_cfg.yml', created by tools/export_model.py."),
-        required=True)
-    parser.add_argument(
-        "--image_dir", type=str, default='', help="Directory of image file.")
-    parser.add_argument(
-        "--use_gpu",
-        type=ast.literal_eval,
-        default=False,
-        help="Whether to predict with GPU.")
-    print('err?')
-    parser.add_argument(
-        "--output_dir",
-        type=str,
-        default="output",
-        help="Directory of output visualization files.")
-    FLAGS = parser.parse_args()
-    print_arguments(FLAGS)
-
-    main()
diff --git a/deploy/serving/README.md b/deploy/serving/README.md
index 812c03044ee8cda82acaea557a125a464c38df4c..38393bf36102a862e829dda345c94a4a57d44895 100644
--- a/deploy/serving/README.md
+++ b/deploy/serving/README.md
@@ -13,7 +13,7 @@ python tools/infer.py -c  --infer_img=demo/000000014439.jpg -o use_gpu=True weig
 请参考[PaddleServing](https://github.com/PaddlePaddle/Serving/tree/v0.5.0) 中安装教程安装
 
 ## 3. 导出模型
-PaddleDetection在训练过程包括网络的前向和优化器相关参数，而在部署过程中，我们只需要前向参数，具体参考:[导出模型](https://github.com/PaddlePaddle/PaddleDetection/blob/master/docs/advanced_tutorials/deploy/EXPORT_MODEL.md)
+PaddleDetection在训练过程包括网络的前向和优化器相关参数，而在部署过程中，我们只需要前向参数，具体参考:[导出模型](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.0/docs/advanced_tutorials/deploy/EXPORT_MODEL.md)
 
 ```
 python tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml -o weights=weights/yolov3_darknet53_270e_coco.pdparams --export_serving_model=True
diff --git a/docs/CHANGELOG.md b/docs/CHANGELOG.md
new file mode 100644
index 0000000000000000000000000000000000000000..1f0bc5d67529f644cd54d6b2e05039f9d59a054c
--- /dev/null
+++ b/docs/CHANGELOG.md
@@ -0,0 +1,204 @@
+# 版本更新信息
+
+## 最新版本信息
+
+### 2.0(04.15/2021)
+
+  **说明：** 自2.0版本开始，动态图作为PaddleDetection默认版本，原`dygraph`目录切换为根目录，原静态图实现移动到`static`目录下。
+
+  - 动态图模型丰富度提升：
+    - 发布PP-YOLOv2及PP-YOLO tiny模型，PP-YOLOv2 COCO test数据集精度达到49.5%，V100预测速度达到68.9 FPS
+    - 发布旋转框检测模型S2ANet
+    - 发布两阶段实用模型PSS-Det
+    - 发布人脸检测模型Blazeface
+
+  - 新增基础模块：
+    - 新增SENet，GhostNet，Res2Net骨干网络
+    - 新增VisualDL训练可视化支持
+    - 新增单类别精度计算及PR曲线绘制功能
+    - YOLO系列模型支持NHWC数据格式
+
+  - 预测部署：
+    - 发布主要模型的预测benchmark数据
+    - 适配TensorRT6，支持TensorRT动态尺寸输入，支持TensorRT int8量化预测
+    - PP-YOLO, YOLOv3, SSD, TTFNet, FCOS, Faster RCNN等7类模型在Linux、Windows、NV Jetson平台下python/cpp/TRT预测部署打通:
+
+  - 检测模型压缩：
+    - 蒸馏：新增动态图蒸馏支持，并发布YOLOv3-MobileNetV1蒸馏模型
+    - 联合策略：新增动态图剪裁+蒸馏联合策略压缩方案，并发布YOLOv3-MobileNetV1的剪裁+蒸馏压缩模型
+    - 问题修复：修复动态图量化模型导出问题
+
+  - 文档：
+    - 新增动态图英文文档：包含首页文档，入门使用，快速开始，模型算法、新增数据集等
+    - 新增动态图中英文安装文档
+    - 新增动态图RCNN系列和YOLO系列配置文件模板及配置项说明文档
+
+
+## 历史版本信息
+
+### 2.0-rc(02.23/2021)
+  - 动态图模型丰富度提升：
+    - 优化RCNN模型组网及训练方式，RCNN系列模型精度提升(依赖Paddle develop或2.0.1版本)
+    - 新增支持SSDLite，FCOS，TTFNet，SOLOv2系列模型
+    - 新增行人和车辆垂类目标检测模型
+
+  - 新增动态图基础模块：
+    - 新增MobileNetV3，HRNet骨干网络
+    - 优化RoIAlign计算逻辑，RCNN系列模型精度提升(依赖Paddle develop或2.0.1版本)
+    - 新增支持Synchronized Batch Norm
+    - 新增支持Modulated Deformable Convolution
+
+  - 预测部署：
+    - 发布动态图python、C++、Serving部署解决方案及文档，支持Faster RCNN，Mask RCNN，YOLOv3，PP-YOLO，SSD，TTFNet，FCOS，SOLOv2等系列模型预测部署
+    - 动态图预测部署支持TensorRT模式FP32，FP16推理加速
+
+  - 检测模型压缩：
+    - 裁剪：新增动态图裁剪支持，并发布YOLOv3-MobileNetV1裁剪模型
+    - 量化：新增动态图量化支持，并发布YOLOv3-MobileNetV1和YOLOv3-MobileNetV3量化模型
+
+  - 文档：
+    - 新增动态图入门教程文档：包含安装说明，快速开始，准备数据，训练/评估/预测流程文档
+    - 新增动态图进阶教程文档：包含模型压缩、推理部署文档
+    - 新增动态图模型库文档
+
+### v2.0-beta(12.20/2020)
+  - 动态图支持:
+    - 支持Faster-RCNN, Mask-RCNN, FPN, Cascade Faster/Mask RCNN, YOLOv3和SSD模型，试用版本。
+  - 模型提升：
+    - 更新PP-YOLO MobileNetv3 large和small模型，精度提升，并新增裁剪和蒸馏后的模型。
+  - 新功能：
+    - 支持VisualDL可视化数据预处理图片。
+
+  - Bug修复:
+    - 修复BlazeFace人脸关键点预测bug。
+
+
+### v0.5.0(11/2020)
+  - 模型丰富度提升：
+    - 发布SOLOv2系列模型，其中SOLOv2-Light-R50-VD-DCN-FPN 模型在单卡V100上达到 38.6 FPS，加速24% ，COCO验证集精度达到38.8%, 提升2.4绝对百分点。
+    - 新增Android移动端检测demo，包括SSD、YOLO系列模型，可直接扫码安装体验。
+
+  - 移动端模型优化：
+    - 新增PACT新量化策略，YOLOv3-Mobilenetv3在COCO数据集上比普通量化相比提升0.7%。
+
+  - 易用性提升及功能组件：
+    - 增强generate_proposal_labels算子功能，规避模型出nan风险。
+    - 修复deploy下python与C++预测若干问题。
+    - 统一COCO与VOC数据集下评估流程，支持输出单类AP和P-R曲线。
+    - PP-YOLO支持矩形输入图像。
+
+  - 文档：
+    - 新增目标检测全流程教程，新增Jetson平台部署教程。
+
+
+### v0.4.0(07/2020)
+  - 模型丰富度提升：
+    - 发布PPYOLO模型，COCO数据集精度达到45.2%，单卡V100预测速度达到72.9 FPS，精度和预测速度优于YOLOv4模型。
+    - 新增TTFNet模型，base版本对齐竞品，COCO数据集精度达到32.9%。
+    - 新增HTC模型，base版本对齐竞品，COCO数据集精度达到42.2%。
+    - 新增BlazeFace人脸关键点检测模型，在Wider-Face数据集的Easy-Set精度达到85.2%。
+    - 新增ACFPN模型， COCO数据集精度达到39.6%。
+    - 发布服务器端通用目标检测模型（包含676类），相同策略在COCO数据集上，V100为19.5FPS时，COCO mAP可以达到49.4%。
+
+  - 移动端模型优化：
+    - 新增SSDLite系列优化模型，包括新增GhostNet的Backbone，新增FPN组件等，精度提升0.5%-1.5%。
+
+  - 易用性提升及功能组件：
+    - 新增GridMask, RandomErasing数据增强方法。
+    - 新增Matrix NMS支持。
+    - 新增EMA(Exponential Moving Average)训练支持。
+    - 新增多机训练方法，两机相对于单机平均加速比80%，多机训练支持待进一步验证。
+
+### v0.3.0(05/2020)
+  - 模型丰富度提升：
+    - 添加Efficientdet-D0模型，速度与精度优于竞品。
+    - 新增YOLOv4预测模型，精度对齐竞品；新增YOLOv4在Pascal VOC数据集上微调训练，精度达到85.5%。
+    - YOLOv3新增MobileNetV3骨干网络，COCO数据集精度达到31.6%。
+    - 添加Anchor-free模型FCOS，精度优于竞品。
+    - 添加Anchor-free模型CornernetSqueeze，精度优于竞品，优化模型的COCO数据集精度38.2%, +3.7%，速度较YOLOv3-Darknet53快5%。
+    - 添加服务器端实用目标检测模型CascadeRCNN-ResNet50vd模型，速度与精度优于竞品EfficientDet。
+
+  - 移动端推出3种模型：
+    - SSDLite系列模型：SSDLite-Mobilenetv3 small/large模型，精度优于竞品。
+    - YOLOv3移动端方案: YOLOv3-MobileNetv3模型压缩后加速3.5倍，速度和精度均领先于竞品的SSDLite模型。
+    - RCNN移动端方案：CascadeRCNN-MobileNetv3经过系列优化, 推出输入图像分别为320x320和640x640的模型，速度与精度具有较高性价比。
+
+  - 预测部署重构：
+    - 新增Python预测部署流程，支持RCNN，YOLO，SSD，RetinaNet，人脸系列模型，支持视频预测。
+    - 重构C++预测部署，提高易用性。
+
+  - 易用性提升及功能组件：
+    - 增加AutoAugment数据增强。
+    - 升级检测库文档结构。
+    - 支持迁移学习自动进行shape匹配。
+    - 优化mask分支评估阶段内存占用。
+
+### v0.2.0(02/2020)
+  - 新增模型:
+    - 新增基于CBResNet模型。
+    - 新增LibraRCNN模型。
+    - 进一步提升YOLOv3模型精度，基于COCO数据精度达到43.2%，相比上个版本提升1.4%。
+  - 新增基础模块:
+    - 主干网络: 新增CBResNet。
+    - loss模块: YOLOv3的loss支持细粒度op组合。
+    - 正则模块: 新增DropBlock模块。
+  - 功能优化和改进:
+    - 加速YOLOv3数据预处理，整体训练提速40%。
+    - 优化数据预处理逻辑，提升易用性。
+    - 增加人脸检测预测benchmark数据。
+    - 增加C++预测引擎Python API预测示例。
+  - 检测模型压缩 :
+    - 裁剪: 发布MobileNet-YOLOv3裁剪方案和模型，基于VOC数据FLOPs - 69.6%, mAP + 1.4%，基于COCO数据FLOPS-28.8%, mAP + 0.9%; 发布ResNet50vd-dcn-YOLOv3裁剪方案和模型，基于COCO数据集FLOPS - 18.4%, mAP + 0.8%。
+    - 蒸馏: 发布MobileNet-YOLOv3蒸馏方案和模型，基于VOC数据mAP + 2.8%，基于COCO数据mAP + 2.1%。
+    - 量化: 发布YOLOv3-MobileNet和BlazeFace的量化模型。
+    - 裁剪+蒸馏: 发布MobileNet-YOLOv3裁剪+蒸馏方案和模型，基于COCO数据FLOPS - 69.6%，基于TensorRT预测加速64.5%，mAP - 0.3 %; 发布ResNet50vd-dcn-YOLOv3裁剪+蒸馏方案和模型，基于COCO数据FLOPS - 43.7%，基于TensorRT预测加速24.0%，mAP + 0.6 %。
+    - 搜索: 开源BlazeFace-Nas的完成搜索方案。
+  - 预测部署:
+    - 集成 TensorRT，支持FP16、FP32、INT8量化推理加速。
+  - 文档:
+    - 增加详细的数据预处理模块介绍文档以及实现自定义数据Reader文档。
+    - 增加如何新增算法模型的文档。
+    - 文档部署到网站: https://paddledetection.readthedocs.io
+
+### 12/2019
+- 增加Res2Net模型。
+- 增加HRNet模型。
+- 增加GIOU loss和DIOU loss。
+
+
+### 21/11/2019
+- 增加CascadeClsAware RCNN模型。
+- 增加CBNet，ResNet200和Non-local模型。
+- 增加SoftNMS。
+- 增加Open Image V5数据集和Objects365数据集模型。
+
+### 10/2019
+- 增加增强版YOLOv3模型，精度高达41.4%。
+- 增加人脸检测模型BlazeFace、Faceboxes。
+- 丰富基于COCO的模型，精度高达51.9%。
+- 增加Objects365 2019 Challenge上夺冠的最佳单模型之一CACascade-RCNN。
+- 增加行人检测和车辆检测预训练模型。
+- 支持FP16训练。
+- 增加跨平台的C++推理部署方案。
+- 增加模型压缩示例。
+
+
+### 2/9/2019
+- 增加GroupNorm模型。
+- 增加CascadeRCNN+Mask模型。
+
+### 5/8/2019
+- 增加Modulated Deformable Convolution系列模型。
+
+### 29/7/2019
+
+- 增加检测库中文文档
+- 修复R-CNN系列模型训练同时进行评估的问题
+- 新增ResNext101-vd + Mask R-CNN + FPN模型
+- 新增基于VOC数据集的YOLOv3模型
+
+### 3/7/2019
+
+- 首次发布PaddleDetection检测库和检测模型库
+- 模型包括：Faster R-CNN, Mask R-CNN, Faster R-CNN+FPN, Mask
+  R-CNN+FPN, Cascade-Faster-RCNN+FPN, RetinaNet, YOLOv3, 和SSD.
diff --git a/docs/MODEL_ZOO_cn.md b/docs/MODEL_ZOO_cn.md
index 5d17e809dfe852ac8619d495b917c850cadd46a4..80db6f2dc94d828ac6c6ab24b7680143e9980bf8 100644
--- a/docs/MODEL_ZOO_cn.md
+++ b/docs/MODEL_ZOO_cn.md
@@ -30,36 +30,36 @@ Paddle提供基于ImageNet的骨架网络预训练模型。所有预训练模型
 
 ### Faster R-CNN
 
-请参考[Faster R-CNN](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/faster_rcnn/)
+请参考[Faster R-CNN](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/faster_rcnn/)
 
 ### Mask R-CNN
 
-请参考[Mask R-CNN](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/mask_rcnn/)
+请参考[Mask R-CNN](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/mask_rcnn/)
 
 ### Cascade R-CNN
 
-请参考[Cascade R-CNN](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/cascade_rcnn/)
+请参考[Cascade R-CNN](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/cascade_rcnn/)
 
 ### YOLOv3
 
-请参考[YOLOv3](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/yolov3/)
+请参考[YOLOv3](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/)
 
 ### SSD
 
-请参考[SSD](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/ssd/)
+请参考[SSD](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ssd/)
 
 ### FCOS
 
-请参考[FCOS](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/fcos/)
+请参考[FCOS](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/fcos/)
 
 ### SOLOv2
 
-请参考[SOLOv2](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/solov2/)
+请参考[SOLOv2](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/solov2/)
 
 ### PP-YOLO
 
-请参考[PP-YOLO](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/ppyolo/)
+请参考[PP-YOLO](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ppyolo/)
 
 ### TTFNet
 
-请参考[TTFNet](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/ttfnet/)
+请参考[TTFNet](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/ttfnet/)
diff --git a/docs/advanced_tutorials/MODEL_TECHNICAL.md b/docs/advanced_tutorials/MODEL_TECHNICAL.md
new file mode 100644
index 0000000000000000000000000000000000000000..614f0985e11ecce1e4cb187818a0534c76a7a7eb
--- /dev/null
+++ b/docs/advanced_tutorials/MODEL_TECHNICAL.md
@@ -0,0 +1,407 @@
+# 新增模型算法
+为了让用户更好的使用PaddleDetection，本文档中，我们将介绍PaddleDetection的主要模型技术细节及应用
+
+## 目录
+- [1.简介](#1.简介)
+- [2.新增模型](#2.新增模型)
+  - [2.1新增网络结构](#2.1新增网络结构)
+    - [2.1.1新增Backbone](#2.1.1新增Backbone)
+    - [2.1.2新增Neck](#2.1.2新增Neck)
+    - [2.1.3新增Head](#2.1.3新增Head)
+    - [2.1.4新增Loss](#2.1.4新增Loss)
+    - [2.1.5新增后处理模块](#2.1.5新增后处理模块)
+    - [2.1.6新增Architecture](#2.1.6新增Architecture)
+  - [2.2新增配置文件](#2.2新增配置文件)
+    - [2.2.1网络结构配置文件](#2.2.1网络结构配置文件)
+    - [2.2.2优化器配置文件](#2.2.2优化器配置文件)
+    - [2.2.3Reader配置文件](#2.2.3Reader配置文件)
+
+### 1.简介
+PaddleDetecion中的每一种模型对应一个文件夹，以yolov3为例，yolov3系列的模型对应于`configs/yolov3`文件夹，其中yolov3_darknet的总配置文件`configs/yolov3/yolov3_darknet53_270e_coco.yml`的内容如下：
+```
+_BASE_: [
+  '../datasets/coco_detection.yml', # 数据集配置文件，所有模型共用
+  '../runtime.yml', # 运行时相关配置
+  '_base_/optimizer_270e.yml', # 优化器相关配置
+  '_base_/yolov3_darknet53.yml', # yolov3网络结构配置文件
+  '_base_/yolov3_reader.yml', # yolov3 Reader模块配置
+]
+
+# 定义在此处的相关配置可以覆盖上述文件中的同名配置
+snapshot_epoch: 5
+weights: output/yolov3_darknet53_270e_coco/model_final
+```
+可以看到，配置文件中的模块进行了清晰的划分，除了公共的数据集配置以及运行时配置，其他配置被划分为优化器，网络结构以及Reader模块。PaddleDetection中支持丰富的优化器，学习率调整策略，预处理算子等，因此大多数情况下不需要编写优化器以及Reader相关的代码，而只需要在配置文件中配置即可。因此，新增一个模型的主要在于搭建网络结构。
+
+PaddleDetection网络结构的代码在`ppdet/modeling/`中，所有网络结构以组件的形式进行定义与组合，网络结构的主要构成如下所示：
+```
+  ppdet/modeling/
+  ├── architectures
+  │   ├── faster_rcnn.py # Faster Rcnn模型
+  │   ├── ssd.py         # SSD模型
+  │   ├── yolo.py      # YOLOv3模型
+  │   │   ...
+  ├── heads       # 检测头模块
+  │   ├── xxx_head.py    # 定义各类检测头
+  │   ├── roi_extractor.py #检测感兴趣区域提取
+  ├── backbones          # 基干网络模块
+  │   ├── resnet.py      # ResNet网络
+  │   ├── mobilenet.py   # MobileNet网络
+  │   │   ...
+  ├── losses             # 损失函数模块
+  │   ├── xxx_loss.py    # 定义注册各类loss函数
+  ├── necks     # 特征融合模块
+  │   ├── xxx_fpn.py  # 定义各种FPN模块
+  ├── proposal_generator # anchor & proposal生成与匹配模块
+  │   ├── anchor_generator.py   # anchor生成模块
+  │   ├── proposal_generator.py # proposal生成模块
+  │   ├── target.py   # anchor & proposal的匹配函数
+  │   ├── target_layer.py   # anchor & proposal的匹配模块
+  ├── tests  # 单元测试模块
+  │   ├── test_xxx.py  # 对网络中的算子以及模块结构进行单元测试
+  ├── ops.py  # 封装各类PaddlePaddle物体检测相关公共检测组件/算子
+  ├── layers.py  # 封装及注册各类PaddlePaddle物体检测相关公共检测组件/算子
+  ├── bbox_utils.py # 封装检测框相关的函数
+  ├── post_process.py # 封装及注册后处理相关模块
+  ├── shape_spec.py # 定义模块输出shape的类
+```
+
+![](../images/model_figure.png)
+
+### 2.新增模型
+接下来，以单阶段检测器YOLOv3为例，对建立模型过程进行详细描述，按照此思路您可以快速搭建新的模型。
+
+#### 2.1新增网络结构
+
+##### 2.1.1新增Backbone
+
+PaddleDetection中现有所有Backbone网络代码都放置在`ppdet/modeling/backbones`目录下，所以我们在其中新建`darknet.py`如下：
+```python
+import paddle.nn as nn
+from ppdet.core.workspace import register, serializable
+
+@register
+@serializable
+class DarkNet(nn.Layer):
+
+    __shared__ = ['norm_type']
+
+    def __init__(self,
+                 depth=53,
+                 return_idx=[2, 3, 4],
+                 norm_type='bn',
+                 norm_decay=0.):
+        super(DarkNet, self).__init__()
+        # 省略内容
+
+    def forward(self, inputs):
+        # 省略处理逻辑
+        pass
+
+    @property
+    def out_shape(self):
+        # 省略内容
+        pass
+```
+然后在`backbones/__init__.py`中加入引用：
+```python
+from . import darknet
+from .darknet import *
+```
+**几点说明：**
+- 为了在yaml配置文件中灵活配置网络，所有Backbone需要利用`ppdet.core.workspace`里的`register`进行注册，形式请参考如上示例。此外，可以使用`serializable`以使backbone支持序列化；
+- 所有的Backbone需继承`paddle.nn.Layer`类，并实现forward函数。此外，还需实现out_shape属性定义输出的feature map的channel信息，具体可参见源码；
+- `__shared__`为了实现一些参数的配置全局共享，这些参数可以被backbone, neck，head，loss等所有注册模块共享。
+
+##### 2.1.2新增Neck
+特征融合模块放置在`ppdet/modeling/necks`目录下，我们在其中新建`yolo_fpn.py`如下：
+
+``` python
+import paddle.nn as nn
+from ppdet.core.workspace import register, serializable
+
+@register
+@serializable
+class YOLOv3FPN(nn.Layer):
+    __shared__ = ['norm_type']
+
+    def __init__(self,
+                in_channels=[256, 512, 1024],
+                norm_type='bn'):
+        super(YOLOv3FPN, self).__init__()
+        # 省略内容
+
+    def forward(self, blocks):
+        # 省略内容
+        pass
+
+    @classmethod
+    def from_config(cls, cfg, input_shape):
+        # 省略内容
+        pass
+
+    @property
+    def out_shape(self):
+        # 省略内容
+        pass
+```
+然后在`necks/__init__.py`中加入引用：
+```python
+from . import yolo_fpn
+from .yolo_fpn import *
+```
+**几点说明：**
+- neck模块需要使用`register`进行注册，可以使用`serializable`进行序列化；
+- neck模块需要继承`paddle.nn.Layer`类，并实现forward函数。除此之外，还需要实现`out_shape`属性，用于定义输出的feature map的channel信息，还需要实现类函数`from_config`用于在配置文件中推理出输入channel，并用于`YOLOv3FPN`的初始化；
+- neck模块可以使用`__shared__`实现一些参数的配置全局共享。
+
+##### 2.1.3新增Head
+Head模块全部存放在`ppdet/modeling/heads`目录下，我们在其中新建`yolo_head.py`如下
+``` python
+import paddle.nn as nn
+from ppdet.core.workspace import register
+
+@register
+class YOLOv3Head(nn.Layer):
+    __shared__ = ['num_classes']
+    __inject__ = ['loss']
+
+    def __init__(self,
+                 anchors=[[10, 13], [16, 30], [33, 23],
+                   [30, 61], [62, 45],[59, 119],
+                   [116, 90], [156, 198], [373, 326]],
+                 anchor_masks=[[6, 7, 8], [3, 4, 5], [0, 1, 2]],
+                 num_classes=80,
+                 loss='YOLOv3Loss',
+                 iou_aware=False,
+                 iou_aware_factor=0.4):
+        super(YOLOv3Head, self).__init__()
+        # 省略内容
+
+    def forward(self, feats, targets=None):
+        # 省略内容
+        pass
+```
+然后在`heads/__init__.py`中加入引用：
+```python
+from . import yolo_head
+from .yolo_head import *
+```
+**几点说明：**
+- Head模块需要使用`register`进行注册；
+- Head模块需要继承`paddle.nn.Layer`类，并实现forward函数。
+- `__inject__`表示引入全局字典中已经封装好的模块。如loss等。
+
+##### 2.1.4新增Loss
+Loss模块全部存放在`ppdet/modeling/losses`目录下，我们在其中新建`yolo_loss.py`下
+```python
+import paddle.nn as nn
+from ppdet.core.workspace import register
+
+@register
+class YOLOv3Loss(nn.Layer):
+
+    __inject__ = ['iou_loss', 'iou_aware_loss']
+    __shared__ = ['num_classes']
+
+    def __init__(self,
+                 num_classes=80,
+                 ignore_thresh=0.7,
+                 label_smooth=False,
+                 downsample=[32, 16, 8],
+                 scale_x_y=1.,
+                 iou_loss=None,
+                 iou_aware_loss=None):
+        super(YOLOv3Loss, self).__init__()
+        # 省略内容
+
+    def forward(self, inputs, targets, anchors):
+        # 省略内容
+        pass
+```
+然后在`losses/__init__.py`中加入引用：
+```python
+from . import yolo_loss
+from .yolo_loss import *
+```
+**几点说明：**
+- loss模块需要使用`register`进行注册；
+- loss模块需要继承`paddle.nn.Layer`类，并实现forward函数。
+- 可以使用`__inject__`表示引入全局字典中已经封装好的模块，使用`__shared__`可以实现一些参数的配置全局共享。
+
+##### 2.1.5新增后处理模块
+后处理模块定义在`ppdet/modeling/post_process.py`中，其中定义了`BBoxPostProcess`类来进行后处理操作，如下所示：
+``` python
+from ppdet.core.workspace import register
+
+@register
+class BBoxPostProcess(object):
+    __shared__ = ['num_classes']
+    __inject__ = ['decode', 'nms']
+
+    def __init__(self, num_classes=80, decode=None, nms=None):
+        # 省略内容
+        pass
+
+    def __call__(self, head_out, rois, im_shape, scale_factor):
+        # 省略内容
+        pass
+```
+**几点说明：**
+- 后处理模块需要使用`register`进行注册
+- `__inject__`注入了全局字典中封装好的模块，如decode和nms等。decode和nms定义在`ppdet/modeling/layers.py`中。
+
+##### 2.1.6新增Architecture
+
+所有architecture网络代码都放置在`ppdet/modeling/architectures`目录下，`meta_arch.py`中定义了`BaseArch`类，代码如下：
+``` python
+import paddle.nn as nn
+from ppdet.core.workspace import register
+
+@register
+class BaseArch(nn.Layer):
+     def __init__(self):
+        super(BaseArch, self).__init__()
+
+    def forward(self, inputs):
+        self.inputs = inputs
+        self.model_arch()
+
+        if self.training:
+            out = self.get_loss()
+        else:
+            out = self.get_pred()
+        return out
+
+    def model_arch(self, ):
+        pass
+
+    def get_loss(self, ):
+        raise NotImplementedError("Should implement get_loss method!")
+
+    def get_pred(self, ):
+        raise NotImplementedError("Should implement get_pred method!")
+```
+所有的architecture需要继承`BaseArch`类，如`yolo.py`中的`YOLOv3`定义如下：
+``` python
+@register
+class YOLOv3(BaseArch):
+    __category__ = 'architecture'
+    __inject__ = ['post_process']
+
+    def __init__(self,
+                 backbone='DarkNet',
+                 neck='YOLOv3FPN',
+                 yolo_head='YOLOv3Head',
+                 post_process='BBoxPostProcess'):
+        super(YOLOv3, self).__init__()
+        self.backbone = backbone
+        self.neck = neck
+        self.yolo_head = yolo_head
+        self.post_process = post_process
+
+    @classmethod
+    def from_config(cls, cfg, *args, **kwargs):
+        # 省略内容
+        pass
+
+    def get_loss(self):
+        # 省略内容
+        pass
+
+    def get_pred(self):
+        # 省略内容
+        pass
+```
+
+**几点说明：**
+- 所有的architecture需要使用`register`进行注册
+- 在组建一个完整的网络时必须要设定`__category__ = 'architecture'`来表示一个完整的物体检测模型；
+- backbone, neck, yolo_head以及post_process等检测组件传入到architecture中组成最终的网络。像这样将检测模块化，提升了检测模型的复用性，可以通过组合不同的检测组件得到多个模型。
+- from_config类函数实现了模块间组合时channel的自动配置。
+
+#### 2.2新增配置文件
+
+##### 2.2.1网络结构配置文件
+上面详细地介绍了如何新增一个architecture，接下来演示如何配置一个模型，yolov3关于网络结构的配置在`configs/yolov3/_base_/`文件夹中定义，如`yolov3_darknet53.yml`定义了yolov3_darknet的网络结构，其定义如下：
+```
+architecture: YOLOv3
+pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/DarkNet53_pretrained.pdparams
+norm_type: sync_bn
+
+YOLOv3:
+  backbone: DarkNet
+  neck: YOLOv3FPN
+  yolo_head: YOLOv3Head
+  post_process: BBoxPostProcess
+
+DarkNet:
+  depth: 53
+  return_idx: [2, 3, 4]
+
+# use default config
+# YOLOv3FPN:
+
+YOLOv3Head:
+  anchors: [[10, 13], [16, 30], [33, 23],
+            [30, 61], [62, 45], [59, 119],
+            [116, 90], [156, 198], [373, 326]]
+  anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
+  loss: YOLOv3Loss
+
+YOLOv3Loss:
+  ignore_thresh: 0.7
+  downsample: [32, 16, 8]
+  label_smooth: false
+
+BBoxPostProcess:
+  decode:
+    name: YOLOBox
+    conf_thresh: 0.005
+    downsample_ratio: 32
+    clip_bbox: true
+  nms:
+    name: MultiClassNMS
+    keep_top_k: 100
+    score_threshold: 0.01
+    nms_threshold: 0.45
+    nms_top_k: 1000
+
+```
+可以看到在配置文件中，首先需要指定网络的architecture，pretrain_weights指定训练模型的url或者路径，norm_type等可以作为全局参数共享。模型的定义自上而下依次在文件中定义，与上节中的模型组件一一对应。对于一些模型组件，如果采用默认
+的参数，可以不用配置，如上文中的`yolo_fpn`。通过改变相关配置，我们可以轻易地组合出另一个模型，比如`configs/yolov3/_base_/yolov3_mobilenet_v1.yml`将backbone从Darknet切换成MobileNet。
+
+##### 2.2.2优化器配置文件
+优化器配置文件定义模型使用的优化器以及学习率的调度策略，目前PaddleDetection中已经集成了多种多样的优化器和学习率策略，具体可参见代码`ppdet/optimizer.py`。比如，yolov3的优化器配置文件定义在`configs/yolov3/_base_/optimizer_270e.yml`，其定义如下：
+```
+epoch: 270
+
+LearningRate:
+  base_lr: 0.001
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones:
+    # epoch数目
+    - 216
+    - 243
+  - !LinearWarmup
+    start_factor: 0.
+    steps: 4000
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0005
+    type: L2
+```
+**几点说明：**
+- 可以通过OptimizerBuilder.optimizer指定优化器的类型及参数，目前支持的优化可以参考[PaddlePaddle官方文档](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/optimizer/Overview_cn.html)
+- 可以设置LearningRate.schedulers设置不同学习率调整策略的组合，PaddlePaddle目前支持多种学习率调整策略，具体也可参考[PaddlePaddle官方文档](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/optimizer/Overview_cn.html)。需要注意的是，你需要对于PaddlePaddle中的学习率调整策略进行简单的封装，具体可参考源码`ppdet/optimizer.py`。
+
+##### 2.2.3Reader配置文件
+关于Reader的配置可以参考[Reader配置文档](./READER.md#5.配置及运行)。
+
+> 看过此文档，您应该对PaddleDetection中模型搭建与配置有了一定经验，结合源码会理解的更加透彻。关于模型技术，如您有其他问题或建议，请给我们提issue，我们非常欢迎您的反馈。
diff --git a/docs/advanced_tutorials/READER.md b/docs/advanced_tutorials/READER.md
new file mode 100644
index 0000000000000000000000000000000000000000..32df651c77dea9c8becc787fc3896c4947238b1d
--- /dev/null
+++ b/docs/advanced_tutorials/READER.md
@@ -0,0 +1,328 @@
+# 数据处理模块
+
+## 目录
+- [1.简介](#1.简介)
+- [2.数据集](#2.数据集)
+  - [2.1COCO数据集](#2.1COCO数据集)
+  - [2.2Pascal VOC数据集](#2.2Pascal-VOC数据集)
+  - [2.3自定义数据集](#2.3自定义数据集)
+- [3.数据预处理](#3.数据预处理)
+  - [3.1数据增强算子](#3.1数据增强算子)
+  - [3.2自定义数据增强算子](#3.2自定义数据增强算子)
+- [4.Raeder](#4.Reader)
+- [5.配置及运行](#5.配置及运行)
+  - [5.1配置](#5.1配置)
+  - [5.2运行](#5.2运行)
+
+### 1.简介
+PaddleDetection的数据处理模块的所有代码逻辑在`ppdet/data/`中，数据处理模块用于加载数据并将其转换成适用于物体检测模型的训练、评估、推理所需要的格式。
+数据处理模块的主要构成如下架构所示：
+```bash
+  ppdet/data/
+  ├── reader.py     # 基于Dataloader封装的Reader模块
+  ├── source  # 数据源管理模块
+  │   ├── dataset.py      # 定义数据源基类，各类数据集继承于此
+  │   ├── coco.py         # COCO数据集解析与格式化数据
+  │   ├── voc.py          # Pascal VOC数据集解析与格式化数据
+  │   ├── widerface.py    # WIDER-FACE数据集解析与格式化数据
+  │   ├── category.py    # 相关数据集的类别信息
+  ├── transform  # 数据预处理模块
+  │   ├── batch_operators.py  # 定义各类基于批量数据的预处理算子
+  │   ├── op_helper.py    # 预处理算子的辅助函数
+  │   ├── operators.py    # 定义各类基于单张图片的预处理算子
+  │   ├── gridmask_utils.py    # GridMask数据增强函数
+  │   ├── autoaugment_utils.py  # AutoAugment辅助函数
+  ├── shm_utils.py     # 用于使用共享内存的辅助函数
+  ```
+
+
+### 2.数据集
+数据集定义在`source`目录下，其中`dataset.py`中定义了数据集的基类`DetDataSet`, 所有的数据集均继承于基类，`DetDataset`基类里定义了如下等方法：
+
+| 方法                        | 输入   | 输出           |  备注                   |
+| :------------------------: | :----: | :------------: | :--------------: |
+| \_\_len\_\_ | 无     | int, 数据集中样本的数量     | 过滤掉了无标注的样本 |
+| \_\_getitem\_\_ | int, 样本的索引idx     |  dict, 索引idx对应的样本roidb  | 得到transform之后的样本roidb |
+| check_or_download_dataset            | 无     | 无  |  检查数据集是否存在，如果不存在则下载，目前支持COCO, VOC，widerface等数据集 |
+| set_kwargs                |  可选参数，以键值对的形式给出   | 无  | 目前用于支持接收mixup, cutmix等参数的设置 |
+| set_transform            | 一系列的transform函数   | 无  | 设置数据集的transform函数 |
+| set_epoch            | int, 当前的epoch  | 无  | 用于dataset与训练过程的交互 |
+| parse_dataset            | 无  | 无  | 用于从数据中读取所有的样本 |
+| get_anno            | 无  | 无  | 用于获取标注文件的路径 |
+
+当一个数据集类继承自`DetDataSet`，那么它只需要实现parse_dataset函数即可。parse_dataset根据数据集设置的数据集根路径dataset_dir，图片文件夹image_dir， 标注文件路径anno_path取出所有的样本，并将其保存在一个列表roidbs中，每一个列表中的元素为一个样本xxx_rec(比如coco_rec或者voc_rec)，用dict表示，dict中包含样本的image, gt_bbox, gt_class等字段。COCO和Pascal-VOC数据集中的xxx_rec的数据结构定义如下：
+  ```python
+  xxx_rec = {
+      'im_file': im_fname,         # 一张图像的完整路径
+      'im_id': np.array([img_id]), # 一张图像的ID序号
+      'h': im_h,                   # 图像高度
+      'w': im_w,                   # 图像宽度
+      'is_crowd': is_crowd,        # 是否是群落对象, 默认为0 (VOC中无此字段)
+      'gt_class': gt_class,        # 标注框标签名称的ID序号
+      'gt_bbox': gt_bbox,          # 标注框坐标(xmin, ymin, xmax, ymax)
+      'gt_poly': gt_poly,          # 分割掩码，此字段只在coco_rec中出现，默认为None
+      'difficult': difficult       # 是否是困难样本，此字段只在voc_rec中出现，默认为0
+  }
+  ```
+
+xxx_rec中的内容也可以通过`DetDataSet`的data_fields参数来控制，即可以过滤掉一些不需要的字段，但大多数情况下不需要修改，按照`configs/dataset`中的默认配置即可。
+
+此外，在parse_dataset函数中，保存了类别名到id的映射的一个字典`cname2cid`。在coco数据集中，会利用[COCO API](https://github.com/cocodataset/cocoapi)从标注文件中加载数据集的类别名，并设置此字典。在voc数据集中，如果设置`use_default_label=False`，将从`label_list.txt`中读取类别列表，反之将使用voc默认的类别列表。
+
+#### 2.1COCO数据集
+COCO数据集目前分为COCO2014和COCO2017，主要由json文件和image文件组成，其组织结构如下所示：
+
+  ```
+  dataset/coco/
+  ├── annotations
+  │   ├── instances_train2014.json
+  │   ├── instances_train2017.json
+  │   ├── instances_val2014.json
+  │   ├── instances_val2017.json
+  │   │   ...
+  ├── train2017
+  │   ├── 000000000009.jpg
+  │   ├── 000000580008.jpg
+  │   │   ...
+  ├── val2017
+  │   ├── 000000000139.jpg
+  │   ├── 000000000285.jpg
+  │   │   ...
+  ```
+
+在`source/coco.py`中定义并注册了`COCODataSet`数据集类，其继承自`DetDataSet`，并实现了parse_dataset方法，调用[COCO API](https://github.com/cocodataset/cocoapi)加载并解析COCO格式数据源`roidbs`和`cname2cid`，具体可参见`source/coco.py`源码。将其他数据集转换成COCO格式可以参考[用户数据转成COCO数据](../tutorials/PrepareDataSet.md#用户数据转成COCO数据)
+
+#### 2.2Pascal VOC数据集
+该数据集目前分为VOC2007和VOC2012，主要由xml文件和image文件组成，其组织结构如下所示：
+```
+  dataset/voc/
+  ├── trainval.txt
+  ├── test.txt
+  ├── label_list.txt (optional)
+  ├── VOCdevkit/VOC2007
+  │   ├── Annotations
+  │       ├── 001789.xml
+  │       │   ...
+  │   ├── JPEGImages
+  │       ├── 001789.jpg
+  │       │   ...
+  │   ├── ImageSets
+  │       |   ...
+  ├── VOCdevkit/VOC2012
+  │   ├── Annotations
+  │       ├── 2011_003876.xml
+  │       │   ...
+  │   ├── JPEGImages
+  │       ├── 2011_003876.jpg
+  │       │   ...
+  │   ├── ImageSets
+  │       │   ...
+  ```
+在`source/voc.py`中定义并注册了`VOCDataSet`数据集，它继承自`DetDataSet`基类，并重写了`parse_dataset`方法，解析VOC数据集中xml格式标注文件，更新`roidbs`和`cname2cid`。将其他数据集转换成VOC格式可以参考[用户数据转成VOC数据](../tutorials/PrepareDataSet.md#用户数据转成VOC数据)
+
+#### 2.3自定义数据集
+如果COCODataSet和VOCDataSet不能满足你的需求，可以通过自定义数据集的方式来加载你的数据集。只需要以下两步即可实现自定义数据集
+
+1. 新建`source/xxx.py`，定义类`XXXDataSet`继承自`DetDataSet`基类，完成注册与序列化，并重写`parse_dataset`方法对`roidbs`与`cname2cid`更新：
+  ```python
+  from ppdet.core.workspace import register, serializable
+
+  #注册并序列化
+  @register
+  @serializable
+  class XXXDataSet(DetDataSet):
+      def __init__(self,
+                  dataset_dir=None,
+                  image_dir=None,
+                  anno_path=None,
+                  ...
+                  ):
+          self.roidbs = None
+          self.cname2cid = None
+          ...
+
+      def parse_dataset(self):
+          ...
+          省略具体解析数据逻辑
+          ...
+          self.roidbs, self.cname2cid = records, cname2cid
+  ```
+
+2. 在`source/__init__.py`中添加引用：
+  ```python
+  from . import xxx
+  from .xxx import *
+  ```
+完成以上两步就将新的数据源`XXXDataSet`添加好了，你可以参考[配置及运行](#配置及运行)实现自定义数据集的使用。
+
+### 3.数据预处理
+
+#### 3.1数据增强算子
+PaddleDetection中支持了种类丰富的数据增强算子，有单图像数据增强算子与批数据增强算子两种方式，您可选取合适的算子组合使用。单图像数据增强算子定义在`transform/operators.py`中，已支持的单图像数据增强算子详见下表：
+
+| 名称                     |  作用                   |
+| :---------------------: | :--------------: |
+| Decode             | 从图像文件或内存buffer中加载图像，格式为RGB格式 |
+| Permute                 | 假如输入是HWC顺序变成CHW |
+| RandomErasingImage | 对图像进行随机擦除 |
+| NormalizeImage          | 对图像像素值进行归一化，如果设置is_scale=True，则先将像素值除以255.0, 再进行归一化。 |
+| GridMask  | GridMask数据增广 |
+| RandomDistort           | 随机扰动图片亮度、对比度、饱和度和色相 |
+| AutoAugment | AutoAugment数据增广，包含一系列数据增强方法 |
+| RandomFlip         | 随机水平翻转图像 |
+| Resize             | 对于图像进行resize，并对标注进行相应的变换 |
+| MultiscaleTestResize    | 将图像重新缩放为多尺度list的每个尺寸 |
+| RandomResize | 对于图像进行随机Resize，可以Resize到不同的尺寸以及使用不同的插值策略 |
+| RandomExpand | 将原始图片放入用像素均值填充的扩张图中，对此图进行裁剪、缩放和翻转 |
+| CropWithSampling         | 根据缩放比例、长宽比例生成若干候选框，再依据这些候选框和标注框的面积交并比(IoU)挑选出符合要求的裁剪结果 |
+| CropImageWithDataAchorSampling | 基于CropImage，在人脸检测中，随机将图片尺度变换到一定范围的尺度，大大增强人脸的尺度变化 |
+| RandomCrop              | 原理同CropImage，以随机比例与IoU阈值进行处理 |
+| RandomScaledCrop        | 根据长边对图像进行随机裁剪，并对标注做相应的变换 |
+| Cutmix             | Cutmix数据增强，对两张图片做拼接  |
+| Mixup              | Mixup数据增强，按比例叠加两张图像 |
+| NormalizeBox            | 对bounding box进行归一化 |
+| PadBox                  | 如果bounding box的数量少于num_max_boxes，则将零填充到bbox |
+| BboxXYXY2XYWH           | 将bounding box从(xmin,ymin,xmax,ymin)形式转换为(xmin,ymin,width,height)格式 |
+| Pad           | 将图片Pad某一个数的整数倍或者指定的size，并支持指定Pad的方式 |
+| Poly2Mask | Poly2Mask数据增强 ｜
+
+批数据增强算子定义在`transform/batch_operators.py`中, 目前支持的算子列表如下：
+| 名称                     |  作用                   |
+| :---------------------: | :--------------: |
+| PadBatch           | 随机对每个batch的数据图片进行Pad操作，使得batch中的图片具有相同的shape |
+| BatchRandomResize  | 对一个batch的图片进行resize，使得batch中的图片随机缩放到相同的尺寸  |
+| Gt2YoloTarget      | 通过gt数据生成YOLO系列模型的目标  |
+| Gt2FCOSTarget      | 通过gt数据生成FCOS模型的目标 |
+| Gt2TTFTarget       | 通过gt数据生成TTFNet模型的目标 |
+| Gt2Solov2Target    | 通过gt数据生成SOLOv2模型的目标 |
+
+**几点说明：**
+- 数据增强算子的输入为sample或者samples，每一个sample对应上文所说的`DetDataSet`输出的roidbs中的一个样本，如coco_rec或者voc_rec
+- 单图像数据增强算子(Mixup, Cutmix等除外)也可用于批数据处理中。但是，单图像处理算子和批图像处理算子仍有一些差异，以RandomResize和BatchRandomResize为例，RandomResize会将一个Batch中的每张图片进行随机缩放，但是每一张图像Resize之后的形状不尽相同，BatchRandomResize则会将一个Batch中的所有图片随机缩放到相同的形状。
+- 除BatchRandomResize外，定义在`transform/batch_operators.py`的批数据增强算子接收的输入图像均为CHW形式，所以使用这些批数据增强算子前请先使用Permute进行处理。如果用到Gt2xxxTarget算子，需要将其放置在靠后的位置。NormalizeBox算子建议放置在Gt2xxxTarget之前。将这些限制条件总结下来，推荐的预处理算子的顺序为
+  ```
+    - XXX: {}
+    - ...
+    - BatchRandomResize: {...} # 如果不需要，可以移除，如果需要，放置在Permute之前
+    - Permute: {} # 必须项
+    - NormalizeBox: {} # 如果需要，建议放在Gt2XXXTarget之前
+    - PadBatch: {...} # 如果不需要可移除，如果需要，建议放置在Permute之后
+    - Gt2XXXTarget: {...} # 建议与PadBatch放置在最后的位置
+  ```
+
+#### 3.2自定义数据增强算子
+如果需要自定义数据增强算子，那么您需要了解下数据增强算子的相关逻辑。数据增强算子基类为定义在`transform/operators.py`中的`BaseOperator`类，单图像数据增强算子与批数据增强算子均继承自这个基类。完整定义参考源码，以下代码显示了`BaseOperator`类的关键函数: apply和__call__方法
+  ``` python
+  class BaseOperator(object):
+
+    ...
+
+    def apply(self, sample, context=None):
+        return sample
+
+    def __call__(self, sample, context=None):
+        if isinstance(sample, Sequence):
+            for i in range(len(sample)):
+                sample[i] = self.apply(sample[i], context)
+        else:
+            sample = self.apply(sample, context)
+        return sample
+  ```
+__call__方法为`BaseOperator`的调用入口，接收一个sample(单图像)或者多个sample(多图像)作为输入，并调用apply函数对一个或者多个sample进行处理。大多数情况下，你只需要继承`BaseOperator`重写apply方法或者重写__call__方法即可，如下所示，定义了一个XXXOp继承自BaseOperator，并注册：
+  ```python
+  @register_op
+  class XXXOp(BaseOperator):
+    def __init__(self,...):
+
+      super(XXXImage, self).__init__()
+      ...
+
+    # 大多数情况下只需要重写apply方法
+    def apply(self, sample, context=None):
+      ...
+      省略对输入的sample具体操作
+      ...
+      return sample
+
+    # 如果有需要，可以重写__call__方法，如Mixup, Gt2XXXTarget等
+    # def __call__(self, sample, context=None):
+    #   ...
+    #   省略对输入的sample具体操作
+    #   ...
+    #   return sample
+  ```
+大多数情况下，只需要重写apply方法即可，如`transform/operators.py`中除Mixup和Cutmix外的预处理算子。对于批处理的情况一般需要重写__call__方法，如`transform/batch_operators.py`的预处理算子。
+
+### 4.Reader
+Reader相关的类定义在`reader.py`, 其中定义了`BaseDataLoader`类。`BaseDataLoader`在`paddle.io.DataLoader`的基础上封装了一层，其具备`paddle.io.DataLoader`的所有功能，并能够实现不同模型对于`DetDataset`的不同需求，如可以通过对Reader进行设置，以控制`DetDataset`支持Mixup, Cutmix等操作。除此之外，数据预处理算子通过`Compose`类和`BatchCompose`类组合起来分别传入`DetDataset`和`paddle.io.DataLoader`中。
+所有的Reader类都继承自`BaseDataLoader`类，具体可参见源码。
+
+### 5.配置及运行
+
+#### 5.1配置
+
+与数据预处理相关的模块的配置文件包含所有模型公用的Datas set的配置文件以及不同模型专用的Reader的配置文件。关于Dataset的配置文件存在于`configs/datasets`文件夹。比如COCO数据集的配置文件如下：
+```
+metric: COCO # 目前支持COCO, VOC, OID， WiderFace等评估标准
+num_classes: 80 # num_classes数据集的类别数，不包含背景类
+
+TrainDataset:
+  !COCODataSet
+    image_dir: train2017 # 训练集的图片所在文件相对于dataset_dir的路径
+    anno_path: annotations/instances_train2017.json # 训练集的标注文件相对于dataset_dir的路径
+    dataset_dir: dataset/coco #数据集所在路径，相对于PaddleDetection路径
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] # 控制dataset输出的sample所包含的字段
+
+EvalDataset:
+  !COCODataSet
+    image_dir: val2017 # 验证集的图片所在文件夹相对于dataset_dir的路径
+    anno_path: annotations/instances_val2017.json # 验证集的标注文件相对于dataset_dir的路径
+    dataset_dir: dataset/coco # 数据集所在路径，相对于PaddleDetection路径
+
+TestDataset:
+  !ImageFolder
+    anno_path: dataset/coco/annotations/instances_val2017.json # 验证集的标注文件所在路径，相对于PaddleDetection的路径
+```
+在PaddleDetection的yml配置文件中，使用`!`直接序列化模块实例(可以是函数，实例等)，上述的配置文件均使用Dataset进行了序列化。
+不同模型专用的Reader定义在每一个模型的文件夹下，如yolov3的Reader配置文件定义在`configs/yolov3/_base_/yolov3_reader.yml`。一个Reader的示例配置如下：
+```
+worker_num: 2
+TrainReader:
+  sample_transforms:
+    - Decode: {}
+    ...
+  batch_transforms:
+    ...
+  batch_size: 8
+  shuffle: true
+  drop_last: true
+  use_shared_memory: true
+
+EvalReader:
+  sample_transforms:
+    - Decode: {}
+    ...
+  batch_size: 1
+  drop_empty: false
+
+TestReader:
+  inputs_def:
+    image_shape: [3, 608, 608]
+  sample_transforms:
+    - Decode: {}
+    ...
+  batch_size: 1
+```
+你可以在Reader中定义不同的预处理算子，每张卡的batch_size以及DataLoader的worker_num等。
+
+#### 5.2运行
+在PaddleDetection的训练、评估和测试运行程序中，都通过创建Reader迭代器。Reader在`ppdet/engine/trainer.py`中创建。下面的代码展示了如何创建训练时的Reader
+``` python
+from ppdet.core.workspace import create
+# build data loader
+self.dataset = cfg['TrainDataset']
+self.loader = create('TrainReader')(selfdataset, cfg.worker_num)
+```
+相应的预测以及评估时的Reader与之类似，具体可参考`ppdet/engine/trainer.py`源码。
+
+> 关于数据处理模块，如您有其他问题或建议，请给我们提issue，我们非常欢迎您的反馈。
diff --git a/docs/feature_models/SSLD_PRETRAINED_MODEL.md b/docs/feature_models/SSLD_PRETRAINED_MODEL.md
new file mode 100644
index 0000000000000000000000000000000000000000..3f42a0c0ada3dc68dacca41b28209a5a3b7677c6
--- /dev/null
+++ b/docs/feature_models/SSLD_PRETRAINED_MODEL.md
@@ -0,0 +1,54 @@
+简体中文 | [English](SSLD_PRETRAINED_MODEL_en.md)
+
+### Simple semi-supervised label knowledge distillation solution (SSLD)
+
+### R-CNN on COCO
+
+| 骨架网络              | 网络类型       | 每张GPU图片个数 | 学习率策略 |推理时间(fps) | Box AP | Mask AP |                           下载                          | 配置文件 |
+| :------------------- | :------------| :-----: | :-----: | :------------: | :-----: | :-----: | :-----------------------------------------------------: | :-----: |
+| ResNet50-vd-SSLDv2-FPN      | Faster         |    1    |   1x    |     ----     |  41.4  |  -  | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_vd_fpn_ssld_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/faster_rcnn/faster_rcnn_r50_vd_fpn_ssld_1x_coco.yml) |
+| ResNet50-vd-SSLDv2-FPN      | Faster         |    1    |   2x    |     ----     |  42.3  |  -  | [下载链接](https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/faster_rcnn/faster_rcnn_r50_vd_ssld_fpn_2x_coco.yml) |
+| ResNet50-vd-SSLDv2-FPN         | Mask         |    1    |   1x    |     ----     |  42.0  |    38.2   | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_vd_fpn_ssld_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/mask_rcnn/mask_rcnn_r50_vd_fpn_ssld_1x_coco.yml) |
+| ResNet50-vd-SSLDv2-FPN         | Mask         |    1    |   2x    |     ----     |  42.7 |    38.9   | [下载链接](https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/mask_rcnn/mask_rcnn_r50_vd_fpn_ssld_2x_coco.yml) |
+| ResNet50-vd-SSLDv2-FPN         | Cascade Faster         |    1    |   1x    |     ----     |  44.4  |    -    | [下载链接](https://paddledet.bj.bcebos.com/models/cascade_rcnn_r50_vd_fpn_ssld_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/cascade_rcnn/cascade_rcnn_r50_vd_fpn_ssld_1x_coco.yml) |
+| ResNet50-vd-SSLDv2-FPN         | Cascade Faster         |    1    |   2x    |     ----     |  45.0  |    -    | [下载链接](https://paddledet.bj.bcebos.com/models/cascade_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/cascade_rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_coco.yml) |
+| ResNet50-vd-SSLDv2-FPN         | Cascade Mask         |    1    |   1x    |     ----     |  44.9 |    39.1    | [下载链接](https://paddledet.bj.bcebos.com/models/cascade_mask_rcnn_r50_vd_fpn_ssld_1x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/cascade_rcnn/cascade_mask_rcnn_r50_vd_fpn_ssld_1x_coco.yml) |
+| ResNet50-vd-SSLDv2-FPN         | Cascade Mask         |    1    |   2x    |     ----     |  45.7  |    39.7    | [下载链接](https://paddledet.bj.bcebos.com/models/cascade_mask_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/cascade_rcnn/cascade_mask_rcnn_r50_vd_fpn_ssld_2x_coco.yml) |
+
+
+### YOLOv3 on COCO
+
+| 骨架网络             | 输入尺寸   | 每张GPU图片个数 | 学习率策略 |推理时间(fps) | Box AP |                           下载                          | 配置文件 |
+| :----------------- | :-------- | :-----------: | :------: | :---------: | :----: | :----------------------------------------------------: | :-----: |
+| MobileNet-V1-SSLD         | 608         |    8    |   270e    |     ----     |  31.0  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_coco.yml) |
+| MobileNet-V1-SSLD         | 416         |    8    |   270e    |     ----     |  30.6  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_coco.yml) |
+| MobileNet-V1-SSLD         | 320         |    8    |   270e    |     ----     |  28.4  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_coco.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_coco.yml) |
+
+### YOLOv3 on Pasacl VOC
+
+| 骨架网络             | 输入尺寸   | 每张GPU图片个数 | 学习率策略 |推理时间(fps) | Box AP |                           下载                          | 配置文件 |
+| :----------------- | :-------- | :-----------: | :------: | :---------: | :----: | :----------------------------------------------------: | :-----: |
+| MobileNet-V1-SSLD | 608  |    8    |   270e  |      -        |  78.3  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_voc.yml) |
+| MobileNet-V1-SSLD | 416  |    8    |   270e  |      -        |  79.6  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_voc.yml) |
+| MobileNet-V1-SSLD | 320  |    8    |   270e  |      -        |  77.3  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_voc.yml) |
+| MobileNet-V3-SSLD | 608  |    8    |   270e  |      -        |  80.4  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_ssld_270e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v3_large_ssld_270e_voc.yml) |
+| MobileNet-V3-SSLD | 416  |    8    |   270e  |      -        |  79.2  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_ssld_270e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v3_large_ssld_270e_voc.yml) |
+| MobileNet-V3-SSLD | 320  |    8    |   270e  |      -        |  77.3  | [下载链接](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_ssld_270e_voc.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v3_large_ssld_270e_voc.yml) |
+
+**注意事项:**  
+
+- [SSLD](https://arxiv.org/abs/1811.11168)是一种知识蒸馏方法，我们使用蒸馏后性能更强的backbone预训练模型，进一步提升检测精度，详细方案请参考[知识蒸馏教程](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/en/advanced_tutorials/distillation/distillation_en.md)
+
+![demo image](../images/ssld_model.png)
+
+## Citations
+```
+@misc{cui2021selfsupervision,
+      title={Beyond Self-Supervision: A Simple Yet Effective Network Distillation Alternative to Improve Backbones},
+      author={Cheng Cui and Ruoyu Guo and Yuning Du and Dongliang He and Fu Li and Zewu Wu and Qiwen Liu and Shilei Wen and Jizhou Huang and Xiaoguang Hu and Dianhai Yu and Errui Ding and Yanjun Ma},
+      year={2021},
+      eprint={2103.05959},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV}
+}
+```
diff --git a/docs/feature_models/SSLD_PRETRAINED_MODEL_en.md b/docs/feature_models/SSLD_PRETRAINED_MODEL_en.md
new file mode 100644
index 0000000000000000000000000000000000000000..efa24c2111750cfbc6c670dd97ed586d01e28d1f
--- /dev/null
+++ b/docs/feature_models/SSLD_PRETRAINED_MODEL_en.md
@@ -0,0 +1,53 @@
+English | [简体中文](SSLD_PRETRAINED_MODEL.md)
+
+### Simple semi-supervised label knowledge distillation solution (SSLD)
+
+### R-CNN on COCO
+
+| Backbone              |  Model       | Images/GPU | Lr schd | FPS | Box AP | Mask AP |                           Download                           | Config |
+| :------------------- | :------------| :-----: | :-----: | :------------: | :-----: | :-----: | :-----------------------------------------------------: | :-----: |
+| ResNet50-vd-SSLDv2-FPN      | Faster         |    1    |   1x    |     ----     |  41.4  |  -  | [model](https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_vd_fpn_ssld_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/faster_rcnn/faster_rcnn_r50_vd_fpn_ssld_1x_coco.yml) |
+| ResNet50-vd-SSLDv2-FPN      | Faster         |    1    |   2x    |     ----     |  42.3  |  -  | [model](https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/faster_rcnn/faster_rcnn_r50_vd_ssld_fpn_2x_coco.yml) |
+| ResNet50-vd-SSLDv2-FPN         | Mask         |    1    |   1x    |     ----     |  42.0  |    38.2   | [model](https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_vd_fpn_ssld_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/mask_rcnn/mask_rcnn_r50_vd_fpn_ssld_1x_coco.yml) |
+| ResNet50-vd-SSLDv2-FPN         | Mask         |    1    |   2x    |     ----     |  42.7 |    38.9   | [model](https://paddledet.bj.bcebos.com/models/mask_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/mask_rcnn/mask_rcnn_r50_vd_fpn_ssld_2x_coco.yml) |
+| ResNet50-vd-SSLDv2-FPN         | Cascade Faster         |    1    |   1x    |     ----     |  44.4  |    -    | [model](https://paddledet.bj.bcebos.com/models/cascade_rcnn_r50_vd_fpn_ssld_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/cascade_rcnn/cascade_rcnn_r50_vd_fpn_ssld_1x_coco.yml) |
+| ResNet50-vd-SSLDv2-FPN         | Cascade Faster         |    1    |   2x    |     ----     |  45.0  |    -    | [model](https://paddledet.bj.bcebos.com/models/cascade_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/cascade_rcnn/cascade_rcnn_r50_vd_fpn_ssld_2x_coco.yml) |
+| ResNet50-vd-SSLDv2-FPN         | Cascade Mask         |    1    |   1x    |     ----     |  44.9 |    39.1    | [model](https://paddledet.bj.bcebos.com/models/cascade_mask_rcnn_r50_vd_fpn_ssld_1x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/cascade_rcnn/cascade_mask_rcnn_r50_vd_fpn_ssld_1x_coco.yml) |
+| ResNet50-vd-SSLDv2-FPN         | Cascade Mask         |    1    |   2x    |     ----     |  45.7  |    39.7    | [model](https://paddledet.bj.bcebos.com/models/cascade_mask_rcnn_r50_vd_fpn_ssld_2x_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/cascade_rcnn/cascade_mask_rcnn_r50_vd_fpn_ssld_2x_coco.yml) |
+
+### YOLOv3 on COCO
+
+| Backbone            |   Input shape   | Images/GPU | Lr schd | FPS | Box AP |                          Download                           | Config |
+| :----------------- | :-------- | :-----------: | :------: | :---------: | :----: | :----------------------------------------------------: | :-----: |
+| MobileNet-V1-SSLD         | 608         |    8    |   270e    |     ----     |  31.0  | [model](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_coco.yml) |
+| MobileNet-V1-SSLD         | 416         |    8    |   270e    |     ----     |  30.6  | [model](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_coco.yml) |
+| MobileNet-V1-SSLD         | 320         |    8    |   270e    |     ----     |  28.4  | [model](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_coco.yml) |
+
+### YOLOv3 on Pasacl VOC
+
+| Backbone            |   Input shape   | Images/GPU | Lr schd | FPS | Box AP |                          Download                           | Config |
+| :----------------- | :-------- | :-----------: | :------: | :---------: | :----: | :----------------------------------------------------: | :-----: |
+| MobileNet-V1-SSLD | 608  |    8    |   270e  |      -        |  78.3  | [model](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_voc.yml) |
+| MobileNet-V1-SSLD | 416  |    8    |   270e  |      -        |  79.6  | [model](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_voc.yml) |
+| MobileNet-V1-SSLD | 320  |    8    |   270e  |      -        |  77.3  | [model](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v1_ssld_270e_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v1_ssld_270e_voc.yml) |
+| MobileNet-V3-SSLD | 608  |    8    |   270e  |      -        |  80.4  | [model](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_ssld_270e_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v3_large_ssld_270e_voc.yml) |
+| MobileNet-V3-SSLD | 416  |    8    |   270e  |      -        |  79.2  | [model](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_ssld_270e_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v3_large_ssld_270e_voc.yml) |
+| MobileNet-V3-SSLD | 320  |    8    |   270e  |      -        |  77.3  | [model](https://paddledet.bj.bcebos.com/models/yolov3_mobilenet_v3_large_ssld_270e_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.0/configs/yolov3/yolov3_mobilenet_v3_large_ssld_270e_voc.yml) |
+
+**Notes:**
+
+- [SSLD](https://arxiv.org/abs/1811.11168) is a knowledge distillation method. We use the stronger backbone pretrained model after distillation to further improve the detection accuracy. Please refer to the [knowledge distillation tutorial](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/en/advanced_tutorials/distillation/distillation_en.md).
+
+![demo image](../images/ssld_model.png)
+
+## Citations
+```
+@misc{cui2021selfsupervision,
+      title={Beyond Self-Supervision: A Simple Yet Effective Network Distillation Alternative to Improve Backbones},
+      author={Cheng Cui and Ruoyu Guo and Yuning Du and Dongliang He and Fu Li and Zewu Wu and Qiwen Liu and Shilei Wen and Jizhou Huang and Xiaoguang Hu and Dianhai Yu and Errui Ding and Yanjun Ma},
+      year={2021},
+      eprint={2103.05959},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV}
+}
+```
diff --git a/docs/images/fps_map.png b/docs/images/fps_map.png
new file mode 100644
index 0000000000000000000000000000000000000000..d73877729c0775709e5954c008a88776bf48606a
Binary files /dev/null and b/docs/images/fps_map.png differ
diff --git a/docs/images/model_figure.png b/docs/images/model_figure.png
new file mode 100644
index 0000000000000000000000000000000000000000..72ec8cdad23a49e948f39fe3091c26f7a94d74a4
Binary files /dev/null and b/docs/images/model_figure.png differ
diff --git a/docs/images/reader_figure.png b/docs/images/reader_figure.png
new file mode 100644
index 0000000000000000000000000000000000000000..68441a20cd5bc14349bfea01a3ffa66a31ac1793
Binary files /dev/null and b/docs/images/reader_figure.png differ
diff --git a/docs/images/ssld_model.png b/docs/images/ssld_model.png
new file mode 100644
index 0000000000000000000000000000000000000000..23508712be7e6b6787575a66ca4c65037c9015c8
Binary files /dev/null and b/docs/images/ssld_model.png differ
diff --git a/docs/tutorials/GETTING_STARTED.md b/docs/tutorials/GETTING_STARTED.md
new file mode 100644
index 0000000000000000000000000000000000000000..4764d73d31c9857a48332f7263469eb02e21ccc9
--- /dev/null
+++ b/docs/tutorials/GETTING_STARTED.md
@@ -0,0 +1,144 @@
+English | [简体中文](GETTING_STARTED_cn.md)
+
+# Getting Started
+
+## Installation
+
+For setting up the running environment, please refer to [installation
+instructions](INSTALL_cn.md).
+
+
+
+## Data preparation
+
+- Please refer to [PrepareDataSet](PrepareDataSet.md) for data preparation
+- Please set the data path for data configuration file in ```configs/datasets```
+
+
+## Training & Evaluation & Inference
+
+PaddleDetection provides scripts for training, evalution and inference with various features according to different configure.
+
+```bash
+# training on single-GPU
+export CUDA_VISIBLE_DEVICES=0
+python tools/train.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml
+# training on multi-GPU
+export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml
+# GPU evaluation
+export CUDA_VISIBLE_DEVICES=0
+python tools/eval.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml
+# Inference
+python tools/infer.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml --infer_img=demo/000000570688.jpg
+```
+
+### Other argument list
+
+list below can be viewed by `--help`
+
+|         FLAG             |  script supported  |    description    |     default     |      remark      |
+| :----------------------: | :------------: | :---------------: | :--------------: | :-----------------: |
+|          -c              |      ALL       |  Select config file  |  None  |  **required**, such as `-c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml` |
+|          -o              |      ALL       |  Set parameters in configure file  |  None  |  `-o` has higher priority to file configured by `-c`. Such as `-o use_gpu=False`  |  
+|        --eval            |     train      |  Whether to perform evaluation in training  |  False  |  set `--eval` if needed  |
+|   -r/--resume_checkpoint |     train      |  Checkpoint path for resuming training  |  None  |  such as `-r output/faster_rcnn_r50_1x_coco/10000`  |
+|      --slim_config     |     ALL |  Configure file of slim method  |  None  |  such as `--slim_config configs/slim/prune/yolov3_prune_l1_norm.yml`  |
+|        --use_vdl          |   train/infer   |  Whether to record the data with [VisualDL](https://github.com/paddlepaddle/visualdl), so as to display in VisualDL  |  False  |  VisualDL requires Python>=3.5   |
+|        --vdl\_log_dir     |   train/infer   |  VisualDL logging directory for image  |  train:`vdl_log_dir/scalar` infer: `vdl_log_dir/image`  |  VisualDL requires Python>=3.5   |
+|      --output_eval       |   eval |  Directory for storing the evaluation output  | None  |   such as `--output_eval=eval_output`, default is current directory  |
+|       --json_eval        |       eval     |  Whether to evaluate with already existed bbox.json or mask.json  |  False  |  set `--json_eval` if needed and json path is set in `--output_eval`  |
+|      --classwise         |       eval     |  Whether to eval AP for each class and draw PR curve  |  False  |  set `--classwise` if needed  |
+|       --output_dir       |      infer     |  Directory for storing the output visualization files  |  `./output`  |  such as `--output_dir output`  |
+|    --draw_threshold      |      infer     |  Threshold to reserve the result for visualization  |  0.5  |   such as `--draw_threshold 0.7`  |
+|      --infer_dir         |       infer     |  Directory for images to perform inference on  |  None  | One of `infer_dir` and `infer_img` is requied  |
+|      --infer_img         |       infer     |  Image path  |  None  | One of `infer_dir` and `infer_img` is requied, `infer_img` has higher priority over `infer_dir`  |
+
+
+
+
+## Examples
+
+### Training
+
+- Perform evaluation in training
+
+  ```bash
+  export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+  python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml --eval
+  ```
+
+  Perform training and evalution alternatively and evaluate at each end of epoch. Meanwhile, the best model with highest MAP is saved at each epoch which has the same path as `model_final`.
+
+  If evaluation dataset is large, we suggest modifing `snapshot_epoch` in `configs/runtime.yml` to decrease evaluation times or evaluating after training.
+
+- Fine-tune other task
+
+  When using pre-trained model to fine-tune other task, pretrain\_weights can be used directly. The parameters with different shape will be ignored automatically. For example:
+
+
+  ```bash
+  export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+  # If the shape of parameters in program is different from pretrain_weights,
+  # then PaddleDetection will not use such parameters.
+  python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml \
+                           -o pretrain_weights=output/faster_rcnn_r50_1x_coco/model_final \
+  ```
+
+##### NOTES
+
+- `CUDA_VISIBLE_DEVICES` can specify different gpu numbers. Such as: `export CUDA_VISIBLE_DEVICES=0,1,2,3`.
+- Dataset will be downloaded automatically and cached in `~/.cache/paddle/dataset` if not be found locally.
+- Pretrained model is downloaded automatically and cached in `~/.cache/paddle/weights`.
+- Checkpoints are saved in `output` by default, and can be revised from `save_dir` in `configs/runtime.yml`.
+
+
+### Evaluation
+
+- Evaluate by specified weights path and dataset path
+
+  ```bash
+  export CUDA_VISIBLE_DEVICES=0
+  python -u tools/eval.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml \
+                          -o weights=https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_fpn_1x_coco.pdparams
+  ```
+
+  The path of model to be evaluted can be both local path and link in [MODEL_ZOO](../MODEL_ZOO_cn.md).
+
+- Evaluate with json
+
+  ```bash
+  export CUDA_VISIBLE_DEVICES=0
+  python tools/eval.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml \
+             --json_eval \
+             -output_eval evaluation/
+  ```
+
+  The json file must be named bbox.json or mask.json, placed in the `evaluation/` directory.
+
+
+### Inference
+
+- Output specified directory && Set up threshold
+
+  ```bash
+  export CUDA_VISIBLE_DEVICES=0
+  python tools/infer.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml \
+                      --infer_img=demo/000000570688.jpg \
+                      --output_dir=infer_output/ \
+                      --draw_threshold=0.5 \
+                      -o weights=output/faster_rcnn_r50_fpn_1x_coco/model_final \
+                      --use_vdl=Ture
+  ```
+
+  `--draw_threshold` is an optional argument. Default is 0.5.
+  Different thresholds will produce different results depending on the calculation of [NMS](https://ieeexplore.ieee.org/document/1699659).
+
+
+## Deployment
+
+Please refer to [depolyment](../../deploy/README.md)
+
+## Model Compression
+
+Please refer to [slim](../../configs/slim/README.md)
diff --git a/docs/tutorials/GETTING_STARTED_cn.md b/docs/tutorials/GETTING_STARTED_cn.md
index 8c8dbe34a2edc02819f175625e87342d33d273c0..d04cac9f37578d48cde2ec5b539a451d06abf446 100644
--- a/docs/tutorials/GETTING_STARTED_cn.md
+++ b/docs/tutorials/GETTING_STARTED_cn.md
@@ -1,3 +1,6 @@
+[English](GETTING_STARTED.md) | 简体中文
+
+
 # 入门使用
 
 ## 安装
@@ -12,86 +15,130 @@
 
 ## 训练/评估/预测
 
-PaddleDetection在[tools](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/tools)中提供了`训练`/`评估`/`预测`/`导出模型`等功能，支持通过传入不同可选参数实现特定功能
-
-### 参数列表
+PaddleDetection提供了`训练`/`评估`/`预测`等功能，支持通过传入不同可选参数实现特定功能
 
-以下列表可以通过`--help`查看
-
-|         FLAG             |     支持脚本    |        用途        |      默认值       |         备注         |
-| :----------------------: | :------------: | :---------------: | :--------------: | :-----------------: |
-|          -c              |      ALL       |  指定配置文件  |  None  |  **必选**，例如-c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml |
-|        --eval            |     train      |  是否边训练边测试  |  False  |  可选，如需指定，直接`--eval`即可 |
-|      --fleet         |       train     |  是否使用fleet API训练  |  False  |  可以使用--fleet来指定使用fleet API进行多机训练  |
-|      --fp16        |       train     |  是否开启混合精度训练  |  False  |  可以使用--fp16来指定使用混合精度训练  |
-|          -o              |      ALL       |  设置或更改配置文件里的参数内容  |  None  |  可选，例如：`-o use_gpu=False`  |
-|       --slim_config             |     ALL      |  模型压缩策略配置文件  |  None  |  可选，例如`--slim_config configs/slim/prune/yolov3_prune_l1_norm.yml`  |
-|       --output_dir       |      infer/export_model     |  预测后结果或导出模型保存路径  |  `./output`  |  可选，例如`--output_dir=output`  |
-|    --draw_threshold      |      infer     |  可视化时分数阈值  |  0.5  |  可选，`--draw_threshold=0.7`  |
-|      --infer_dir         |       infer     |  用于预测的图片文件夹路径  |  None  |  可选  |
-|      --infer_img         |       infer     |  用于预测的图片路径  |  None  |  可选，`--infer_img`和`--infer_dir`必须至少设置一个  |
-|      --classwise         |       eval     |  是否评估单类AP和绘制单类PR曲线  |  False  |  可选  |
-
-### 训练
-
-- 单卡训练
 ```bash
-# 通过CUDA_VISIBLE_DEVICES指定GPU卡号
+# GPU单卡训练
 export CUDA_VISIBLE_DEVICES=0
 python tools/train.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml
-```
-- 多卡训练
-
-```bash
+# GPU多卡训练
 export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
 python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml
+# GPU评估
+export CUDA_VISIBLE_DEVICES=0
+python tools/eval.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml
+# 预测
+python tools/infer.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml --infer_img=demo/000000570688.jpg
 ```
 
-- 混合精度训练
 
-```bash
-export CUDA_VISIBLE_DEVICES=0
-python tools/train.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml --fp16
-```
+### 参数列表
 
-- fleet API训练
+以下列表可以通过`--help`查看
 
-```bash
-# fleet API用于多机训练，启动方式与单机多卡训练方式基本一致，只不过需要使用--ips指定ip列表以及--fleet开启多机训练
-python -m paddle.distributed.launch --ips="xx.xx.xx.xx,yy.yy.yy.yy" --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml --fleet
-```
+|         FLAG             |     支持脚本    |        用途        |      默认值       |         备注         |
+| :----------------------: | :------------: | :---------------: | :--------------: | :-----------------: |
+|          -c              |      ALL       |  指定配置文件  |  None  |  **必选**，例如-c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml |
+|          -o              |      ALL       |  设置或更改配置文件里的参数内容  |  None  |  相较于`-c`设置的配置文件有更高优先级，例如：`-o use_gpu=False`  |
+|        --eval            |     train      |  是否边训练边测试  |  False  |  如需指定，直接`--eval`即可 |
+|   -r/--resume_checkpoint |     train      |  恢复训练加载的权重路径  |  None  |  例如：`-r output/faster_rcnn_r50_1x_coco/10000`  |
+|       --slim_config             |     ALL      |  模型压缩策略配置文件  |  None  |  例如`--slim_config configs/slim/prune/yolov3_prune_l1_norm.yml`  |
+|        --use_vdl          |   train/infer   |  是否使用[VisualDL](https://github.com/paddlepaddle/visualdl)记录数据，进而在VisualDL面板中显示  |  False  |  VisualDL需Python>=3.5   |
+|        --vdl\_log_dir     |   train/infer   |  指定 VisualDL 记录数据的存储路径  |  train:`vdl_log_dir/scalar` infer: `vdl_log_dir/image`  |  VisualDL需Python>=3.5   |
+|      --output_eval       |   eval |  评估阶段保存json路径  | None  |  例如 `--output_eval=eval_output`, 默认为当前路径  |
+|       --json_eval        |       eval     |  是否通过已存在的bbox.json或者mask.json进行评估  |  False  |  如需指定，直接`--json_eval`即可， json文件路径在`--output_eval`中设置  |
+|      --classwise         |       eval     |  是否评估单类AP和绘制单类PR曲线  |  False  |  如需指定，直接`--classwise`即可 |
+|       --output_dir       |      infer/export_model     |  预测后结果或导出模型保存路径  |  `./output`  |  例如`--output_dir=output`  |
+|    --draw_threshold      |      infer     |  可视化时分数阈值  |  0.5  |  例如`--draw_threshold=0.7`  |
+|      --infer_dir         |       infer     |  用于预测的图片文件夹路径  |  None  |    `--infer_img`和`--infer_dir`必须至少设置一个 |
+|      --infer_img         |       infer     |  用于预测的图片路径  |  None  |  `--infer_img`和`--infer_dir`必须至少设置一个，`infer_img`具有更高优先级  |
+|      --save_txt          |       infer     |  是否在文件夹下将图片的预测结果保存到文本文件中        |  False  |  可选  |
+
+## 使用示例
+
+### 模型训练
 
 - 边训练边评估
-```bash
-python tools/train.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml --eval
-```
 
-### 评估
-```bash
-# 目前只支持单卡评估
-CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml
-```
+  ```bash
+  export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+  python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml --eval
+  ```
 
-### 预测
-```bash
-CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml --infer_img={IMAGE_PATH}
-```
+  在训练中交替执行评估, 评估在每个epoch训练结束后开始。每次评估后还会评出最佳mAP模型保存到`best_model`文件夹下。
 
-## 预测部署
+  如果验证集很大，测试将会比较耗时，建议调整`configs/runtime.yml` 文件中的 `snapshot_epoch`配置以减少评估次数，或训练完成后再进行评估。
 
-（1）导出模型
 
-```bash
-python tools/export_model.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml \
-        -o weights=output/faster_rcnn_r50_fpn_1x_coco/model_final \
-        --output_dir=output_inference
-```
+- Fine-tune其他任务
+
+  使用预训练模型fine-tune其他任务时，可以直接加载预训练模型，形状不匹配的参数将自动忽略，例如：
+
+  ```bash
+  export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+  # 如果模型中参数形状与加载权重形状不同，将不会加载这类参数
+  python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml \
+                           -o pretrain_weights=output/faster_rcnn_r50_1x_coco/model_final \
+  ```
+
+**提示:**  
+
+- `CUDA_VISIBLE_DEVICES` 参数可以指定不同的GPU。例如: `export CUDA_VISIBLE_DEVICES=0,1,2,3`
+- 若本地未找到数据集，将自动下载数据集并保存在`~/.cache/paddle/dataset`中。
+- 预训练模型自动下载并保存在`〜/.cache/paddle/weights`中。
+- 模型checkpoints默认保存在`output`中，可通过修改配置文件`configs/runtime.yml`中`save_dir`进行配置。
+
+
+### 模型评估
+
+- 指定权重和数据集路径
+
+  ```bash
+  export CUDA_VISIBLE_DEVICES=0
+  python -u tools/eval.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml \
+                          -o weights=https://paddledet.bj.bcebos.com/models/faster_rcnn_r50_fpn_1x_coco.pdparams
+  ```
+
+  评估模型可以为本地路径，例如`output/faster_rcnn_r50_1x_coco/model_final`, 也可以是[MODEL_ZOO](../MODEL_ZOO_cn.md)中给出的模型链接。
 
-（2）预测部署
 
-参考[预测部署文档](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/deploy)。
+- 通过json文件评估
+
+  ```bash
+  export CUDA_VISIBLE_DEVICES=0
+  python tools/eval.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml \
+             --json_eval \
+             -output_eval evaluation/
+  ```
+
+  json文件必须命名为bbox.json或者mask.json，放在`evaluation/`目录下。
+
+
+
+### 模型预测
+
+- 设置输出路径 && 设置预测阈值
+
+  ```bash
+  export CUDA_VISIBLE_DEVICES=0
+  python tools/infer.py -c configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml \
+                      --infer_img=demo/000000570688.jpg \
+                      --output_dir=infer_output/ \
+                      --draw_threshold=0.5 \
+                      -o weights=output/faster_rcnn_r50_fpn_1x_coco/model_final \
+                      --use_vdl=Ture
+  ```
+
+  `--draw_threshold` 是个可选参数. 根据 [NMS](https://ieeexplore.ieee.org/document/1699659) 的计算，
+  不同阈值会产生不同的结果。
+
+
+
+## 预测部署
+
+请参考[预测部署文档](../../deploy/README.md)。
 
 
 ## 模型压缩
 
-参考[模型压缩文档](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/slim)。
+请参考[模型压缩文档](../../configs/slim/README.md)。
diff --git a/docs/tutorials/INSTALL.md b/docs/tutorials/INSTALL.md
new file mode 100644
index 0000000000000000000000000000000000000000..db09adaadf1482fd0413644e9e2bbb1ee43a2559
--- /dev/null
+++ b/docs/tutorials/INSTALL.md
@@ -0,0 +1,131 @@
+English | [简体中文](INSTALL_cn.md)
+
+# Installation
+
+
+This document covers how to install PaddleDetection and its dependencies.
+
+For general information about PaddleDetection, please see [README.md](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.0).
+
+## Requirements:
+
+- PaddlePaddle 2.0.1
+- OS 64 bit
+- Python 3(3.5.1+/3.6/3.7)，64 bit
+- pip/pip3(9.0.1+), 64 bit
+- CUDA >= 9.0
+- cuDNN >= 7.6
+
+
+Dependency of PaddleDetection and PaddlePaddle:
+
+| PaddleDetection version | PaddlePaddle version  |    tips    |
+| :----------------: | :---------------: | :-------: |
+|    release/2.0       |       >= 2.0.1    |     Dygraph mode is set as default    |
+|    release/2.0-rc    |       >= 2.0.1    |     --    |
+|    release/0.5       |       >= 1.8.4    |  Cascade R-CNN and SOLOv2 depends on 2.0.0.rc |
+|    release/0.4       |       >= 1.8.4    |  PP-YOLO depends on 1.8.4 |
+|    release/0.3       |        >=1.7      |     --    |
+
+
+## Instruction
+
+### 1. Install PaddlePaddle
+
+```
+# CUDA9.0
+python -m pip install paddlepaddle-gpu==2.0.1.post90 -i https://mirror.baidu.com/pypi/simple
+
+# CUDA10.1
+python -m pip install paddlepaddle-gpu==2.0.1.post101 -f https://mirror.baidu.com/pypi/simple
+
+# CPU
+python -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
+```
+
+For more installation methods such as conda or compile with source code, please refer to the [installation document](https://www.paddlepaddle.org.cn/documentation/docs/en/install/index_en.html)
+
+Please make sure that your PaddlePaddle is installed successfully and the version is not lower than the required version. Use the following command to verify.
+
+```
+# check
+>>> import paddle
+>>> paddle.utils.run_check()
+
+# confirm the paddle's version
+python -c "import paddle; print(paddle.__version__)"
+```
+
+**Note**
+
+1.  If you want to use PaddleDetection on multi-GPU, please install NCCL at first.
+
+
+### 2. Install PaddleDetection
+
+PaddleDetection can be installed in the following two ways:
+
+#### 2.1 Install via pip
+
+**Note:** Installing via pip only supports Python3
+
+```
+# Install paddledet via pip
+pip install paddledet==2.0.1 -i https://mirror.baidu.com/pypi/simple
+
+# Download and use the configuration files and code examples in the source code
+git clone https://github.com/PaddlePaddle/PaddleDetection.git
+cd PaddleDetection
+```
+
+#### 2.2 Compile and install from Source code
+
+```
+# Clone PaddleDetection repository
+cd <path/to/clone/PaddleDetection>
+git clone https://github.com/PaddlePaddle/PaddleDetection.git
+
+# Compile and install paddledet
+cd PaddleDetection
+python setup.py install
+
+# Install other dependencies
+pip install -r requirements.txt
+
+```
+
+**Note**
+
+1. If you are working on Windows OS, `pycocotools` installing may failed because of the origin version of cocoapi does not support windows, another version can be used used which only supports Python3:
+
+    ```pip install git+https://github.com/philferriere/cocoapi.git#subdirectory=PythonAPI```
+
+After installation, make sure the tests pass:
+
+```shell
+python ppdet/modeling/tests/test_architectures.py
+```
+
+If the tests are passed, the following information will be prompted:
+
+```
+.....
+----------------------------------------------------------------------
+Ran 5 tests in 4.280s
+OK
+```
+
+## Inference demo
+
+**Congratulation!** Now you have installed PaddleDetection successfully and try our inference demo:
+
+```
+# Predict an image by GPU
+export CUDA_VISIBLE_DEVICES=0
+python tools/infer.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o use_gpu=true weights=https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams --infer_img=demo/000000014439.jpg
+```
+
+An image of the same name with the predicted result will be generated under the `output` folder.
+The result is as shown below：
+
+![](../images/000000014439.jpg)
diff --git a/docs/tutorials/INSTALL_cn.md b/docs/tutorials/INSTALL_cn.md
index ceefa45ab357f823be9054bd33347284058243b8..e51a1091d63a1ce66092bb90e7cf8a5dfa1941a2 100644
--- a/docs/tutorials/INSTALL_cn.md
+++ b/docs/tutorials/INSTALL_cn.md
@@ -1,79 +1,128 @@
-# 安装说明
+[English](INSTALL.md) | 简体中文
 
----
-## 目录
 
-- [安装PaddlePaddle](#安装PaddlePaddle)
-- [其他依赖安装](#其他依赖安装)
-- [PaddleDetection](#PaddleDetection)
+# 安装文档
 
+本文档包含了如何安装PaddleDetection以及相关依赖
 
-## 安装PaddlePaddle
+其他更多PaddleDetection信息，请参考[README.md](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.0).
 
-**环境需求:**
+## 环境要求
 
-- PaddlePaddle 2.0.1 或 PaddlePaddle release/2.0分支最新编译安装包
+- PaddlePaddle 2.0.1
 - OS 64位操作系统
 - Python 3(3.5.1+/3.6/3.7)，64位版本
 - pip/pip3(9.0.1+)，64位版本
 - CUDA >= 9.0
 - cuDNN >= 7.6
 
-如果需要 GPU 多卡训练，请先安装NCCL。
+PaddleDetection 依赖 PaddlePaddle 版本关系：
 
+|  PaddleDetection版本  | PaddlePaddle版本  |    备注    |
+| :------------------: | :---------------: | :-------: |
+|    release/2.0       |       >= 2.0.1    |     默认使用动态图模式    |
+|    release/2.0-rc    |       >= 2.0.1    |     --    |
+|    release/0.5       |       >= 1.8.4    |  大部分模型>=1.8.4即可运行，Cascade R-CNN系列模型与SOLOv2依赖2.0.0.rc版本 |
+|    release/0.4       |       >= 1.8.4    |  PP-YOLO依赖1.8.4 |
+|    release/0.3       |        >=1.7      |     --    |
 
-## 其他依赖安装
+## 安装说明
 
-[COCO-API](https://github.com/cocodataset/cocoapi):
+### 1. 安装PaddlePaddle
 
-运行需要COCO-API，安装方式如下：
+```
+# CUDA9.0
+python -m pip install paddlepaddle-gpu==2.0.1.post90 -i https://mirror.baidu.com/pypi/simple
+
+# CUDA10.1
+python -m pip install paddlepaddle-gpu==2.0.1.post101 -f https://mirror.baidu.com/pypi/simple
+
+# CPU
+python -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
+```
+
+更多安装方式例如conda或源码编译安装方法，请参考PaddlePaddle[安装文档](https://www.paddlepaddle.org.cn/documentation/docs/zh/install/index_cn.html)
 
-    # 安装pycocotools
-    pip install pycocotools
+请确保您的PaddlePaddle安装成功并且版本不低于需求版本。使用以下命令进行验证。
 
-**windows用户安装COCO-API方式：**
+```
+# 在您的Python解释器中确认PaddlePaddle安装成功
+>>> import paddle
+>>> paddle.utils.run_check()
 
-    # 若Cython未安装，请安装Cython
-    pip install Cython
+# 确认PaddlePaddle版本
+python -c "import paddle; print(paddle.__version__)"
+```
+**注意**
+1. 如果您希望在多卡环境下使用PaddleDetection，请首先安装NCCL
 
-    # 由于原版cocoapi不支持windows，采用第三方实现版本，该版本仅支持Python3
-    pip install git+https://github.com/philferriere/cocoapi.git#subdirectory=PythonAPI
+### 2. 安装PaddleDetection
 
-## PaddleDetection
+可通过如下两种方式安装PaddleDetection
 
-**克隆PaddleDetection库：**
+#### 2.1 通过pip安装
 
-您可以通过以下命令克隆PaddleDetection：
+**注意：** pip安装方式只支持Python3
 
 ```
-cd <path/to/clone/PaddleDetection>
+# pip安装paddledet
+pip install paddledet==2.0.1 -i https://mirror.baidu.com/pypi/simple
+
+# 下载使用源码中的配置文件和代码示例
 git clone https://github.com/PaddlePaddle/PaddleDetection.git
+cd PaddleDetection
 ```
 
-也可以通过 [https://gitee.com/paddlepaddle/PaddleDetection](https://gitee.com/paddlepaddle/PaddleDetection) 克隆。
+#### 2.2 源码编译安装
+
 ```
+# 克隆PaddleDetection仓库
 cd <path/to/clone/PaddleDetection>
-git clone https://gitee.com/paddlepaddle/PaddleDetection
+git clone https://github.com/PaddlePaddle/PaddleDetection.git
+
+# 编译安装paddledet
+cd PaddleDetection
+python setup.py install
+
+# 安装其他依赖
+pip install -r requirements.txt
+
+```
+
+**注意**
+
+1. 若您使用的是Windows系统，由于原版cocoapi不支持Windows，`pycocotools`依赖可能安装失败，可采用第三方实现版本，该版本仅支持Python3
+
+    ```pip install git+https://github.com/philferriere/cocoapi.git#subdirectory=PythonAPI```
+
+
+安装后确认测试通过：
+
+```
+python ppdet/modeling/tests/test_architectures.py
 ```
 
-**安装PaddleDetection库:**
+测试通过后会提示如下信息：
 
 ```
-cd PaddleDetection/dygraph
-python setup.py install
+.....
+----------------------------------------------------------------------
+Ran 5 tests in 4.280s
+OK
 ```
 
-**预训练模型预测**
+## 快速体验
 
-使用预训练模型预测图像，快速体验模型预测效果：
+**恭喜！** 您已经成功安装了PaddleDetection，接下来快速体验目标检测效果
 
 ```
-# use_gpu参数设置是否使用GPU
-python tools/infer.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o use_gpu=true weights=https://paddlemodels.bj.bcebos.com/object_detection/dygraph/ppyolo_r50vd_dcn_1x_coco.pdparams --infer_img=demo/000000014439.jpg
+# 在GPU上预测一张图片
+export CUDA_VISIBLE_DEVICES=0
+python tools/infer.py -c configs/ppyolo/ppyolo_r50vd_dcn_1x_coco.yml -o use_gpu=true weights=https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams --infer_img=demo/000000014439.jpg
 ```
 
 会在`output`文件夹下生成一个画有预测结果的同名图像。
 
 结果如下图：
 
-![demo image](../images/000000014439.jpg)
+![](../images/000000014439.jpg)
diff --git a/docs/tutorials/PrepareDataSet.md b/docs/tutorials/PrepareDataSet.md
index 2689ebf089c0847ad2eeb4ca9f587bdbab77f43b..7f4d9dcbc3676cede6d19b71466d6b396c5d58e4 100644
--- a/docs/tutorials/PrepareDataSet.md
+++ b/docs/tutorials/PrepareDataSet.md
@@ -2,10 +2,10 @@
 ## 目录
 - [目标检测数据说明](#目标检测数据说明)
 - [准备训练数据](#准备训练数据)
-    - [VOC数据数据](#VOC数据数据)
+    - [VOC数据](#VOC数据)
         - [VOC数据集下载](#VOC数据集下载)
         - [VOC数据标注文件介绍](#VOC数据标注文件介绍)
-    - [COCO数据数据](#COCO数据数据)
+    - [COCO数据](#COCO数据)
         - [COCO数据集下载](#COCO数据下载)
         - [COCO数据标注文件介绍](#COCO数据标注文件介绍)
     - [用户数据](#用户数据)
@@ -44,14 +44,14 @@ cd PaddleDetection/
 ppdet_root=$(pwd)
 ```
 
-#### VOC数据数据  
+#### VOC数据
 
 VOC数据是[Pascal VOC](http://host.robots.ox.ac.uk/pascal/VOC/) 比赛使用的数据。Pascal VOC比赛不仅包含图像分类分类任务，还包含图像目标检测、图像分割等任务，其标注文件中包含多个任务的标注内容。
 VOC数据集指的是Pascal VOC比赛使用的数据。用户自定义的VOC数据，xml文件中的非必须字段，请根据实际情况选择是否标注或是否使用默认值。
 
 ##### VOC数据集下载  
 
-- 通过代码自动化下载VOC数据集  
+- 通过代码自动化下载VOC数据集，数据集较大，下载需要较长时间
 
     ```
     # 执行代码自动化下载VOC数据集  
@@ -151,11 +151,11 @@ COCO数据集指的是COCO比赛使用的数据。用户自定义的COCO数据
 
 
 ##### COCO数据下载  
-- 通过代码自动化下载COCO数据集  
+- 通过代码自动化下载COCO数据集，数据集较大，下载需要较长时间
 
     ```
     # 执行代码自动化下载COCO数据集  
-    python dataset/voc/download_coco.py
+    python dataset/coco/download_coco.py
     ```
 
     代码执行完成后COCO数据集文件组织结构为：
@@ -289,7 +289,7 @@ classname2
 ...
 ```
 
-##### 用户数据转成COCO  
+##### 用户数据转成COCO数据
 在`./tools/`中提供了`x2coco.py`用于将VOC数据集、labelme标注的数据集或cityscape数据集转换为COCO数据，例如:
 
 （1）labelme数据转换为COCO数据：
@@ -328,7 +328,7 @@ dataset/xxx/
 ```
 
 ##### 用户数据自定义reader  
-如果数据集有新的数据需要添加进PaddleDetection中，您可参考数据处理文档中的[添加新数据源](../advanced_tutorials/READER.md#添加新数据源)文档部分，开发相应代码完成新的数据源支持，同时数据处理具体代码解析等可阅读[数据处理文档](../advanced_tutorials/READER.md)
+如果数据集有新的数据需要添加进PaddleDetection中，您可参考数据处理文档中的[添加新数据源](../advanced_tutorials/READER.md#2.3自定义数据集)文档部分，开发相应代码完成新的数据源支持，同时数据处理具体代码解析等可阅读[数据处理文档](../advanced_tutorials/READER.md)
 
 
 #### 用户数据数据转换示例  
diff --git a/docs/tutorials/QUICK_STARTED.md b/docs/tutorials/QUICK_STARTED.md
new file mode 100644
index 0000000000000000000000000000000000000000..9b3e0dced48327c0db645c6236f9f73f66692f4b
--- /dev/null
+++ b/docs/tutorials/QUICK_STARTED.md
@@ -0,0 +1,91 @@
+English | [简体中文](QUICK_STARTED_cn.md)
+
+# Quick Start
+In order to enable users to experience PaddleDetection and produce models in a short time, this tutorial introduces the pipeline to get a decent object detection model by finetuning on a small dataset in 10 minutes only. In practical applications, it is recommended that users select a suitable model configuration file for their specific demand.
+
+- **Set GPU**
+
+
+```bash
+export CUDA_VISIBLE_DEVICES=0
+```
+
+## Inference Demo with Pre-trained Models
+
+```
+# predict an image using PP-YOLO
+python tools/infer.py -c configs/ppyolo/ppyolo.yml -o use_gpu=true weights=https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams --infer_img=demo/000000014439.jpg
+```
+
+the result：
+
+![](../images/000000014439.jpg)
+
+
+## Data preparation
+The Dataset is [Kaggle dataset](https://www.kaggle.com/andrewmvd/road-sign-detection) ，including 877 images and 4 data categories: crosswalk, speedlimit, stop, trafficlight. The dataset is divided into training set (701 images) and test set (176 images)，[download link](https://paddlemodels.bj.bcebos.com/object_detection/roadsign_voc.tar).
+
+```
+# Note: this command could skip and
+# the dataset will be dowloaded automatically at the stage of training.
+python dataset/roadsign_voc/download_roadsign_voc.py
+```
+
+## Training & Evaluation & Inference
+### 1、Training
+```
+# It will takes about 10 minutes on 1080Ti and 1 hour on CPU
+# -c set configuration file
+# -o overwrite the settings in the configuration file
+# --eval Evaluate while training, and a model named best_model.pdmodel with the most evaluation results will be automatically saved
+
+
+python tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml --eval -o use_gpu=true
+```
+
+If you want to observe the loss change curve in real time through VisualDL, add --use_vdl=true to the training command, and set the log save path through --vdl_log_dir.
+
+**Note: VisualDL need Python>=3.5**
+
+Please install [VisualDL](https://github.com/PaddlePaddle/VisualDL) first
+
+```
+python -m pip install visualdl -i https://mirror.baidu.com/pypi/simple
+```
+
+```
+python -u tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml \
+                        --use_vdl=true \
+                        --vdl_log_dir=vdl_dir/scalar \
+                        --eval
+```
+View the change curve in real time through the visualdl command:
+```
+visualdl --logdir vdl_dir/scalar/ --host <host_IP> --port <port_num>
+```
+
+### 2、Evaluation
+```
+# Evaluate best_model by default
+# -c set config file
+# -o overwrite the settings in the configuration file
+
+python tools/eval.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml -o use_gpu=true
+```
+
+The final mAP should be around 0.85. The dataset is small so the precision may vary a little after each training.
+
+
+### 3、Inference
+```
+# -c set config file
+# -o overwrite the settings in the configuration file
+# --infer_img image path
+# After the prediction is over, an image of the same name with the prediction result will be generated in the output folder
+
+python tools/infer.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml -o use_gpu=true --infer_img=demo/road554.png
+```
+
+The result is as shown below：
+
+![](../images/road554.png)
diff --git a/docs/tutorials/QUICK_STARTED_cn.md b/docs/tutorials/QUICK_STARTED_cn.md
index 2b2aa81e7537bf694894bfbc4fec39cf5a1514ff..2b5212af15b855759d9455037b83800d5e1d7603 100644
--- a/docs/tutorials/QUICK_STARTED_cn.md
+++ b/docs/tutorials/QUICK_STARTED_cn.md
@@ -1,3 +1,5 @@
+[English](QUICK_STARTED.md) | 简体中文
+
 # 快速开始
 为了使得用户能够在很短时间内快速产出模型，掌握PaddleDetection的使用方式，这篇教程通过一个预训练检测模型对小数据集进行finetune。在较短时间内即可产出一个效果不错的模型。实际业务中，建议用户根据需要选择合适模型配置文件进行适配。
 
@@ -35,7 +37,27 @@ python dataset/roadsign_voc/download_roadsign_voc.py
 # -o 参数表示指定配置文件中的全局变量（覆盖配置文件中的设置），这里设置使用gpu
 # --eval 参数表示边训练边评估，最后会自动保存一个名为model_final.pdparams的模型
 
-python tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml --eval -o use_gpu=true --weight_type finetune
+python tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml --eval -o use_gpu=true
+```
+
+如果想通过VisualDL实时观察loss变化曲线，在训练命令中添加--use_vdl=true，以及通过--vdl_log_dir设置日志保存路径。
+
+**但注意VisualDL需Python>=3.5**
+
+首先安装[VisualDL](https://github.com/PaddlePaddle/VisualDL)
+```
+python -m pip install visualdl -i https://mirror.baidu.com/pypi/simple
+```
+
+```
+python -u tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml \
+                        --use_vdl=true \
+                        --vdl_log_dir=vdl_dir/scalar \
+                        --eval
+```
+通过visualdl命令实时查看变化曲线：
+```
+visualdl --logdir vdl_dir/scalar/ --host <host_IP> --port <port_num>
 ```
 
 
@@ -48,6 +70,7 @@ python tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml --eval
 
 python tools/eval.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml -o use_gpu=true
 ```
+最终模型精度在mAP=0.85左右，由于数据集较小因此每次训练结束后精度会有一定波动
 
 
 ### 3、预测
diff --git a/docs/tutorials/config_annotation/faster_rcnn_r50_fpn_1x_coco_annotation.md b/docs/tutorials/config_annotation/faster_rcnn_r50_fpn_1x_coco_annotation.md
new file mode 100644
index 0000000000000000000000000000000000000000..460af362bff54f708cc49f2806434a77582c9aee
--- /dev/null
+++ b/docs/tutorials/config_annotation/faster_rcnn_r50_fpn_1x_coco_annotation.md
@@ -0,0 +1,263 @@
+# RCNN系列模型参数配置教程
+
+标签： 模型参数配置
+
+以`faster_rcnn_r50_fpn_1x_coco.yml`为例，这个模型由五个子配置文件组成：
+
+- 数据配置文件 `coco_detection.yml`
+
+```yaml
+# 数据评估类型
+metric: COCO
+# 数据集的类别数
+num_classes: 80
+
+# TrainDataset
+TrainDataset:
+  !COCODataSet
+    # 图像数据路径，相对 dataset_dir 路径，os.path.join(dataset_dir, image_dir)
+    image_dir: train2017
+    # 标注文件路径，相对 dataset_dir 路径，os.path.join(dataset_dir, anno_path)
+    anno_path: annotations/instances_train2017.json
+    # 数据文件夹
+    dataset_dir: dataset/coco
+    # data_fields
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
+
+EvalDataset:
+  !COCODataSet
+    # 图像数据路径，相对 dataset_dir 路径，os.path.join(dataset_dir, image_dir)
+    image_dir: val2017
+    # 标注文件路径，相对 dataset_dir 路径，os.path.join(dataset_dir, anno_path)
+    anno_path: annotations/instances_val2017.json
+    # 数据文件夹
+    dataset_dir: dataset/coco
+
+TestDataset:
+  !ImageFolder
+    # 标注文件路径，相对 dataset_dir 路径，os.path.join(dataset_dir, anno_path)
+    anno_path: annotations/instances_val2017.json
+```
+
+- 优化器配置文件 `optimizer_1x.yml`
+
+```yaml
+# 总训练轮数
+epoch: 12
+
+# 学习率设置
+LearningRate:
+  # 默认为8卡训学习率
+  base_lr: 0.01
+  # 学习率调整策略
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    # 学习率变化位置(轮数)
+    milestones: [8, 11]
+  - !LinearWarmup
+    start_factor: 0.1
+    steps: 1000
+
+# 优化器
+OptimizerBuilder:
+  # 优化器
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  # 正则化
+  regularizer:
+    factor: 0.0001
+    type: L2
+```
+
+- 数据读取配置文件 `faster_fpn_reader.yml`
+
+```yaml
+# 每张GPU reader进程个数
+worker_num: 2
+# 训练数据
+TrainReader:
+  # 训练数据transforms
+  sample_transforms:
+  - Decode: {}
+  - RandomResize: {target_size: [[640, 1333], [672, 1333], [704, 1333], [736, 1333], [768, 1333], [800, 1333]], interp: 2, keep_ratio: True}
+  - RandomFlip: {prob: 0.5}
+  - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
+  - Permute: {}
+  batch_transforms:
+  # 由于模型存在FPN结构，输入图片需要padding为32的倍数
+  - PadBatch: {pad_to_stride: 32}
+  # 训练时batch_size
+  batch_size: 1
+  # 读取数据是是否乱序
+  shuffle: true
+  # 是否丢弃最后不能完整组成batch的数据
+  drop_last: true
+  # 表示reader是否对gt进行组batch的操作，在rcnn系列算法中设置为false，得到的gt格式为list[Tensor]
+  collate_batch: false
+
+# 评估数据
+EvalReader:
+  # 评估数据transforms
+  sample_transforms:
+  - Decode: {}
+  - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True}
+  - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
+  - Permute: {}
+  batch_transforms:
+  # 由于模型存在FPN结构，输入图片需要padding为32的倍数
+  - PadBatch: {pad_to_stride: 32}
+  # 评估时batch_size
+  batch_size: 1
+  # 读取数据是是否乱序
+  shuffle: false
+  # 是否丢弃最后不能完整组成batch的数据
+  drop_last: false
+  # 是否丢弃没有标注的数据
+  drop_empty: false
+
+# 测试数据
+TestReader:
+  # 测试数据transforms
+  sample_transforms:
+  - Decode: {}
+  - Resize: {interp: 2, target_size: [800, 1333], keep_ratio: True}
+  - NormalizeImage: {is_scale: true, mean: [0.485,0.456,0.406], std: [0.229, 0.224,0.225]}
+  - Permute: {}
+  batch_transforms:
+  # 由于模型存在FPN结构，输入图片需要padding为32的倍数
+  - PadBatch: {pad_to_stride: 32}
+  # 测试时batch_size
+  batch_size: 1
+  # 读取数据是是否乱序
+  shuffle: false
+  # 是否丢弃最后不能完整组成batch的数据
+  drop_last: false
+```
+
+- 模型配置文件 `faster_rcnn_r50_fpn.yml`
+
+```yaml
+# 模型结构类型
+architecture: FasterRCNN
+# 预训练模型地址
+pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_cos_pretrained.pdparams
+
+# FasterRCNN
+FasterRCNN:
+  # backbone
+  backbone: ResNet
+  # neck
+  neck: FPN
+  # rpn_head
+  rpn_head: RPNHead
+  # bbox_head
+  bbox_head: BBoxHead
+  # post process
+  bbox_post_process: BBoxPostProcess
+
+
+# backbone
+ResNet:
+  # index 0 stands for res2
+  depth: 50
+  # norm_type，可设置参数：bn 或 sync_bn
+  norm_type: bn
+  # freeze_at index, 0 represent res2
+  freeze_at: 0
+  # return_idx
+  return_idx: [0,1,2,3]
+  # num_stages
+  num_stages: 4
+
+# FPN
+FPN:
+  # channel of FPN
+  out_channel: 256
+
+# RPNHead
+RPNHead:
+  # anchor generator
+  anchor_generator:
+    aspect_ratios: [0.5, 1.0, 2.0]
+    anchor_sizes: [[32], [64], [128], [256], [512]]
+    strides: [4, 8, 16, 32, 64]
+  # rpn_target_assign
+  rpn_target_assign:
+    batch_size_per_im: 256
+    fg_fraction: 0.5
+    negative_overlap: 0.3
+    positive_overlap: 0.7
+    use_random: True
+  # 训练时生成proposal的参数
+  train_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 2000
+    post_nms_top_n: 1000
+    topk_after_collect: True
+  # 评估时生成proposal的参数
+  test_proposal:
+    min_size: 0.0
+    nms_thresh: 0.7
+    pre_nms_top_n: 1000
+    post_nms_top_n: 1000
+
+# BBoxHead
+BBoxHead:
+  # TwoFCHead as BBoxHead
+  head: TwoFCHead
+  # roi align
+  roi_extractor:
+    resolution: 7
+    sampling_ratio: 0
+    aligned: True
+  # bbox_assigner
+  bbox_assigner: BBoxAssigner
+
+# BBoxAssigner
+BBoxAssigner:
+  # batch_size_per_im
+  batch_size_per_im: 512
+  # 背景阈值
+  bg_thresh: 0.5
+  # 前景阈值
+  fg_thresh: 0.5
+  # 前景比例
+  fg_fraction: 0.25
+  # 是否随机采样
+  use_random: True
+
+# TwoFCHead
+TwoFCHead:
+  # TwoFCHead特征维度
+  out_channel: 1024
+
+
+# BBoxPostProcess
+BBoxPostProcess:
+  # 解码
+  decode: RCNNBox
+  # nms
+  nms:
+    # 使用MultiClassNMS
+    name: MultiClassNMS
+    keep_top_k: 100
+    score_threshold: 0.05
+    nms_threshold: 0.5
+
+```
+
+- 运行时置文件 `runtime.yml`
+
+```yaml
+# 是否使用gpu
+use_gpu: true
+# 日志打印间隔
+log_iter: 20
+# save_dir
+save_dir: output
+# 模型保存间隔时间
+snapshot_epoch: 1
+```
diff --git a/docs/tutorials/config_annotation/ppyolo_r50vd_dcn_1x_coco_annotation.md b/docs/tutorials/config_annotation/ppyolo_r50vd_dcn_1x_coco_annotation.md
new file mode 100644
index 0000000000000000000000000000000000000000..9c7985fd26971b5e412d7a920a0d46eed62204e5
--- /dev/null
+++ b/docs/tutorials/config_annotation/ppyolo_r50vd_dcn_1x_coco_annotation.md
@@ -0,0 +1,266 @@
+# YOLO系列模型参数配置教程
+
+标签： 模型参数配置
+
+以`ppyolo_r50vd_dcn_1x_coco.yml`为例，这个模型由五个子配置文件组成：
+
+- 数据配置文件 `coco_detection.yml`
+
+```yaml
+# 数据评估类型
+metric: COCO
+# 数据集的类别数
+num_classes: 80
+
+# TrainDataset
+TrainDataset:
+  !COCODataSet
+    # 图像数据路径，相对 dataset_dir 路径，os.path.join(dataset_dir, image_dir)
+    image_dir: train2017
+    # 标注文件路径，相对 dataset_dir 路径，os.path.join(dataset_dir, anno_path)
+    anno_path: annotations/instances_train2017.json
+    # 数据文件夹
+    dataset_dir: dataset/coco
+    # data_fields
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
+
+EvalDataset:
+  !COCODataSet
+    # 图像数据路径，相对 dataset_dir 路径，os.path.join(dataset_dir, image_dir)
+    image_dir: val2017
+    # 标注文件路径，相对 dataset_dir 路径，os.path.join(dataset_dir, anno_path)
+    anno_path: annotations/instances_val2017.json
+    # 数据文件夹，os.path.join(dataset_dir, anno_path)
+    dataset_dir: dataset/coco
+
+TestDataset:
+  !ImageFolder
+    # 标注文件路径，相对 dataset_dir 路径
+    anno_path: annotations/instances_val2017.json
+```
+
+- 优化器配置文件 `optimizer_1x.yml`
+
+```yaml
+# 总训练轮数
+epoch: 405
+
+# 学习率设置
+LearningRate:
+  # 默认为8卡训学习率
+  base_lr: 0.01
+  # 学习率调整策略
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    # 学习率变化位置(轮数)
+    milestones:
+    - 243
+    - 324
+  # Warmup
+  - !LinearWarmup
+    start_factor: 0.
+    steps: 4000
+
+# 优化器
+OptimizerBuilder:
+  # 优化器
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  # 正则化
+  regularizer:
+    factor: 0.0005
+    type: L2
+```
+
+- 数据读取配置文件 `ppyolo_reader.yml`
+
+```yaml
+# 每张GPU reader进程个数
+worker_num: 2
+# 训练数据
+TrainReader:
+  inputs_def:
+    num_max_boxes: 50
+  # 训练数据transforms
+  sample_transforms:
+    - Decode: {}
+    - Mixup: {alpha: 1.5, beta: 1.5}
+    - RandomDistort: {}
+    - RandomExpand: {fill_value: [123.675, 116.28, 103.53]}
+    - RandomCrop: {}
+    - RandomFlip: {}
+  # batch_transforms
+  batch_transforms:
+    - BatchRandomResize: {target_size: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608], random_size: True, random_interp: True, keep_ratio: False}
+    - NormalizeBox: {}
+    - PadBox: {num_max_boxes: 50}
+    - BboxXYXY2XYWH: {}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+    - Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8]}
+  # 训练时batch_size
+  batch_size: 24
+  # 读取数据是是否乱序
+  shuffle: true
+  # 是否丢弃最后不能完整组成batch的数据
+  drop_last: true
+  # mixup_epoch，大于最大epoch，表示训练过程一直使用mixup数据增广
+  mixup_epoch: 25000
+  # 是否通过共享内存进行数据读取加速，需要保证共享内存大小(如/dev/shm)满足大于1G
+  use_shared_memory: true
+
+# 评估数据
+EvalReader:
+  # 评估数据transforms
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: [608, 608], keep_ratio: False, interp: 2}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+  # 评估时batch_size
+  batch_size: 8
+  # 是否丢弃没有标注的数据
+  drop_empty: false
+
+# 测试数据
+TestReader:
+  inputs_def:
+    image_shape: [3, 608, 608]
+  # 测试数据transforms
+  sample_transforms:
+    - Decode: {}
+    - Resize: {target_size: [608, 608], keep_ratio: False, interp: 2}
+    - NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
+    - Permute: {}
+  # 测试时batch_size
+  batch_size: 1
+```
+
+- 模型配置文件 `ppyolo_r50vd_dcn.yml`
+
+```yaml
+# 模型结构类型
+architecture: YOLOv3
+# 预训练模型地址
+pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/ResNet50_vd_ssld_pretrained.pdparams
+# norm_type
+norm_type: sync_bn
+# 是否使用ema
+use_ema: true
+# ema_decay
+ema_decay: 0.9998
+
+# YOLOv3
+YOLOv3:
+  # backbone
+  backbone: ResNet
+  # neck
+  neck: PPYOLOFPN
+  # yolo_head
+  yolo_head: YOLOv3Head
+  # post_process
+  post_process: BBoxPostProcess
+
+
+# backbone
+ResNet:
+  # depth
+  depth: 50
+  # variant
+  variant: d
+  # return_idx, 0 represent res2
+  return_idx: [1, 2, 3]
+  # dcn_v2_stages
+  dcn_v2_stages: [3]
+  # freeze_at
+  freeze_at: -1
+  # freeze_norm
+  freeze_norm: false
+  # norm_decay
+  norm_decay: 0.
+
+# PPYOLOFPN
+PPYOLOFPN:
+  # 是否coord_conv
+  coord_conv: true
+  # 是否drop_block
+  drop_block: true
+  # block_size
+  block_size: 3
+  # keep_prob
+  keep_prob: 0.9
+  # 是否spp
+  spp: true
+
+# YOLOv3Head
+YOLOv3Head:
+  # anchors
+  anchors: [[10, 13], [16, 30], [33, 23],
+            [30, 61], [62, 45], [59, 119],
+            [116, 90], [156, 198], [373, 326]]
+  # anchor_masks
+  anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
+  # loss
+  loss: YOLOv3Loss
+  # 是否使用iou_aware
+  iou_aware: true
+  # iou_aware_factor
+  iou_aware_factor: 0.4
+
+# YOLOv3Loss
+YOLOv3Loss:
+  # ignore_thresh
+  ignore_thresh: 0.7
+  # downsample
+  downsample: [32, 16, 8]
+  # 是否label_smooth
+  label_smooth: false
+  # scale_x_y
+  scale_x_y: 1.05
+  # iou_loss
+  iou_loss: IouLoss
+  # iou_aware_loss
+  iou_aware_loss: IouAwareLoss
+
+# IouLoss
+IouLoss:
+  loss_weight: 2.5
+  loss_square: true
+
+# IouAwareLoss
+IouAwareLoss:
+  loss_weight: 1.0
+
+# BBoxPostProcess
+BBoxPostProcess:
+  decode:
+    name: YOLOBox
+    conf_thresh: 0.01
+    downsample_ratio: 32
+    clip_bbox: true
+    scale_x_y: 1.05
+  # nms 配置
+  nms:
+    name: MatrixNMS
+    keep_top_k: 100
+    score_threshold: 0.01
+    post_threshold: 0.01
+    nms_top_k: -1
+    background_label: -1
+
+```
+
+- 运行时置文件 `runtime.yml`
+
+```yaml
+# 是否使用gpu
+use_gpu: true
+# 日志打印间隔
+log_iter: 20
+# save_dir
+save_dir: output
+# 模型保存间隔时间
+snapshot_epoch: 1
+```
diff --git a/ppdet/__init__.py b/ppdet/__init__.py
index 1d5bd814bd1c334bb1396e7a1cef4c9acb6b6ea6..56b687dd9d8d0ba3cca7296d540413edea3c837b 100644
--- a/ppdet/__init__.py
+++ b/ppdet/__init__.py
@@ -13,4 +13,4 @@
 # limitations under the License.
 
 from . import (core, data, engine, modeling, model_zoo, optimizer, metrics,
-               py_op, utils, slim)
+               utils, slim)
diff --git a/ppdet/data/reader.py b/ppdet/data/reader.py
index 084ad7b938946681b626bbfb72cb3d86f3849919..bc34ec51e8376e9bd124c2871f5d23086839b4b8 100644
--- a/ppdet/data/reader.py
+++ b/ppdet/data/reader.py
@@ -166,6 +166,7 @@ class BaseDataLoader(object):
                  batch_sampler=None,
                  return_list=False):
         self.dataset = dataset
+        self.dataset.check_or_download_dataset()
         self.dataset.parse_dataset()
         # get data
         self.dataset.set_transform(self._sample_transforms)
diff --git a/ppdet/data/source/__init__.py b/ppdet/data/source/__init__.py
index 60c205d140cf8ac6a631be473ab816009c82ac6d..b63cba0e6a22fd238b806fb30e8112ee095f12c6 100644
--- a/ppdet/data/source/__init__.py
+++ b/ppdet/data/source/__init__.py
@@ -13,10 +13,11 @@
 # limitations under the License.
 
 from . import coco
-# TODO add voc and widerface dataset
 from . import voc
-#from . import widerface
+from . import widerface
+from . import category
 
 from .coco import *
 from .voc import *
-#from .widerface import *
+from .widerface import *
+from .category import *
diff --git a/ppdet/metrics/category.py b/ppdet/data/source/category.py
similarity index 99%
rename from ppdet/metrics/category.py
rename to ppdet/data/source/category.py
index fe73af7a4fa19e6a64bfb9a93748afd640c4996d..06fbcccbb3433b1dfb5e355a79e3d91bef911d84 100644
--- a/ppdet/metrics/category.py
+++ b/ppdet/data/source/category.py
@@ -32,6 +32,8 @@ def get_categories(metric_type, anno_file=None):
     to category name map from annotation file.
 
     Args:
+        metric_type (str): metric type, currently support 'coco', 'voc', 'oid'
+            and 'widerface'.
         anno_file (str): annotation file path
     """
     if metric_type.lower() == 'coco':
diff --git a/ppdet/data/source/coco.py b/ppdet/data/source/coco.py
index 387229136ef5470d988e19e63c912992c3e1a801..cf08aad5116813b236334110a5013acde3817f5b 100644
--- a/ppdet/data/source/coco.py
+++ b/ppdet/data/source/coco.py
@@ -24,6 +24,17 @@ logger = setup_logger(__name__)
 @register
 @serializable
 class COCODataSet(DetDataset):
+    """
+    Load dataset with COCO format.
+
+    Args:
+        dataset_dir (str): root directory for dataset.
+        image_dir (str): directory for images.
+        anno_path (str): coco annotation file path.
+        data_fields (list): key name of data dictionary, at least have 'image'.
+        sample_num (int): number of samples to load, -1 means all.
+    """
+
     def __init__(self,
                  dataset_dir=None,
                  image_dir=None,
@@ -49,10 +60,10 @@ class COCODataSet(DetDataset):
         records = []
         ct = 0
 
-        catid2clsid = dict({catid: i for i, catid in enumerate(cat_ids)})
-        cname2cid = dict({
+        self.catid2clsid = dict({catid: i for i, catid in enumerate(cat_ids)})
+        self.cname2cid = dict({
             coco.loadCats(catid)[0]['name']: clsid
-            for catid, clsid in catid2clsid.items()
+            for catid, clsid in self.catid2clsid.items()
         })
 
         if 'annotations' not in coco.dataset:
@@ -79,6 +90,13 @@ class COCODataSet(DetDataset):
                                    im_w, im_h, img_id))
                 continue
 
+            coco_rec = {
+                'im_file': im_path,
+                'im_id': np.array([img_id]),
+                'h': im_h,
+                'w': im_w,
+            } if 'image' in self.data_fields else {}
+
             if not self.load_image_only:
                 ins_anno_ids = coco.getAnnIds(imgIds=[img_id], iscrowd=False)
                 instances = coco.loadAnns(ins_anno_ids)
@@ -91,14 +109,26 @@ class COCODataSet(DetDataset):
                     else:
                         if not any(np.array(inst['bbox'])):
                             continue
-                    x1, y1, box_w, box_h = inst['bbox']
-                    x2 = x1 + box_w
-                    y2 = y1 + box_h
+
+                    # read rbox anno or not
+                    is_rbox_anno = True if len(inst['bbox']) == 5 else False
+                    if is_rbox_anno:
+                        xc, yc, box_w, box_h, angle = inst['bbox']
+                        x1 = xc - box_w / 2.0
+                        y1 = yc - box_h / 2.0
+                        x2 = x1 + box_w
+                        y2 = y1 + box_h
+                    else:
+                        x1, y1, box_w, box_h = inst['bbox']
+                        x2 = x1 + box_w
+                        y2 = y1 + box_h
                     eps = 1e-5
                     if inst['area'] > 0 and x2 - x1 > eps and y2 - y1 > eps:
                         inst['clean_bbox'] = [
                             round(float(x), 3) for x in [x1, y1, x2, y2]
                         ]
+                        if is_rbox_anno:
+                            inst['clean_rbox'] = [xc, yc, box_w, box_h, angle]
                         bboxes.append(inst)
                     else:
                         logger.warning(
@@ -111,6 +141,9 @@ class COCODataSet(DetDataset):
                     continue
 
                 gt_bbox = np.zeros((num_bbox, 4), dtype=np.float32)
+                if is_rbox_anno:
+                    gt_rbox = np.zeros((num_bbox, 5), dtype=np.float32)
+                gt_theta = np.zeros((num_bbox, 1), dtype=np.int32)
                 gt_class = np.zeros((num_bbox, 1), dtype=np.int32)
                 is_crowd = np.zeros((num_bbox, 1), dtype=np.int32)
                 difficult = np.zeros((num_bbox, 1), dtype=np.int32)
@@ -119,8 +152,11 @@ class COCODataSet(DetDataset):
                 has_segmentation = False
                 for i, box in enumerate(bboxes):
                     catid = box['category_id']
-                    gt_class[i][0] = catid2clsid[catid]
+                    gt_class[i][0] = self.catid2clsid[catid]
                     gt_bbox[i, :] = box['clean_bbox']
+                    # xc, yc, w, h, theta
+                    if is_rbox_anno:
+                        gt_rbox[i, :] = box['clean_rbox']
                     is_crowd[i][0] = box['iscrowd']
                     # check RLE format 
                     if 'segmentation' in box and box['iscrowd'] == 1:
@@ -132,19 +168,22 @@ class COCODataSet(DetDataset):
                 if has_segmentation and not any(gt_poly):
                     continue
 
-                coco_rec = {
-                    'im_file': im_path,
-                    'im_id': np.array([img_id]),
-                    'h': im_h,
-                    'w': im_w,
-                } if 'image' in self.data_fields else {}
-
-                gt_rec = {
-                    'is_crowd': is_crowd,
-                    'gt_class': gt_class,
-                    'gt_bbox': gt_bbox,
-                    'gt_poly': gt_poly,
-                }
+                if is_rbox_anno:
+                    gt_rec = {
+                        'is_crowd': is_crowd,
+                        'gt_class': gt_class,
+                        'gt_bbox': gt_bbox,
+                        'gt_rbox': gt_rbox,
+                        'gt_poly': gt_poly,
+                    } if 'image' in self.data_fields else {}
+                else:
+                    gt_rec = {
+                        'is_crowd': is_crowd,
+                        'gt_class': gt_class,
+                        'gt_bbox': gt_bbox,
+                        'gt_poly': gt_poly,
+                    } if 'image' in self.data_fields else {}
+
                 for k, v in gt_rec.items():
                     if k in self.data_fields:
                         coco_rec[k] = v
@@ -163,4 +202,4 @@ class COCODataSet(DetDataset):
                 break
         assert len(records) > 0, 'not found any coco record in %s' % (anno_path)
         logger.debug('{} samples in file {}'.format(ct, anno_path))
-        self.roidbs, self.cname2cid = records, cname2cid
+        self.roidbs = records
diff --git a/ppdet/data/source/dataset.py b/ppdet/data/source/dataset.py
index 429cdc7a5ca5053dc73aa7266d210d0e1933f771..96b81326a2729c16637da33eaba39f7743fed2a3 100644
--- a/ppdet/data/source/dataset.py
+++ b/ppdet/data/source/dataset.py
@@ -27,6 +27,18 @@ import copy
 
 @serializable
 class DetDataset(Dataset):
+    """
+    Load detection dataset.
+
+    Args:
+        dataset_dir (str): root directory for dataset.
+        image_dir (str): directory for images.
+        anno_path (str): annotation file path.
+        data_fields (list): key name of data dictionary, at least have 'image'.
+        sample_num (int): number of samples to load, -1 means all.
+        use_default_label (bool): whether to load default label list.
+    """
+
     def __init__(self,
                  dataset_dir=None,
                  image_dir=None,
@@ -43,6 +55,7 @@ class DetDataset(Dataset):
         self.sample_num = sample_num
         self.use_default_label = use_default_label
         self._epoch = 0
+        self._curr_iter = 0
 
     def __len__(self, ):
         return len(self.roidbs)
@@ -64,9 +77,19 @@ class DetDataset(Dataset):
                 copy.deepcopy(self.roidbs[np.random.randint(n)])
                 for _ in range(3)
             ]
+        if isinstance(roidb, Sequence):
+            for r in roidb:
+                r['curr_iter'] = self._curr_iter
+        else:
+            roidb['curr_iter'] = self._curr_iter
+        self._curr_iter += 1
 
         return self.transform(roidb)
 
+    def check_or_download_dataset(self):
+        self.dataset_dir = get_dataset_path(self.dataset_dir, self.anno_path,
+                                            self.image_dir)
+
     def set_kwargs(self, **kwargs):
         self.mixup_epoch = kwargs.get('mixup_epoch', -1)
         self.cutmix_epoch = kwargs.get('cutmix_epoch', -1)
@@ -125,6 +148,9 @@ class ImageFolder(DetDataset):
         self.roidbs = None
         self.sample_num = sample_num
 
+    def check_or_download_dataset(self):
+        return
+
     def parse_dataset(self, ):
         if not self.roidbs:
             self.roidbs = self._load_images()
diff --git a/ppdet/data/source/voc.py b/ppdet/data/source/voc.py
index 00d976ce0c944620b0d249d434b6fc023a4e4fb7..56b746c14cc23a60e148a4c84149a3553b48c927 100644
--- a/ppdet/data/source/voc.py
+++ b/ppdet/data/source/voc.py
@@ -38,6 +38,7 @@ class VOCDataSet(DetDataset):
         dataset_dir (str): root directory for dataset.
         image_dir (str): directory for images.
         anno_path (str): voc annotation file path.
+        data_fields (list): key name of data dictionary, at least have 'image'.
         sample_num (int): number of samples to load, -1 means all.
         label_list (str): if use_default_label is False, will load
             mapping between category and class index.
@@ -116,7 +117,12 @@ class VOCDataSet(DetDataset):
                 difficult = []
                 for i, obj in enumerate(objs):
                     cname = obj.find('name').text
-                    _difficult = int(obj.find('difficult').text)
+
+                    # user dataset may not contain difficult field
+                    _difficult = obj.find('difficult')
+                    _difficult = int(
+                        _difficult.text) if _difficult is not None else 0
+
                     x1 = float(obj.find('bndbox').find('xmin').text)
                     y1 = float(obj.find('bndbox').find('ymin').text)
                     x2 = float(obj.find('bndbox').find('xmax').text)
diff --git a/ppdet/data/source/widerface.py b/ppdet/data/source/widerface.py
index db2b74326b3eba182143631ea3498171f6983180..b1813b0e07035ad365b92d4d5f9125094ff015de 100644
--- a/ppdet/data/source/widerface.py
+++ b/ppdet/data/source/widerface.py
@@ -31,8 +31,10 @@ class WIDERFaceDataSet(DetDataset):
     Args:
         dataset_dir (str): root directory for dataset.
         image_dir (str): directory for images.
-        anno_path (str): root directory for voc annotation data
-        sample_num (int): number of samples to load, -1 means all
+        anno_path (str): WiderFace annotation data.
+        data_fields (list): key name of data dictionary, at least have 'image'.
+        sample_num (int): number of samples to load, -1 means all.
+        with_lmk (bool): whether to load face landmark keypoint labels.
     """
 
     def __init__(self,
diff --git a/ppdet/data/transform/autoaugment_utils.py b/ppdet/data/transform/autoaugment_utils.py
index 0cd8a04eef271f1417373691b53b0d7ba5392373..78e3bb36b3c2a750744101d46667dded539426c2 100644
--- a/ppdet/data/transform/autoaugment_utils.py
+++ b/ppdet/data/transform/autoaugment_utils.py
@@ -1453,19 +1453,19 @@ def _parse_policy_info(name, prob, level, replace_value, augmentation_hparams):
     # Check to see if prob is passed into function. This is used for operations
     # where we alter bboxes independently.
     # pytype:disable=wrong-arg-types
-    if 'prob' in inspect.getargspec(func)[0]:
+    if 'prob' in inspect.getfullargspec(func)[0]:
         args = tuple([prob] + list(args))
     # pytype:enable=wrong-arg-types
 
     # Add in replace arg if it is required for the function that is being called.
-    if 'replace' in inspect.getargspec(func)[0]:
+    if 'replace' in inspect.getfullargspec(func)[0]:
         # Make sure replace is the final argument
-        assert 'replace' == inspect.getargspec(func)[0][-1]
+        assert 'replace' == inspect.getfullargspec(func)[0][-1]
         args = tuple(list(args) + [replace_value])
 
     # Add bboxes as the second positional argument for the function if it does
     # not already exist.
-    if 'bboxes' not in inspect.getargspec(func)[0]:
+    if 'bboxes' not in inspect.getfullargspec(func)[0]:
         func = bbox_wrapper(func)
     return (func, prob, args)
 
@@ -1473,11 +1473,11 @@ def _parse_policy_info(name, prob, level, replace_value, augmentation_hparams):
 def _apply_func_with_prob(func, image, args, prob, bboxes):
     """Apply `func` to image w/ `args` as input with probability `prob`."""
     assert isinstance(args, tuple)
-    assert 'bboxes' == inspect.getargspec(func)[0][1]
+    assert 'bboxes' == inspect.getfullargspec(func)[0][1]
 
     # If prob is a function argument, then this randomness is being handled
     # inside the function, so make sure it is always called.
-    if 'prob' in inspect.getargspec(func)[0]:
+    if 'prob' in inspect.getfullargspec(func)[0]:
         prob = 1.0
 
     # Apply the function with probability `prob`.
diff --git a/ppdet/data/transform/batch_operators.py b/ppdet/data/transform/batch_operators.py
index bd99c6f9387c7d71335dd864f05b6347477008f0..3ae84774c714125f09875737b23b6d96ad0e46d7 100644
--- a/ppdet/data/transform/batch_operators.py
+++ b/ppdet/data/transform/batch_operators.py
@@ -27,16 +27,13 @@ from .operators import register_op, BaseOperator, Resize
 from .op_helper import jaccard_overlap, gaussian2D
 from scipy import ndimage
 
+from ppdet.modeling import bbox_utils
 from ppdet.utils.logger import setup_logger
 logger = setup_logger(__name__)
 
 __all__ = [
-    'PadBatch',
-    'BatchRandomResize',
-    'Gt2YoloTarget',
-    'Gt2FCOSTarget',
-    'Gt2TTFTarget',
-    'Gt2Solov2Target',
+    'PadBatch', 'BatchRandomResize', 'Gt2YoloTarget', 'Gt2FCOSTarget',
+    'Gt2TTFTarget', 'Gt2Solov2Target', 'RboxPadBatch'
 ]
 
 
@@ -118,6 +115,7 @@ class PadBatch(BaseOperator):
                 gt_box_data = -np.ones([gt_num_max, 4], dtype=np.float32)
                 gt_class_data = -np.ones([gt_num_max], dtype=np.int32)
                 is_crowd_data = np.ones([gt_num_max], dtype=np.int32)
+                difficult_data = np.ones([gt_num_max], dtype=np.int32)
 
                 if pad_mask:
                     poly_num_max = max(poly_num)
@@ -130,7 +128,12 @@ class PadBatch(BaseOperator):
                 gt_num = data['gt_bbox'].shape[0]
                 gt_box_data[0:gt_num, :] = data['gt_bbox']
                 gt_class_data[0:gt_num] = np.squeeze(data['gt_class'])
-                is_crowd_data[0:gt_num] = np.squeeze(data['is_crowd'])
+                if 'is_crowd' in data:
+                    is_crowd_data[0:gt_num] = np.squeeze(data['is_crowd'])
+                    data['is_crowd'] = is_crowd_data
+                if 'difficult' in data:
+                    difficult_data[0:gt_num] = np.squeeze(data['difficult'])
+                    data['difficult'] = difficult_data
                 if pad_mask:
                     for j, poly in enumerate(data['gt_poly']):
                         for k, p_p in enumerate(poly):
@@ -139,7 +142,6 @@ class PadBatch(BaseOperator):
                     data['gt_poly'] = gt_masks_data
                 data['gt_bbox'] = gt_box_data
                 data['gt_class'] = gt_class_data
-                data['is_crowd'] = is_crowd_data
 
         return samples
 
@@ -585,6 +587,11 @@ class Gt2TTFTarget(BaseOperator):
             sample['ttf_heatmap'] = heatmap
             sample['ttf_box_target'] = box_target
             sample['ttf_reg_weight'] = reg_weight
+            sample.pop('is_crowd')
+            sample.pop('gt_class')
+            sample.pop('gt_bbox')
+            if 'gt_score' in sample:
+                sample.pop('gt_score')
         return samples
 
     def draw_truncate_gaussian(self, heatmap, center, h_radius, w_radius):
@@ -787,3 +794,111 @@ class Gt2Solov2Target(BaseOperator):
                 data['grid_order{}'.format(idx)] = gt_grid_order
 
         return samples
+
+
+@register_op
+class RboxPadBatch(BaseOperator):
+    """
+    Pad a batch of samples so they can be divisible by a stride.
+    The layout of each image should be 'CHW'. And convert poly to rbox.
+    Args:
+        pad_to_stride (int): If `pad_to_stride > 0`, pad zeros to ensure
+            height and width is divisible by `pad_to_stride`.
+    """
+
+    def __init__(self, pad_to_stride=0, pad_gt=False):
+        super(RboxPadBatch, self).__init__()
+        self.pad_to_stride = pad_to_stride
+        self.pad_gt = pad_gt
+
+    def __call__(self, samples, context=None):
+        """
+        Args:
+            samples (list): a batch of sample, each is dict.
+        """
+        coarsest_stride = self.pad_to_stride
+
+        max_shape = np.array([data['image'].shape for data in samples]).max(
+            axis=0)
+        if coarsest_stride > 0:
+            max_shape[1] = int(
+                np.ceil(max_shape[1] / coarsest_stride) * coarsest_stride)
+            max_shape[2] = int(
+                np.ceil(max_shape[2] / coarsest_stride) * coarsest_stride)
+
+        for data in samples:
+            im = data['image']
+            im_c, im_h, im_w = im.shape[:]
+            padding_im = np.zeros(
+                (im_c, max_shape[1], max_shape[2]), dtype=np.float32)
+            padding_im[:, :im_h, :im_w] = im
+            data['image'] = padding_im
+            if 'semantic' in data and data['semantic'] is not None:
+                semantic = data['semantic']
+                padding_sem = np.zeros(
+                    (1, max_shape[1], max_shape[2]), dtype=np.float32)
+                padding_sem[:, :im_h, :im_w] = semantic
+                data['semantic'] = padding_sem
+            if 'gt_segm' in data and data['gt_segm'] is not None:
+                gt_segm = data['gt_segm']
+                padding_segm = np.zeros(
+                    (gt_segm.shape[0], max_shape[1], max_shape[2]),
+                    dtype=np.uint8)
+                padding_segm[:, :im_h, :im_w] = gt_segm
+                data['gt_segm'] = padding_segm
+        if self.pad_gt:
+            gt_num = []
+            if 'gt_poly' in data and data['gt_poly'] is not None and len(data[
+                    'gt_poly']) > 0:
+                pad_mask = True
+            else:
+                pad_mask = False
+
+            if pad_mask:
+                poly_num = []
+                poly_part_num = []
+                point_num = []
+            for data in samples:
+                gt_num.append(data['gt_bbox'].shape[0])
+                if pad_mask:
+                    poly_num.append(len(data['gt_poly']))
+                    for poly in data['gt_poly']:
+                        poly_part_num.append(int(len(poly)))
+                        for p_p in poly:
+                            point_num.append(int(len(p_p) / 2))
+            gt_num_max = max(gt_num)
+
+            for i, sample in enumerate(samples):
+                assert 'gt_rbox' in sample
+                assert 'gt_rbox2poly' in sample
+                gt_box_data = -np.ones([gt_num_max, 4], dtype=np.float32)
+                gt_class_data = -np.ones([gt_num_max], dtype=np.int32)
+                is_crowd_data = np.ones([gt_num_max], dtype=np.int32)
+
+                if pad_mask:
+                    poly_num_max = max(poly_num)
+                    poly_part_num_max = max(poly_part_num)
+                    point_num_max = max(point_num)
+                    gt_masks_data = -np.ones(
+                        [poly_num_max, poly_part_num_max, point_num_max, 2],
+                        dtype=np.float32)
+
+                gt_num = sample['gt_bbox'].shape[0]
+                gt_box_data[0:gt_num, :] = sample['gt_bbox']
+                gt_class_data[0:gt_num] = np.squeeze(sample['gt_class'])
+                is_crowd_data[0:gt_num] = np.squeeze(sample['is_crowd'])
+                if pad_mask:
+                    for j, poly in enumerate(sample['gt_poly']):
+                        for k, p_p in enumerate(poly):
+                            pp_np = np.array(p_p).reshape(-1, 2)
+                            gt_masks_data[j, k, :pp_np.shape[0], :] = pp_np
+                    sample['gt_poly'] = gt_masks_data
+                sample['gt_bbox'] = gt_box_data
+                sample['gt_class'] = gt_class_data
+                sample['is_crowd'] = is_crowd_data
+                # ploy to rbox
+                polys = sample['gt_rbox2poly']
+                rbox = bbox_utils.poly_to_rbox(polys)
+                sample['gt_rbox'] = rbox
+
+        return samples
diff --git a/ppdet/data/transform/gridmask_utils.py b/ppdet/data/transform/gridmask_utils.py
index a23e69b20860fe90c7a25472e11de770d238dd07..b0a27f015443701e6f690b96101d3d33fa3fbaaa 100644
--- a/ppdet/data/transform/gridmask_utils.py
+++ b/ppdet/data/transform/gridmask_utils.py
@@ -20,7 +20,7 @@ import numpy as np
 from PIL import Image
 
 
-class GridMask(object):
+class Gridmask(object):
     def __init__(self,
                  use_h=True,
                  use_w=True,
@@ -30,7 +30,7 @@ class GridMask(object):
                  mode=1,
                  prob=0.7,
                  upper_iter=360000):
-        super(GridMask, self).__init__()
+        super(Gridmask, self).__init__()
         self.use_h = use_h
         self.use_w = use_w
         self.rotate = rotate
@@ -45,7 +45,7 @@ class GridMask(object):
         self.prob = self.st_prob * min(1, 1.0 * curr_iter / self.upper_iter)
         if np.random.rand() > self.prob:
             return x
-        _, h, w = x.shape
+        h, w, _ = x.shape
         hh = int(1.5 * h)
         ww = int(1.5 * w)
         d = np.random.randint(2, h)
@@ -73,7 +73,7 @@ class GridMask(object):
 
         if self.mode == 1:
             mask = 1 - mask
-        mask = np.expand_dims(mask, axis=0)
+        mask = np.expand_dims(mask, axis=-1)
         if self.offset:
             offset = (2 * (np.random.rand(h, w) - 0.5)).astype(np.float32)
             x = (x * mask + offset * (1 - mask)).astype(x.dtype)
diff --git a/ppdet/data/transform/operators.py b/ppdet/data/transform/operators.py
index b07ee0cba1dfa22b7114f0054d3844df614a500e..932c7971f3551ad432b0aa893f6c398b12b7d4bb 100644
--- a/ppdet/data/transform/operators.py
+++ b/ppdet/data/transform/operators.py
@@ -39,6 +39,7 @@ from PIL import Image, ImageEnhance, ImageDraw
 
 from ppdet.core.workspace import serializable
 from ppdet.modeling.layers import AnchorGrid
+from ppdet.modeling import bbox_utils
 
 from .op_helper import (satisfy_sample_constraint, filter_and_process,
                         generate_sample_bbox, clip_bbox, data_anchor_sampling,
@@ -165,7 +166,7 @@ class Permute(BaseOperator):
 @register_op
 class Lighting(BaseOperator):
     """
-    Lighting the imagen by eigenvalues and eigenvectors
+    Lighting the image by eigenvalues and eigenvectors
     Args:
         eigval (list): eigenvalues
         eigvec (list): eigenvectors
@@ -308,8 +309,8 @@ class GridMask(BaseOperator):
         self.prob = prob
         self.upper_iter = upper_iter
 
-        from .gridmask_utils import GridMask
-        self.gridmask_op = GridMask(
+        from .gridmask_utils import Gridmask
+        self.gridmask_op = Gridmask(
             use_h,
             use_w,
             rotate=rotate,
@@ -536,6 +537,18 @@ class RandomFlip(BaseOperator):
         bbox[:, 2] = width - oldx1
         return bbox
 
+    def apply_rbox(self, bbox, width):
+        oldx1 = bbox[:, 0].copy()
+        oldx2 = bbox[:, 2].copy()
+        oldx3 = bbox[:, 4].copy()
+        oldx4 = bbox[:, 6].copy()
+        bbox[:, 0] = width - oldx1
+        bbox[:, 2] = width - oldx2
+        bbox[:, 4] = width - oldx3
+        bbox[:, 6] = width - oldx4
+        bbox = [bbox_utils.get_best_begin_point_single(e) for e in bbox]
+        return bbox
+
     def apply(self, sample, context=None):
         """Filp the image and bounding box.
         Operators:
@@ -567,6 +580,10 @@ class RandomFlip(BaseOperator):
             if 'gt_segm' in sample and sample['gt_segm'].any():
                 sample['gt_segm'] = sample['gt_segm'][:, :, ::-1]
 
+            if 'gt_rbox2poly' in sample and sample['gt_rbox2poly'].any():
+                sample['gt_rbox2poly'] = self.apply_rbox(sample['gt_rbox2poly'],
+                                                         width)
+
             sample['flipped'] = True
             sample['image'] = im
         return sample
@@ -704,6 +721,16 @@ class Resize(BaseOperator):
                                                 [im_scale_x, im_scale_y],
                                                 [resize_w, resize_h])
 
+        # apply rbox
+        if 'gt_rbox2poly' in sample:
+            if np.array(sample['gt_rbox2poly']).shape[1] != 8:
+                logger.warn(
+                    "gt_rbox2poly's length shoule be 8, but actually is {}".
+                    format(len(sample['gt_rbox2poly'])))
+            sample['gt_rbox2poly'] = self.apply_bbox(sample['gt_rbox2poly'],
+                                                     [im_scale_x, im_scale_y],
+                                                     [resize_w, resize_h])
+
         # apply polygon
         if 'gt_poly' in sample and len(sample['gt_poly']) > 0:
             sample['gt_poly'] = self.apply_segm(sample['gt_poly'], im_shape[:2],
@@ -1490,14 +1517,14 @@ class Cutmix(BaseOperator):
         bbx2 = np.clip(cx + cut_w // 2, 0, w - 1)
         bby2 = np.clip(cy + cut_h // 2, 0, h - 1)
 
-        img_1 = np.zeros((h, w, img1.shape[2]), 'float32')
-        img_1[:img1.shape[0], :img1.shape[1], :] = \
+        img_1_pad = np.zeros((h, w, img1.shape[2]), 'float32')
+        img_1_pad[:img1.shape[0], :img1.shape[1], :] = \
             img1.astype('float32')
-        img_2 = np.zeros((h, w, img2.shape[2]), 'float32')
-        img_2[:img2.shape[0], :img2.shape[1], :] = \
+        img_2_pad = np.zeros((h, w, img2.shape[2]), 'float32')
+        img_2_pad[:img2.shape[0], :img2.shape[1], :] = \
             img2.astype('float32')
-        img_1[bby1:bby2, bbx1:bbx2, :] = img2[bby1:bby2, bbx1:bbx2, :]
-        return img_1
+        img_1_pad[bby1:bby2, bbx1:bbx2, :] = img_2_pad[bby1:bby2, bbx1:bbx2, :]
+        return img_1_pad
 
     def __call__(self, sample, context=None):
         if not isinstance(sample, Sequence):
@@ -1520,16 +1547,27 @@ class Cutmix(BaseOperator):
         gt_class1 = sample[0]['gt_class']
         gt_class2 = sample[1]['gt_class']
         gt_class = np.concatenate((gt_class1, gt_class2), axis=0)
-        gt_score1 = sample[0]['gt_score']
-        gt_score2 = sample[1]['gt_score']
+        gt_score1 = np.ones_like(sample[0]['gt_class'])
+        gt_score2 = np.ones_like(sample[1]['gt_class'])
         gt_score = np.concatenate(
             (gt_score1 * factor, gt_score2 * (1. - factor)), axis=0)
-        sample = sample[0]
-        sample['image'] = img
-        sample['gt_bbox'] = gt_bbox
-        sample['gt_score'] = gt_score
-        sample['gt_class'] = gt_class
-        return sample
+        result = copy.deepcopy(sample[0])
+        result['image'] = img
+        result['gt_bbox'] = gt_bbox
+        result['gt_score'] = gt_score
+        result['gt_class'] = gt_class
+        if 'is_crowd' in sample[0]:
+            is_crowd1 = sample[0]['is_crowd']
+            is_crowd2 = sample[1]['is_crowd']
+            is_crowd = np.concatenate((is_crowd1, is_crowd2), axis=0)
+            result['is_crowd'] = is_crowd
+        if 'difficult' in sample[0]:
+            is_difficult1 = sample[0]['difficult']
+            is_difficult2 = sample[1]['difficult']
+            is_difficult = np.concatenate(
+                (is_difficult1, is_difficult2), axis=0)
+            result['difficult'] = is_difficult
+        return result
 
 
 @register_op
@@ -1774,12 +1812,13 @@ class Pad(BaseOperator):
                  offsets=None,
                  fill_value=(127.5, 127.5, 127.5)):
         """
-        Pad image to a specified size or multiple of size_divisor. random target_size and interpolation method
+        Pad image to a specified size or multiple of size_divisor.
         Args:
             size (int, Sequence): image target size, if None, pad to multiple of size_divisor, default None
             size_divisor (int): size divisor, default 32
             pad_mode (int): pad mode, currently only supports four modes [-1, 0, 1, 2]. if -1, use specified offsets
                 if 0, only pad to right and bottom. if 1, pad according to center. if 2, only pad left and top
+            offsets (list): [offset_x, offset_y], specify offset while padding, only supported pad_mode=-1
             fill_value (bool): rgb value of pad area, default (127.5, 127.5, 127.5)
         """
         super(Pad, self).__init__()
@@ -1933,3 +1972,30 @@ class Poly2Mask(BaseOperator):
         ]
         sample['gt_segm'] = np.asarray(masks).astype(np.uint8)
         return sample
+
+
+@register_op
+class Rbox2Poly(BaseOperator):
+    """
+    Convert rbbox format to poly format.
+    """
+
+    def __init__(self):
+        super(Rbox2Poly, self).__init__()
+
+    def apply(self, sample, context=None):
+        assert 'gt_rbox' in sample
+        assert sample['gt_rbox'].shape[1] == 5
+        rrects = sample['gt_rbox']
+        x_ctr = rrects[:, 0]
+        y_ctr = rrects[:, 1]
+        width = rrects[:, 2]
+        height = rrects[:, 3]
+        x1 = x_ctr - width / 2.0
+        y1 = y_ctr - height / 2.0
+        x2 = x_ctr + width / 2.0
+        y2 = y_ctr + height / 2.0
+        sample['gt_bbox'] = np.stack([x1, y1, x2, y2], axis=1)
+        polys = bbox_utils.rbox2poly(rrects)
+        sample['gt_rbox2poly'] = polys
+        return sample
diff --git a/ppdet/engine/callbacks.py b/ppdet/engine/callbacks.py
index 1419661ca6e5213f0012ca04f60964ac5092998a..0798b91f3216977577d3026f9bbfa22486b65950 100644
--- a/ppdet/engine/callbacks.py
+++ b/ppdet/engine/callbacks.py
@@ -23,10 +23,9 @@ import six
 import numpy as np
 
 import paddle
-from paddle.distributed import ParallelEnv
+import paddle.distributed as dist
 
 from ppdet.utils.checkpoint import save_model
-from ppdet.optimizer import ModelEMA
 
 from ppdet.utils.logger import setup_logger
 logger = setup_logger('ppdet.engine')
@@ -81,7 +80,7 @@ class LogPrinter(Callback):
         super(LogPrinter, self).__init__(model)
 
     def on_step_end(self, status):
-        if ParallelEnv().nranks < 2 or ParallelEnv().local_rank == 0:
+        if dist.get_world_size() < 2 or dist.get_rank() == 0:
             mode = status['mode']
             if mode == 'train':
                 epoch_id = status['epoch_id']
@@ -129,7 +128,7 @@ class LogPrinter(Callback):
                     logger.info("Eval iter: {}".format(step_id))
 
     def on_epoch_end(self, status):
-        if ParallelEnv().nranks < 2 or ParallelEnv().local_rank == 0:
+        if dist.get_world_size() < 2 or dist.get_rank() == 0:
             mode = status['mode']
             if mode == 'eval':
                 sample_num = status['sample_num']
@@ -143,16 +142,12 @@ class Checkpointer(Callback):
         super(Checkpointer, self).__init__(model)
         cfg = self.model.cfg
         self.best_ap = 0.
-        self.use_ema = ('use_ema' in cfg and cfg['use_ema'])
         self.save_dir = os.path.join(self.model.cfg.save_dir,
                                      self.model.cfg.filename)
-        if self.use_ema:
-            self.ema = ModelEMA(
-                cfg['ema_decay'], self.model.model, use_thres_step=True)
-
-    def on_step_end(self, status):
-        if self.use_ema:
-            self.ema.update(self.model.model)
+        if hasattr(self.model.model, 'student_model'):
+            self.weight = self.model.model.student_model
+        else:
+            self.weight = self.model.model
 
     def on_epoch_end(self, status):
         # Checkpointer only performed during training
@@ -160,28 +155,27 @@ class Checkpointer(Callback):
         epoch_id = status['epoch_id']
         weight = None
         save_name = None
-        if ParallelEnv().nranks < 2 or ParallelEnv().local_rank == 0:
+        if dist.get_world_size() < 2 or dist.get_rank() == 0:
             if mode == 'train':
                 end_epoch = self.model.cfg.epoch
                 if epoch_id % self.model.cfg.snapshot_epoch == 0 or epoch_id == end_epoch - 1:
                     save_name = str(
                         epoch_id) if epoch_id != end_epoch - 1 else "model_final"
-                    if self.use_ema:
-                        weight = self.ema.apply()
-                    else:
-                        weight = self.model.model
+                    weight = self.weight
             elif mode == 'eval':
                 if 'save_best_model' in status and status['save_best_model']:
                     for metric in self.model._metrics:
                         map_res = metric.get_results()
                         key = 'bbox' if 'bbox' in map_res else 'mask'
+                        if key not in map_res:
+                            logger.warn("Evaluation results empty, this may be due to " \
+                                        "training iterations being too few or not " \
+                                        "loading the correct weights.")
+                            return
                         if map_res[key][0] > self.best_ap:
                             self.best_ap = map_res[key][0]
                             save_name = 'best_model'
-                            if self.use_ema:
-                                weight = self.ema.apply()
-                            else:
-                                weight = self.model.model
+                            weight = self.weight
                         logger.info("Best test {} ap is {:0.3f}.".format(
                             key, self.best_ap))
             if weight:
@@ -224,7 +218,7 @@ class VisualDLWriter(Callback):
 
     def on_step_end(self, status):
         mode = status['mode']
-        if ParallelEnv().nranks < 2 or ParallelEnv().local_rank == 0:
+        if dist.get_world_size() < 2 or dist.get_rank() == 0:
             if mode == 'train':
                 training_staus = status['training_staus']
                 for loss_name, loss_value in training_staus.get().items():
@@ -248,7 +242,7 @@ class VisualDLWriter(Callback):
 
     def on_epoch_end(self, status):
         mode = status['mode']
-        if ParallelEnv().nranks < 2 or ParallelEnv().local_rank == 0:
+        if dist.get_world_size() < 2 or dist.get_rank() == 0:
             if mode == 'eval':
                 for metric in self.model._metrics:
                     for key, map_value in metric.get_results().items():
diff --git a/ppdet/engine/env.py b/ppdet/engine/env.py
index ba0b7edd61bf39d5df9e647cefdded867f6ca86f..cfeea08c98c081083033120c9d3fbb5c02efdd35 100644
--- a/ppdet/engine/env.py
+++ b/ppdet/engine/env.py
@@ -21,7 +21,7 @@ import random
 import numpy as np
 
 import paddle
-from paddle.distributed import ParallelEnv, fleet
+from paddle.distributed import fleet
 
 __all__ = ['init_parallel_env', 'set_random_seed', 'init_fleet_env']
 
diff --git a/ppdet/engine/export_utils.py b/ppdet/engine/export_utils.py
index 7fa2403d53c345fc2c1b52830fab122954af3403..744775cd981502dfc64a4695305ac9858e6cd195 100644
--- a/ppdet/engine/export_utils.py
+++ b/ppdet/engine/export_utils.py
@@ -20,7 +20,7 @@ import os
 import yaml
 from collections import OrderedDict
 
-from ppdet.metrics import get_categories
+from ppdet.data.source.category import get_categories
 
 from ppdet.utils.logger import setup_logger
 logger = setup_logger('ppdet.engine')
@@ -28,9 +28,10 @@ logger = setup_logger('ppdet.engine')
 # Global dictionary
 TRT_MIN_SUBGRAPH = {
     'YOLO': 3,
-    'SSD': 40,
+    'SSD': 60,
     'RCNN': 40,
     'RetinaNet': 40,
+    'S2ANet': 40,
     'EfficientDet': 40,
     'Face': 3,
     'TTFNet': 3,
diff --git a/ppdet/engine/trainer.py b/ppdet/engine/trainer.py
index 57c16e4b8c74fd2fa65ef2b086b42a7224188c60..2b17cdea94c97c3d43a784fb79ab2ab7ba08bede 100644
--- a/ppdet/engine/trainer.py
+++ b/ppdet/engine/trainer.py
@@ -24,14 +24,17 @@ import numpy as np
 from PIL import Image
 
 import paddle
-from paddle.distributed import ParallelEnv, fleet
+import paddle.distributed as dist
+from paddle.distributed import fleet
 from paddle import amp
 from paddle.static import InputSpec
+from ppdet.optimizer import ModelEMA
 
 from ppdet.core.workspace import create
 from ppdet.utils.checkpoint import load_weight, load_pretrain_weight
-from ppdet.utils.visualizer import visualize_results
-from ppdet.metrics import Metric, COCOMetric, VOCMetric, WiderFaceMetric, get_categories, get_infer_results
+from ppdet.utils.visualizer import visualize_results, save_result
+from ppdet.metrics import Metric, COCOMetric, VOCMetric, WiderFaceMetric, get_infer_results
+from ppdet.data.source.category import get_categories
 import ppdet.utils.stats as stats
 
 from .callbacks import Callback, ComposeCallback, LogPrinter, Checkpointer, WiferFaceEval, VisualDLWriter
@@ -50,17 +53,19 @@ class Trainer(object):
                 "mode should be 'train', 'eval' or 'test'"
         self.mode = mode.lower()
         self.optimizer = None
-        self.slim = None
+        self.is_loaded_weights = False
 
         # build model
-        self.model = create(cfg.architecture)
+        if 'model' not in self.cfg:
+            self.model = create(cfg.architecture)
+        else:
+            self.model = self.cfg.model
+            self.is_loaded_weights = True
 
-        # model slim build
-        if 'slim' in cfg and cfg.slim:
-            if self.mode == 'train':
-                self.load_weights(cfg.pretrain_weights)
-            self.slim = create(cfg.slim)
-            self.slim(self.model)
+        self.use_ema = ('use_ema' in cfg and cfg['use_ema'])
+        if self.use_ema:
+            self.ema = ModelEMA(
+                cfg['ema_decay'], self.model, use_thres_step=True)
 
         # build data loader
         self.dataset = cfg['{}Dataset'.format(self.mode.capitalize())]
@@ -83,8 +88,8 @@ class Trainer(object):
             self.optimizer = create('OptimizerBuilder')(self.lr,
                                                         self.model.parameters())
 
-        self._nranks = ParallelEnv().nranks
-        self._local_rank = ParallelEnv().local_rank
+        self._nranks = dist.get_world_size()
+        self._local_rank = dist.get_rank()
 
         self.status = {}
 
@@ -116,8 +121,8 @@ class Trainer(object):
             self._callbacks = []
             self._compose_callback = None
 
-    def _init_metrics(self):
-        if self.mode == 'test':
+    def _init_metrics(self, validate=False):
+        if self.mode == 'test' or (self.mode == 'train' and not validate):
             self._metrics = []
             return
         classwise = self.cfg['classwise'] if 'classwise' in self.cfg else False
@@ -126,12 +131,30 @@ class Trainer(object):
             bias = self.cfg['bias'] if 'bias' in self.cfg else 0
             output_eval = self.cfg['output_eval'] \
                 if 'output_eval' in self.cfg else None
+            save_prediction_only = self.cfg['save_prediction_only'] \
+                if 'save_prediction_only' in self.cfg else False
+
+            # pass clsid2catid info to metric instance to avoid multiple loading
+            # annotation file
+            clsid2catid = {v: k for k, v in self.dataset.catid2clsid.items()} \
+                                if self.mode == 'eval' else None
+
+            # when do validation in train, annotation file should be get from
+            # EvalReader instead of self.dataset(which is TrainReader)
+            anno_file = self.dataset.get_anno()
+            if self.mode == 'train' and validate:
+                eval_dataset = self.cfg['EvalDataset']
+                eval_dataset.check_or_download_dataset()
+                anno_file = eval_dataset.get_anno()
+
             self._metrics = [
                 COCOMetric(
-                    anno_file=self.dataset.get_anno(),
+                    anno_file=anno_file,
+                    clsid2catid=clsid2catid,
                     classwise=classwise,
                     output_eval=output_eval,
-                    bias=bias)
+                    bias=bias,
+                    save_prediction_only=save_prediction_only)
             ]
         elif self.cfg.metric == 'VOC':
             self._metrics = [
@@ -175,17 +198,29 @@ class Trainer(object):
         self._metrics.extend(metrics)
 
     def load_weights(self, weights):
+        if self.is_loaded_weights:
+            return
         self.start_epoch = 0
         load_pretrain_weight(self.model, weights)
         logger.debug("Load weights {} to start training".format(weights))
 
     def resume_weights(self, weights):
-        self.start_epoch = load_weight(self.model, weights, self.optimizer)
+        # support Distill resume weights
+        if hasattr(self.model, 'student_model'):
+            self.start_epoch = load_weight(self.model.student_model, weights,
+                                           self.optimizer)
+        else:
+            self.start_epoch = load_weight(self.model, weights, self.optimizer)
         logger.debug("Resume weights of epoch {}".format(self.start_epoch))
 
     def train(self, validate=False):
         assert self.mode == 'train', "Model not in 'train' mode"
 
+        # if validation in training is enabled, metrics should be re-init
+        if validate:
+            self._init_metrics(validate=validate)
+            self._reset_metrics()
+
         model = self.model
         if self.cfg.fleet:
             model = fleet.distributed_model(model)
@@ -252,8 +287,15 @@ class Trainer(object):
 
                 self.status['batch_time'].update(time.time() - iter_tic)
                 self._compose_callback.on_step_end(self.status)
+                if self.use_ema:
+                    self.ema.update(self.model)
                 iter_tic = time.time()
 
+            # apply ema weight on model
+            if self.use_ema:
+                weight = self.model.state_dict()
+                self.model.set_dict(self.ema.apply())
+
             self._compose_callback.on_epoch_end(self.status)
 
             if validate and (self._nranks < 2 or self._local_rank == 0) \
@@ -274,6 +316,10 @@ class Trainer(object):
                     self.status['save_best_model'] = True
                     self._eval_with_loader(self._eval_loader)
 
+            # restore origin weight on model
+            if self.use_ema:
+                self.model.set_dict(weight)
+
     def _eval_with_loader(self, loader):
         sample_num = 0
         tic = time.time()
@@ -307,7 +353,11 @@ class Trainer(object):
     def evaluate(self):
         self._eval_with_loader(self.loader)
 
-    def predict(self, images, draw_threshold=0.5, output_dir='output'):
+    def predict(self,
+                images,
+                draw_threshold=0.5,
+                output_dir='output',
+                save_txt=False):
         self.dataset.set_images(images)
         loader = create('TestReader')(self.dataset, 0)
 
@@ -343,6 +393,7 @@ class Trainer(object):
                         if 'mask' in batch_res else None
                 segm_res = batch_res['segm'][start:end] \
                         if 'segm' in batch_res else None
+
                 image = visualize_results(image, bbox_res, mask_res, segm_res,
                                           int(outs['im_id']), catid2name,
                                           draw_threshold)
@@ -354,6 +405,9 @@ class Trainer(object):
                 logger.info("Detection bbox results save in {}".format(
                     save_name))
                 image.save(save_name, quality=95)
+                if save_txt:
+                    save_path = os.path.splitext(save_name)[0] + '.txt'
+                    save_result(save_path, bbox_res, catid2name, draw_threshold)
                 start = end
 
     def _get_save_image_name(self, output_dir, image_path):
@@ -397,7 +451,7 @@ class Trainer(object):
         }]
 
         # dy2st and save model
-        if self.slim is None or self.cfg['slim'] != 'QAT':
+        if 'slim' not in self.cfg or self.cfg['slim'] != 'QAT':
             static_model = paddle.jit.to_static(
                 self.model, input_spec=input_spec)
             # NOTE: dy2st do not pruned program, but jit.save will prune program
@@ -411,7 +465,7 @@ class Trainer(object):
                 input_spec=pruned_input_spec)
             logger.info("Export model and saved in {}".format(save_dir))
         else:
-            self.slim.save_quantized_model(
+            self.cfg.slim.save_quantized_model(
                 self.model,
                 os.path.join(save_dir, 'model'),
                 input_spec=input_spec)
diff --git a/ppdet/ext_op/README.md b/ppdet/ext_op/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..7ada0acf7fd75266fed6c66a9a010debc645bee8
--- /dev/null
+++ b/ppdet/ext_op/README.md
@@ -0,0 +1,38 @@
+# 自定义OP编译
+旋转框IOU计算OP是参考[自定义外部算子](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/07_new_op/new_custom_op.html) 。
+
+## 1. 环境依赖
+- Paddle >= 2.0.1
+- gcc 8.2
+
+## 2. 安装
+```
+python3.7 setup.py install
+```
+
+按照如下方式使用
+```
+# 引入自定义op
+from rbox_iou_ops import rbox_iou
+
+paddle.set_device('gpu:0')
+paddle.disable_static()
+
+rbox1 = np.random.rand(13000, 5)
+rbox2 = np.random.rand(7, 5)
+
+pd_rbox1 = paddle.to_tensor(rbox1)
+pd_rbox2 = paddle.to_tensor(rbox2)
+
+iou = rbox_iou(pd_rbox1, pd_rbox2)
+print('iou', iou)
+```
+
+## 3. 单元测试
+单元测试`test.py`文件中，通过对比python实现的结果和测试自定义op结果。
+
+由于python计算细节与cpp计算细节略有区别，误差区间设置为0.02。
+```
+python3.7 test.py
+```
+提示`rbox_iou OP compute right!`说明OP测试通过。
diff --git a/ppdet/ext_op/rbox_iou_op.cc b/ppdet/ext_op/rbox_iou_op.cc
new file mode 100644
index 0000000000000000000000000000000000000000..05890fd2bf7fe2e10299e608a7fb852b175f3507
--- /dev/null
+++ b/ppdet/ext_op/rbox_iou_op.cc
@@ -0,0 +1,46 @@
+/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License. */
+
+#include "paddle/extension.h"
+
+#include <vector>
+
+std::vector<paddle::Tensor> RboxIouCPUForward(const paddle::Tensor& rbox1, const paddle::Tensor& rbox2);
+std::vector<paddle::Tensor> RboxIouCUDAForward(const paddle::Tensor& rbox1, const paddle::Tensor& rbox2);
+
+
+#define CHECK_INPUT_SAME(x1, x2) PD_CHECK(x1.place() == x2.place(), "input must be smae pacle.")
+std::vector<paddle::Tensor> RboxIouForward(const paddle::Tensor& rbox1, const paddle::Tensor& rbox2) {
+    CHECK_INPUT_SAME(rbox1, rbox2);
+    if (rbox1.place() == paddle::PlaceType::kCPU) {
+        return RboxIouCPUForward(rbox1, rbox2);
+    }
+    else if (rbox1.place() == paddle::PlaceType::kGPU) {
+        return RboxIouCUDAForward(rbox1, rbox2);
+    }
+}
+
+std::vector<std::vector<int64_t>> InferShape(std::vector<int64_t> rbox1_shape, std::vector<int64_t> rbox2_shape) {
+    return {{rbox1_shape[0], rbox2_shape[0]}};
+}
+
+std::vector<paddle::DataType> InferDtype(paddle::DataType t1, paddle::DataType t2) {
+    return {t1};
+}
+
+PD_BUILD_OP(rbox_iou)
+    .Inputs({"RBOX1", "RBOX2"})
+    .Outputs({"Output"})
+    .SetKernelFn(PD_KERNEL(RboxIouForward))
+    .SetInferShapeFn(PD_INFER_SHAPE(InferShape))
+    .SetInferDtypeFn(PD_INFER_DTYPE(InferDtype));
diff --git a/ppdet/ext_op/rbox_iou_op.cu b/ppdet/ext_op/rbox_iou_op.cu
new file mode 100644
index 0000000000000000000000000000000000000000..0581f782f3ed1e2e8c7be81f58adc88a431387d8
--- /dev/null
+++ b/ppdet/ext_op/rbox_iou_op.cu
@@ -0,0 +1,507 @@
+/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License. */
+
+
+#include <cassert>
+#include <cmath>
+
+#ifdef __CUDACC__
+// Designates functions callable from the host (CPU) and the device (GPU)
+#define HOST_DEVICE __host__ __device__
+#define HOST_DEVICE_INLINE HOST_DEVICE __forceinline__
+#else
+#include <algorithm>
+#define HOST_DEVICE
+#define HOST_DEVICE_INLINE HOST_DEVICE inline
+#endif
+
+#include "paddle/extension.h"
+
+#include <vector>
+
+namespace {
+
+template <typename T>
+struct RotatedBox {
+  T x_ctr, y_ctr, w, h, a;
+};
+
+template <typename T>
+struct Point {
+  T x, y;
+  HOST_DEVICE_INLINE Point(const T& px = 0, const T& py = 0) : x(px), y(py) {}
+  HOST_DEVICE_INLINE Point operator+(const Point& p) const {
+    return Point(x + p.x, y + p.y);
+  }
+  HOST_DEVICE_INLINE Point& operator+=(const Point& p) {
+    x += p.x;
+    y += p.y;
+    return *this;
+  }
+  HOST_DEVICE_INLINE Point operator-(const Point& p) const {
+    return Point(x - p.x, y - p.y);
+  }
+  HOST_DEVICE_INLINE Point operator*(const T coeff) const {
+    return Point(x * coeff, y * coeff);
+  }
+};
+
+template <typename T>
+HOST_DEVICE_INLINE T dot_2d(const Point<T>& A, const Point<T>& B) {
+  return A.x * B.x + A.y * B.y;
+}
+
+template <typename T>
+HOST_DEVICE_INLINE T cross_2d(const Point<T>& A, const Point<T>& B) {
+  return A.x * B.y - B.x * A.y;
+}
+
+template <typename T>
+HOST_DEVICE_INLINE void get_rotated_vertices(
+    const RotatedBox<T>& box,
+    Point<T> (&pts)[4]) {
+  // M_PI / 180. == 0.01745329251
+  //double theta = box.a * 0.01745329251;
+  //MODIFIED
+  double theta = box.a;
+  T cosTheta2 = (T)cos(theta) * 0.5f;
+  T sinTheta2 = (T)sin(theta) * 0.5f;
+
+  // y: top --> down; x: left --> right
+  pts[0].x = box.x_ctr - sinTheta2 * box.h - cosTheta2 * box.w;
+  pts[0].y = box.y_ctr + cosTheta2 * box.h - sinTheta2 * box.w;
+  pts[1].x = box.x_ctr + sinTheta2 * box.h - cosTheta2 * box.w;
+  pts[1].y = box.y_ctr - cosTheta2 * box.h - sinTheta2 * box.w;
+  pts[2].x = 2 * box.x_ctr - pts[0].x;
+  pts[2].y = 2 * box.y_ctr - pts[0].y;
+  pts[3].x = 2 * box.x_ctr - pts[1].x;
+  pts[3].y = 2 * box.y_ctr - pts[1].y;
+}
+
+template <typename T>
+HOST_DEVICE_INLINE int get_intersection_points(
+    const Point<T> (&pts1)[4],
+    const Point<T> (&pts2)[4],
+    Point<T> (&intersections)[24]) {
+  // Line vector
+  // A line from p1 to p2 is: p1 + (p2-p1)*t, t=[0,1]
+  Point<T> vec1[4], vec2[4];
+  for (int i = 0; i < 4; i++) {
+    vec1[i] = pts1[(i + 1) % 4] - pts1[i];
+    vec2[i] = pts2[(i + 1) % 4] - pts2[i];
+  }
+
+  // Line test - test all line combos for intersection
+  int num = 0; // number of intersections
+  for (int i = 0; i < 4; i++) {
+    for (int j = 0; j < 4; j++) {
+      // Solve for 2x2 Ax=b
+      T det = cross_2d<T>(vec2[j], vec1[i]);
+
+      // This takes care of parallel lines
+      if (fabs(det) <= 1e-14) {
+        continue;
+      }
+
+      auto vec12 = pts2[j] - pts1[i];
+
+      T t1 = cross_2d<T>(vec2[j], vec12) / det;
+      T t2 = cross_2d<T>(vec1[i], vec12) / det;
+
+      if (t1 >= 0.0f && t1 <= 1.0f && t2 >= 0.0f && t2 <= 1.0f) {
+        intersections[num++] = pts1[i] + vec1[i] * t1;
+      }
+    }
+  }
+
+  // Check for vertices of rect1 inside rect2
+  {
+    const auto& AB = vec2[0];
+    const auto& DA = vec2[3];
+    auto ABdotAB = dot_2d<T>(AB, AB);
+    auto ADdotAD = dot_2d<T>(DA, DA);
+    for (int i = 0; i < 4; i++) {
+      // assume ABCD is the rectangle, and P is the point to be judged
+      // P is inside ABCD iff. P's projection on AB lies within AB
+      // and P's projection on AD lies within AD
+
+      auto AP = pts1[i] - pts2[0];
+
+      auto APdotAB = dot_2d<T>(AP, AB);
+      auto APdotAD = -dot_2d<T>(AP, DA);
+
+      if ((APdotAB >= 0) && (APdotAD >= 0) && (APdotAB <= ABdotAB) &&
+          (APdotAD <= ADdotAD)) {
+        intersections[num++] = pts1[i];
+      }
+    }
+  }
+
+  // Reverse the check - check for vertices of rect2 inside rect1
+  {
+    const auto& AB = vec1[0];
+    const auto& DA = vec1[3];
+    auto ABdotAB = dot_2d<T>(AB, AB);
+    auto ADdotAD = dot_2d<T>(DA, DA);
+    for (int i = 0; i < 4; i++) {
+      auto AP = pts2[i] - pts1[0];
+
+      auto APdotAB = dot_2d<T>(AP, AB);
+      auto APdotAD = -dot_2d<T>(AP, DA);
+
+      if ((APdotAB >= 0) && (APdotAD >= 0) && (APdotAB <= ABdotAB) &&
+          (APdotAD <= ADdotAD)) {
+        intersections[num++] = pts2[i];
+      }
+    }
+  }
+
+  return num;
+}
+
+template <typename T>
+HOST_DEVICE_INLINE int convex_hull_graham(
+    const Point<T> (&p)[24],
+    const int& num_in,
+    Point<T> (&q)[24],
+    bool shift_to_zero = false) {
+  assert(num_in >= 2);
+
+  // Step 1:
+  // Find point with minimum y
+  // if more than 1 points have the same minimum y,
+  // pick the one with the minimum x.
+  int t = 0;
+  for (int i = 1; i < num_in; i++) {
+    if (p[i].y < p[t].y || (p[i].y == p[t].y && p[i].x < p[t].x)) {
+      t = i;
+    }
+  }
+  auto& start = p[t]; // starting point
+
+  // Step 2:
+  // Subtract starting point from every points (for sorting in the next step)
+  for (int i = 0; i < num_in; i++) {
+    q[i] = p[i] - start;
+  }
+
+  // Swap the starting point to position 0
+  auto tmp = q[0];
+  q[0] = q[t];
+  q[t] = tmp;
+
+  // Step 3:
+  // Sort point 1 ~ num_in according to their relative cross-product values
+  // (essentially sorting according to angles)
+  // If the angles are the same, sort according to their distance to origin
+  T dist[24];
+  for (int i = 0; i < num_in; i++) {
+    dist[i] = dot_2d<T>(q[i], q[i]);
+  }
+
+#ifdef __CUDACC__
+  // CUDA version
+  // In the future, we can potentially use thrust
+  // for sorting here to improve speed (though not guaranteed)
+  for (int i = 1; i < num_in - 1; i++) {
+    for (int j = i + 1; j < num_in; j++) {
+      T crossProduct = cross_2d<T>(q[i], q[j]);
+      if ((crossProduct < -1e-6) ||
+          (fabs(crossProduct) < 1e-6 && dist[i] > dist[j])) {
+        auto q_tmp = q[i];
+        q[i] = q[j];
+        q[j] = q_tmp;
+        auto dist_tmp = dist[i];
+        dist[i] = dist[j];
+        dist[j] = dist_tmp;
+      }
+    }
+  }
+#else
+  // CPU version
+  std::sort(
+      q + 1, q + num_in, [](const Point<T>& A, const Point<T>& B) -> bool {
+        T temp = cross_2d<T>(A, B);
+        if (fabs(temp) < 1e-6) {
+          return dot_2d<T>(A, A) < dot_2d<T>(B, B);
+        } else {
+          return temp > 0;
+        }
+      });
+#endif
+
+  // Step 4:
+  // Make sure there are at least 2 points (that don't overlap with each other)
+  // in the stack
+  int k; // index of the non-overlapped second point
+  for (k = 1; k < num_in; k++) {
+    if (dist[k] > 1e-8) {
+      break;
+    }
+  }
+  if (k == num_in) {
+    // We reach the end, which means the convex hull is just one point
+    q[0] = p[t];
+    return 1;
+  }
+  q[1] = q[k];
+  int m = 2; // 2 points in the stack
+  // Step 5:
+  // Finally we can start the scanning process.
+  // When a non-convex relationship between the 3 points is found
+  // (either concave shape or duplicated points),
+  // we pop the previous point from the stack
+  // until the 3-point relationship is convex again, or
+  // until the stack only contains two points
+  for (int i = k + 1; i < num_in; i++) {
+    while (m > 1 && cross_2d<T>(q[i] - q[m - 2], q[m - 1] - q[m - 2]) >= 0) {
+      m--;
+    }
+    q[m++] = q[i];
+  }
+
+  // Step 6 (Optional):
+  // In general sense we need the original coordinates, so we
+  // need to shift the points back (reverting Step 2)
+  // But if we're only interested in getting the area/perimeter of the shape
+  // We can simply return.
+  if (!shift_to_zero) {
+    for (int i = 0; i < m; i++) {
+      q[i] += start;
+    }
+  }
+
+  return m;
+}
+
+template <typename T>
+HOST_DEVICE_INLINE T polygon_area(const Point<T> (&q)[24], const int& m) {
+  if (m <= 2) {
+    return 0;
+  }
+
+  T area = 0;
+  for (int i = 1; i < m - 1; i++) {
+    area += fabs(cross_2d<T>(q[i] - q[0], q[i + 1] - q[0]));
+  }
+
+  return area / 2.0;
+}
+
+template <typename T>
+HOST_DEVICE_INLINE T rboxes_intersection(
+    const RotatedBox<T>& box1,
+    const RotatedBox<T>& box2) {
+  // There are up to 4 x 4 + 4 + 4 = 24 intersections (including dups) returned
+  // from rotated_rect_intersection_pts
+  Point<T> intersectPts[24], orderedPts[24];
+
+  Point<T> pts1[4];
+  Point<T> pts2[4];
+  get_rotated_vertices<T>(box1, pts1);
+  get_rotated_vertices<T>(box2, pts2);
+
+  int num = get_intersection_points<T>(pts1, pts2, intersectPts);
+
+  if (num <= 2) {
+    return 0.0;
+  }
+
+  // Convex Hull to order the intersection points in clockwise order and find
+  // the contour area.
+  int num_convex = convex_hull_graham<T>(intersectPts, num, orderedPts, true);
+  return polygon_area<T>(orderedPts, num_convex);
+}
+
+} // namespace
+
+template <typename T>
+HOST_DEVICE_INLINE T
+rbox_iou_single(T const* const box1_raw, T const* const box2_raw) {
+  // shift center to the middle point to achieve higher precision in result
+  RotatedBox<T> box1, box2;
+  auto center_shift_x = (box1_raw[0] + box2_raw[0]) / 2.0;
+  auto center_shift_y = (box1_raw[1] + box2_raw[1]) / 2.0;
+  box1.x_ctr = box1_raw[0] - center_shift_x;
+  box1.y_ctr = box1_raw[1] - center_shift_y;
+  box1.w = box1_raw[2];
+  box1.h = box1_raw[3];
+  box1.a = box1_raw[4];
+  box2.x_ctr = box2_raw[0] - center_shift_x;
+  box2.y_ctr = box2_raw[1] - center_shift_y;
+  box2.w = box2_raw[2];
+  box2.h = box2_raw[3];
+  box2.a = box2_raw[4];
+
+  const T area1 = box1.w * box1.h;
+  const T area2 = box2.w * box2.h;
+  if (area1 < 1e-14 || area2 < 1e-14) {
+    return 0.f;
+  }
+
+  const T intersection = rboxes_intersection<T>(box1, box2);
+  const T iou = intersection / (area1 + area2 - intersection);
+  return iou;
+}
+
+
+// 2D block with 32 * 16 = 512 threads per block
+const int BLOCK_DIM_X = 32;
+const int BLOCK_DIM_Y = 16;
+
+/**
+   Computes ceil(a / b)
+*/
+template <typename T>
+__host__ __device__ __forceinline__ T CeilDiv0(T a, T b) {
+  return (a + b - 1) / b;
+}
+
+static inline int CeilDiv(const int a, const int b) {
+  return (a + b -1)  / b;
+}
+
+template <typename T>
+__global__ void rbox_iou_cuda_kernel(
+    const int rbox1_num,
+    const int rbox2_num,
+    const T* rbox1_data_ptr,
+    const T* rbox2_data_ptr,
+    T* output_data_ptr) {
+
+  // get row_start and col_start
+  const int rbox1_block_idx = blockIdx.x * blockDim.x;
+  const int rbox2_block_idx = blockIdx.y * blockDim.y;
+
+  const int rbox1_thread_num = min(rbox1_num - rbox1_block_idx, blockDim.x);
+  const int rbox2_thread_num = min(rbox2_num - rbox2_block_idx, blockDim.y);
+
+  __shared__ T block_boxes1[BLOCK_DIM_X * 5];
+  __shared__ T block_boxes2[BLOCK_DIM_Y * 5];
+
+
+  // It's safe to copy using threadIdx.x since BLOCK_DIM_X >= BLOCK_DIM_Y
+  if (threadIdx.x < rbox1_thread_num && threadIdx.y == 0) {
+    block_boxes1[threadIdx.x * 5 + 0] =
+        rbox1_data_ptr[(rbox1_block_idx + threadIdx.x) * 5 + 0];
+    block_boxes1[threadIdx.x * 5 + 1] =
+        rbox1_data_ptr[(rbox1_block_idx + threadIdx.x) * 5 + 1];
+    block_boxes1[threadIdx.x * 5 + 2] =
+        rbox1_data_ptr[(rbox1_block_idx + threadIdx.x) * 5 + 2];
+    block_boxes1[threadIdx.x * 5 + 3] =
+        rbox1_data_ptr[(rbox1_block_idx + threadIdx.x) * 5 + 3];
+    block_boxes1[threadIdx.x * 5 + 4] =
+        rbox1_data_ptr[(rbox1_block_idx + threadIdx.x) * 5 + 4];
+  }
+
+  // threadIdx.x < BLOCK_DIM_Y=rbox2_thread_num, just use same condition as above: threadIdx.y == 0
+  if (threadIdx.x < rbox2_thread_num && threadIdx.y == 0) {
+    block_boxes2[threadIdx.x * 5 + 0] =
+        rbox2_data_ptr[(rbox2_block_idx + threadIdx.x) * 5 + 0];
+    block_boxes2[threadIdx.x * 5 + 1] =
+        rbox2_data_ptr[(rbox2_block_idx + threadIdx.x) * 5 + 1];
+    block_boxes2[threadIdx.x * 5 + 2] =
+        rbox2_data_ptr[(rbox2_block_idx + threadIdx.x) * 5 + 2];
+    block_boxes2[threadIdx.x * 5 + 3] =
+        rbox2_data_ptr[(rbox2_block_idx + threadIdx.x) * 5 + 3];
+    block_boxes2[threadIdx.x * 5 + 4] =
+        rbox2_data_ptr[(rbox2_block_idx + threadIdx.x) * 5 + 4];
+  }
+
+  // sync
+  __syncthreads();
+
+  if (threadIdx.x < rbox1_thread_num && threadIdx.y < rbox2_thread_num) {
+    int offset = (rbox1_block_idx + threadIdx.x) * rbox2_num + rbox2_block_idx + threadIdx.y;
+    output_data_ptr[offset] = rbox_iou_single<T>(block_boxes1 + threadIdx.x * 5, block_boxes2 + threadIdx.y * 5);
+  }
+}
+
+#define CHECK_INPUT_GPU(x) PD_CHECK(x.place() == paddle::PlaceType::kGPU, #x " must be a GPU Tensor.")
+
+std::vector<paddle::Tensor> RboxIouCUDAForward(const paddle::Tensor& rbox1, const paddle::Tensor& rbox2) {
+    CHECK_INPUT_GPU(rbox1);
+    CHECK_INPUT_GPU(rbox2);
+
+    auto rbox1_num = rbox1.shape()[0];
+    auto rbox2_num = rbox2.shape()[0];
+
+    auto output = paddle::Tensor(paddle::PlaceType::kGPU);
+    output.reshape({rbox1_num, rbox2_num});
+
+    const int blocks_x = CeilDiv(rbox1_num, BLOCK_DIM_X);
+    const int blocks_y = CeilDiv(rbox2_num, BLOCK_DIM_Y);
+
+    dim3 blocks(blocks_x, blocks_y);
+    dim3 threads(BLOCK_DIM_X, BLOCK_DIM_Y);
+
+    PD_DISPATCH_FLOATING_TYPES(
+        rbox1.type(),
+        "rbox_iou_cuda_kernel",
+        ([&] {
+            rbox_iou_cuda_kernel<data_t><<<blocks, threads, 0, rbox1.stream()>>>(
+                rbox1_num,
+                rbox2_num,
+                rbox1.data<data_t>(),
+                rbox2.data<data_t>(),
+                output.mutable_data<data_t>());
+        }));
+
+    return {output};
+}
+
+
+template <typename T>
+void rbox_iou_cpu_kernel(
+    const int rbox1_num,
+    const int rbox2_num,
+    const T* rbox1_data_ptr,
+    const T* rbox2_data_ptr,
+    T* output_data_ptr) {
+
+    int i, j;
+    for (i = 0; i < rbox1_num; i++) {
+        for (j = 0; j < rbox2_num; j++) {
+		int offset = i * rbox2_num + j;
+		output_data_ptr[offset] = rbox_iou_single<T>(rbox1_data_ptr + i * 5, rbox2_data_ptr + j * 5);
+        }
+    }
+}
+
+
+#define CHECK_INPUT_CPU(x) PD_CHECK(x.place() == paddle::PlaceType::kCPU, #x " must be a CPU Tensor.")
+
+std::vector<paddle::Tensor> RboxIouCPUForward(const paddle::Tensor& rbox1, const paddle::Tensor& rbox2) {
+    CHECK_INPUT_CPU(rbox1);
+    CHECK_INPUT_CPU(rbox2);
+
+    auto rbox1_num = rbox1.shape()[0];
+    auto rbox2_num = rbox2.shape()[0];
+
+    auto output = paddle::Tensor(paddle::PlaceType::kCPU);
+    output.reshape({rbox1_num, rbox2_num});
+
+    PD_DISPATCH_FLOATING_TYPES(
+        rbox1.type(),
+        "rbox_iou_cpu_kernel",
+        ([&] {
+            rbox_iou_cpu_kernel<data_t>(
+                rbox1_num,
+                rbox2_num,
+                rbox1.data<data_t>(),
+                rbox2.data<data_t>(),
+                output.mutable_data<data_t>());
+        }));
+    
+    return {output};
+}
diff --git a/ppdet/ext_op/setup.py b/ppdet/ext_op/setup.py
new file mode 100644
index 0000000000000000000000000000000000000000..6859f0cc29b80a171534eb385654f24f92a60921
--- /dev/null
+++ b/ppdet/ext_op/setup.py
@@ -0,0 +1,6 @@
+from paddle.utils.cpp_extension import CppExtension, CUDAExtension, setup
+
+if __name__ == "__main__":
+    setup(
+        name='rbox_iou_ops',
+        ext_modules=CUDAExtension(sources=['rbox_iou_op.cc', 'rbox_iou_op.cu']))
diff --git a/ppdet/ext_op/test.py b/ppdet/ext_op/test.py
new file mode 100644
index 0000000000000000000000000000000000000000..83403edd3a9e6a34accd386aac26d0bdb1d77b20
--- /dev/null
+++ b/ppdet/ext_op/test.py
@@ -0,0 +1,154 @@
+import numpy as np
+import os
+import sys
+import cv2
+import time
+import shapely
+from shapely.geometry import Polygon
+import paddle
+
+paddle.set_device('gpu:0')
+paddle.disable_static()
+
+try:
+    from rbox_iou_ops import rbox_iou
+except Exception as e:
+    print('import custom_ops error', e)
+    sys.exit(-1)
+
+# generate random data
+rbox1 = np.random.rand(13000, 5)
+rbox2 = np.random.rand(7, 5)
+
+# x1 y1 w h [0, 0.5]
+rbox1[:, 0:4] = rbox1[:, 0:4] * 0.45 + 0.001
+rbox2[:, 0:4] = rbox2[:, 0:4] * 0.45 + 0.001
+
+# generate rbox
+rbox1[:, 4] = rbox1[:, 4] - 0.5
+rbox2[:, 4] = rbox2[:, 4] - 0.5
+
+print('rbox1', rbox1.shape, 'rbox2', rbox2.shape)
+
+# to paddle tensor
+pd_rbox1 = paddle.to_tensor(rbox1)
+pd_rbox2 = paddle.to_tensor(rbox2)
+
+iou = rbox_iou(pd_rbox1, pd_rbox2)
+start_time = time.time()
+print('paddle time:', time.time() - start_time)
+print('iou is', iou.cpu().shape)
+
+
+# get gt
+def rbox2poly_single(rrect, get_best_begin_point=False):
+    """
+    rrect:[x_ctr,y_ctr,w,h,angle]
+    to
+    poly:[x0,y0,x1,y1,x2,y2,x3,y3]
+    """
+    x_ctr, y_ctr, width, height, angle = rrect[:5]
+    tl_x, tl_y, br_x, br_y = -width / 2, -height / 2, width / 2, height / 2
+    # rect 2x4
+    rect = np.array([[tl_x, br_x, br_x, tl_x], [tl_y, tl_y, br_y, br_y]])
+    R = np.array([[np.cos(angle), -np.sin(angle)],
+                  [np.sin(angle), np.cos(angle)]])
+    # poly
+    poly = R.dot(rect)
+    x0, x1, x2, x3 = poly[0, :4] + x_ctr
+    y0, y1, y2, y3 = poly[1, :4] + y_ctr
+    poly = np.array([x0, y0, x1, y1, x2, y2, x3, y3], dtype=np.float32)
+    return poly
+
+
+def intersection(g, p):
+    """
+    Intersection.
+    """
+
+    g = g[:8].reshape((4, 2))
+    p = p[:8].reshape((4, 2))
+
+    a = g
+    b = p
+
+    use_filter = True
+    if use_filter:
+        # step1:
+        inter_x1 = np.maximum(np.min(a[:, 0]), np.min(b[:, 0]))
+        inter_x2 = np.minimum(np.max(a[:, 0]), np.max(b[:, 0]))
+        inter_y1 = np.maximum(np.min(a[:, 1]), np.min(b[:, 1]))
+        inter_y2 = np.minimum(np.max(a[:, 1]), np.max(b[:, 1]))
+        if inter_x1 >= inter_x2 or inter_y1 >= inter_y2:
+            return 0.
+        x1 = np.minimum(np.min(a[:, 0]), np.min(b[:, 0]))
+        x2 = np.maximum(np.max(a[:, 0]), np.max(b[:, 0]))
+        y1 = np.minimum(np.min(a[:, 1]), np.min(b[:, 1]))
+        y2 = np.maximum(np.max(a[:, 1]), np.max(b[:, 1]))
+        if x1 >= x2 or y1 >= y2 or (x2 - x1) < 2 or (y2 - y1) < 2:
+            return 0.
+
+    g = Polygon(g)
+    p = Polygon(p)
+    #g = g.buffer(0)
+    #p = p.buffer(0)
+    if not g.is_valid or not p.is_valid:
+        return 0
+
+    inter = Polygon(g).intersection(Polygon(p)).area
+    union = g.area + p.area - inter
+    if union == 0:
+        return 0
+    else:
+        return inter / union
+
+
+# rbox_iou by python
+def rbox_overlaps(anchors, gt_bboxes, use_cv2=False):
+    """
+
+    Args:
+        anchors: [NA, 5]  x1,y1,x2,y2,angle
+        gt_bboxes: [M, 5]  x1,y1,x2,y2,angle
+
+    Returns:
+
+    """
+    assert anchors.shape[1] == 5
+    assert gt_bboxes.shape[1] == 5
+
+    gt_bboxes_ploy = [rbox2poly_single(e) for e in gt_bboxes]
+    anchors_ploy = [rbox2poly_single(e) for e in anchors]
+
+    num_gt, num_anchors = len(gt_bboxes_ploy), len(anchors_ploy)
+    iou = np.zeros((num_gt, num_anchors), dtype=np.float32)
+
+    start_time = time.time()
+    for i in range(num_gt):
+        for j in range(num_anchors):
+            try:
+                iou[i, j] = intersection(gt_bboxes_ploy[i], anchors_ploy[j])
+            except Exception as e:
+                print('cur gt_bboxes_ploy[i]', gt_bboxes_ploy[i],
+                      'anchors_ploy[j]', anchors_ploy[j], e)
+    iou = iou.T
+    print('intersection  all sp_time', time.time() - start_time)
+    return iou
+
+
+# make coor as int
+ploy_rbox1 = rbox1
+ploy_rbox2 = rbox2
+ploy_rbox1[:, 0:4] = rbox1[:, 0:4] * 1024
+ploy_rbox2[:, 0:4] = rbox2[:, 0:4] * 1024
+
+start_time = time.time()
+iou_py = rbox_overlaps(ploy_rbox1, ploy_rbox2, use_cv2=False)
+print('rbox time', time.time() - start_time)
+print(iou_py.shape)
+
+iou_pd = iou.cpu().numpy()
+sum_abs_diff = np.sum(np.abs(iou_pd - iou_py))
+print('sum of abs diff', sum_abs_diff)
+if sum_abs_diff < 0.02:
+    print("rbox_iou OP compute right!")
diff --git a/ppdet/metrics/__init__.py b/ppdet/metrics/__init__.py
index fb7add57c2e45a8ff411863918b795d4f0a5a4c7..460b12dea5ec265c618d4d35223b16cd528491ba 100644
--- a/ppdet/metrics/__init__.py
+++ b/ppdet/metrics/__init__.py
@@ -15,8 +15,4 @@
 from . import metrics
 from .metrics import *
 
-from . import category
-from .category import *
-
-__all__ = metrics.__all__ \
-        + category.__all__
+__all__ = metrics.__all__
diff --git a/ppdet/metrics/coco_utils.py b/ppdet/metrics/coco_utils.py
index 984abfbf764ea3d5c63ded9dd4507a29e79caa72..a7ac32226566d56f9c993a6b02f6c21397c36be5 100644
--- a/ppdet/metrics/coco_utils.py
+++ b/ppdet/metrics/coco_utils.py
@@ -21,7 +21,7 @@ import sys
 import numpy as np
 import itertools
 
-from ppdet.py_op.post_process import get_det_res, get_seg_res, get_solov2_segm_res
+from ppdet.metrics.json_results import get_det_res, get_det_poly_res, get_seg_res, get_solov2_segm_res
 from ppdet.metrics.map_utils import draw_pr_curve
 
 from ppdet.utils.logger import setup_logger
@@ -45,8 +45,12 @@ def get_infer_results(outs, catid, bias=0):
 
     infer_res = {}
     if 'bbox' in outs:
-        infer_res['bbox'] = get_det_res(
-            outs['bbox'], outs['bbox_num'], im_id, catid, bias=bias)
+        if len(outs['bbox']) > 0 and len(outs['bbox'][0]) > 6:
+            infer_res['bbox'] = get_det_poly_res(
+                outs['bbox'], outs['bbox_num'], im_id, catid, bias=bias)
+        else:
+            infer_res['bbox'] = get_det_res(
+                outs['bbox'], outs['bbox_num'], im_id, catid, bias=bias)
 
     if 'mask' in outs:
         # mask post process
@@ -67,13 +71,13 @@ def cocoapi_eval(jsonfile,
                  classwise=False):
     """
     Args:
-        jsonfile: Evaluation json file, eg: bbox.json, mask.json.
-        style: COCOeval style, can be `bbox` , `segm` and `proposal`.
-        coco_gt: Whether to load COCOAPI through anno_file,
+        jsonfile (str): Evaluation json file, eg: bbox.json, mask.json.
+        style (str): COCOeval style, can be `bbox` , `segm` and `proposal`.
+        coco_gt (str): Whether to load COCOAPI through anno_file,
                  eg: coco_gt = COCO(anno_file)
-        anno_file: COCO annotations file.
-        max_dets: COCO evaluation maxDets.
-        classwise: whether per-category AP and draw P-R Curve or not.
+        anno_file (str): COCO annotations file.
+        max_dets (tuple): COCO evaluation maxDets.
+        classwise (bool): Whether per-category AP and draw P-R Curve or not.
     """
     assert coco_gt != None or anno_file != None
     from pycocotools.coco import COCO
@@ -142,9 +146,7 @@ def cocoapi_eval(jsonfile,
     return coco_eval.stats
 
 
-def json_eval_results(metric: object,
-                      json_directory: object=None,
-                      dataset: object=None) -> object:
+def json_eval_results(metric, json_directory, dataset):
     """
     cocoapi eval with already exists proposal.json, bbox.json or mask.json
     """
diff --git a/ppdet/py_op/post_process.py b/ppdet/metrics/json_results.py
similarity index 69%
rename from ppdet/py_op/post_process.py
rename to ppdet/metrics/json_results.py
index 0c02cdba317290829422b8723de570490b098b0d..f5607666103e804bc4cd42ca87797c032e3b0a97 100755
--- a/ppdet/py_op/post_process.py
+++ b/ppdet/metrics/json_results.py
@@ -43,6 +43,54 @@ def get_det_res(bboxes, bbox_nums, image_id, label_to_cat_id_map, bias=0):
     return det_res
 
 
+def get_det_poly_res(bboxes, bbox_nums, image_id, label_to_cat_id_map, bias=0):
+    det_res = []
+    k = 0
+    for i in range(len(bbox_nums)):
+        cur_image_id = int(image_id[i][0])
+        det_nums = bbox_nums[i]
+        for j in range(det_nums):
+            dt = bboxes[k]
+            k = k + 1
+            num_id, score, x1, y1, x2, y2, x3, y3, x4, y4 = dt.tolist()
+            if int(num_id) < 0:
+                continue
+            category_id = int(num_id)
+            rbox = [x1, y1, x2, y2, x3, y3, x4, y4]
+            dt_res = {
+                'image_id': cur_image_id,
+                'category_id': category_id,
+                'bbox': rbox,
+                'score': score
+            }
+            det_res.append(dt_res)
+    return det_res
+
+
+def get_det_poly_res(bboxes, bbox_nums, image_id, label_to_cat_id_map, bias=0):
+    det_res = []
+    k = 0
+    for i in range(len(bbox_nums)):
+        cur_image_id = int(image_id[i][0])
+        det_nums = bbox_nums[i]
+        for j in range(det_nums):
+            dt = bboxes[k]
+            k = k + 1
+            num_id, score, x1, y1, x2, y2, x3, y3, x4, y4 = dt.tolist()
+            if int(num_id) < 0:
+                continue
+            category_id = int(num_id)
+            rbox = [x1, y1, x2, y2, x3, y3, x4, y4]
+            dt_res = {
+                'image_id': cur_image_id,
+                'category_id': category_id,
+                'bbox': rbox,
+                'score': score
+            }
+            det_res.append(dt_res)
+    return det_res
+
+
 def get_seg_res(masks, bboxes, mask_nums, image_id, label_to_cat_id_map):
     import pycocotools.mask as mask_util
     seg_res = []
diff --git a/ppdet/metrics/map_utils.py b/ppdet/metrics/map_utils.py
index 21c0e3922588b51a903be8441fe5f5e2cf05c6e6..17730bcdf782e9fcd78f20d975d807555d2c2197 100644
--- a/ppdet/metrics/map_utils.py
+++ b/ppdet/metrics/map_utils.py
@@ -101,19 +101,20 @@ class DetectionMAP(object):
     Currently support two types: 11point and integral
 
     Args:
-        class_num (int): the class number.
+        class_num (int): The class number.
         overlap_thresh (float): The threshold of overlap
             ratio between prediction bounding box and 
             ground truth bounding box for deciding 
             true/false positive. Default 0.5.
-        map_type (str): calculation method of mean average
+        map_type (str): Calculation method of mean average
             precision, currently support '11point' and
             'integral'. Default '11point'.
-        is_bbox_normalized (bool): whther bounding boxes
+        is_bbox_normalized (bool): Whether bounding boxes
             is normalized to range[0, 1]. Default False.
-        evaluate_difficult (bool): whether to evaluate
+        evaluate_difficult (bool): Whether to evaluate
             difficult bounding boxes. Default False.
-        classwise (bool): whether per-category AP and draw
+        catid2name (dict): Mapping between category id and category name.
+        classwise (bool): Whether per-category AP and draw
             P-R Curve or not.
     """
 
diff --git a/ppdet/metrics/metrics.py b/ppdet/metrics/metrics.py
index efd34c21b772b53aab52c8c2cbbd0ea804b6b901..e4ad1544f4808f721445390f07b5c81441ef21ca 100644
--- a/ppdet/metrics/metrics.py
+++ b/ppdet/metrics/metrics.py
@@ -22,10 +22,10 @@ import json
 import paddle
 import numpy as np
 
-from .category import get_categories
 from .map_utils import prune_zero_padding, DetectionMAP
 from .coco_utils import get_infer_results, cocoapi_eval
 from .widerface_utils import face_eval_run
+from ppdet.data.source.category import get_categories
 
 from ppdet.utils.logger import setup_logger
 logger = setup_logger(__name__)
@@ -62,11 +62,14 @@ class COCOMetric(Metric):
         assert os.path.isfile(anno_file), \
                 "anno_file {} not a file".format(anno_file)
         self.anno_file = anno_file
-        self.clsid2catid, self.catid2name = get_categories('COCO', anno_file)
+        self.clsid2catid = kwargs.get('clsid2catid', None)
+        if self.clsid2catid is None:
+            self.clsid2catid, _ = get_categories('COCO', anno_file)
         self.classwise = kwargs.get('classwise', False)
         self.output_eval = kwargs.get('output_eval', None)
         # TODO: bias should be unified
         self.bias = kwargs.get('bias', 0)
+        self.save_prediction_only = kwargs.get('save_prediction_only', False)
         self.reset()
 
     def reset(self):
@@ -102,13 +105,17 @@ class COCOMetric(Metric):
                 json.dump(self.results['bbox'], f)
                 logger.info('The bbox result is saved to bbox.json.')
 
-            bbox_stats = cocoapi_eval(
-                output,
-                'bbox',
-                anno_file=self.anno_file,
-                classwise=self.classwise)
-            self.eval_results['bbox'] = bbox_stats
-            sys.stdout.flush()
+            if self.save_prediction_only:
+                logger.info('The bbox result is saved to {} and do not '
+                            'evaluate the mAP.'.format(output))
+            else:
+                bbox_stats = cocoapi_eval(
+                    output,
+                    'bbox',
+                    anno_file=self.anno_file,
+                    classwise=self.classwise)
+                self.eval_results['bbox'] = bbox_stats
+                sys.stdout.flush()
 
         if len(self.results['mask']) > 0:
             output = "mask.json"
@@ -118,13 +125,17 @@ class COCOMetric(Metric):
                 json.dump(self.results['mask'], f)
                 logger.info('The mask result is saved to mask.json.')
 
-            seg_stats = cocoapi_eval(
-                output,
-                'segm',
-                anno_file=self.anno_file,
-                classwise=self.classwise)
-            self.eval_results['mask'] = seg_stats
-            sys.stdout.flush()
+            if self.save_prediction_only:
+                logger.info('The mask result is saved to {} and do not '
+                            'evaluate the mAP.'.format(output))
+            else:
+                seg_stats = cocoapi_eval(
+                    output,
+                    'segm',
+                    anno_file=self.anno_file,
+                    classwise=self.classwise)
+                self.eval_results['mask'] = seg_stats
+                sys.stdout.flush()
 
         if len(self.results['segm']) > 0:
             output = "segm.json"
@@ -134,13 +145,17 @@ class COCOMetric(Metric):
                 json.dump(self.results['segm'], f)
                 logger.info('The segm result is saved to segm.json.')
 
-            seg_stats = cocoapi_eval(
-                output,
-                'segm',
-                anno_file=self.anno_file,
-                classwise=self.classwise)
-            self.eval_results['mask'] = seg_stats
-            sys.stdout.flush()
+            if self.save_prediction_only:
+                logger.info('The segm result is saved to {} and do not '
+                            'evaluate the mAP.'.format(output))
+            else:
+                seg_stats = cocoapi_eval(
+                    output,
+                    'segm',
+                    anno_file=self.anno_file,
+                    classwise=self.classwise)
+                self.eval_results['mask'] = seg_stats
+                sys.stdout.flush()
 
     def log(self):
         pass
diff --git a/ppdet/model_zoo/model_zoo.py b/ppdet/model_zoo/model_zoo.py
index 0f7f2446760139b609ba92c589cc45f284fdc638..17af46f7970a46ab889d332c59e9e7a06de7bdbe 100644
--- a/ppdet/model_zoo/model_zoo.py
+++ b/ppdet/model_zoo/model_zoo.py
@@ -68,11 +68,11 @@ def list_model(filters=[]):
 
 # models and configs save on bcebos under dygraph directory
 def get_config_file(model_name):
-    return get_config_path("ppdet://dygraph/configs/{}.yml".format(model_name))
+    return get_config_path("ppdet://configs/{}.yml".format(model_name))
 
 
 def get_weights_url(model_name):
-    return "ppdet://dygraph/{}.pdparams".format(model_name)
+    return "ppdet://models/{}.pdparams".format(osp.split(model_name)[-1])
 
 
 def get_model(model_name, pretrained=True):
diff --git a/ppdet/modeling/__init__.py b/ppdet/modeling/__init__.py
index 5171d205cf3992f70c3187eea595504215560ef2..01968ba3c2a8d0785065730f99c4fc4b9656aaf7 100644
--- a/ppdet/modeling/__init__.py
+++ b/ppdet/modeling/__init__.py
@@ -1,3 +1,9 @@
+# OP docs may contains math formula which may cause
+# DeprecationWarning in string parsing
+import warnings
+warnings.filterwarnings(
+    action='ignore', category=DeprecationWarning, module='ops')
+
 from . import ops
 from . import backbones
 from . import necks
@@ -7,7 +13,6 @@ from . import losses
 from . import architectures
 from . import post_process
 from . import layers
-from . import utils
 
 from .ops import *
 from .backbones import *
@@ -18,4 +23,3 @@ from .losses import *
 from .architectures import *
 from .post_process import *
 from .layers import *
-from .utils import *
diff --git a/ppdet/modeling/architectures/__init__.py b/ppdet/modeling/architectures/__init__.py
index 6ffb47115548fc513245e517ebb776fbb9b72fc7..ae881607c6544709dbfdc7f6e73ae4ae30bbe48b 100644
--- a/ppdet/modeling/architectures/__init__.py
+++ b/ppdet/modeling/architectures/__init__.py
@@ -14,6 +14,7 @@ from . import ssd
 from . import fcos
 from . import solov2
 from . import ttfnet
+from . import s2anet
 
 from .meta_arch import *
 from .faster_rcnn import *
@@ -24,3 +25,4 @@ from .ssd import *
 from .fcos import *
 from .solov2 import *
 from .ttfnet import *
+from .s2anet import *
diff --git a/ppdet/modeling/architectures/cascade_rcnn.py b/ppdet/modeling/architectures/cascade_rcnn.py
index 987d7a77a3a7ac571899a6c05516a9ee3723b193..ac29b775d5af90a1be32e5260a65d90b9972fd4a 100644
--- a/ppdet/modeling/architectures/cascade_rcnn.py
+++ b/ppdet/modeling/architectures/cascade_rcnn.py
@@ -25,6 +25,18 @@ __all__ = ['CascadeRCNN']
 
 @register
 class CascadeRCNN(BaseArch):
+    """
+    Cascade R-CNN network, see https://arxiv.org/abs/1712.00726
+
+    Args:
+        backbone (object): backbone instance
+        rpn_head (object): `RPNHead` instance
+        bbox_head (object): `BBoxHead` instance
+        bbox_post_process (object): `BBoxPostProcess` instance
+        neck (object): 'FPN' instance
+        mask_head (object): `MaskHead` instance
+        mask_post_process (object): `MaskPostProcess` instance
+    """
     __category__ = 'architecture'
     __inject__ = [
         'bbox_post_process',
diff --git a/ppdet/modeling/architectures/faster_rcnn.py b/ppdet/modeling/architectures/faster_rcnn.py
index b7cd9308fa7546bdb7904fed3da502d88d01fd54..26a2672d60f49aa989c7945b65ce3ecd9beec182 100644
--- a/ppdet/modeling/architectures/faster_rcnn.py
+++ b/ppdet/modeling/architectures/faster_rcnn.py
@@ -25,6 +25,16 @@ __all__ = ['FasterRCNN']
 
 @register
 class FasterRCNN(BaseArch):
+    """
+    Faster R-CNN network, see https://arxiv.org/abs/1506.01497
+
+    Args:
+        backbone (object): backbone instance
+        rpn_head (object): `RPNHead` instance
+        bbox_head (object): `BBoxHead` instance
+        bbox_post_process (object): `BBoxPostProcess` instance
+        neck (object): 'FPN' instance
+    """
     __category__ = 'architecture'
     __inject__ = ['bbox_post_process']
 
@@ -34,13 +44,6 @@ class FasterRCNN(BaseArch):
                  bbox_head,
                  bbox_post_process,
                  neck=None):
-        """
-        backbone (nn.Layer): backbone instance.
-        rpn_head (nn.Layer): generates proposals using backbone features.
-        bbox_head (nn.Layer): a head that performs per-region computation.
-        mask_head (nn.Layer): generates mask from bbox and backbone features.
-        """
-
         super(FasterRCNN, self).__init__()
         self.backbone = backbone
         self.neck = neck
diff --git a/ppdet/modeling/architectures/mask_rcnn.py b/ppdet/modeling/architectures/mask_rcnn.py
index 3b5618655e0624961a8f15898fc9f1337fcf4cc8..071a326f4f3655aaa79fbf6e35fb7b8945c2c9d9 100644
--- a/ppdet/modeling/architectures/mask_rcnn.py
+++ b/ppdet/modeling/architectures/mask_rcnn.py
@@ -25,6 +25,19 @@ __all__ = ['MaskRCNN']
 
 @register
 class MaskRCNN(BaseArch):
+    """
+    Mask R-CNN network, see https://arxiv.org/abs/1703.06870
+
+    Args:
+        backbone (object): backbone instance
+        rpn_head (object): `RPNHead` instance
+        bbox_head (object): `BBoxHead` instance
+        mask_head (object): `MaskHead` instance
+        bbox_post_process (object): `BBoxPostProcess` instance
+        mask_post_process (object): `MaskPostProcess` instance
+        neck (object): 'FPN' instance
+    """
+
     __category__ = 'architecture'
     __inject__ = [
         'bbox_post_process',
@@ -39,12 +52,6 @@ class MaskRCNN(BaseArch):
                  bbox_post_process,
                  mask_post_process,
                  neck=None):
-        """
-        backbone (nn.Layer): backbone instance.
-        rpn_head (nn.Layer): generates proposals using backbone features.
-        bbox_head (nn.Layer): a head that performs per-region computation.
-        mask_head (nn.Layer): generates mask from bbox and backbone features.
-        """
         super(MaskRCNN, self).__init__()
         self.backbone = backbone
         self.neck = neck
diff --git a/ppdet/modeling/architectures/s2anet.py b/ppdet/modeling/architectures/s2anet.py
new file mode 100644
index 0000000000000000000000000000000000000000..72e9e820adcf230c5dd4a0d6c51c0496779e424a
--- /dev/null
+++ b/ppdet/modeling/architectures/s2anet.py
@@ -0,0 +1,100 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#   
+# Licensed under the Apache License, Version 2.0 (the "License");   
+# you may not use this file except in compliance with the License.  
+# You may obtain a copy of the License at   
+#   
+#     http://www.apache.org/licenses/LICENSE-2.0    
+#   
+# Unless required by applicable law or agreed to in writing, software   
+# distributed under the License is distributed on an "AS IS" BASIS, 
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  
+# See the License for the specific language governing permissions and   
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import paddle
+from ppdet.core.workspace import register, create
+from .meta_arch import BaseArch
+import numpy as np
+
+__all__ = ['S2ANet']
+
+
+@register
+class S2ANet(BaseArch):
+    __category__ = 'architecture'
+    __inject__ = [
+        's2anet_head',
+        's2anet_bbox_post_process',
+    ]
+
+    def __init__(self, backbone, neck, s2anet_head, s2anet_bbox_post_process):
+        """
+        S2ANet, see https://arxiv.org/pdf/2008.09397.pdf
+
+        Args:
+            backbone (object): backbone instance
+            neck (object): `FPN` instance
+            s2anet_head (object): `S2ANetHead` instance
+            s2anet_bbox_post_process (object): `S2ANetBBoxPostProcess` instance
+        """
+        super(S2ANet, self).__init__()
+        self.backbone = backbone
+        self.neck = neck
+        self.s2anet_head = s2anet_head
+        self.s2anet_bbox_post_process = s2anet_bbox_post_process
+
+    @classmethod
+    def from_config(cls, cfg, *args, **kwargs):
+        backbone = create(cfg['backbone'])
+        kwargs = {'input_shape': backbone.out_shape}
+        neck = cfg['neck'] and create(cfg['neck'], **kwargs)
+
+        out_shape = neck and neck.out_shape or backbone.out_shape
+        kwargs = {'input_shape': out_shape}
+        s2anet_head = create(cfg['s2anet_head'], **kwargs)
+        s2anet_bbox_post_process = create(cfg['s2anet_bbox_post_process'],
+                                          **kwargs)
+
+        return {
+            'backbone': backbone,
+            'neck': neck,
+            "s2anet_head": s2anet_head,
+            "s2anet_bbox_post_process": s2anet_bbox_post_process,
+        }
+
+    def _forward(self):
+        body_feats = self.backbone(self.inputs)
+        if self.neck is not None:
+            body_feats = self.neck(body_feats)
+        self.s2anet_head(body_feats)
+        if self.training:
+            loss = self.s2anet_head.get_loss(self.inputs)
+            total_loss = paddle.add_n(list(loss.values()))
+            loss.update({'loss': total_loss})
+            return loss
+        else:
+            im_shape = self.inputs['im_shape']
+            scale_factor = self.inputs['scale_factor']
+            nms_pre = self.s2anet_bbox_post_process.nms_pre
+            pred_scores, pred_bboxes = self.s2anet_head.get_prediction(nms_pre)
+
+            # post_process
+            pred_cls_score_bbox, bbox_num, index = self.s2anet_bbox_post_process.get_prediction(
+                pred_scores, pred_bboxes, im_shape, scale_factor)
+
+            # output
+            output = {'bbox': pred_cls_score_bbox, 'bbox_num': bbox_num}
+            return output
+
+    def get_loss(self, ):
+        loss = self._forward()
+        return loss
+
+    def get_pred(self):
+        output = self._forward()
+        return output
diff --git a/ppdet/modeling/architectures/ssd.py b/ppdet/modeling/architectures/ssd.py
index 49b01bde0d6e4d399f35b43660d608ea44b06644..136e34fd5259486f7e922d455707e9f4d5019a06 100644
--- a/ppdet/modeling/architectures/ssd.py
+++ b/ppdet/modeling/architectures/ssd.py
@@ -24,6 +24,15 @@ __all__ = ['SSD']
 
 @register
 class SSD(BaseArch):
+    """
+    Single Shot MultiBox Detector, see https://arxiv.org/abs/1512.02325
+
+    Args:
+        backbone (nn.Layer): backbone instance
+        ssd_head (nn.Layer): `SSDHead` instance
+        post_process (object): `BBoxPostProcess` instance
+    """
+
     __category__ = 'architecture'
     __inject__ = ['post_process']
 
diff --git a/ppdet/modeling/architectures/yolo.py b/ppdet/modeling/architectures/yolo.py
index bf6c19ecfe26a07c42e75d38e4acd9b1c443b499..6c0444480b1de27c96fba217e531d75005e92d70 100644
--- a/ppdet/modeling/architectures/yolo.py
+++ b/ppdet/modeling/architectures/yolo.py
@@ -20,6 +20,16 @@ class YOLOv3(BaseArch):
                  yolo_head='YOLOv3Head',
                  post_process='BBoxPostProcess',
                  data_format='NCHW'):
+        """
+        YOLOv3 network, see https://arxiv.org/abs/1804.02767
+
+        Args:
+            backbone (nn.Layer): backbone instance
+            neck (nn.Layer): neck instance
+            yolo_head (nn.Layer): anchor_head instance
+            bbox_post_process (object): `BBoxPostProcess` instance
+            data_format (str): data format, NCHW or NHWC
+        """
         super(YOLOv3, self).__init__(data_format=data_format)
         self.backbone = backbone
         self.neck = neck
diff --git a/ppdet/modeling/backbones/__init__.py b/ppdet/modeling/backbones/__init__.py
index d027c323cf565b90b84b06088e109f16e9a697a7..4937c9b8ddbba112da6cf315ada86a977ddfa04f 100644
--- a/ppdet/modeling/backbones/__init__.py
+++ b/ppdet/modeling/backbones/__init__.py
@@ -20,6 +20,7 @@ from . import mobilenet_v3
 from . import hrnet
 from . import blazenet
 from . import ghostnet
+from . import senet
 
 from .vgg import *
 from .resnet import *
@@ -29,3 +30,4 @@ from .mobilenet_v3 import *
 from .hrnet import *
 from .blazenet import *
 from .ghostnet import *
+from .senet import *
diff --git a/ppdet/modeling/backbones/darknet.py b/ppdet/modeling/backbones/darknet.py
index 7981306a912b7d893e9c63b76b0aee9247019ba0..8d3d07a25fc07f86ad5e32ea201f2a14b5e32476 100755
--- a/ppdet/modeling/backbones/darknet.py
+++ b/ppdet/modeling/backbones/darknet.py
@@ -18,7 +18,7 @@ import paddle.nn.functional as F
 from paddle import ParamAttr
 from paddle.regularizer import L2Decay
 from ppdet.core.workspace import register, serializable
-from ppdet.modeling.ops import batch_norm
+from ppdet.modeling.ops import batch_norm, mish
 from ..shape_spec import ShapeSpec
 
 __all__ = ['DarkNet', 'ConvBNLayer']
@@ -35,8 +35,23 @@ class ConvBNLayer(nn.Layer):
                  norm_type='bn',
                  norm_decay=0.,
                  act="leaky",
-                 name=None,
-                 data_format='NCHW'):
+                 data_format='NCHW',
+                 name=''):
+        """
+        conv + bn + activation layer
+
+        Args:
+            ch_in (int): input channel
+            ch_out (int): output channel
+            filter_size (int): filter size, default 3
+            stride (int): stride, default 1
+            groups (int): number of groups of conv layer, default 1
+            padding (int): padding size, default 0
+            norm_type (str): batch norm type, default bn
+            norm_decay (str): decay for weight and bias of batch norm layer, default 0.
+            act (str): activation function type, default 'leaky', which means leaky_relu
+            data_format (str): data format, NCHW or NHWC
+        """
         super(ConvBNLayer, self).__init__()
 
         self.conv = nn.Conv2D(
@@ -46,14 +61,12 @@ class ConvBNLayer(nn.Layer):
             stride=stride,
             padding=padding,
             groups=groups,
-            weight_attr=ParamAttr(name=name + '.conv.weights'),
             data_format=data_format,
             bias_attr=False)
         self.batch_norm = batch_norm(
             ch_out,
             norm_type=norm_type,
             norm_decay=norm_decay,
-            name=name,
             data_format=data_format)
         self.act = act
 
@@ -62,6 +75,8 @@ class ConvBNLayer(nn.Layer):
         out = self.batch_norm(out)
         if self.act == 'leaky':
             out = F.leaky_relu(out, 0.1)
+        elif self.act == 'mish':
+            out = mish(out)
         return out
 
 
@@ -74,8 +89,20 @@ class DownSample(nn.Layer):
                  padding=1,
                  norm_type='bn',
                  norm_decay=0.,
-                 name=None,
                  data_format='NCHW'):
+        """
+        downsample layer
+
+        Args:
+            ch_in (int): input channel
+            ch_out (int): output channel
+            filter_size (int): filter size, default 3
+            stride (int): stride, default 2
+            padding (int): padding size, default 1
+            norm_type (str): batch norm type, default bn
+            norm_decay (str): decay for weight and bias of batch norm layer, default 0.
+            data_format (str): data format, NCHW or NHWC
+        """
 
         super(DownSample, self).__init__()
 
@@ -87,8 +114,7 @@ class DownSample(nn.Layer):
             padding=padding,
             norm_type=norm_type,
             norm_decay=norm_decay,
-            data_format=data_format,
-            name=name)
+            data_format=data_format)
         self.ch_out = ch_out
 
     def forward(self, inputs):
@@ -102,8 +128,18 @@ class BasicBlock(nn.Layer):
                  ch_out,
                  norm_type='bn',
                  norm_decay=0.,
-                 name=None,
                  data_format='NCHW'):
+        """
+        BasicBlock layer of DarkNet
+
+        Args:
+            ch_in (int): input channel
+            ch_out (int): output channel
+            norm_type (str): batch norm type, default bn
+            norm_decay (str): decay for weight and bias of batch norm layer, default 0.
+            data_format (str): data format, NCHW or NHWC
+        """
+
         super(BasicBlock, self).__init__()
 
         self.conv1 = ConvBNLayer(
@@ -114,8 +150,7 @@ class BasicBlock(nn.Layer):
             padding=0,
             norm_type=norm_type,
             norm_decay=norm_decay,
-            data_format=data_format,
-            name=name + '.0')
+            data_format=data_format)
         self.conv2 = ConvBNLayer(
             ch_in=ch_out,
             ch_out=ch_out * 2,
@@ -124,8 +159,7 @@ class BasicBlock(nn.Layer):
             padding=1,
             norm_type=norm_type,
             norm_decay=norm_decay,
-            data_format=data_format,
-            name=name + '.1')
+            data_format=data_format)
 
     def forward(self, inputs):
         conv1 = self.conv1(inputs)
@@ -143,6 +177,18 @@ class Blocks(nn.Layer):
                  norm_decay=0.,
                  name=None,
                  data_format='NCHW'):
+        """
+        Blocks layer, which consist of some BaickBlock layers
+
+        Args:
+            ch_in (int): input channel
+            ch_out (int): output channel
+            count (int): number of BasicBlock layer
+            norm_type (str): batch norm type, default bn
+            norm_decay (str): decay for weight and bias of batch norm layer, default 0.
+            name (str): layer name
+            data_format (str): data format, NCHW or NHWC
+        """
         super(Blocks, self).__init__()
 
         self.basicblock0 = BasicBlock(
@@ -150,8 +196,7 @@ class Blocks(nn.Layer):
             ch_out,
             norm_type=norm_type,
             norm_decay=norm_decay,
-            data_format=data_format,
-            name=name + '.0')
+            data_format=data_format)
         self.res_out_list = []
         for i in range(1, count):
             block_name = '{}.{}'.format(name, i)
@@ -162,8 +207,7 @@ class Blocks(nn.Layer):
                     ch_out,
                     norm_type=norm_type,
                     norm_decay=norm_decay,
-                    data_format=data_format,
-                    name=block_name))
+                    data_format=data_format))
             self.res_out_list.append(res_out)
         self.ch_out = ch_out
 
@@ -190,6 +234,18 @@ class DarkNet(nn.Layer):
                  norm_type='bn',
                  norm_decay=0.,
                  data_format='NCHW'):
+        """
+        Darknet, see https://pjreddie.com/darknet/yolo/
+
+        Args:
+            depth (int): depth of network
+            freeze_at (int): freeze the backbone at which stage
+            filter_size (int): filter size, default 3
+            return_idx (list): index of stages whose feature maps are returned
+            norm_type (str): batch norm type, default bn
+            norm_decay (str): decay for weight and bias of batch norm layer, default 0.
+            data_format (str): data format, NCHW or NHWC
+        """
         super(DarkNet, self).__init__()
         self.depth = depth
         self.freeze_at = freeze_at
@@ -205,16 +261,14 @@ class DarkNet(nn.Layer):
             padding=1,
             norm_type=norm_type,
             norm_decay=norm_decay,
-            data_format=data_format,
-            name='yolo_input')
+            data_format=data_format)
 
         self.downsample0 = DownSample(
             ch_in=32,
             ch_out=32 * 2,
             norm_type=norm_type,
             norm_decay=norm_decay,
-            data_format=data_format,
-            name='yolo_input.downsample')
+            data_format=data_format)
 
         self._out_channels = []
         self.darknet_conv_block_list = []
@@ -244,8 +298,7 @@ class DarkNet(nn.Layer):
                     ch_out=32 * (2**(i + 2)),
                     norm_type=norm_type,
                     norm_decay=norm_decay,
-                    data_format=data_format,
-                    name=down_name))
+                    data_format=data_format))
             self.downsample_list.append(downsample)
 
     def forward(self, inputs):
diff --git a/ppdet/modeling/backbones/hrnet.py b/ppdet/modeling/backbones/hrnet.py
index 4450bd9a597cf8f0edda241b2ac1201bd4906684..f93f5fd9c0403724f2d0b0495cdf5b5bbb82136a 100644
--- a/ppdet/modeling/backbones/hrnet.py
+++ b/ppdet/modeling/backbones/hrnet.py
@@ -97,7 +97,12 @@ class ConvNormLayer(nn.Layer):
 
 
 class Layer1(nn.Layer):
-    def __init__(self, num_channels, has_se=False, freeze_norm=True, name=None):
+    def __init__(self,
+                 num_channels,
+                 has_se=False,
+                 norm_decay=0.,
+                 freeze_norm=True,
+                 name=None):
         super(Layer1, self).__init__()
 
         self.bottleneck_block_list = []
@@ -111,6 +116,7 @@ class Layer1(nn.Layer):
                     has_se=has_se,
                     stride=1,
                     downsample=True if i == 0 else False,
+                    norm_decay=norm_decay,
                     freeze_norm=freeze_norm,
                     name=name + '_' + str(i + 1)))
             self.bottleneck_block_list.append(bottleneck_block)
@@ -123,7 +129,12 @@ class Layer1(nn.Layer):
 
 
 class TransitionLayer(nn.Layer):
-    def __init__(self, in_channels, out_channels, freeze_norm=True, name=None):
+    def __init__(self,
+                 in_channels,
+                 out_channels,
+                 norm_decay=0.,
+                 freeze_norm=True,
+                 name=None):
         super(TransitionLayer, self).__init__()
 
         num_in = len(in_channels)
@@ -140,6 +151,7 @@ class TransitionLayer(nn.Layer):
                             ch_in=in_channels[i],
                             ch_out=out_channels[i],
                             filter_size=3,
+                            norm_decay=norm_decay,
                             freeze_norm=freeze_norm,
                             act='relu',
                             name=name + '_layer_' + str(i + 1)))
@@ -151,6 +163,7 @@ class TransitionLayer(nn.Layer):
                         ch_out=out_channels[i],
                         filter_size=3,
                         stride=2,
+                        norm_decay=norm_decay,
                         freeze_norm=freeze_norm,
                         act='relu',
                         name=name + '_layer_' + str(i + 1)))
@@ -175,6 +188,7 @@ class Branches(nn.Layer):
                  in_channels,
                  out_channels,
                  has_se=False,
+                 norm_decay=0.,
                  freeze_norm=True,
                  name=None):
         super(Branches, self).__init__()
@@ -190,6 +204,7 @@ class Branches(nn.Layer):
                         num_channels=in_ch,
                         num_filters=out_channels[i],
                         has_se=has_se,
+                        norm_decay=norm_decay,
                         freeze_norm=freeze_norm,
                         name=name + '_branch_layer_' + str(i + 1) + '_' +
                         str(j + 1)))
@@ -213,6 +228,7 @@ class BottleneckBlock(nn.Layer):
                  has_se,
                  stride=1,
                  downsample=False,
+                 norm_decay=0.,
                  freeze_norm=True,
                  name=None):
         super(BottleneckBlock, self).__init__()
@@ -224,6 +240,7 @@ class BottleneckBlock(nn.Layer):
             ch_in=num_channels,
             ch_out=num_filters,
             filter_size=1,
+            norm_decay=norm_decay,
             freeze_norm=freeze_norm,
             act="relu",
             name=name + "_conv1")
@@ -232,6 +249,7 @@ class BottleneckBlock(nn.Layer):
             ch_out=num_filters,
             filter_size=3,
             stride=stride,
+            norm_decay=norm_decay,
             freeze_norm=freeze_norm,
             act="relu",
             name=name + "_conv2")
@@ -239,6 +257,7 @@ class BottleneckBlock(nn.Layer):
             ch_in=num_filters,
             ch_out=num_filters * 4,
             filter_size=1,
+            norm_decay=norm_decay,
             freeze_norm=freeze_norm,
             act=None,
             name=name + "_conv3")
@@ -248,6 +267,7 @@ class BottleneckBlock(nn.Layer):
                 ch_in=num_channels,
                 ch_out=num_filters * 4,
                 filter_size=1,
+                norm_decay=norm_decay,
                 freeze_norm=freeze_norm,
                 act=None,
                 name=name + "_downsample")
@@ -283,6 +303,7 @@ class BasicBlock(nn.Layer):
                  stride=1,
                  has_se=False,
                  downsample=False,
+                 norm_decay=0.,
                  freeze_norm=True,
                  name=None):
         super(BasicBlock, self).__init__()
@@ -293,6 +314,7 @@ class BasicBlock(nn.Layer):
             ch_in=num_channels,
             ch_out=num_filters,
             filter_size=3,
+            norm_decay=norm_decay,
             freeze_norm=freeze_norm,
             stride=stride,
             act="relu",
@@ -301,6 +323,7 @@ class BasicBlock(nn.Layer):
             ch_in=num_filters,
             ch_out=num_filters,
             filter_size=3,
+            norm_decay=norm_decay,
             freeze_norm=freeze_norm,
             stride=1,
             act=None,
@@ -311,6 +334,7 @@ class BasicBlock(nn.Layer):
                 ch_in=num_channels,
                 ch_out=num_filters * 4,
                 filter_size=1,
+                norm_decay=norm_decay,
                 freeze_norm=freeze_norm,
                 act=None,
                 name=name + "_downsample")
@@ -381,6 +405,7 @@ class Stage(nn.Layer):
                  num_modules,
                  num_filters,
                  has_se=False,
+                 norm_decay=0.,
                  freeze_norm=True,
                  multi_scale_output=True,
                  name=None):
@@ -396,6 +421,7 @@ class Stage(nn.Layer):
                         num_channels=num_channels,
                         num_filters=num_filters,
                         has_se=has_se,
+                        norm_decay=norm_decay,
                         freeze_norm=freeze_norm,
                         multi_scale_output=False,
                         name=name + '_' + str(i + 1)))
@@ -406,6 +432,7 @@ class Stage(nn.Layer):
                         num_channels=num_channels,
                         num_filters=num_filters,
                         has_se=has_se,
+                        norm_decay=norm_decay,
                         freeze_norm=freeze_norm,
                         name=name + '_' + str(i + 1)))
 
@@ -424,6 +451,7 @@ class HighResolutionModule(nn.Layer):
                  num_filters,
                  has_se=False,
                  multi_scale_output=True,
+                 norm_decay=0.,
                  freeze_norm=True,
                  name=None):
         super(HighResolutionModule, self).__init__()
@@ -432,6 +460,7 @@ class HighResolutionModule(nn.Layer):
             in_channels=num_channels,
             out_channels=num_filters,
             has_se=has_se,
+            norm_decay=norm_decay,
             freeze_norm=freeze_norm,
             name=name)
 
@@ -439,6 +468,7 @@ class HighResolutionModule(nn.Layer):
             in_channels=num_filters,
             out_channels=num_filters,
             multi_scale_output=multi_scale_output,
+            norm_decay=norm_decay,
             freeze_norm=freeze_norm,
             name=name)
 
@@ -453,6 +483,7 @@ class FuseLayers(nn.Layer):
                  in_channels,
                  out_channels,
                  multi_scale_output=True,
+                 norm_decay=0.,
                  freeze_norm=True,
                  name=None):
         super(FuseLayers, self).__init__()
@@ -473,6 +504,7 @@ class FuseLayers(nn.Layer):
                             filter_size=1,
                             stride=1,
                             act=None,
+                            norm_decay=norm_decay,
                             freeze_norm=freeze_norm,
                             name=name + '_layer_' + str(i + 1) + '_' +
                             str(j + 1)))
@@ -489,6 +521,7 @@ class FuseLayers(nn.Layer):
                                     ch_out=out_channels[i],
                                     filter_size=3,
                                     stride=2,
+                                    norm_decay=norm_decay,
                                     freeze_norm=freeze_norm,
                                     act=None,
                                     name=name + '_layer_' + str(i + 1) + '_' +
@@ -503,6 +536,7 @@ class FuseLayers(nn.Layer):
                                     ch_out=out_channels[j],
                                     filter_size=3,
                                     stride=2,
+                                    norm_decay=norm_decay,
                                     freeze_norm=freeze_norm,
                                     act="relu",
                                     name=name + '_layer_' + str(i + 1) + '_' +
@@ -544,6 +578,7 @@ class HRNet(nn.Layer):
         has_se (bool): whether to add SE block for each stage
         freeze_at (int): the stage to freeze
         freeze_norm (bool): whether to freeze norm in HRNet
+        norm_decay (float): weight decay for normalization layer weights
         return_idx (List): the stage to return
     """
 
@@ -586,6 +621,7 @@ class HRNet(nn.Layer):
             ch_out=64,
             filter_size=3,
             stride=2,
+            norm_decay=norm_decay,
             freeze_norm=freeze_norm,
             act='relu',
             name="layer1_1")
@@ -595,6 +631,7 @@ class HRNet(nn.Layer):
             ch_out=64,
             filter_size=3,
             stride=2,
+            norm_decay=norm_decay,
             freeze_norm=freeze_norm,
             act='relu',
             name="layer1_2")
@@ -602,12 +639,14 @@ class HRNet(nn.Layer):
         self.la1 = Layer1(
             num_channels=64,
             has_se=has_se,
+            norm_decay=norm_decay,
             freeze_norm=freeze_norm,
             name="layer2")
 
         self.tr1 = TransitionLayer(
             in_channels=[256],
             out_channels=channels_2,
+            norm_decay=norm_decay,
             freeze_norm=freeze_norm,
             name="tr1")
 
@@ -616,12 +655,14 @@ class HRNet(nn.Layer):
             num_modules=num_modules_2,
             num_filters=channels_2,
             has_se=self.has_se,
+            norm_decay=norm_decay,
             freeze_norm=freeze_norm,
             name="st2")
 
         self.tr2 = TransitionLayer(
             in_channels=channels_2,
             out_channels=channels_3,
+            norm_decay=norm_decay,
             freeze_norm=freeze_norm,
             name="tr2")
 
@@ -630,12 +671,14 @@ class HRNet(nn.Layer):
             num_modules=num_modules_3,
             num_filters=channels_3,
             has_se=self.has_se,
+            norm_decay=norm_decay,
             freeze_norm=freeze_norm,
             name="st3")
 
         self.tr3 = TransitionLayer(
             in_channels=channels_3,
             out_channels=channels_4,
+            norm_decay=norm_decay,
             freeze_norm=freeze_norm,
             name="tr3")
         self.st4 = Stage(
@@ -643,6 +686,7 @@ class HRNet(nn.Layer):
             num_modules=num_modules_4,
             num_filters=channels_4,
             has_se=self.has_se,
+            norm_decay=norm_decay,
             freeze_norm=freeze_norm,
             name="st4")
 
diff --git a/ppdet/modeling/backbones/mobilenet_v1.py b/ppdet/modeling/backbones/mobilenet_v1.py
index 5b4d1287847808f0e4c0d20b7809a1efee3cd82b..cecc6a5b5e79db265936fa149019ac9323811eaf 100644
--- a/ppdet/modeling/backbones/mobilenet_v1.py
+++ b/ppdet/modeling/backbones/mobilenet_v1.py
@@ -55,14 +55,11 @@ class ConvBNLayer(nn.Layer):
             weight_attr=ParamAttr(
                 learning_rate=conv_lr,
                 initializer=KaimingNormal(),
-                regularizer=L2Decay(conv_decay),
-                name=name + "_weights"),
+                regularizer=L2Decay(conv_decay)),
             bias_attr=False)
 
-        param_attr = ParamAttr(
-            name=name + "_bn_scale", regularizer=L2Decay(norm_decay))
-        bias_attr = ParamAttr(
-            name=name + "_bn_offset", regularizer=L2Decay(norm_decay))
+        param_attr = ParamAttr(regularizer=L2Decay(norm_decay))
+        bias_attr = ParamAttr(regularizer=L2Decay(norm_decay))
         if norm_type == 'sync_bn':
             self._batch_norm = nn.SyncBatchNorm(
                 out_channels, weight_attr=param_attr, bias_attr=bias_attr)
@@ -72,9 +69,7 @@ class ConvBNLayer(nn.Layer):
                 act=None,
                 param_attr=param_attr,
                 bias_attr=bias_attr,
-                use_global_stats=False,
-                moving_mean_name=name + '_bn_mean',
-                moving_variance_name=name + '_bn_variance')
+                use_global_stats=False)
 
     def forward(self, x):
         x = self._conv(x)
diff --git a/ppdet/modeling/backbones/mobilenet_v3.py b/ppdet/modeling/backbones/mobilenet_v3.py
index 1cebf5ef1e08c341f4ffb2670f1737974750ba6b..d7178c9132e214392d3ad8d4cd70b7a4854c8c4e 100644
--- a/ppdet/modeling/backbones/mobilenet_v3.py
+++ b/ppdet/modeling/backbones/mobilenet_v3.py
@@ -330,16 +330,16 @@ class MobileNetV3(nn.Layer):
                 [3, 16, 16, False, "relu", 1],
                 [3, 64, 24, False, "relu", 2],
                 [3, 72, 24, False, "relu", 1],
-                [5, 72, 40, True, "relu", 2],
+                [5, 72, 40, True, "relu", 2],  # RCNN output
                 [5, 120, 40, True, "relu", 1],
                 [5, 120, 40, True, "relu", 1],  # YOLOv3 output
-                [3, 240, 80, False, "hard_swish", 2],
+                [3, 240, 80, False, "hard_swish", 2],  # RCNN output
                 [3, 200, 80, False, "hard_swish", 1],
                 [3, 184, 80, False, "hard_swish", 1],
                 [3, 184, 80, False, "hard_swish", 1],
                 [3, 480, 112, True, "hard_swish", 1],
                 [3, 672, 112, True, "hard_swish", 1],  # YOLOv3 output
-                [5, 672, 160, True, "hard_swish", 2],  # SSD/SSDLite output
+                [5, 672, 160, True, "hard_swish", 2],  # SSD/SSDLite/RCNN output
                 [5, 960, 160, True, "hard_swish", 1],
                 [5, 960, 160, True, "hard_swish", 1],  # YOLOv3 output
             ]
@@ -347,14 +347,14 @@ class MobileNetV3(nn.Layer):
             self.cfg = [
                 # k, exp, c,  se,     nl,  s,
                 [3, 16, 16, True, "relu", 2],
-                [3, 72, 24, False, "relu", 2],
+                [3, 72, 24, False, "relu", 2],  # RCNN output
                 [3, 88, 24, False, "relu", 1],  # YOLOv3 output
-                [5, 96, 40, True, "hard_swish", 2],
+                [5, 96, 40, True, "hard_swish", 2],  # RCNN output
                 [5, 240, 40, True, "hard_swish", 1],
                 [5, 240, 40, True, "hard_swish", 1],
                 [5, 120, 48, True, "hard_swish", 1],
                 [5, 144, 48, True, "hard_swish", 1],  # YOLOv3 output
-                [5, 288, 96, True, "hard_swish", 2],  # SSD/SSDLite output
+                [5, 288, 96, True, "hard_swish", 2],  # SSD/SSDLite/RCNN output
                 [5, 576, 96, True, "hard_swish", 1],
                 [5, 576, 96, True, "hard_swish", 1],  # YOLOv3 output
             ]
diff --git a/ppdet/modeling/backbones/resnet.py b/ppdet/modeling/backbones/resnet.py
index e59f1761464a59538b692e5413292577175b2be9..6be2fc6e16cfa94695aad596ffa2ecfd0705f7b8 100755
--- a/ppdet/modeling/backbones/resnet.py
+++ b/ppdet/modeling/backbones/resnet.py
@@ -20,11 +20,14 @@ import paddle.nn as nn
 import paddle.nn.functional as F
 from ppdet.core.workspace import register, serializable
 from paddle.regularizer import L2Decay
-from ppdet.modeling.layers import DeformableConvV2
+from paddle.nn.initializer import Uniform
+from paddle import ParamAttr
+from paddle.nn.initializer import Constant
+from paddle.vision.ops import DeformConv2D
 from .name_adapter import NameAdapter
 from ..shape_spec import ShapeSpec
 
-__all__ = ['ResNet', 'Res5Head']
+__all__ = ['ResNet', 'Res5Head', 'Blocks', 'BasicBlock', 'BottleNeck']
 
 ResNet_cfg = {
     18: [2, 2, 2, 2],
@@ -41,21 +44,20 @@ class ConvNormLayer(nn.Layer):
                  ch_out,
                  filter_size,
                  stride,
-                 name_adapter,
                  groups=1,
                  act=None,
                  norm_type='bn',
                  norm_decay=0.,
                  freeze_norm=True,
                  lr=1.0,
-                 dcn_v2=False,
-                 name=None):
+                 dcn_v2=False):
         super(ConvNormLayer, self).__init__()
         assert norm_type in ['bn', 'sync_bn']
         self.norm_type = norm_type
         self.act = act
+        self.dcn_v2 = dcn_v2
 
-        if not dcn_v2:
+        if not self.dcn_v2:
             self.conv = nn.Conv2D(
                 in_channels=ch_in,
                 out_channels=ch_out,
@@ -63,33 +65,39 @@ class ConvNormLayer(nn.Layer):
                 stride=stride,
                 padding=(filter_size - 1) // 2,
                 groups=groups,
-                weight_attr=paddle.ParamAttr(
-                    learning_rate=lr, name=name + "_weights"),
+                weight_attr=ParamAttr(learning_rate=lr),
                 bias_attr=False)
         else:
-            self.conv = DeformableConvV2(
+            self.offset_channel = 2 * filter_size**2
+            self.mask_channel = filter_size**2
+
+            self.conv_offset = nn.Conv2D(
+                in_channels=ch_in,
+                out_channels=3 * filter_size**2,
+                kernel_size=filter_size,
+                stride=stride,
+                padding=(filter_size - 1) // 2,
+                weight_attr=ParamAttr(initializer=Constant(0.)),
+                bias_attr=ParamAttr(initializer=Constant(0.)))
+            self.conv = DeformConv2D(
                 in_channels=ch_in,
                 out_channels=ch_out,
                 kernel_size=filter_size,
                 stride=stride,
                 padding=(filter_size - 1) // 2,
+                dilation=1,
                 groups=groups,
-                weight_attr=paddle.ParamAttr(
-                    learning_rate=lr, name=name + '_weights'),
-                bias_attr=False,
-                name=name)
+                weight_attr=ParamAttr(learning_rate=lr),
+                bias_attr=False)
 
-        bn_name = name_adapter.fix_conv_norm_name(name)
         norm_lr = 0. if freeze_norm else lr
-        param_attr = paddle.ParamAttr(
+        param_attr = ParamAttr(
             learning_rate=norm_lr,
             regularizer=L2Decay(norm_decay),
-            name=bn_name + "_scale",
             trainable=False if freeze_norm else True)
-        bias_attr = paddle.ParamAttr(
+        bias_attr = ParamAttr(
             learning_rate=norm_lr,
             regularizer=L2Decay(norm_decay),
-            name=bn_name + "_offset",
             trainable=False if freeze_norm else True)
 
         global_stats = True if freeze_norm else False
@@ -102,9 +110,7 @@ class ConvNormLayer(nn.Layer):
                 act=None,
                 param_attr=param_attr,
                 bias_attr=bias_attr,
-                use_global_stats=global_stats,
-                moving_mean_name=bn_name + '_mean',
-                moving_variance_name=bn_name + '_variance')
+                use_global_stats=global_stats)
         norm_params = self.norm.parameters()
 
         if freeze_norm:
@@ -112,7 +118,17 @@ class ConvNormLayer(nn.Layer):
                 param.stop_gradient = True
 
     def forward(self, inputs):
-        out = self.conv(inputs)
+        if not self.dcn_v2:
+            out = self.conv(inputs)
+        else:
+            offset_mask = self.conv_offset(inputs)
+            offset, mask = paddle.split(
+                offset_mask,
+                num_or_sections=[self.offset_channel, self.mask_channel],
+                axis=1)
+            mask = F.sigmoid(mask)
+            out = self.conv(inputs, offset, mask=mask)
+
         if self.norm_type in ['bn', 'sync_bn']:
             out = self.norm(out)
         if self.act:
@@ -120,24 +136,58 @@ class ConvNormLayer(nn.Layer):
         return out
 
 
+class SELayer(nn.Layer):
+    def __init__(self, ch, reduction_ratio=16):
+        super(SELayer, self).__init__()
+        self.pool = nn.AdaptiveAvgPool2D(1)
+        stdv = 1.0 / math.sqrt(ch)
+        c_ = ch // reduction_ratio
+        self.squeeze = nn.Linear(
+            ch,
+            c_,
+            weight_attr=paddle.ParamAttr(initializer=Uniform(-stdv, stdv)),
+            bias_attr=True)
+
+        stdv = 1.0 / math.sqrt(c_)
+        self.extract = nn.Linear(
+            c_,
+            ch,
+            weight_attr=paddle.ParamAttr(initializer=Uniform(-stdv, stdv)),
+            bias_attr=True)
+
+    def forward(self, inputs):
+        out = self.pool(inputs)
+        out = paddle.squeeze(out, axis=[2, 3])
+        out = self.squeeze(out)
+        out = F.relu(out)
+        out = self.extract(out)
+        out = F.sigmoid(out)
+        out = paddle.unsqueeze(out, axis=[2, 3])
+        scale = out * inputs
+        return scale
+
+
 class BasicBlock(nn.Layer):
+
+    expansion = 1
+
     def __init__(self,
                  ch_in,
                  ch_out,
                  stride,
                  shortcut,
-                 name_adapter,
-                 name,
                  variant='b',
+                 groups=1,
+                 base_width=64,
                  lr=1.0,
                  norm_type='bn',
                  norm_decay=0.,
                  freeze_norm=True,
-                 dcn_v2=False):
+                 dcn_v2=False,
+                 std_senet=False):
         super(BasicBlock, self).__init__()
         assert dcn_v2 is False, "Not implemented yet."
-        conv_name1, conv_name2, shortcut_name = name_adapter.fix_basicblock_name(
-            name)
+        assert groups == 1 and base_width == 64, 'BasicBlock only supports groups=1 and base_width=64'
 
         self.shortcut = shortcut
         if not shortcut:
@@ -154,54 +204,52 @@ class BasicBlock(nn.Layer):
                         ch_out=ch_out,
                         filter_size=1,
                         stride=1,
-                        name_adapter=name_adapter,
                         norm_type=norm_type,
                         norm_decay=norm_decay,
                         freeze_norm=freeze_norm,
-                        lr=lr,
-                        name=shortcut_name))
+                        lr=lr))
             else:
                 self.short = ConvNormLayer(
                     ch_in=ch_in,
                     ch_out=ch_out,
                     filter_size=1,
                     stride=stride,
-                    name_adapter=name_adapter,
                     norm_type=norm_type,
                     norm_decay=norm_decay,
                     freeze_norm=freeze_norm,
-                    lr=lr,
-                    name=shortcut_name)
+                    lr=lr)
 
         self.branch2a = ConvNormLayer(
             ch_in=ch_in,
             ch_out=ch_out,
             filter_size=3,
             stride=stride,
-            name_adapter=name_adapter,
             act='relu',
             norm_type=norm_type,
             norm_decay=norm_decay,
             freeze_norm=freeze_norm,
-            lr=lr,
-            name=conv_name1)
+            lr=lr)
 
         self.branch2b = ConvNormLayer(
             ch_in=ch_out,
             ch_out=ch_out,
             filter_size=3,
             stride=1,
-            name_adapter=name_adapter,
             act=None,
             norm_type=norm_type,
             norm_decay=norm_decay,
             freeze_norm=freeze_norm,
-            lr=lr,
-            name=conv_name2)
+            lr=lr)
+
+        self.std_senet = std_senet
+        if self.std_senet:
+            self.se = SELayer(ch_out)
 
     def forward(self, inputs):
         out = self.branch2a(inputs)
         out = self.branch2b(out)
+        if self.std_senet:
+            out = self.se(out)
 
         if self.shortcut:
             short = inputs
@@ -215,22 +263,23 @@ class BasicBlock(nn.Layer):
 
 
 class BottleNeck(nn.Layer):
+
+    expansion = 4
+
     def __init__(self,
                  ch_in,
                  ch_out,
                  stride,
                  shortcut,
-                 name_adapter,
-                 name,
                  variant='b',
                  groups=1,
                  base_width=4,
-                 base_channels=64,
                  lr=1.0,
                  norm_type='bn',
                  norm_decay=0.,
                  freeze_norm=True,
-                 dcn_v2=False):
+                 dcn_v2=False,
+                 std_senet=False):
         super(BottleNeck, self).__init__()
         if variant == 'a':
             stride1, stride2 = stride, 1
@@ -238,15 +287,7 @@ class BottleNeck(nn.Layer):
             stride1, stride2 = 1, stride
 
         # ResNeXt
-        if groups == 1:
-            width = ch_out
-        else:
-            width = int(
-                math.floor(ch_out * (base_width * 1.0 / base_channels)) *
-                groups)
-
-        conv_name1, conv_name2, conv_name3, \
-            shortcut_name = name_adapter.fix_bottleneck_name(name)
+        width = int(ch_out * (base_width / 64.)) * groups
 
         self.shortcut = shortcut
         if not shortcut:
@@ -260,75 +301,73 @@ class BottleNeck(nn.Layer):
                     'conv',
                     ConvNormLayer(
                         ch_in=ch_in,
-                        ch_out=ch_out * 4,
+                        ch_out=ch_out * self.expansion,
                         filter_size=1,
                         stride=1,
-                        name_adapter=name_adapter,
                         norm_type=norm_type,
                         norm_decay=norm_decay,
                         freeze_norm=freeze_norm,
-                        lr=lr,
-                        name=shortcut_name))
+                        lr=lr))
             else:
                 self.short = ConvNormLayer(
                     ch_in=ch_in,
-                    ch_out=ch_out * 4,
+                    ch_out=ch_out * self.expansion,
                     filter_size=1,
                     stride=stride,
-                    name_adapter=name_adapter,
                     norm_type=norm_type,
                     norm_decay=norm_decay,
                     freeze_norm=freeze_norm,
-                    lr=lr,
-                    name=shortcut_name)
+                    lr=lr)
 
         self.branch2a = ConvNormLayer(
             ch_in=ch_in,
             ch_out=width,
             filter_size=1,
             stride=stride1,
-            name_adapter=name_adapter,
             groups=1,
             act='relu',
             norm_type=norm_type,
             norm_decay=norm_decay,
             freeze_norm=freeze_norm,
-            lr=lr,
-            name=conv_name1)
+            lr=lr)
 
         self.branch2b = ConvNormLayer(
             ch_in=width,
             ch_out=width,
             filter_size=3,
             stride=stride2,
-            name_adapter=name_adapter,
             groups=groups,
             act='relu',
             norm_type=norm_type,
             norm_decay=norm_decay,
             freeze_norm=freeze_norm,
             lr=lr,
-            dcn_v2=dcn_v2,
-            name=conv_name2)
+            dcn_v2=dcn_v2)
 
         self.branch2c = ConvNormLayer(
             ch_in=width,
-            ch_out=ch_out * 4,
+            ch_out=ch_out * self.expansion,
             filter_size=1,
             stride=1,
-            name_adapter=name_adapter,
             groups=1,
             norm_type=norm_type,
             norm_decay=norm_decay,
             freeze_norm=freeze_norm,
-            lr=lr,
-            name=conv_name3)
+            lr=lr)
+
+        self.std_senet = std_senet
+        if self.std_senet:
+            self.se = SELayer(ch_out * self.expansion)
 
     def forward(self, inputs):
 
         out = self.branch2a(inputs)
         out = self.branch2b(out)
         out = self.branch2c(out)
+
+        if self.std_senet:
+            out = self.se(out)
+
         if self.shortcut:
             short = inputs
         else:
@@ -342,7 +381,7 @@ class BottleNeck(nn.Layer):
 
 class Blocks(nn.Layer):
     def __init__(self,
-                 depth,
+                 block,
                  ch_in,
                  ch_out,
                  count,
@@ -350,55 +389,37 @@ class Blocks(nn.Layer):
                  stage_num,
                  variant='b',
                  groups=1,
-                 base_width=-1,
-                 base_channels=-1,
+                 base_width=64,
                  lr=1.0,
                  norm_type='bn',
                  norm_decay=0.,
                  freeze_norm=True,
-                 dcn_v2=False):
+                 dcn_v2=False,
+                 std_senet=False):
         super(Blocks, self).__init__()
 
         self.blocks = []
         for i in range(count):
             conv_name = name_adapter.fix_layer_warp_name(stage_num, count, i)
-            if depth >= 50:
-                block = self.add_sublayer(
-                    conv_name,
-                    BottleNeck(
-                        ch_in=ch_in if i == 0 else ch_out * 4,
-                        ch_out=ch_out,
-                        stride=2 if i == 0 and stage_num != 2 else 1,
-                        shortcut=False if i == 0 else True,
-                        name_adapter=name_adapter,
-                        name=conv_name,
-                        variant=variant,
-                        groups=groups,
-                        base_width=base_width,
-                        base_channels=base_channels,
-                        lr=lr,
-                        norm_type=norm_type,
-                        norm_decay=norm_decay,
-                        freeze_norm=freeze_norm,
-                        dcn_v2=dcn_v2))
-            else:
-                ch_in = ch_in // 4 if i > 0 else ch_in
-                block = self.add_sublayer(
-                    conv_name,
-                    BasicBlock(
-                        ch_in=ch_in if i == 0 else ch_out,
-                        ch_out=ch_out,
-                        stride=2 if i == 0 and stage_num != 2 else 1,
-                        shortcut=False if i == 0 else True,
-                        name_adapter=name_adapter,
-                        name=conv_name,
-                        variant=variant,
-                        lr=lr,
-                        norm_type=norm_type,
-                        norm_decay=norm_decay,
-                        freeze_norm=freeze_norm,
-                        dcn_v2=dcn_v2))
-            self.blocks.append(block)
+            layer = self.add_sublayer(
+                conv_name,
+                block(
+                    ch_in=ch_in,
+                    ch_out=ch_out,
+                    stride=2 if i == 0 and stage_num != 2 else 1,
+                    shortcut=False if i == 0 else True,
+                    variant=variant,
+                    groups=groups,
+                    base_width=base_width,
+                    lr=lr,
+                    norm_type=norm_type,
+                    norm_decay=norm_decay,
+                    freeze_norm=freeze_norm,
+                    dcn_v2=dcn_v2,
+                    std_senet=std_senet))
+            self.blocks.append(layer)
+            if i == 0:
+                ch_in = ch_out * block.expansion
 
     def forward(self, inputs):
         block_out = inputs
@@ -414,23 +435,47 @@ class ResNet(nn.Layer):
 
     def __init__(self,
                  depth=50,
+                 ch_in=64,
                  variant='b',
                  lr_mult_list=[1.0, 1.0, 1.0, 1.0],
                  groups=1,
-                 base_width=-1,
-                 base_channels=-1,
+                 base_width=64,
                  norm_type='bn',
                  norm_decay=0,
                  freeze_norm=True,
                  freeze_at=0,
                  return_idx=[0, 1, 2, 3],
                  dcn_v2_stages=[-1],
-                 num_stages=4):
+                 num_stages=4,
+                 std_senet=False):
+        """
+        Residual Network, see https://arxiv.org/abs/1512.03385
+        
+        Args:
+            depth (int): ResNet depth, should be 18, 34, 50, 101, 152.
+            ch_in (int): output channel of first stage, default 64
+            variant (str): ResNet variant, supports 'a', 'b', 'c', 'd' currently
+            lr_mult_list (list): learning rate ratio of different resnet stages(2,3,4,5),
+                                 lower learning rate ratio is need for pretrained model 
+                                 got using distillation(default as [1.0, 1.0, 1.0, 1.0]).
+            groups (int): group convolution cardinality
+            base_width (int): base width of each group convolution
+            norm_type (str): normalization type, 'bn', 'sync_bn' or 'affine_channel'
+            norm_decay (float): weight decay for normalization layer weights
+            freeze_norm (bool): freeze normalization layers
+            freeze_at (int): freeze the backbone at which stage
+            return_idx (list): index of the stages whose feature maps are returned
+            dcn_v2_stages (list): index of stages who select deformable conv v2
+            num_stages (int): total num of stages
+            std_senet (bool): whether use senet, default True
+        """
         super(ResNet, self).__init__()
         self._model_type = 'ResNet' if groups == 1 else 'ResNeXt'
         assert num_stages >= 1 and num_stages <= 4
         self.depth = depth
         self.variant = variant
+        self.groups = groups
+        self.base_width = base_width
         self.norm_type = norm_type
         self.norm_decay = norm_decay
         self.freeze_norm = freeze_norm
@@ -460,12 +505,12 @@ class ResNet(nn.Layer):
         conv1_name = na.fix_c1_stage_name()
         if variant in ['c', 'd']:
             conv_def = [
-                [3, 32, 3, 2, "conv1_1"],
-                [32, 32, 3, 1, "conv1_2"],
-                [32, 64, 3, 1, "conv1_3"],
+                [3, ch_in // 2, 3, 2, "conv1_1"],
+                [ch_in // 2, ch_in // 2, 3, 1, "conv1_2"],
+                [ch_in // 2, ch_in, 3, 1, "conv1_3"],
             ]
         else:
-            conv_def = [[3, 64, 7, 2, conv1_name]]
+            conv_def = [[3, ch_in, 7, 2, conv1_name]]
         self.conv1 = nn.Sequential()
         for (c_in, c_out, k, s, _name) in conv_def:
             self.conv1.add_sublayer(
@@ -475,20 +520,18 @@ class ResNet(nn.Layer):
                     ch_out=c_out,
                     filter_size=k,
                     stride=s,
-                    name_adapter=na,
                     groups=1,
                     act='relu',
                     norm_type=norm_type,
                     norm_decay=norm_decay,
                     freeze_norm=freeze_norm,
-                    lr=1.0,
-                    name=_name))
+                    lr=1.0))
 
-        ch_in_list = [64, 256, 512, 1024]
+        self.ch_in = ch_in
         ch_out_list = [64, 128, 256, 512]
-        self.expansion = 4 if depth >= 50 else 1
+        block = BottleNeck if depth >= 50 else BasicBlock
 
-        self._out_channels = [self.expansion * v for v in ch_out_list]
+        self._out_channels = [block.expansion * v for v in ch_out_list]
         self._out_strides = [4, 8, 16, 32]
 
         self.res_layers = []
@@ -499,9 +542,8 @@ class ResNet(nn.Layer):
             res_layer = self.add_sublayer(
                 res_name,
                 Blocks(
-                    depth,
-                    ch_in_list[i] // 4
-                    if i > 0 and depth < 50 else ch_in_list[i],
+                    block,
+                    self.ch_in,
                     ch_out_list[i],
                     count=block_nums[i],
                     name_adapter=na,
@@ -509,13 +551,14 @@ class ResNet(nn.Layer):
                     variant=variant,
                     groups=groups,
                     base_width=base_width,
-                    base_channels=base_channels,
                     lr=lr_mult,
                     norm_type=norm_type,
                     norm_decay=norm_decay,
                     freeze_norm=freeze_norm,
-                    dcn_v2=(i in self.dcn_v2_stages)))
+                    dcn_v2=(i in self.dcn_v2_stages),
+                    std_senet=std_senet))
             self.res_layers.append(res_layer)
+            self.ch_in = self._out_channels[i]
 
     @property
     def out_shape(self):
@@ -547,8 +590,9 @@ class Res5Head(nn.Layer):
         if depth < 50:
             feat_in = 256
         na = NameAdapter(self)
+        block = BottleNeck if depth >= 50 else BasicBlock
         self.res5 = Blocks(
-            depth, feat_in, feat_out, count=3, name_adapter=na, stage_num=5)
+            block, feat_in, feat_out, count=3, name_adapter=na, stage_num=5)
         self.feat_out = feat_out if depth < 50 else feat_out * 4
 
     @property
diff --git a/ppdet/modeling/backbones/senet.py b/ppdet/modeling/backbones/senet.py
new file mode 100644
index 0000000000000000000000000000000000000000..a621c69b94df0de9c3f34445d6a48b8a61640177
--- /dev/null
+++ b/ppdet/modeling/backbones/senet.py
@@ -0,0 +1,140 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. 
+#   
+# Licensed under the Apache License, Version 2.0 (the "License");   
+# you may not use this file except in compliance with the License.  
+# You may obtain a copy of the License at   
+#   
+#     http://www.apache.org/licenses/LICENSE-2.0    
+#   
+# Unless required by applicable law or agreed to in writing, software   
+# distributed under the License is distributed on an "AS IS" BASIS, 
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  
+# See the License for the specific language governing permissions and   
+# limitations under the License.
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+from ppdet.core.workspace import register, serializable
+from .resnet import ResNet, Blocks, BasicBlock, BottleNeck
+
+__all__ = ['SENet', 'SERes5Head']
+
+
+@register
+@serializable
+class SENet(ResNet):
+    __shared__ = ['norm_type']
+
+    def __init__(self,
+                 depth=50,
+                 variant='b',
+                 lr_mult_list=[1.0, 1.0, 1.0, 1.0],
+                 groups=1,
+                 base_width=64,
+                 norm_type='bn',
+                 norm_decay=0,
+                 freeze_norm=True,
+                 freeze_at=0,
+                 return_idx=[0, 1, 2, 3],
+                 dcn_v2_stages=[-1],
+                 std_senet=True,
+                 num_stages=4):
+        """
+        Squeeze-and-Excitation Networks, see https://arxiv.org/abs/1709.01507
+        
+        Args:
+            depth (int): SENet depth, should be 50, 101, 152
+            variant (str): ResNet variant, supports 'a', 'b', 'c', 'd' currently
+            lr_mult_list (list): learning rate ratio of different resnet stages(2,3,4,5),
+                                 lower learning rate ratio is need for pretrained model 
+                                 got using distillation(default as [1.0, 1.0, 1.0, 1.0]).
+            groups (int): group convolution cardinality
+            base_width (int): base width of each group convolution
+            norm_type (str): normalization type, 'bn', 'sync_bn' or 'affine_channel'
+            norm_decay (float): weight decay for normalization layer weights
+            freeze_norm (bool): freeze normalization layers
+            freeze_at (int): freeze the backbone at which stage
+            return_idx (list): index of the stages whose feature maps are returned
+            dcn_v2_stages (list): index of stages who select deformable conv v2
+            std_senet (bool): whether use senet, default True
+            num_stages (int): total num of stages
+        """
+
+        super(SENet, self).__init__(
+            depth=depth,
+            variant=variant,
+            lr_mult_list=lr_mult_list,
+            ch_in=128,
+            groups=groups,
+            base_width=base_width,
+            norm_type=norm_type,
+            norm_decay=norm_decay,
+            freeze_norm=freeze_norm,
+            freeze_at=freeze_at,
+            return_idx=return_idx,
+            dcn_v2_stages=dcn_v2_stages,
+            std_senet=std_senet,
+            num_stages=num_stages)
+
+
+@register
+class SERes5Head(nn.Layer):
+    def __init__(self,
+                 depth=50,
+                 variant='b',
+                 lr_mult=1.0,
+                 groups=1,
+                 base_width=64,
+                 norm_type='bn',
+                 norm_decay=0,
+                 dcn_v2=False,
+                 freeze_norm=False,
+                 std_senet=True):
+        """
+        SERes5Head layer
+
+        Args:
+            depth (int): SENet depth, should be 50, 101, 152
+            variant (str): ResNet variant, supports 'a', 'b', 'c', 'd' currently
+            lr_mult (list): learning rate ratio of SERes5Head, default as 1.0.
+            groups (int): group convolution cardinality
+            base_width (int): base width of each group convolution
+            norm_type (str): normalization type, 'bn', 'sync_bn' or 'affine_channel'
+            norm_decay (float): weight decay for normalization layer weights
+            dcn_v2_stages (list): index of stages who select deformable conv v2
+            std_senet (bool): whether use senet, default True
+            
+        """
+        super(SERes5Head, self).__init__()
+        ch_out = 512
+        ch_in = 256 if depth < 50 else 1024
+        na = NameAdapter(self)
+        block = BottleNeck if depth >= 50 else BasicBlock
+        self.res5 = Blocks(
+            block,
+            ch_in,
+            ch_out,
+            count=3,
+            name_adapter=na,
+            stage_num=5,
+            variant=variant,
+            groups=groups,
+            base_width=base_width,
+            lr=lr_mult,
+            norm_type=norm_type,
+            norm_decay=norm_decay,
+            freeze_norm=freeze_norm,
+            dcn_v2=dcn_v2,
+            std_senet=std_senet)
+        self.ch_out = ch_out * block.expansion
+
+    @property
+    def out_shape(self):
+        return [ShapeSpec(
+            channels=self.ch_out,
+            stride=16, )]
+
+    def forward(self, roi_feat):
+        y = self.res5(roi_feat)
+        return y
diff --git a/ppdet/modeling/bbox_utils.py b/ppdet/modeling/bbox_utils.py
index 9d90831b6fcdcfccae89a0d30a8e3248f00b58bc..c77a5aec50406e257acef33235c6a2fce544c5cb 100644
--- a/ppdet/modeling/bbox_utils.py
+++ b/ppdet/modeling/bbox_utils.py
@@ -14,6 +14,9 @@
 
 import math
 import paddle
+import paddle.nn.functional as F
+import math
+import numpy as np
 
 
 def bbox2delta(src_boxes, tgt_boxes, weights):
@@ -111,6 +114,16 @@ def bbox_area(boxes):
 
 
 def bbox_overlaps(boxes1, boxes2):
+    """
+    Calculate overlaps between boxes1 and boxes2
+
+    Args:
+        boxes1 (Tensor): boxes with shape [M, 4]
+        boxes2 (Tensor): boxes with shape [N, 4]
+
+    Return:
+        overlaps (Tensor): overlaps between boxes1 and boxes2 with shape [M, N]
+    """
     area1 = bbox_area(boxes1)
     area2 = bbox_area(boxes2)
 
@@ -126,3 +139,390 @@ def bbox_overlaps(boxes1, boxes2):
                             (paddle.unsqueeze(area1, 1) + area2 - inter),
                             paddle.zeros_like(inter))
     return overlaps
+
+
+def xywh2xyxy(box):
+    x, y, w, h = box
+    x1 = x - w * 0.5
+    y1 = y - h * 0.5
+    x2 = x + w * 0.5
+    y2 = y + h * 0.5
+    return [x1, y1, x2, y2]
+
+
+def make_grid(h, w, dtype):
+    yv, xv = paddle.meshgrid([paddle.arange(h), paddle.arange(w)])
+    return paddle.stack((xv, yv), 2).cast(dtype=dtype)
+
+
+def decode_yolo(box, anchor, downsample_ratio):
+    """decode yolo box
+
+    Args:
+        box (list): [x, y, w, h], all have the shape [b, na, h, w, 1]
+        anchor (list): anchor with the shape [na, 2]
+        downsample_ratio (int): downsample ratio, default 32
+        scale (float): scale, default 1.
+
+    Return:
+        box (list): decoded box, [x, y, w, h], all have the shape [b, na, h, w, 1]
+    """
+    x, y, w, h = box
+    na, grid_h, grid_w = x.shape[1:4]
+    grid = make_grid(grid_h, grid_w, x.dtype).reshape((1, 1, grid_h, grid_w, 2))
+    x1 = (x + grid[:, :, :, :, 0:1]) / grid_w
+    y1 = (y + grid[:, :, :, :, 1:2]) / grid_h
+
+    anchor = paddle.to_tensor(anchor)
+    anchor = paddle.cast(anchor, x.dtype)
+    anchor = anchor.reshape((1, na, 1, 1, 2))
+    w1 = paddle.exp(w) * anchor[:, :, :, :, 0:1] / (downsample_ratio * grid_w)
+    h1 = paddle.exp(h) * anchor[:, :, :, :, 1:2] / (downsample_ratio * grid_h)
+
+    return [x1, y1, w1, h1]
+
+
+def iou_similarity(box1, box2, eps=1e-9):
+    """Calculate iou of box1 and box2
+
+    Args:
+        box1 (Tensor): box with the shape [N, M1, 4]
+        box2 (Tensor): box with the shape [N, M2, 4]
+
+    Return:
+        iou (Tensor): iou between box1 and box2 with the shape [N, M1, M2]
+    """
+    box1 = box1.unsqueeze(2)  # [N, M1, 4] -> [N, M1, 1, 4]
+    box2 = box2.unsqueeze(1)  # [N, M2, 4] -> [N, 1, M2, 4]
+    px1y1, px2y2 = box1[:, :, :, 0:2], box1[:, :, :, 2:4]
+    gx1y1, gx2y2 = box2[:, :, :, 0:2], box2[:, :, :, 2:4]
+    x1y1 = paddle.maximum(px1y1, gx1y1)
+    x2y2 = paddle.minimum(px2y2, gx2y2)
+    overlap = (x2y2 - x1y1).clip(0).prod(-1)
+    area1 = (px2y2 - px1y1).clip(0).prod(-1)
+    area2 = (gx2y2 - gx1y1).clip(0).prod(-1)
+    union = area1 + area2 - overlap + eps
+    return overlap / union
+
+
+def bbox_iou(box1, box2, giou=False, diou=False, ciou=False, eps=1e-9):
+    """calculate the iou of box1 and box2
+
+    Args:
+        box1 (list): [x, y, w, h], all have the shape [b, na, h, w, 1]
+        box2 (list): [x, y, w, h], all have the shape [b, na, h, w, 1]
+        giou (bool): whether use giou or not, default False
+        diou (bool): whether use diou or not, default False
+        ciou (bool): whether use ciou or not, default False
+        eps (float): epsilon to avoid divide by zero
+
+    Return:
+        iou (Tensor): iou of box1 and box1, with the shape [b, na, h, w, 1]
+    """
+    px1, py1, px2, py2 = box1
+    gx1, gy1, gx2, gy2 = box2
+    x1 = paddle.maximum(px1, gx1)
+    y1 = paddle.maximum(py1, gy1)
+    x2 = paddle.minimum(px2, gx2)
+    y2 = paddle.minimum(py2, gy2)
+
+    overlap = ((x2 - x1).clip(0)) * ((y2 - y1).clip(0))
+
+    area1 = (px2 - px1) * (py2 - py1)
+    area1 = area1.clip(0)
+
+    area2 = (gx2 - gx1) * (gy2 - gy1)
+    area2 = area2.clip(0)
+
+    union = area1 + area2 - overlap + eps
+    iou = overlap / union
+
+    if giou or ciou or diou:
+        # convex w, h
+        cw = paddle.maximum(px2, gx2) - paddle.minimum(px1, gx1)
+        ch = paddle.maximum(py2, gy2) - paddle.minimum(py1, gy1)
+        if giou:
+            c_area = cw * ch + eps
+            return iou - (c_area - union) / c_area
+        else:
+            # convex diagonal squared
+            c2 = cw**2 + ch**2 + eps
+            # center distance
+            rho2 = ((px1 + px2 - gx1 - gx2)**2 + (py1 + py2 - gy1 - gy2)**2) / 4
+            if diou:
+                return iou - rho2 / c2
+            else:
+                w1, h1 = px2 - px1, py2 - py1 + eps
+                w2, h2 = gx2 - gx1, gy2 - gy1 + eps
+                delta = paddle.atan(w1 / h1) - paddle.atan(w2 / h2)
+                v = (4 / math.pi**2) * paddle.pow(delta, 2)
+                alpha = v / (1 + eps - iou + v)
+                alpha.stop_gradient = True
+                return iou - (rho2 / c2 + v * alpha)
+    else:
+        return iou
+
+
+def rect2rbox(bboxes):
+    """
+    :param bboxes: shape (n, 4) (xmin, ymin, xmax, ymax)
+    :return: dbboxes: shape (n, 5) (x_ctr, y_ctr, w, h, angle)
+    """
+    bboxes = bboxes.reshape(-1, 4)
+    num_boxes = bboxes.shape[0]
+
+    x_ctr = (bboxes[:, 2] + bboxes[:, 0]) / 2.0
+    y_ctr = (bboxes[:, 3] + bboxes[:, 1]) / 2.0
+    edges1 = np.abs(bboxes[:, 2] - bboxes[:, 0])
+    edges2 = np.abs(bboxes[:, 3] - bboxes[:, 1])
+    angles = np.zeros([num_boxes], dtype=bboxes.dtype)
+
+    inds = edges1 < edges2
+
+    rboxes = np.stack((x_ctr, y_ctr, edges1, edges2, angles), axis=1)
+    rboxes[inds, 2] = edges2[inds]
+    rboxes[inds, 3] = edges1[inds]
+    rboxes[inds, 4] = np.pi / 2.0
+    return rboxes
+
+
+def delta2rbox(Rrois,
+               deltas,
+               means=[0, 0, 0, 0, 0],
+               stds=[1, 1, 1, 1, 1],
+               wh_ratio_clip=1e-6):
+    """
+    :param Rrois: (cx, cy, w, h, theta)
+    :param deltas: (dx, dy, dw, dh, dtheta)
+    :param means:
+    :param stds:
+    :param wh_ratio_clip:
+    :return:
+    """
+    means = paddle.to_tensor(means)
+    stds = paddle.to_tensor(stds)
+    deltas = paddle.reshape(deltas, [-1, deltas.shape[-1]])
+    denorm_deltas = deltas * stds + means
+
+    dx = denorm_deltas[:, 0]
+    dy = denorm_deltas[:, 1]
+    dw = denorm_deltas[:, 2]
+    dh = denorm_deltas[:, 3]
+    dangle = denorm_deltas[:, 4]
+
+    max_ratio = np.abs(np.log(wh_ratio_clip))
+    dw = paddle.clip(dw, min=-max_ratio, max=max_ratio)
+    dh = paddle.clip(dh, min=-max_ratio, max=max_ratio)
+
+    Rroi_x = Rrois[:, 0]
+    Rroi_y = Rrois[:, 1]
+    Rroi_w = Rrois[:, 2]
+    Rroi_h = Rrois[:, 3]
+    Rroi_angle = Rrois[:, 4]
+
+    gx = dx * Rroi_w * paddle.cos(Rroi_angle) - dy * Rroi_h * paddle.sin(
+        Rroi_angle) + Rroi_x
+    gy = dx * Rroi_w * paddle.sin(Rroi_angle) + dy * Rroi_h * paddle.cos(
+        Rroi_angle) + Rroi_y
+    gw = Rroi_w * dw.exp()
+    gh = Rroi_h * dh.exp()
+    ga = np.pi * dangle + Rroi_angle
+    ga = (ga + np.pi / 4) % np.pi - np.pi / 4
+    ga = paddle.to_tensor(ga)
+
+    gw = paddle.to_tensor(gw, dtype='float32')
+    gh = paddle.to_tensor(gh, dtype='float32')
+    bboxes = paddle.stack([gx, gy, gw, gh, ga], axis=-1)
+    return bboxes
+
+
+def rbox2delta(proposals, gt, means=[0, 0, 0, 0, 0], stds=[1, 1, 1, 1, 1]):
+    """
+
+    Args:
+        proposals:
+        gt:
+        means: 1x5
+        stds: 1x5
+
+    Returns:
+
+    """
+    proposals = proposals.astype(np.float64)
+
+    PI = np.pi
+
+    gt_widths = gt[..., 2]
+    gt_heights = gt[..., 3]
+    gt_angle = gt[..., 4]
+
+    proposals_widths = proposals[..., 2]
+    proposals_heights = proposals[..., 3]
+    proposals_angle = proposals[..., 4]
+
+    coord = gt[..., 0:2] - proposals[..., 0:2]
+    dx = (np.cos(proposals[..., 4]) * coord[..., 0] + np.sin(proposals[..., 4])
+          * coord[..., 1]) / proposals_widths
+    dy = (-np.sin(proposals[..., 4]) * coord[..., 0] + np.cos(proposals[..., 4])
+          * coord[..., 1]) / proposals_heights
+    dw = np.log(gt_widths / proposals_widths)
+    dh = np.log(gt_heights / proposals_heights)
+    da = (gt_angle - proposals_angle)
+
+    da = (da + PI / 4) % PI - PI / 4
+    da /= PI
+
+    deltas = np.stack([dx, dy, dw, dh, da], axis=-1)
+    means = np.array(means, dtype=deltas.dtype)
+    stds = np.array(stds, dtype=deltas.dtype)
+    deltas = (deltas - means) / stds
+    deltas = deltas.astype(np.float32)
+    return deltas
+
+
+def bbox_decode(bbox_preds,
+                anchors,
+                means=[0, 0, 0, 0, 0],
+                stds=[1, 1, 1, 1, 1]):
+    """decode bbox from deltas
+    Args:
+        bbox_preds: [N,H,W,5]
+        anchors: [H*W,5]
+    return:
+        bboxes: [N,H,W,5]
+    """
+    means = paddle.to_tensor(means)
+    stds = paddle.to_tensor(stds)
+    num_imgs, H, W, _ = bbox_preds.shape
+    bboxes_list = []
+    for img_id in range(num_imgs):
+        bbox_pred = bbox_preds[img_id]
+        # bbox_pred.shape=[5,H,W]
+        bbox_delta = bbox_pred
+        anchors = paddle.to_tensor(anchors)
+        bboxes = delta2rbox(
+            anchors, bbox_delta, means, stds, wh_ratio_clip=1e-6)
+        bboxes = paddle.reshape(bboxes, [H, W, 5])
+        bboxes_list.append(bboxes)
+    return paddle.stack(bboxes_list, axis=0)
+
+
+def poly_to_rbox(polys):
+    """
+    poly:[x0,y0,x1,y1,x2,y2,x3,y3]
+    to
+    rotated_boxes:[x_ctr,y_ctr,w,h,angle]
+    """
+    rotated_boxes = []
+    for poly in polys:
+        poly = np.array(poly[:8], dtype=np.float32)
+
+        pt1 = (poly[0], poly[1])
+        pt2 = (poly[2], poly[3])
+        pt3 = (poly[4], poly[5])
+        pt4 = (poly[6], poly[7])
+
+        edge1 = np.sqrt((pt1[0] - pt2[0]) * (pt1[0] - pt2[0]) + (pt1[1] - pt2[
+            1]) * (pt1[1] - pt2[1]))
+        edge2 = np.sqrt((pt2[0] - pt3[0]) * (pt2[0] - pt3[0]) + (pt2[1] - pt3[
+            1]) * (pt2[1] - pt3[1]))
+
+        width = max(edge1, edge2)
+        height = min(edge1, edge2)
+
+        rbox_angle = 0
+        if edge1 > edge2:
+            rbox_angle = np.arctan2(
+                np.float(pt2[1] - pt1[1]), np.float(pt2[0] - pt1[0]))
+        elif edge2 >= edge1:
+            rbox_angle = np.arctan2(
+                np.float(pt4[1] - pt1[1]), np.float(pt4[0] - pt1[0]))
+
+        def norm_angle(angle, range=[-np.pi / 4, np.pi]):
+            return (angle - range[0]) % range[1] + range[0]
+
+        rbox_angle = norm_angle(rbox_angle)
+
+        x_ctr = np.float(pt1[0] + pt3[0]) / 2
+        y_ctr = np.float(pt1[1] + pt3[1]) / 2
+        rotated_box = np.array([x_ctr, y_ctr, width, height, rbox_angle])
+        rotated_boxes.append(rotated_box)
+    ret_rotated_boxes = np.array(rotated_boxes)
+    assert ret_rotated_boxes.shape[1] == 5
+    return ret_rotated_boxes
+
+
+def cal_line_length(point1, point2):
+    import math
+    return math.sqrt(
+        math.pow(point1[0] - point2[0], 2) + math.pow(point1[1] - point2[1], 2))
+
+
+def get_best_begin_point_single(coordinate):
+    x1, y1, x2, y2, x3, y3, x4, y4 = coordinate
+    xmin = min(x1, x2, x3, x4)
+    ymin = min(y1, y2, y3, y4)
+    xmax = max(x1, x2, x3, x4)
+    ymax = max(y1, y2, y3, y4)
+    combinate = [[[x1, y1], [x2, y2], [x3, y3], [x4, y4]],
+                 [[x4, y4], [x1, y1], [x2, y2], [x3, y3]],
+                 [[x3, y3], [x4, y4], [x1, y1], [x2, y2]],
+                 [[x2, y2], [x3, y3], [x4, y4], [x1, y1]]]
+    dst_coordinate = [[xmin, ymin], [xmax, ymin], [xmax, ymax], [xmin, ymax]]
+    force = 100000000.0
+    force_flag = 0
+    for i in range(4):
+        temp_force = cal_line_length(combinate[i][0], dst_coordinate[0]) \
+                     + cal_line_length(combinate[i][1], dst_coordinate[1]) \
+                     + cal_line_length(combinate[i][2], dst_coordinate[2]) \
+                     + cal_line_length(combinate[i][3], dst_coordinate[3])
+        if temp_force < force:
+            force = temp_force
+            force_flag = i
+    if force_flag != 0:
+        pass
+    return np.array(combinate[force_flag]).reshape(8)
+
+
+def rbox2poly_single(rrect):
+    """
+    rrect:[x_ctr,y_ctr,w,h,angle]
+    to
+    poly:[x0,y0,x1,y1,x2,y2,x3,y3]
+    """
+    x_ctr, y_ctr, width, height, angle = rrect[:5]
+    tl_x, tl_y, br_x, br_y = -width / 2, -height / 2, width / 2, height / 2
+    # rect 2x4
+    rect = np.array([[tl_x, br_x, br_x, tl_x], [tl_y, tl_y, br_y, br_y]])
+    R = np.array([[np.cos(angle), -np.sin(angle)],
+                  [np.sin(angle), np.cos(angle)]])
+    # poly
+    poly = R.dot(rect)
+    x0, x1, x2, x3 = poly[0, :4] + x_ctr
+    y0, y1, y2, y3 = poly[1, :4] + y_ctr
+    poly = np.array([x0, y0, x1, y1, x2, y2, x3, y3], dtype=np.float32)
+    poly = get_best_begin_point_single(poly)
+    return poly
+
+
+def rbox2poly(rrects):
+    """
+    rrect:[x_ctr,y_ctr,w,h,angle]
+    to
+    poly:[x0,y0,x1,y1,x2,y2,x3,y3]
+    """
+    polys = []
+    for rrect in rrects:
+        x_ctr, y_ctr, width, height, angle = rrect[:5]
+        tl_x, tl_y, br_x, br_y = -width / 2, -height / 2, width / 2, height / 2
+        rect = np.array([[tl_x, br_x, br_x, tl_x], [tl_y, tl_y, br_y, br_y]])
+        R = np.array([[np.cos(angle), -np.sin(angle)],
+                      [np.sin(angle), np.cos(angle)]])
+        poly = R.dot(rect)
+        x0, x1, x2, x3 = poly[0, :4] + x_ctr
+        y0, y1, y2, y3 = poly[1, :4] + y_ctr
+        poly = np.array([x0, y0, x1, y1, x2, y2, x3, y3], dtype=np.float32)
+        poly = get_best_begin_point_single(poly)
+        polys.append(poly)
+    polys = np.array(polys)
+    return polys
diff --git a/ppdet/modeling/heads/__init__.py b/ppdet/modeling/heads/__init__.py
index 9ed5fc2e3e5e8b8989084d1f1b3eb6dac93d24ff..9263aa812f7c112f21615649be1f7a946a16f83d 100644
--- a/ppdet/modeling/heads/__init__.py
+++ b/ppdet/modeling/heads/__init__.py
@@ -22,6 +22,7 @@ from . import solov2_head
 from . import ttf_head
 from . import cascade_head
 from . import face_head
+from . import s2anet_head
 
 from .bbox_head import *
 from .mask_head import *
@@ -33,3 +34,4 @@ from .solov2_head import *
 from .ttf_head import *
 from .cascade_head import *
 from .face_head import *
+from .s2anet_head import *
diff --git a/ppdet/modeling/heads/bbox_head.py b/ppdet/modeling/heads/bbox_head.py
index a6480961cd1b6a4cdcbfa29b3143bf48e25eb8ec..09796372ef81a911543374ff68b7bf16d7e64b53 100644
--- a/ppdet/modeling/heads/bbox_head.py
+++ b/ppdet/modeling/heads/bbox_head.py
@@ -12,6 +12,8 @@
 # See the License for the specific language governing permissions and   
 # limitations under the License.
 
+import numpy as np
+
 import paddle
 import paddle.nn as nn
 import paddle.nn.functional as F
@@ -31,31 +33,40 @@ __all__ = ['TwoFCHead', 'XConvNormHead', 'BBoxHead']
 
 @register
 class TwoFCHead(nn.Layer):
-    def __init__(self, in_dim=256, mlp_dim=1024, resolution=7):
+    """
+    RCNN bbox head with Two fc layers to extract feature
+
+    Args:
+        in_channel (int): Input channel which can be derived by from_config
+        out_channel (int): Output channel
+        resolution (int): Resolution of input feature map, default 7
+    """
+
+    def __init__(self, in_channel=256, out_channel=1024, resolution=7):
         super(TwoFCHead, self).__init__()
-        self.in_dim = in_dim
-        self.mlp_dim = mlp_dim
-        fan = in_dim * resolution * resolution
+        self.in_channel = in_channel
+        self.out_channel = out_channel
+        fan = in_channel * resolution * resolution
         self.fc6 = nn.Linear(
-            in_dim * resolution * resolution,
-            mlp_dim,
+            in_channel * resolution * resolution,
+            out_channel,
             weight_attr=paddle.ParamAttr(
                 initializer=XavierUniform(fan_out=fan)))
 
         self.fc7 = nn.Linear(
-            mlp_dim,
-            mlp_dim,
+            out_channel,
+            out_channel,
             weight_attr=paddle.ParamAttr(initializer=XavierUniform()))
 
     @classmethod
     def from_config(cls, cfg, input_shape):
         s = input_shape
         s = s[0] if isinstance(s, (list, tuple)) else s
-        return {'in_dim': s.channels}
+        return {'in_channel': s.channels}
 
     @property
     def out_shape(self):
-        return [ShapeSpec(channels=self.mlp_dim, )]
+        return [ShapeSpec(channels=self.out_channel, )]
 
     def forward(self, rois_feat):
         rois_feat = paddle.flatten(rois_feat, start_axis=1, stop_axis=-1)
@@ -68,34 +79,36 @@ class TwoFCHead(nn.Layer):
 
 @register
 class XConvNormHead(nn.Layer):
+    __shared__ = ['norm_type', 'freeze_norm']
     """
     RCNN bbox head with serveral convolution layers
+
     Args:
-        in_dim(int): num of channels for the input rois_feat
-        num_convs(int): num of convolution layers for the rcnn bbox head
-        conv_dim(int): num of channels for the conv layers
-        mlp_dim(int): num of channels for the fc layers
-        resolution(int): resolution of the rois_feat
-        norm_type(str): norm type, 'gn' by defalut
-        freeze_norm(bool): whether to freeze the norm
-        stage_name(str): used in CascadeXConvNormHead, '' by default
+        in_channel (int): Input channels which can be derived by from_config
+        num_convs (int): The number of conv layers
+        conv_dim (int): The number of channels for the conv layers
+        out_channel (int): Output channels
+        resolution (int): Resolution of input feature map
+        norm_type (string): Norm type, bn, gn, sync_bn are available, 
+            default `gn`
+        freeze_norm (bool): Whether to freeze the norm
+        stage_name (string): Prefix name for conv layer,  '' by default
     """
-    __shared__ = ['norm_type', 'freeze_norm']
 
     def __init__(self,
-                 in_dim=256,
+                 in_channel=256,
                  num_convs=4,
                  conv_dim=256,
-                 mlp_dim=1024,
+                 out_channel=1024,
                  resolution=7,
                  norm_type='gn',
                  freeze_norm=False,
                  stage_name=''):
         super(XConvNormHead, self).__init__()
-        self.in_dim = in_dim
+        self.in_channel = in_channel
         self.num_convs = num_convs
         self.conv_dim = conv_dim
-        self.mlp_dim = mlp_dim
+        self.out_channel = out_channel
         self.norm_type = norm_type
         self.freeze_norm = freeze_norm
 
@@ -103,7 +116,7 @@ class XConvNormHead(nn.Layer):
         fan = conv_dim * 3 * 3
         initializer = KaimingNormal(fan_in=fan)
         for i in range(self.num_convs):
-            in_c = in_dim if i == 0 else conv_dim
+            in_c = in_channel if i == 0 else conv_dim
             head_conv_name = stage_name + 'bbox_head_conv{}'.format(i)
             head_conv = self.add_sublayer(
                 head_conv_name,
@@ -113,16 +126,14 @@ class XConvNormHead(nn.Layer):
                     filter_size=3,
                     stride=1,
                     norm_type=self.norm_type,
-                    norm_name=head_conv_name + '_norm',
                     freeze_norm=self.freeze_norm,
-                    initializer=initializer,
-                    name=head_conv_name))
+                    initializer=initializer))
             self.bbox_head_convs.append(head_conv)
 
         fan = conv_dim * resolution * resolution
         self.fc6 = nn.Linear(
             conv_dim * resolution * resolution,
-            mlp_dim,
+            out_channel,
             weight_attr=paddle.ParamAttr(
                 initializer=XavierUniform(fan_out=fan)),
             bias_attr=paddle.ParamAttr(
@@ -132,11 +143,11 @@ class XConvNormHead(nn.Layer):
     def from_config(cls, cfg, input_shape):
         s = input_shape
         s = s[0] if isinstance(s, (list, tuple)) else s
-        return {'in_dim': s.channels}
+        return {'in_channel': s.channels}
 
     @property
     def out_shape(self):
-        return [ShapeSpec(channels=self.mlp_dim, )]
+        return [ShapeSpec(channels=self.out_channel, )]
 
     def forward(self, rois_feat):
         for i in range(self.num_convs):
@@ -149,16 +160,19 @@ class XConvNormHead(nn.Layer):
 @register
 class BBoxHead(nn.Layer):
     __shared__ = ['num_classes']
-    __inject__ = ['bbox_assigner']
+    __inject__ = ['bbox_assigner', 'bbox_loss']
     """
-    head (nn.Layer): Extract feature in bbox head
-    in_channel (int): Input channel after RoI extractor
-    roi_extractor (object): The module of RoI Extractor
-    bbox_assigner (object): The module of Box Assigner, label and sample the 
-                            box.
-    with_pool (bool): Whether to use pooling for the RoI feature.
-    num_classes (int): The number of classes
-    bbox_weight (List[float]): The weight to get the decode box 
+    RCNN bbox head
+
+    Args:
+        head (nn.Layer): Extract feature in bbox head
+        in_channel (int): Input channel after RoI extractor
+        roi_extractor (object): The module of RoI Extractor
+        bbox_assigner (object): The module of Box Assigner, label and sample the 
+            box.
+        with_pool (bool): Whether to use pooling for the RoI feature.
+        num_classes (int): The number of classes
+        bbox_weight (List[float]): The weight to get the decode box 
     """
 
     def __init__(self,
@@ -168,7 +182,8 @@ class BBoxHead(nn.Layer):
                  bbox_assigner='BboxAssigner',
                  with_pool=False,
                  num_classes=80,
-                 bbox_weight=[10., 10., 5., 5.]):
+                 bbox_weight=[10., 10., 5., 5.],
+                 bbox_loss=None):
         super(BBoxHead, self).__init__()
         self.head = head
         self.roi_extractor = roi_extractor
@@ -179,6 +194,7 @@ class BBoxHead(nn.Layer):
         self.with_pool = with_pool
         self.num_classes = num_classes
         self.bbox_weight = bbox_weight
+        self.bbox_loss = bbox_loss
 
         self.bbox_score = nn.Linear(
             in_channel,
@@ -293,14 +309,51 @@ class BBoxHead(nn.Layer):
         reg_target = paddle.gather(reg_target, fg_inds)
         reg_target.stop_gradient = True
 
-        loss_bbox_reg = paddle.abs(reg_delta - reg_target).sum(
-        ) / tgt_labels.shape[0]
+        if self.bbox_loss is not None:
+            reg_delta = self.bbox_transform(reg_delta)
+            reg_target = self.bbox_transform(reg_target)
+            loss_bbox_reg = self.bbox_loss(
+                reg_delta, reg_target).sum() / tgt_labels.shape[0]
+            loss_bbox_reg *= self.num_classes
+        else:
+            loss_bbox_reg = paddle.abs(reg_delta - reg_target).sum(
+            ) / tgt_labels.shape[0]
 
         loss_bbox[cls_name] = loss_bbox_cls * loss_weight
         loss_bbox[reg_name] = loss_bbox_reg * loss_weight
 
         return loss_bbox
 
+    def bbox_transform(self, deltas, weights=[0.1, 0.1, 0.2, 0.2]):
+        wx, wy, ww, wh = weights
+
+        deltas = paddle.reshape(deltas, shape=(0, -1, 4))
+
+        dx = paddle.slice(deltas, axes=[2], starts=[0], ends=[1]) * wx
+        dy = paddle.slice(deltas, axes=[2], starts=[1], ends=[2]) * wy
+        dw = paddle.slice(deltas, axes=[2], starts=[2], ends=[3]) * ww
+        dh = paddle.slice(deltas, axes=[2], starts=[3], ends=[4]) * wh
+
+        dw = paddle.clip(dw, -1.e10, np.log(1000. / 16))
+        dh = paddle.clip(dh, -1.e10, np.log(1000. / 16))
+
+        pred_ctr_x = dx
+        pred_ctr_y = dy
+        pred_w = paddle.exp(dw)
+        pred_h = paddle.exp(dh)
+
+        x1 = pred_ctr_x - 0.5 * pred_w
+        y1 = pred_ctr_y - 0.5 * pred_h
+        x2 = pred_ctr_x + 0.5 * pred_w
+        y2 = pred_ctr_y + 0.5 * pred_h
+
+        x1 = paddle.reshape(x1, shape=(-1, ))
+        y1 = paddle.reshape(y1, shape=(-1, ))
+        x2 = paddle.reshape(x2, shape=(-1, ))
+        y2 = paddle.reshape(y2, shape=(-1, ))
+
+        return paddle.concat([x1, y1, x2, y2])
+
     def get_prediction(self, score, delta):
         bbox_prob = F.softmax(score)
         return delta, bbox_prob
diff --git a/ppdet/modeling/heads/cascade_head.py b/ppdet/modeling/heads/cascade_head.py
index 99c43c83e90232de63663f872b95b88e0c81f1a5..aed59661bdfde4011882a3ec97b7ec9f42fec1d4 100644
--- a/ppdet/modeling/heads/cascade_head.py
+++ b/ppdet/modeling/heads/cascade_head.py
@@ -32,32 +32,41 @@ __all__ = ['CascadeTwoFCHead', 'CascadeXConvNormHead', 'CascadeHead']
 @register
 class CascadeTwoFCHead(nn.Layer):
     __shared__ = ['num_cascade_stage']
+    """
+    Cascade RCNN bbox head  with Two fc layers to extract feature
+
+    Args:
+        in_channel (int): Input channel which can be derived by from_config
+        out_channel (int): Output channel
+        resolution (int): Resolution of input feature map, default 7
+        num_cascade_stage (int): The number of cascade stage, default 3
+    """
 
     def __init__(self,
-                 in_dim=256,
-                 mlp_dim=1024,
+                 in_channel=256,
+                 out_channel=1024,
                  resolution=7,
                  num_cascade_stage=3):
         super(CascadeTwoFCHead, self).__init__()
 
-        self.in_dim = in_dim
-        self.mlp_dim = mlp_dim
+        self.in_channel = in_channel
+        self.out_channel = out_channel
 
         self.head_list = []
         for stage in range(num_cascade_stage):
             head_per_stage = self.add_sublayer(
-                str(stage), TwoFCHead(in_dim, mlp_dim, resolution))
+                str(stage), TwoFCHead(in_channel, out_channel, resolution))
             self.head_list.append(head_per_stage)
 
     @classmethod
     def from_config(cls, cfg, input_shape):
         s = input_shape
         s = s[0] if isinstance(s, (list, tuple)) else s
-        return {'in_dim': s.channels}
+        return {'in_channel': s.channels}
 
     @property
     def out_shape(self):
-        return [ShapeSpec(channels=self.mlp_dim, )]
+        return [ShapeSpec(channels=self.out_channel, )]
 
     def forward(self, rois_feat, stage=0):
         out = self.head_list[stage](rois_feat)
@@ -67,29 +76,43 @@ class CascadeTwoFCHead(nn.Layer):
 @register
 class CascadeXConvNormHead(nn.Layer):
     __shared__ = ['norm_type', 'freeze_norm', 'num_cascade_stage']
+    """
+    Cascade RCNN bbox head with serveral convolution layers
+
+    Args:
+        in_channel (int): Input channels which can be derived by from_config
+        num_convs (int): The number of conv layers
+        conv_dim (int): The number of channels for the conv layers
+        out_channel (int): Output channels
+        resolution (int): Resolution of input feature map
+        norm_type (string): Norm type, bn, gn, sync_bn are available, 
+            default `gn`
+        freeze_norm (bool): Whether to freeze the norm
+        num_cascade_stage (int): The number of cascade stage, default 3
+    """
 
     def __init__(self,
-                 in_dim=256,
+                 in_channel=256,
                  num_convs=4,
                  conv_dim=256,
-                 mlp_dim=1024,
+                 out_channel=1024,
                  resolution=7,
                  norm_type='gn',
                  freeze_norm=False,
                  num_cascade_stage=3):
         super(CascadeXConvNormHead, self).__init__()
-        self.in_dim = in_dim
-        self.mlp_dim = mlp_dim
+        self.in_channel = in_channel
+        self.out_channel = out_channel
 
         self.head_list = []
         for stage in range(num_cascade_stage):
             head_per_stage = self.add_sublayer(
                 str(stage),
                 XConvNormHead(
-                    in_dim,
+                    in_channel,
                     num_convs,
                     conv_dim,
-                    mlp_dim,
+                    out_channel,
                     resolution,
                     norm_type,
                     freeze_norm,
@@ -100,11 +123,11 @@ class CascadeXConvNormHead(nn.Layer):
     def from_config(cls, cfg, input_shape):
         s = input_shape
         s = s[0] if isinstance(s, (list, tuple)) else s
-        return {'in_dim': s.channels}
+        return {'in_channel': s.channels}
 
     @property
     def out_shape(self):
-        return [ShapeSpec(channels=self.mlp_dim, )]
+        return [ShapeSpec(channels=self.out_channel, )]
 
     def forward(self, rois_feat, stage=0):
         out = self.head_list[stage](rois_feat)
@@ -114,18 +137,20 @@ class CascadeXConvNormHead(nn.Layer):
 @register
 class CascadeHead(BBoxHead):
     __shared__ = ['num_classes', 'num_cascade_stages']
-    __inject__ = ['bbox_assigner']
+    __inject__ = ['bbox_assigner', 'bbox_loss']
     """
-    head (nn.Layer): Extract feature in bbox head
-    in_channel (int): Input channel after RoI extractor
-    roi_extractor (object): The module of RoI Extractor
-    bbox_assigner (object): The module of Box Assigner, label and sample the 
-                            box.
-    num_classes (int): The number of classes
-    bbox_weight (List[List[float]]): The weight to get the decode box and the 
-                                     length of weight is the number of cascade 
-                                     stage
-    num_cascade_stages (int): THe number of stage to refine the box
+    Cascade RCNN bbox head
+
+    Args:
+        head (nn.Layer): Extract feature in bbox head
+        in_channel (int): Input channel after RoI extractor
+        roi_extractor (object): The module of RoI Extractor
+        bbox_assigner (object): The module of Box Assigner, label and sample the 
+            box.
+        num_classes (int): The number of classes
+        bbox_weight (List[List[float]]): The weight to get the decode box and the 
+            length of weight is the number of cascade stage
+        num_cascade_stages (int): THe number of stage to refine the box
     """
 
     def __init__(self,
@@ -136,7 +161,8 @@ class CascadeHead(BBoxHead):
                  num_classes=80,
                  bbox_weight=[[10., 10., 5., 5.], [20.0, 20.0, 10.0, 10.0],
                               [30.0, 30.0, 15.0, 15.0]],
-                 num_cascade_stages=3):
+                 num_cascade_stages=3,
+                 bbox_loss=None):
         nn.Layer.__init__(self, )
         self.head = head
         self.roi_extractor = roi_extractor
@@ -147,6 +173,7 @@ class CascadeHead(BBoxHead):
         self.num_classes = num_classes
         self.bbox_weight = bbox_weight
         self.num_cascade_stages = num_cascade_stages
+        self.bbox_loss = bbox_loss
 
         self.bbox_score_list = []
         self.bbox_delta_list = []
diff --git a/ppdet/modeling/heads/fcos_head.py b/ppdet/modeling/heads/fcos_head.py
index 1776d8c3810784df3d1052109c91d70fc5e4b675..3b8fd7f785d77ee8c18576cc4d7d71b44e86c509 100644
--- a/ppdet/modeling/heads/fcos_head.py
+++ b/ppdet/modeling/heads/fcos_head.py
@@ -28,6 +28,10 @@ from ppdet.modeling.layers import ConvNormLayer
 
 
 class ScaleReg(nn.Layer):
+    """
+    Parameter for scaling the regression outputs.
+    """
+
     def __init__(self):
         super(ScaleReg, self).__init__()
         self.scale_reg = self.create_parameter(
@@ -77,10 +81,8 @@ class FCOSFeat(nn.Layer):
                     stride=1,
                     norm_type=norm_type,
                     use_dcn=use_dcn,
-                    norm_name=cls_conv_name + '_norm',
                     bias_on=True,
-                    lr_scale=2.,
-                    name=cls_conv_name))
+                    lr_scale=2.))
             self.cls_subnet_convs.append(cls_conv)
 
             reg_conv_name = 'fcos_head_reg_tower_conv_{}'.format(i)
@@ -93,10 +95,8 @@ class FCOSFeat(nn.Layer):
                     stride=1,
                     norm_type=norm_type,
                     use_dcn=use_dcn,
-                    norm_name=reg_conv_name + '_norm',
                     bias_on=True,
-                    lr_scale=2.,
-                    name=reg_conv_name))
+                    lr_scale=2.))
             self.reg_subnet_convs.append(reg_conv)
 
     def forward(self, fpn_feat):
@@ -113,12 +113,13 @@ class FCOSHead(nn.Layer):
     """
     FCOSHead
     Args:
-        num_classes(int): Number of classes
-        fpn_stride(list): The stride of each FPN Layer
-        prior_prob(float): Used to set the bias init for the class prediction layer
-        fcos_loss(object): Instance of 'FCOSLoss'
-        norm_reg_targets(bool): Normalization the regression target if true
-        centerness_on_reg(bool): The prediction of centerness on regression or clssification branch
+        fcos_feat (object): Instance of 'FCOSFeat'
+        num_classes (int): Number of classes
+        fpn_stride (list): The stride of each FPN Layer
+        prior_prob (float): Used to set the bias init for the class prediction layer
+        fcos_loss (object): Instance of 'FCOSLoss'
+        norm_reg_targets (bool): Normalization the regression target if true
+        centerness_on_reg (bool): The prediction of centerness on regression or clssification branch
     """
     __inject__ = ['fcos_feat', 'fcos_loss']
     __shared__ = ['num_classes']
@@ -199,7 +200,15 @@ class FCOSHead(nn.Layer):
             scale_reg = self.add_sublayer(feat_name, ScaleReg())
             self.scales_regs.append(scale_reg)
 
-    def _compute_locatioins_by_level(self, fpn_stride, feature):
+    def _compute_locations_by_level(self, fpn_stride, feature):
+        """
+        Compute locations of anchor points of each FPN layer
+        Args:
+            fpn_stride (int): The stride of current FPN feature map
+            feature (Tensor): Tensor of current FPN feature map
+        Return:
+            Anchor points locations of current FPN feature map
+        """
         shape_fm = paddle.shape(feature)
         shape_fm.stop_gradient = True
         h, w = shape_fm[2], shape_fm[3]
@@ -247,8 +256,7 @@ class FCOSHead(nn.Layer):
         if not is_training:
             locations_list = []
             for fpn_stride, feature in zip(self.fpn_stride, fpn_feats):
-                location = self._compute_locatioins_by_level(fpn_stride,
-                                                             feature)
+                location = self._compute_locations_by_level(fpn_stride, feature)
                 locations_list.append(location)
 
             return locations_list, cls_logits_list, bboxes_reg_list, centerness_list
diff --git a/ppdet/modeling/heads/mask_head.py b/ppdet/modeling/heads/mask_head.py
index dc624ff838e8b9dcb66e024fcbf83fcdbb08cf4a..e5df8d234e1696456eca945a7a732437a1917106 100644
--- a/ppdet/modeling/heads/mask_head.py
+++ b/ppdet/modeling/heads/mask_head.py
@@ -27,18 +27,29 @@ from .roi_extractor import RoIAlign
 
 @register
 class MaskFeat(nn.Layer):
+    """
+    Feature extraction in Mask head
+
+    Args:
+        in_channel (int): Input channels
+        out_channel (int): Output channels
+        num_convs (int): The number of conv layers, default 4
+        norm_type (string | None): Norm type, bn, gn, sync_bn are available,
+            default None
+    """
+
     def __init__(self,
+                 in_channel=256,
+                 out_channel=256,
                  num_convs=4,
-                 in_channels=256,
-                 out_channels=256,
                  norm_type=None):
         super(MaskFeat, self).__init__()
         self.num_convs = num_convs
-        self.in_channels = in_channels
-        self.out_channels = out_channels
+        self.in_channel = in_channel
+        self.out_channel = out_channel
         self.norm_type = norm_type
-        fan_conv = out_channels * 3 * 3
-        fan_deconv = out_channels * 2 * 2
+        fan_conv = out_channel * 3 * 3
+        fan_deconv = out_channel * 2 * 2
 
         mask_conv = nn.Sequential()
         if norm_type == 'gn':
@@ -47,33 +58,30 @@ class MaskFeat(nn.Layer):
                 mask_conv.add_sublayer(
                     conv_name,
                     ConvNormLayer(
-                        ch_in=in_channels if i == 0 else out_channels,
-                        ch_out=out_channels,
+                        ch_in=in_channel if i == 0 else out_channel,
+                        ch_out=out_channel,
                         filter_size=3,
                         stride=1,
                         norm_type=self.norm_type,
-                        norm_name=conv_name + '_norm',
-                        initializer=KaimingNormal(fan_in=fan_conv),
-                        name=conv_name))
+                        initializer=KaimingNormal(fan_in=fan_conv)))
                 mask_conv.add_sublayer(conv_name + 'act', nn.ReLU())
         else:
             for i in range(self.num_convs):
                 conv_name = 'mask_inter_feat_{}'.format(i + 1)
-                mask_conv.add_sublayer(
-                    conv_name,
-                    nn.Conv2D(
-                        in_channels=in_channels if i == 0 else out_channels,
-                        out_channels=out_channels,
-                        kernel_size=3,
-                        padding=1,
-                        weight_attr=paddle.ParamAttr(
-                            initializer=KaimingNormal(fan_in=fan_conv))))
+                conv = nn.Conv2D(
+                    in_channels=in_channel if i == 0 else out_channel,
+                    out_channels=out_channel,
+                    kernel_size=3,
+                    padding=1,
+                    weight_attr=paddle.ParamAttr(
+                        initializer=KaimingNormal(fan_in=fan_conv)))
+                mask_conv.add_sublayer(conv_name, conv)
                 mask_conv.add_sublayer(conv_name + 'act', nn.ReLU())
         mask_conv.add_sublayer(
             'conv5_mask',
             nn.Conv2DTranspose(
-                in_channels=self.in_channels,
-                out_channels=self.out_channels,
+                in_channels=self.in_channel,
+                out_channels=self.out_channel,
                 kernel_size=2,
                 stride=2,
                 weight_attr=paddle.ParamAttr(
@@ -85,10 +93,10 @@ class MaskFeat(nn.Layer):
     def from_config(cls, cfg, input_shape):
         if isinstance(input_shape, (list, tuple)):
             input_shape = input_shape[0]
-        return {'in_channels': input_shape.channels, }
+        return {'in_channel': input_shape.channels, }
 
-    def out_channel(self):
-        return self.out_channels
+    def out_channels(self):
+        return self.out_channel
 
     def forward(self, feats):
         return self.upsample(feats)
@@ -98,6 +106,18 @@ class MaskFeat(nn.Layer):
 class MaskHead(nn.Layer):
     __shared__ = ['num_classes']
     __inject__ = ['mask_assigner']
+    """
+    RCNN mask head
+
+    Args:
+        head (nn.Layer): Extract feature in mask head
+        roi_extractor (object): The module of RoI Extractor
+        mask_assigner (object): The module of Mask Assigner, 
+            label and sample the mask
+        num_classes (int): The number of classes
+        share_bbox_feat (bool): Whether to share the feature from bbox head,
+            default false
+    """
 
     def __init__(self,
                  head,
@@ -112,7 +132,7 @@ class MaskHead(nn.Layer):
         if isinstance(roi_extractor, dict):
             self.roi_extractor = RoIAlign(**roi_extractor)
         self.head = head
-        self.in_channels = head.out_channel()
+        self.in_channels = head.out_channels()
         self.mask_assigner = mask_assigner
         self.share_bbox_feat = share_bbox_feat
         self.bbox_head = None
@@ -159,7 +179,6 @@ class MaskHead(nn.Layer):
         rois_num (Tensor): The number of proposals for each batch
         inputs (dict): ground truth info
         """
-        #assert self.bbox_head
         tgt_labels, _, tgt_gt_inds = targets
         rois, rois_num, tgt_classes, tgt_masks, mask_index, tgt_weights = self.mask_assigner(
             rois, tgt_labels, tgt_gt_inds, inputs)
diff --git a/ppdet/modeling/heads/roi_extractor.py b/ppdet/modeling/heads/roi_extractor.py
index 1e2f658a7b1a67a9b86a3ca881177bc74ecde737..35c3924e36c60ddbc82f38f6b828197e31833b01 100644
--- a/ppdet/modeling/heads/roi_extractor.py
+++ b/ppdet/modeling/heads/roi_extractor.py
@@ -25,6 +25,31 @@ def _to_list(v):
 
 @register
 class RoIAlign(object):
+    """
+    RoI Align module
+
+    For more details, please refer to the document of roi_align in
+    in ppdet/modeing/ops.py
+
+    Args:
+        resolution (int): The output size, default 14
+        spatial_scale (float): Multiplicative spatial scale factor to translate
+            ROI coords from their input scale to the scale used when pooling.
+            default 0.0625
+        sampling_ratio (int): The number of sampling points in the interpolation
+            grid, default 0
+        canconical_level (int): The referring level of FPN layer with 
+            specified level. default 4
+        canonical_size (int): The referring scale of FPN layer with 
+            specified scale. default 224
+        start_level (int): The start level of FPN layer to extract RoI feature,
+            default 0
+        end_level (int): The end level of FPN layer to extract RoI feature,
+            default 3
+        aligned (bool): Whether to add offset to rois' coord in roi_align.
+            default false
+    """
+
     def __init__(self,
                  resolution=14,
                  spatial_scale=0.0625,
diff --git a/ppdet/modeling/heads/rpn_head.py b/ppdet/modeling/heads/rpn_head.py
deleted file mode 100644
index 64f7acc495326d4edbbff389e5351f602e67f0de..0000000000000000000000000000000000000000
--- a/ppdet/modeling/heads/rpn_head.py
+++ /dev/null
@@ -1,115 +0,0 @@
-# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. 
-#   
-# Licensed under the Apache License, Version 2.0 (the "License");   
-# you may not use this file except in compliance with the License.  
-# You may obtain a copy of the License at   
-#   
-#     http://www.apache.org/licenses/LICENSE-2.0    
-#   
-# Unless required by applicable law or agreed to in writing, software   
-# distributed under the License is distributed on an "AS IS" BASIS, 
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  
-# See the License for the specific language governing permissions and   
-# limitations under the License.
-
-import paddle
-import paddle.nn as nn
-import paddle.nn.functional as F
-from paddle import ParamAttr
-from paddle.nn.initializer import Normal
-from paddle.regularizer import L2Decay
-from paddle.nn import Conv2D
-
-from ppdet.core.workspace import register
-from ppdet.modeling import ops
-
-
-@register
-class RPNFeat(nn.Layer):
-    def __init__(self, feat_in=1024, feat_out=1024):
-        super(RPNFeat, self).__init__()
-        # rpn feat is shared with each level
-        self.rpn_conv = Conv2D(
-            in_channels=feat_in,
-            out_channels=feat_out,
-            kernel_size=3,
-            padding=1,
-            weight_attr=ParamAttr(initializer=Normal(
-                mean=0., std=0.01)),
-            bias_attr=ParamAttr(
-                learning_rate=2., regularizer=L2Decay(0.)))
-
-    def forward(self, inputs, feats):
-        rpn_feats = []
-        for feat in feats:
-            rpn_feats.append(F.relu(self.rpn_conv(feat)))
-        return rpn_feats
-
-
-@register
-class RPNHead(nn.Layer):
-    __inject__ = ['rpn_feat']
-
-    def __init__(self, rpn_feat, anchor_per_position=15, rpn_channel=1024):
-        super(RPNHead, self).__init__()
-        self.rpn_feat = rpn_feat
-        if isinstance(rpn_feat, dict):
-            self.rpn_feat = RPNFeat(**rpn_feat)
-        # rpn head is shared with each level
-        # rpn roi classification scores
-        self.rpn_rois_score = Conv2D(
-            in_channels=rpn_channel,
-            out_channels=anchor_per_position,
-            kernel_size=1,
-            padding=0,
-            weight_attr=ParamAttr(initializer=Normal(
-                mean=0., std=0.01)),
-            bias_attr=ParamAttr(
-                learning_rate=2., regularizer=L2Decay(0.)))
-
-        # rpn roi bbox regression deltas
-        self.rpn_rois_delta = Conv2D(
-            in_channels=rpn_channel,
-            out_channels=4 * anchor_per_position,
-            kernel_size=1,
-            padding=0,
-            weight_attr=ParamAttr(initializer=Normal(
-                mean=0., std=0.01)),
-            bias_attr=ParamAttr(
-                learning_rate=2., regularizer=L2Decay(0.)))
-
-    def forward(self, inputs, feats):
-        rpn_feats = self.rpn_feat(inputs, feats)
-        rpn_head_out = []
-        for rpn_feat in rpn_feats:
-            rrs = self.rpn_rois_score(rpn_feat)
-            rrd = self.rpn_rois_delta(rpn_feat)
-            rpn_head_out.append((rrs, rrd))
-        return rpn_feats, rpn_head_out
-
-    def get_loss(self, loss_inputs):
-        # cls loss
-        score_tgt = paddle.cast(
-            x=loss_inputs['rpn_score_target'], dtype='float32')
-        score_tgt.stop_gradient = True
-        loss_rpn_cls = ops.sigmoid_cross_entropy_with_logits(
-            input=loss_inputs['rpn_score_pred'], label=score_tgt)
-        loss_rpn_cls = paddle.mean(loss_rpn_cls, name='loss_rpn_cls')
-
-        # reg loss
-        loc_tgt = paddle.cast(x=loss_inputs['rpn_rois_target'], dtype='float32')
-        loc_tgt.stop_gradient = True
-        loss_rpn_reg = ops.smooth_l1(
-            input=loss_inputs['rpn_rois_pred'],
-            label=loc_tgt,
-            inside_weight=loss_inputs['rpn_rois_weight'],
-            outside_weight=loss_inputs['rpn_rois_weight'],
-            sigma=3.0, )
-        loss_rpn_reg = paddle.sum(loss_rpn_reg)
-        score_shape = paddle.shape(score_tgt)
-        score_shape = paddle.cast(score_shape, dtype='float32')
-        norm = paddle.prod(score_shape)
-        norm.stop_gradient = True
-        loss_rpn_reg = loss_rpn_reg / norm
-
-        return {'loss_rpn_cls': loss_rpn_cls, 'loss_rpn_reg': loss_rpn_reg}
diff --git a/ppdet/modeling/heads/s2anet_head.py b/ppdet/modeling/heads/s2anet_head.py
new file mode 100644
index 0000000000000000000000000000000000000000..12e0c3144622d2cdb6c3f70f7cdf9d2c4c2483f9
--- /dev/null
+++ b/ppdet/modeling/heads/s2anet_head.py
@@ -0,0 +1,872 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import paddle
+from paddle import ParamAttr
+import paddle.nn as nn
+import paddle.nn.functional as F
+from paddle.nn.initializer import Normal, Constant
+from ppdet.core.workspace import register
+from ppdet.modeling import ops
+from ppdet.modeling import bbox_utils
+from ppdet.modeling.proposal_generator.target_layer import RBoxAssigner
+import numpy as np
+
+
+class S2ANetAnchorGenerator(object):
+    """
+    S2ANetAnchorGenerator by np
+    """
+
+    def __init__(self,
+                 base_size=8,
+                 scales=1.0,
+                 ratios=1.0,
+                 scale_major=True,
+                 ctr=None):
+        self.base_size = base_size
+        self.scales = scales
+        self.ratios = ratios
+        self.scale_major = scale_major
+        self.ctr = ctr
+        self.base_anchors = self.gen_base_anchors()
+
+    @property
+    def num_base_anchors(self):
+        return self.base_anchors.shape[0]
+
+    def gen_base_anchors(self):
+        w = self.base_size
+        h = self.base_size
+        if self.ctr is None:
+            x_ctr = 0.5 * (w - 1)
+            y_ctr = 0.5 * (h - 1)
+        else:
+            x_ctr, y_ctr = self.ctr
+
+        h_ratios = np.sqrt(self.ratios)
+        w_ratios = 1 / h_ratios
+        if self.scale_major:
+            ws = (w * w_ratios[:] * self.scales[:]).reshape([-1])
+            hs = (h * h_ratios[:] * self.scales[:]).reshape([-1])
+        else:
+            ws = (w * self.scales[:] * w_ratios[:]).reshape([-1])
+            hs = (h * self.scales[:] * h_ratios[:]).reshape([-1])
+
+        # yapf: disable
+        base_anchors = np.stack(
+            [
+                x_ctr - 0.5 * (ws - 1), y_ctr - 0.5 * (hs - 1),
+                x_ctr + 0.5 * (ws - 1), y_ctr + 0.5 * (hs - 1)
+            ],
+            axis=-1)
+        base_anchors = np.round(base_anchors)
+        # yapf: enable
+
+        return base_anchors
+
+    def _meshgrid(self, x, y, row_major=True):
+        xx, yy = np.meshgrid(x, y)
+        xx = xx.reshape(-1)
+        yy = yy.reshape(-1)
+        if row_major:
+            return xx, yy
+        else:
+            return yy, xx
+
+    def grid_anchors(self, featmap_size, stride=16):
+        # featmap_size*stride project it to original area
+        base_anchors = self.base_anchors
+        feat_h, feat_w = featmap_size
+        shift_x = np.arange(0, feat_w, 1, 'int32') * stride
+        shift_y = np.arange(0, feat_h, 1, 'int32') * stride
+        shift_xx, shift_yy = self._meshgrid(shift_x, shift_y)
+        shifts = np.stack([shift_xx, shift_yy, shift_xx, shift_yy], axis=-1)
+        # shifts = shifts.type_as(base_anchors)
+        # first feat_w elements correspond to the first row of shifts
+        # add A anchors (1, A, 4) to K shifts (K, 1, 4) to get
+        # shifted anchors (K, A, 4), reshape to (K*A, 4)
+
+        #all_anchors = base_anchors[:, :] + shifts[:, :]
+        all_anchors = base_anchors[None, :, :] + shifts[:, None, :]
+        # all_anchors = all_anchors.reshape([-1, 4])
+        # first A rows correspond to A anchors of (0, 0) in feature map,
+        # then (0, 1), (0, 2), ...
+        return all_anchors
+
+    def valid_flags(self, featmap_size, valid_size):
+        feat_h, feat_w = featmap_size
+        valid_h, valid_w = valid_size
+        assert valid_h <= feat_h and valid_w <= feat_w
+        valid_x = np.zeros([feat_w], dtype='uint8')
+        valid_y = np.zeros([feat_h], dtype='uint8')
+        valid_x[:valid_w] = 1
+        valid_y[:valid_h] = 1
+        valid_xx, valid_yy = self._meshgrid(valid_x, valid_y)
+        valid = valid_xx & valid_yy
+        valid = valid.reshape([-1])
+
+        # valid = valid[:, None].expand(
+        #    [valid.size(0), self.num_base_anchors]).reshape([-1])
+        return valid
+
+
+class AlignConv(nn.Layer):
+    def __init__(self, in_channels, out_channels, kernel_size=3, groups=1):
+        super(AlignConv, self).__init__()
+        self.kernel_size = kernel_size
+        self.align_conv = paddle.vision.ops.DeformConv2D(
+            in_channels,
+            out_channels,
+            kernel_size=self.kernel_size,
+            padding=(self.kernel_size - 1) // 2,
+            groups=groups,
+            weight_attr=ParamAttr(initializer=Normal(0, 0.01)),
+            bias_attr=None)
+
+    @paddle.no_grad()
+    def get_offset(self, anchors, featmap_size, stride):
+        """
+        Args:
+            anchors: [M,5] xc,yc,w,h,angle
+            featmap_size: (feat_h, feat_w)
+            stride: 8
+        Returns:
+
+        """
+        anchors = paddle.reshape(anchors, [-1, 5])  # (NA,5)
+        dtype = anchors.dtype
+        feat_h, feat_w = featmap_size
+        pad = (self.kernel_size - 1) // 2
+        idx = paddle.arange(-pad, pad + 1, dtype=dtype)
+
+        yy, xx = paddle.meshgrid(idx, idx)
+        xx = paddle.reshape(xx, [-1])
+        yy = paddle.reshape(yy, [-1])
+
+        # get sampling locations of default conv
+        xc = paddle.arange(0, feat_w, dtype=dtype)
+        yc = paddle.arange(0, feat_h, dtype=dtype)
+        yc, xc = paddle.meshgrid(yc, xc)
+
+        xc = paddle.reshape(xc, [-1, 1])
+        yc = paddle.reshape(yc, [-1, 1])
+        x_conv = xc + xx
+        y_conv = yc + yy
+
+        # get sampling locations of anchors
+        # x_ctr, y_ctr, w, h, a = np.unbind(anchors, dim=1)
+        x_ctr = anchors[:, 0]
+        y_ctr = anchors[:, 1]
+        w = anchors[:, 2]
+        h = anchors[:, 3]
+        a = anchors[:, 4]
+
+        x_ctr = paddle.reshape(x_ctr, [x_ctr.shape[0], 1])
+        y_ctr = paddle.reshape(y_ctr, [y_ctr.shape[0], 1])
+        w = paddle.reshape(w, [w.shape[0], 1])
+        h = paddle.reshape(h, [h.shape[0], 1])
+        a = paddle.reshape(a, [a.shape[0], 1])
+
+        x_ctr = x_ctr / stride
+        y_ctr = y_ctr / stride
+        w_s = w / stride
+        h_s = h / stride
+        cos, sin = paddle.cos(a), paddle.sin(a)
+        dw, dh = w_s / self.kernel_size, h_s / self.kernel_size
+        x, y = dw * xx, dh * yy
+        xr = cos * x - sin * y
+        yr = sin * x + cos * y
+        x_anchor, y_anchor = xr + x_ctr, yr + y_ctr
+        # get offset filed
+        offset_x = x_anchor - x_conv
+        offset_y = y_anchor - y_conv
+        # x, y in anchors is opposite in image coordinates,
+        # so we stack them with y, x other than x, y
+        offset = paddle.stack([offset_y, offset_x], axis=-1)
+        # NA,ks*ks*2
+        # [NA, ks, ks, 2] --> [NA, ks*ks*2]
+        offset = paddle.reshape(offset, [offset.shape[0], -1])
+        # [NA, ks*ks*2] --> [ks*ks*2, NA]
+        offset = paddle.transpose(offset, [1, 0])
+        # [NA, ks*ks*2] --> [1, ks*ks*2, H, W]
+        offset = paddle.reshape(offset, [1, -1, feat_h, feat_w])
+        return offset
+
+    def forward(self, x, refine_anchors, stride):
+        featmap_size = (x.shape[2], x.shape[3])
+        offset = self.get_offset(refine_anchors, featmap_size, stride)
+        x = F.relu(self.align_conv(x, offset))
+        return x
+
+
+@register
+class S2ANetHead(nn.Layer):
+    """
+    S2Anet head
+    Args:
+        stacked_convs (int): number of stacked_convs
+        feat_in (int): input channels of feat
+        feat_out (int): output channels of feat
+        num_classes (int): num_classes
+        anchor_strides (list): stride of anchors
+        anchor_scales (list): scale of anchors
+        anchor_ratios (list): ratios of anchors
+        target_means (list): target_means
+        target_stds (list): target_stds
+        align_conv_type (str): align_conv_type ['Conv', 'AlignConv']
+        align_conv_size (int): kernel size of align_conv
+        use_sigmoid_cls (bool): use sigmoid_cls or not
+        reg_loss_weight (list): loss weight for regression
+    """
+    __shared__ = ['num_classes']
+    __inject__ = ['anchor_assign']
+
+    def __init__(self,
+                 stacked_convs=2,
+                 feat_in=256,
+                 feat_out=256,
+                 num_classes=15,
+                 anchor_strides=[8, 16, 32, 64, 128],
+                 anchor_scales=[4],
+                 anchor_ratios=[1.0],
+                 target_means=(.0, .0, .0, .0, .0),
+                 target_stds=(1.0, 1.0, 1.0, 1.0, 1.0),
+                 align_conv_type='AlignConv',
+                 align_conv_size=3,
+                 use_sigmoid_cls=True,
+                 anchor_assign=RBoxAssigner().__dict__,
+                 reg_loss_weight=[1.0, 1.0, 1.0, 1.0, 1.0]):
+        super(S2ANetHead, self).__init__()
+        self.stacked_convs = stacked_convs
+        self.feat_in = feat_in
+        self.feat_out = feat_out
+        self.anchor_list = None
+        self.anchor_scales = anchor_scales
+        self.anchor_ratios = anchor_ratios
+        self.anchor_strides = anchor_strides
+        self.anchor_base_sizes = list(anchor_strides)
+        self.target_means = target_means
+        self.target_stds = target_stds
+        assert align_conv_type in ['AlignConv', 'Conv', 'DCN']
+        self.align_conv_type = align_conv_type
+        self.align_conv_size = align_conv_size
+
+        self.use_sigmoid_cls = use_sigmoid_cls
+        self.cls_out_channels = num_classes if self.use_sigmoid_cls else 1
+        self.sampling = False
+        self.anchor_assign = anchor_assign
+        self.reg_loss_weight = reg_loss_weight
+
+        self.s2anet_head_out = None
+
+        # anchor
+        self.anchor_generators = []
+        for anchor_base in self.anchor_base_sizes:
+            self.anchor_generators.append(
+                S2ANetAnchorGenerator(anchor_base, anchor_scales,
+                                      anchor_ratios))
+
+        self.fam_cls_convs = nn.Sequential()
+        self.fam_reg_convs = nn.Sequential()
+
+        for i in range(self.stacked_convs):
+            chan_in = self.feat_in if i == 0 else self.feat_out
+
+            self.fam_cls_convs.add_sublayer(
+                'fam_cls_conv_{}'.format(i),
+                nn.Conv2D(
+                    in_channels=chan_in,
+                    out_channels=self.feat_out,
+                    kernel_size=3,
+                    padding=1,
+                    weight_attr=ParamAttr(initializer=Normal(0.0, 0.01)),
+                    bias_attr=ParamAttr(initializer=Constant(0))))
+
+            self.fam_cls_convs.add_sublayer('fam_cls_conv_{}_act'.format(i),
+                                            nn.ReLU())
+
+            self.fam_reg_convs.add_sublayer(
+                'fam_reg_conv_{}'.format(i),
+                nn.Conv2D(
+                    in_channels=chan_in,
+                    out_channels=self.feat_out,
+                    kernel_size=3,
+                    padding=1,
+                    weight_attr=ParamAttr(initializer=Normal(0.0, 0.01)),
+                    bias_attr=ParamAttr(initializer=Constant(0))))
+
+            self.fam_reg_convs.add_sublayer('fam_reg_conv_{}_act'.format(i),
+                                            nn.ReLU())
+
+        self.fam_reg = nn.Conv2D(
+            self.feat_out,
+            5,
+            1,
+            weight_attr=ParamAttr(initializer=Normal(0.0, 0.01)),
+            bias_attr=ParamAttr(initializer=Constant(0)))
+        prior_prob = 0.01
+        bias_init = float(-np.log((1 - prior_prob) / prior_prob))
+        self.fam_cls = nn.Conv2D(
+            self.feat_out,
+            self.cls_out_channels,
+            1,
+            weight_attr=ParamAttr(initializer=Normal(0.0, 0.01)),
+            bias_attr=ParamAttr(initializer=Constant(bias_init)))
+
+        if self.align_conv_type == "AlignConv":
+            self.align_conv = AlignConv(self.feat_out, self.feat_out,
+                                        self.align_conv_size)
+        elif self.align_conv_type == "Conv":
+            self.align_conv = nn.Conv2D(
+                self.feat_out,
+                self.feat_out,
+                self.align_conv_size,
+                padding=(self.align_conv_size - 1) // 2,
+                bias_attr=ParamAttr(initializer=Constant(0)))
+
+        elif self.align_conv_type == "DCN":
+            self.align_conv_offset = nn.Conv2D(
+                self.feat_out,
+                2 * self.align_conv_size**2,
+                1,
+                weight_attr=ParamAttr(initializer=Normal(0.0, 0.01)),
+                bias_attr=ParamAttr(initializer=Constant(0)))
+
+            self.align_conv = paddle.vision.ops.DeformConv2D(
+                self.feat_out,
+                self.feat_out,
+                self.align_conv_size,
+                padding=(self.align_conv_size - 1) // 2,
+                weight_attr=ParamAttr(initializer=Normal(0.0, 0.01)),
+                bias_attr=False)
+
+        self.or_conv = nn.Conv2D(
+            self.feat_out,
+            self.feat_out,
+            kernel_size=3,
+            padding=1,
+            weight_attr=ParamAttr(initializer=Normal(0.0, 0.01)),
+            bias_attr=ParamAttr(initializer=Constant(0)))
+
+        # ODM
+        self.odm_cls_convs = nn.Sequential()
+        self.odm_reg_convs = nn.Sequential()
+
+        for i in range(self.stacked_convs):
+            ch_in = self.feat_out
+            # ch_in = int(self.feat_out / 8) if i == 0 else self.feat_out
+
+            self.odm_cls_convs.add_sublayer(
+                'odm_cls_conv_{}'.format(i),
+                nn.Conv2D(
+                    in_channels=ch_in,
+                    out_channels=self.feat_out,
+                    kernel_size=3,
+                    stride=1,
+                    padding=1,
+                    weight_attr=ParamAttr(initializer=Normal(0.0, 0.01)),
+                    bias_attr=ParamAttr(initializer=Constant(0))))
+
+            self.odm_cls_convs.add_sublayer('odm_cls_conv_{}_act'.format(i),
+                                            nn.ReLU())
+
+            self.odm_reg_convs.add_sublayer(
+                'odm_reg_conv_{}'.format(i),
+                nn.Conv2D(
+                    in_channels=self.feat_out,
+                    out_channels=self.feat_out,
+                    kernel_size=3,
+                    stride=1,
+                    padding=1,
+                    weight_attr=ParamAttr(initializer=Normal(0.0, 0.01)),
+                    bias_attr=ParamAttr(initializer=Constant(0))))
+
+            self.odm_reg_convs.add_sublayer('odm_reg_conv_{}_act'.format(i),
+                                            nn.ReLU())
+
+        self.odm_cls = nn.Conv2D(
+            self.feat_out,
+            self.cls_out_channels,
+            3,
+            padding=1,
+            weight_attr=ParamAttr(initializer=Normal(0.0, 0.01)),
+            bias_attr=ParamAttr(initializer=Constant(bias_init)))
+        self.odm_reg = nn.Conv2D(
+            self.feat_out,
+            5,
+            3,
+            padding=1,
+            weight_attr=ParamAttr(initializer=Normal(0.0, 0.01)),
+            bias_attr=ParamAttr(initializer=Constant(0)))
+
+        self.base_anchors = dict()
+        self.featmap_sizes = dict()
+        self.base_anchors = dict()
+        self.refine_anchor_list = []
+
+    def forward(self, feats):
+        fam_reg_branch_list = []
+        fam_cls_branch_list = []
+
+        odm_reg_branch_list = []
+        odm_cls_branch_list = []
+
+        self.featmap_sizes = dict()
+        self.base_anchors = dict()
+        self.refine_anchor_list = []
+
+        for i, feat in enumerate(feats):
+            fam_cls_feat = self.fam_cls_convs(feat)
+
+            fam_cls = self.fam_cls(fam_cls_feat)
+            # [N, CLS, H, W] --> [N, H, W, CLS]
+            fam_cls = fam_cls.transpose([0, 2, 3, 1])
+            fam_cls_reshape = paddle.reshape(
+                fam_cls, [fam_cls.shape[0], -1, self.cls_out_channels])
+            fam_cls_branch_list.append(fam_cls_reshape)
+
+            fam_reg_feat = self.fam_reg_convs(feat)
+
+            fam_reg = self.fam_reg(fam_reg_feat)
+            # [N, 5, H, W] --> [N, H, W, 5]
+            fam_reg = fam_reg.transpose([0, 2, 3, 1])
+            fam_reg_reshape = paddle.reshape(fam_reg, [fam_reg.shape[0], -1, 5])
+            fam_reg_branch_list.append(fam_reg_reshape)
+
+            # prepare anchor
+            featmap_size = feat.shape[-2:]
+            self.featmap_sizes[i] = featmap_size
+            init_anchors = self.anchor_generators[i].grid_anchors(
+                featmap_size, self.anchor_strides[i])
+
+            init_anchors = bbox_utils.rect2rbox(init_anchors)
+            self.base_anchors[(i, featmap_size[0])] = init_anchors
+
+            #fam_reg1 = fam_reg
+            #fam_reg1.stop_gradient = True
+            refine_anchor = bbox_utils.bbox_decode(
+                fam_reg.detach(), init_anchors, self.target_means,
+                self.target_stds)
+
+            self.refine_anchor_list.append(refine_anchor)
+
+            if self.align_conv_type == 'AlignConv':
+                align_feat = self.align_conv(feat,
+                                             refine_anchor.clone(),
+                                             self.anchor_strides[i])
+            elif self.align_conv_type == 'DCN':
+                align_offset = self.align_conv_offset(feat)
+                align_feat = self.align_conv(feat, align_offset)
+            elif self.align_conv_type == 'Conv':
+                align_feat = self.align_conv(feat)
+
+            or_feat = self.or_conv(align_feat)
+            odm_reg_feat = or_feat
+            odm_cls_feat = or_feat
+
+            odm_reg_feat = self.odm_reg_convs(odm_reg_feat)
+            odm_cls_feat = self.odm_cls_convs(odm_cls_feat)
+
+            odm_cls_score = self.odm_cls(odm_cls_feat)
+            # [N, CLS, H, W] --> [N, H, W, CLS]
+            odm_cls_score = odm_cls_score.transpose([0, 2, 3, 1])
+            odm_cls_score_reshape = paddle.reshape(
+                odm_cls_score,
+                [odm_cls_score.shape[0], -1, self.cls_out_channels])
+
+            odm_cls_branch_list.append(odm_cls_score_reshape)
+
+            odm_bbox_pred = self.odm_reg(odm_reg_feat)
+            # [N, 5, H, W] --> [N, H, W, 5]
+            odm_bbox_pred = odm_bbox_pred.transpose([0, 2, 3, 1])
+            odm_bbox_pred_reshape = paddle.reshape(
+                odm_bbox_pred, [odm_bbox_pred.shape[0], -1, 5])
+            odm_reg_branch_list.append(odm_bbox_pred_reshape)
+
+        self.s2anet_head_out = (fam_cls_branch_list, fam_reg_branch_list,
+                                odm_cls_branch_list, odm_reg_branch_list)
+        return self.s2anet_head_out
+
+    def get_prediction(self, nms_pre):
+        refine_anchors = self.refine_anchor_list
+        fam_cls_branch_list, fam_reg_branch_list, odm_cls_branch_list, odm_reg_branch_list = self.s2anet_head_out
+        pred_scores, pred_bboxes = self.get_bboxes(
+            odm_cls_branch_list,
+            odm_reg_branch_list,
+            refine_anchors,
+            nms_pre,
+            cls_out_channels=self.cls_out_channels,
+            use_sigmoid_cls=self.use_sigmoid_cls)
+        return pred_scores, pred_bboxes
+
+    def smooth_l1_loss(self, pred, label, delta=1.0 / 9.0):
+        """
+        Args:
+            pred: pred score
+            label: label
+            delta: delta
+        Returns: loss
+        """
+        assert pred.shape == label.shape and label.numel() > 0
+        assert delta > 0
+        diff = paddle.abs(pred - label)
+        loss = paddle.where(diff < delta, 0.5 * diff * diff / delta,
+                            diff - 0.5 * delta)
+        return loss
+
+    def get_fam_loss(self, fam_target, s2anet_head_out):
+        (labels, label_weights, bbox_targets, bbox_weights, pos_inds,
+         neg_inds) = fam_target
+        fam_cls_branch_list, fam_reg_branch_list, odm_cls_branch_list, odm_reg_branch_list = s2anet_head_out
+
+        fam_cls_losses = []
+        fam_bbox_losses = []
+        st_idx = 0
+        featmap_sizes = [self.featmap_sizes[e] for e in self.featmap_sizes]
+        num_total_samples = len(pos_inds) + len(
+            neg_inds) if self.sampling else len(pos_inds)
+        num_total_samples = max(1, num_total_samples)
+
+        for idx, feat_size in enumerate(featmap_sizes):
+            feat_anchor_num = feat_size[0] * feat_size[1]
+
+            # step1:  get data
+            feat_labels = labels[st_idx:st_idx + feat_anchor_num]
+            feat_label_weights = label_weights[st_idx:st_idx + feat_anchor_num]
+
+            feat_bbox_targets = bbox_targets[st_idx:st_idx + feat_anchor_num, :]
+            feat_bbox_weights = bbox_weights[st_idx:st_idx + feat_anchor_num, :]
+            st_idx += feat_anchor_num
+
+            # step2: calc cls loss
+            feat_labels = feat_labels.reshape(-1)
+            feat_label_weights = feat_label_weights.reshape(-1)
+
+            fam_cls_score = fam_cls_branch_list[idx]
+            fam_cls_score = paddle.squeeze(fam_cls_score, axis=0)
+            fam_cls_score1 = fam_cls_score
+
+            # gt_classes 0~14(data), feat_labels 0~14, sigmoid_focal_loss need class>=1
+            feat_labels = paddle.to_tensor(feat_labels)
+            feat_labels_one_hot = paddle.nn.functional.one_hot(
+                feat_labels, self.cls_out_channels + 1)
+            feat_labels_one_hot = feat_labels_one_hot[:, 1:]
+            feat_labels_one_hot.stop_gradient = True
+
+            num_total_samples = paddle.to_tensor(
+                num_total_samples, dtype='float32', stop_gradient=True)
+
+            fam_cls = F.sigmoid_focal_loss(
+                fam_cls_score1,
+                feat_labels_one_hot,
+                normalizer=num_total_samples,
+                reduction='none')
+
+            feat_label_weights = feat_label_weights.reshape(
+                feat_label_weights.shape[0], 1)
+            feat_label_weights = np.repeat(
+                feat_label_weights, self.cls_out_channels, axis=1)
+            feat_label_weights = paddle.to_tensor(
+                feat_label_weights, stop_gradient=True)
+
+            fam_cls = fam_cls * feat_label_weights
+            fam_cls_total = paddle.sum(fam_cls)
+            fam_cls_losses.append(fam_cls_total)
+
+            # step3: regression loss
+            fam_bbox_pred = fam_reg_branch_list[idx]
+            feat_bbox_targets = paddle.to_tensor(
+                feat_bbox_targets, dtype='float32', stop_gradient=True)
+            feat_bbox_targets = paddle.reshape(feat_bbox_targets, [-1, 5])
+
+            fam_bbox_pred = fam_reg_branch_list[idx]
+            fam_bbox_pred = paddle.squeeze(fam_bbox_pred, axis=0)
+            fam_bbox_pred = paddle.reshape(fam_bbox_pred, [-1, 5])
+            fam_bbox = self.smooth_l1_loss(fam_bbox_pred, feat_bbox_targets)
+            loss_weight = paddle.to_tensor(
+                self.reg_loss_weight, dtype='float32', stop_gradient=True)
+            fam_bbox = paddle.multiply(fam_bbox, loss_weight)
+            feat_bbox_weights = paddle.to_tensor(
+                feat_bbox_weights, stop_gradient=True)
+            fam_bbox = fam_bbox * feat_bbox_weights
+            fam_bbox_total = paddle.sum(fam_bbox) / num_total_samples
+
+            fam_bbox_losses.append(fam_bbox_total)
+
+        fam_cls_loss = paddle.add_n(fam_cls_losses)
+        fam_cls_loss = fam_cls_loss * 2.0
+        fam_reg_loss = paddle.add_n(fam_bbox_losses)
+        return fam_cls_loss, fam_reg_loss
+
+    def get_odm_loss(self, odm_target, s2anet_head_out):
+        (labels, label_weights, bbox_targets, bbox_weights, pos_inds,
+         neg_inds) = odm_target
+        fam_cls_branch_list, fam_reg_branch_list, odm_cls_branch_list, odm_reg_branch_list = s2anet_head_out
+
+        odm_cls_losses = []
+        odm_bbox_losses = []
+        st_idx = 0
+        featmap_sizes = [self.featmap_sizes[e] for e in self.featmap_sizes]
+        num_total_samples = len(pos_inds) + len(
+            neg_inds) if self.sampling else len(pos_inds)
+        num_total_samples = max(1, num_total_samples)
+        for idx, feat_size in enumerate(featmap_sizes):
+            feat_anchor_num = feat_size[0] * feat_size[1]
+
+            # step1:  get data
+            feat_labels = labels[st_idx:st_idx + feat_anchor_num]
+            feat_label_weights = label_weights[st_idx:st_idx + feat_anchor_num]
+
+            feat_bbox_targets = bbox_targets[st_idx:st_idx + feat_anchor_num, :]
+            feat_bbox_weights = bbox_weights[st_idx:st_idx + feat_anchor_num, :]
+            st_idx += feat_anchor_num
+
+            # step2: calc cls loss
+            feat_labels = feat_labels.reshape(-1)
+            feat_label_weights = feat_label_weights.reshape(-1)
+
+            odm_cls_score = odm_cls_branch_list[idx]
+            odm_cls_score = paddle.squeeze(odm_cls_score, axis=0)
+            odm_cls_score1 = odm_cls_score
+
+            # gt_classes 0~14(data), feat_labels 0~14, sigmoid_focal_loss need class>=1
+            feat_labels = paddle.to_tensor(feat_labels)
+            feat_labels_one_hot = paddle.nn.functional.one_hot(
+                feat_labels, self.cls_out_channels + 1)
+            feat_labels_one_hot = feat_labels_one_hot[:, 1:]
+            feat_labels_one_hot.stop_gradient = True
+
+            num_total_samples = paddle.to_tensor(
+                num_total_samples, dtype='float32', stop_gradient=True)
+            odm_cls = F.sigmoid_focal_loss(
+                odm_cls_score1,
+                feat_labels_one_hot,
+                normalizer=num_total_samples,
+                reduction='none')
+
+            feat_label_weights = feat_label_weights.reshape(
+                feat_label_weights.shape[0], 1)
+            feat_label_weights = np.repeat(
+                feat_label_weights, self.cls_out_channels, axis=1)
+            feat_label_weights = paddle.to_tensor(feat_label_weights)
+            feat_label_weights.stop_gradient = True
+
+            odm_cls = odm_cls * feat_label_weights
+            odm_cls_total = paddle.sum(odm_cls)
+            odm_cls_losses.append(odm_cls_total)
+
+            # # step3: regression loss
+            feat_bbox_targets = paddle.to_tensor(
+                feat_bbox_targets, dtype='float32')
+            feat_bbox_targets = paddle.reshape(feat_bbox_targets, [-1, 5])
+            feat_bbox_targets.stop_gradient = True
+
+            odm_bbox_pred = odm_reg_branch_list[idx]
+            odm_bbox_pred = paddle.squeeze(odm_bbox_pred, axis=0)
+            odm_bbox_pred = paddle.reshape(odm_bbox_pred, [-1, 5])
+            odm_bbox = self.smooth_l1_loss(odm_bbox_pred, feat_bbox_targets)
+            loss_weight = paddle.to_tensor(
+                self.reg_loss_weight, dtype='float32', stop_gradient=True)
+            odm_bbox = paddle.multiply(odm_bbox, loss_weight)
+            feat_bbox_weights = paddle.to_tensor(
+                feat_bbox_weights, stop_gradient=True)
+            odm_bbox = odm_bbox * feat_bbox_weights
+            odm_bbox_total = paddle.sum(odm_bbox) / num_total_samples
+            odm_bbox_losses.append(odm_bbox_total)
+
+        odm_cls_loss = paddle.add_n(odm_cls_losses)
+        odm_cls_loss = odm_cls_loss * 2.0
+        odm_reg_loss = paddle.add_n(odm_bbox_losses)
+        return odm_cls_loss, odm_reg_loss
+
+    def get_loss(self, inputs):
+        # inputs: im_id image im_shape scale_factor gt_bbox gt_class is_crowd
+
+        # compute loss
+        fam_cls_loss_lst = []
+        fam_reg_loss_lst = []
+        odm_cls_loss_lst = []
+        odm_reg_loss_lst = []
+
+        im_shape = inputs['im_shape']
+        for im_id in range(im_shape.shape[0]):
+            np_im_shape = inputs['im_shape'][im_id].numpy()
+            np_scale_factor = inputs['scale_factor'][im_id].numpy()
+            # data_format: (xc, yc, w, h, theta)
+            gt_bboxes = inputs['gt_rbox'][im_id].numpy()
+            gt_labels = inputs['gt_class'][im_id].numpy()
+            is_crowd = inputs['is_crowd'][im_id].numpy()
+            gt_labels = gt_labels + 1
+
+            # featmap_sizes
+            featmap_sizes = [self.featmap_sizes[e] for e in self.featmap_sizes]
+            anchors_list, valid_flag_list = self.get_init_anchors(featmap_sizes,
+                                                                  np_im_shape)
+            anchors_list_all = []
+            for ii, anchor in enumerate(anchors_list):
+                anchor = anchor.reshape(-1, 4)
+                anchor = bbox_utils.rect2rbox(anchor)
+                anchors_list_all.extend(anchor)
+            anchors_list_all = np.array(anchors_list_all)
+
+            # get im_feat
+            fam_cls_feats_list = [e[im_id] for e in self.s2anet_head_out[0]]
+            fam_reg_feats_list = [e[im_id] for e in self.s2anet_head_out[1]]
+            odm_cls_feats_list = [e[im_id] for e in self.s2anet_head_out[2]]
+            odm_reg_feats_list = [e[im_id] for e in self.s2anet_head_out[3]]
+            im_s2anet_head_out = (fam_cls_feats_list, fam_reg_feats_list,
+                                  odm_cls_feats_list, odm_reg_feats_list)
+
+            # FAM
+            im_fam_target = self.anchor_assign(anchors_list_all, gt_bboxes,
+                                               gt_labels, is_crowd)
+            if im_fam_target is not None:
+                im_fam_cls_loss, im_fam_reg_loss = self.get_fam_loss(
+                    im_fam_target, im_s2anet_head_out)
+                fam_cls_loss_lst.append(im_fam_cls_loss)
+                fam_reg_loss_lst.append(im_fam_reg_loss)
+
+            # ODM
+            refine_anchors_list, valid_flag_list = self.get_refine_anchors(
+                featmap_sizes, image_shape=np_im_shape)
+            refine_anchors_list = np.array(refine_anchors_list)
+            im_odm_target = self.anchor_assign(refine_anchors_list, gt_bboxes,
+                                               gt_labels, is_crowd)
+
+            if im_odm_target is not None:
+                im_odm_cls_loss, im_odm_reg_loss = self.get_odm_loss(
+                    im_odm_target, im_s2anet_head_out)
+                odm_cls_loss_lst.append(im_odm_cls_loss)
+                odm_reg_loss_lst.append(im_odm_reg_loss)
+        fam_cls_loss = paddle.add_n(fam_cls_loss_lst)
+        fam_reg_loss = paddle.add_n(fam_reg_loss_lst)
+        odm_cls_loss = paddle.add_n(odm_cls_loss_lst)
+        odm_reg_loss = paddle.add_n(odm_reg_loss_lst)
+        return {
+            'fam_cls_loss': fam_cls_loss,
+            'fam_reg_loss': fam_reg_loss,
+            'odm_cls_loss': odm_cls_loss,
+            'odm_reg_loss': odm_reg_loss
+        }
+
+    def get_init_anchors(self, featmap_sizes, image_shape):
+        """Get anchors according to feature map sizes.
+
+        Args:
+            featmap_sizes (list[tuple]): Multi-level feature map sizes.
+            image_shape (list[dict]): Image meta info.
+        Returns:
+            tuple: anchors of each image, valid flags of each image
+        """
+        num_levels = len(featmap_sizes)
+
+        # since feature map sizes of all images are the same, we only compute
+        # anchors for one time
+        anchor_list = []
+        for i in range(num_levels):
+            anchors = self.anchor_generators[i].grid_anchors(
+                featmap_sizes[i], self.anchor_strides[i])
+            anchor_list.append(anchors)
+
+        # for each image, we compute valid flags of multi level anchors
+        valid_flag_list = []
+        for i in range(num_levels):
+            anchor_stride = self.anchor_strides[i]
+            feat_h, feat_w = featmap_sizes[i]
+            h, w = image_shape
+            valid_feat_h = min(int(np.ceil(h / anchor_stride)), feat_h)
+            valid_feat_w = min(int(np.ceil(w / anchor_stride)), feat_w)
+            flags = self.anchor_generators[i].valid_flags(
+                (feat_h, feat_w), (valid_feat_h, valid_feat_w))
+            valid_flag_list.append(flags)
+
+        return anchor_list, valid_flag_list
+
+    def get_refine_anchors(self, featmap_sizes, image_shape):
+        num_levels = len(featmap_sizes)
+
+        refine_anchors_list = []
+        for i in range(num_levels):
+            refine_anchor = self.refine_anchor_list[i]
+            refine_anchor = paddle.squeeze(refine_anchor, axis=0)
+            refine_anchor = refine_anchor.numpy()
+            refine_anchor = np.reshape(refine_anchor,
+                                       [-1, refine_anchor.shape[-1]])
+            refine_anchors_list.extend(refine_anchor)
+
+        # for each image, we compute valid flags of multi level anchors
+        valid_flag_list = []
+        for i in range(num_levels):
+            anchor_stride = self.anchor_strides[i]
+        feat_h, feat_w = featmap_sizes[i]
+        h, w = image_shape
+        valid_feat_h = min(int(np.ceil(h / anchor_stride)), feat_h)
+        valid_feat_w = min(int(np.ceil(w / anchor_stride)), feat_w)
+        flags = self.anchor_generators[i].valid_flags(
+            (feat_h, feat_w), (valid_feat_h, valid_feat_w))
+        valid_flag_list.append(flags)
+
+        return refine_anchors_list, valid_flag_list
+
+    def get_bboxes(self, cls_score_list, bbox_pred_list, mlvl_anchors, nms_pre,
+                   cls_out_channels, use_sigmoid_cls):
+        assert len(cls_score_list) == len(bbox_pred_list) == len(mlvl_anchors)
+
+        mlvl_bboxes = []
+        mlvl_scores = []
+
+        idx = 0
+        for cls_score, bbox_pred, anchors in zip(cls_score_list, bbox_pred_list,
+                                                 mlvl_anchors):
+            cls_score = paddle.reshape(cls_score, [-1, cls_out_channels])
+            if use_sigmoid_cls:
+                scores = F.sigmoid(cls_score)
+            else:
+                scores = F.softmax(cls_score, axis=-1)
+
+            # bbox_pred = bbox_pred.permute(1, 2, 0).reshape(-1, 5)
+            bbox_pred = paddle.transpose(bbox_pred, [1, 2, 0])
+            bbox_pred = paddle.reshape(bbox_pred, [-1, 5])
+            anchors = paddle.reshape(anchors, [-1, 5])
+
+            if nms_pre > 0 and scores.shape[0] > nms_pre:
+                # Get maximum scores for foreground classes.
+                if use_sigmoid_cls:
+                    max_scores = paddle.max(scores, axis=1)
+                else:
+                    max_scores = paddle.max(scores[:, 1:], axis=1)
+
+                topk_val, topk_inds = paddle.topk(max_scores, nms_pre)
+                anchors = paddle.gather(anchors, topk_inds)
+                bbox_pred = paddle.gather(bbox_pred, topk_inds)
+                scores = paddle.gather(scores, topk_inds)
+
+            target_means = (.0, .0, .0, .0, .0)
+            target_stds = (1.0, 1.0, 1.0, 1.0, 1.0)
+            bboxes = bbox_utils.delta2rbox(anchors, bbox_pred, target_means,
+                                           target_stds)
+            mlvl_bboxes.append(bboxes)
+            mlvl_scores.append(scores)
+
+            idx += 1
+
+        mlvl_bboxes = paddle.concat(mlvl_bboxes, axis=0)
+        mlvl_scores = paddle.concat(mlvl_scores)
+        if use_sigmoid_cls:
+            # Add a dummy background class to the front when using sigmoid
+            padding = paddle.zeros(
+                [mlvl_scores.shape[0], 1], dtype=mlvl_scores.dtype)
+            mlvl_scores = paddle.concat([padding, mlvl_scores], axis=1)
+
+        return mlvl_scores, mlvl_bboxes
diff --git a/ppdet/modeling/heads/solov2_head.py b/ppdet/modeling/heads/solov2_head.py
index d24b0b029fc3a5a15ee4831451c918f42b2a88f6..5f15461fa7fac5b2b8ba2b642fc8082fdaa15e53 100644
--- a/ppdet/modeling/heads/solov2_head.py
+++ b/ppdet/modeling/heads/solov2_head.py
@@ -75,9 +75,7 @@ class SOLOv2MaskHead(nn.Layer):
                         ch_out=self.mid_channels,
                         filter_size=3,
                         stride=1,
-                        norm_type='gn',
-                        norm_name=conv_feat_name + '.conv' + str(i) + '.gn',
-                        name=conv_feat_name + '.conv' + str(i)))
+                        norm_type='gn'))
                 self.add_sublayer('conv_pre_feat' + str(i), conv_pre_feat)
                 self.convs_all_levels.append(conv_pre_feat)
             else:
@@ -94,9 +92,7 @@ class SOLOv2MaskHead(nn.Layer):
                             ch_out=self.mid_channels,
                             filter_size=3,
                             stride=1,
-                            norm_type='gn',
-                            norm_name=conv_feat_name + '.conv' + str(j) + '.gn',
-                            name=conv_feat_name + '.conv' + str(j)))
+                            norm_type='gn'))
                     conv_pre_feat.add_sublayer(
                         conv_feat_name + '.conv' + str(j) + 'act', nn.ReLU())
                     conv_pre_feat.add_sublayer(
@@ -114,9 +110,7 @@ class SOLOv2MaskHead(nn.Layer):
                 ch_out=self.out_channels,
                 filter_size=1,
                 stride=1,
-                norm_type='gn',
-                norm_name=conv_pred_name + '.gn',
-                name=conv_pred_name))
+                norm_type='gn'))
 
     def forward(self, inputs):
         """
@@ -216,9 +210,7 @@ class SOLOv2Head(nn.Layer):
                     ch_out=self.seg_feat_channels,
                     filter_size=3,
                     stride=1,
-                    norm_type='gn',
-                    norm_name='bbox_head.kernel_convs.{}.gn'.format(i),
-                    name='bbox_head.kernel_convs.{}'.format(i)))
+                    norm_type='gn'))
             self.kernel_pred_convs.append(kernel_conv)
             ch_in = self.in_channels if i == 0 else self.seg_feat_channels
             cate_conv = self.add_sublayer(
@@ -228,9 +220,7 @@ class SOLOv2Head(nn.Layer):
                     ch_out=self.seg_feat_channels,
                     filter_size=3,
                     stride=1,
-                    norm_type='gn',
-                    norm_name='bbox_head.cate_convs.{}.gn'.format(i),
-                    name='bbox_head.cate_convs.{}'.format(i)))
+                    norm_type='gn'))
             self.cate_pred_convs.append(cate_conv)
 
         self.solo_kernel = self.add_sublayer(
@@ -241,11 +231,9 @@ class SOLOv2Head(nn.Layer):
                 kernel_size=3,
                 stride=1,
                 padding=1,
-                weight_attr=ParamAttr(
-                    name="bbox_head.solo_kernel.weight",
-                    initializer=Normal(
-                        mean=0., std=0.01)),
-                bias_attr=ParamAttr(name="bbox_head.solo_kernel.bias")))
+                weight_attr=ParamAttr(initializer=Normal(
+                    mean=0., std=0.01)),
+                bias_attr=True))
         self.solo_cate = self.add_sublayer(
             'bbox_head.solo_cate',
             nn.Conv2D(
@@ -254,14 +242,10 @@ class SOLOv2Head(nn.Layer):
                 kernel_size=3,
                 stride=1,
                 padding=1,
-                weight_attr=ParamAttr(
-                    name="bbox_head.solo_cate.weight",
-                    initializer=Normal(
-                        mean=0., std=0.01)),
-                bias_attr=ParamAttr(
-                    name="bbox_head.solo_cate.bias",
-                    initializer=Constant(
-                        value=float(-np.log((1 - 0.01) / 0.01))))))
+                weight_attr=ParamAttr(initializer=Normal(
+                    mean=0., std=0.01)),
+                bias_attr=ParamAttr(initializer=Constant(
+                    value=float(-np.log((1 - 0.01) / 0.01))))))
 
     def _points_nms(self, heat, kernel_size=2):
         hmax = F.max_pool2d(heat, kernel_size=kernel_size, stride=1, padding=1)
diff --git a/ppdet/modeling/heads/ssd_head.py b/ppdet/modeling/heads/ssd_head.py
index a280c01432049c82396f301736cf197006fdbc80..96ed5e424d659f96778b66fe95b2c799a1dfb92f 100644
--- a/ppdet/modeling/heads/ssd_head.py
+++ b/ppdet/modeling/heads/ssd_head.py
@@ -28,8 +28,7 @@ class SepConvLayer(nn.Layer):
                  out_channels,
                  kernel_size=3,
                  padding=1,
-                 conv_decay=0,
-                 name=None):
+                 conv_decay=0):
         super(SepConvLayer, self).__init__()
         self.dw_conv = nn.Conv2D(
             in_channels=in_channels,
@@ -38,16 +37,13 @@ class SepConvLayer(nn.Layer):
             stride=1,
             padding=padding,
             groups=in_channels,
-            weight_attr=ParamAttr(
-                name=name + "_dw_weights", regularizer=L2Decay(conv_decay)),
+            weight_attr=ParamAttr(regularizer=L2Decay(conv_decay)),
             bias_attr=False)
 
         self.bn = nn.BatchNorm2D(
             in_channels,
-            weight_attr=ParamAttr(
-                name=name + "_bn_scale", regularizer=L2Decay(0.)),
-            bias_attr=ParamAttr(
-                name=name + "_bn_offset", regularizer=L2Decay(0.)))
+            weight_attr=ParamAttr(regularizer=L2Decay(0.)),
+            bias_attr=ParamAttr(regularizer=L2Decay(0.)))
 
         self.pw_conv = nn.Conv2D(
             in_channels=in_channels,
@@ -55,8 +51,7 @@ class SepConvLayer(nn.Layer):
             kernel_size=1,
             stride=1,
             padding=0,
-            weight_attr=ParamAttr(
-                name=name + "_pw_weights", regularizer=L2Decay(conv_decay)),
+            weight_attr=ParamAttr(regularizer=L2Decay(conv_decay)),
             bias_attr=False)
 
     def forward(self, x):
@@ -68,6 +63,20 @@ class SepConvLayer(nn.Layer):
 
 @register
 class SSDHead(nn.Layer):
+    """
+    SSDHead
+
+    Args:
+        num_classes (int): Number of classes
+        in_channels (list): Number of channels per input feature
+        anchor_generator (dict): Configuration of 'AnchorGeneratorSSD' instance
+        kernel_size (int): Conv kernel size
+        padding (int): Conv padding
+        use_sepconv (bool): Use SepConvLayer if true
+        conv_decay (float): Conv regularization coeff
+        loss (object): 'SSDLoss' instance
+    """
+
     __shared__ = ['num_classes']
     __inject__ = ['anchor_generator', 'loss']
 
@@ -111,8 +120,7 @@ class SSDHead(nn.Layer):
                         out_channels=num_prior * 4,
                         kernel_size=kernel_size,
                         padding=padding,
-                        conv_decay=conv_decay,
-                        name=box_conv_name))
+                        conv_decay=conv_decay))
             self.box_convs.append(box_conv)
 
             score_conv_name = "scores{}".format(i)
@@ -132,8 +140,7 @@ class SSDHead(nn.Layer):
                         out_channels=num_prior * self.num_classes,
                         kernel_size=kernel_size,
                         padding=padding,
-                        conv_decay=conv_decay,
-                        name=score_conv_name))
+                        conv_decay=conv_decay))
             self.score_convs.append(score_conv)
 
     @classmethod
diff --git a/ppdet/modeling/heads/ttf_head.py b/ppdet/modeling/heads/ttf_head.py
index 632fd06306ec515deec3a8355dd422600c960ebc..9e2eb6add8c4d0e4c7ea9a19a654d9d67de07e78 100644
--- a/ppdet/modeling/heads/ttf_head.py
+++ b/ppdet/modeling/heads/ttf_head.py
@@ -19,43 +19,82 @@ from paddle import ParamAttr
 from paddle.nn.initializer import Constant, Uniform, Normal
 from paddle.regularizer import L2Decay
 from ppdet.core.workspace import register
+from ppdet.modeling.layers import DeformableConvV2, LiteConv
 import numpy as np
 
 
 @register
 class HMHead(nn.Layer):
+    """
+    Args:
+        ch_in (int): The channel number of input Tensor.
+        ch_out (int): The channel number of output Tensor.
+        num_classes (int): Number of classes.
+        conv_num (int): The convolution number of hm_feat.
+        dcn_head(bool): whether use dcn in head. False by default. 
+        lite_head(bool): whether use lite version. False by default.
+        norm_type (string): norm type, 'sync_bn', 'bn', 'gn' are optional.
+            bn by default
 
-    __shared__ = ['num_classes']
+    Return:
+        Heatmap head output
+    """
+    __shared__ = ['num_classes', 'norm_type']
 
-    def __init__(self, ch_in, ch_out=128, num_classes=80, conv_num=2):
+    def __init__(
+            self,
+            ch_in,
+            ch_out=128,
+            num_classes=80,
+            conv_num=2,
+            dcn_head=False,
+            lite_head=False,
+            norm_type='bn', ):
         super(HMHead, self).__init__()
         head_conv = nn.Sequential()
         for i in range(conv_num):
             name = 'conv.{}'.format(i)
-            head_conv.add_sublayer(
-                name,
-                nn.Conv2D(
-                    in_channels=ch_in if i == 0 else ch_out,
-                    out_channels=ch_out,
-                    kernel_size=3,
-                    padding=1,
-                    weight_attr=ParamAttr(initializer=Normal(0, 0.01)),
-                    bias_attr=ParamAttr(
-                        learning_rate=2., regularizer=L2Decay(0.))))
-            head_conv.add_sublayer(name + '.act', nn.ReLU())
-        self.feat = self.add_sublayer('hm_feat', head_conv)
+            if lite_head:
+                lite_name = 'hm.' + name
+                head_conv.add_sublayer(
+                    lite_name,
+                    LiteConv(
+                        in_channels=ch_in if i == 0 else ch_out,
+                        out_channels=ch_out,
+                        norm_type=norm_type))
+                head_conv.add_sublayer(lite_name + '.act', nn.ReLU6())
+            else:
+                if dcn_head:
+                    head_conv.add_sublayer(
+                        name,
+                        DeformableConvV2(
+                            in_channels=ch_in if i == 0 else ch_out,
+                            out_channels=ch_out,
+                            kernel_size=3,
+                            weight_attr=ParamAttr(initializer=Normal(0, 0.01))))
+                else:
+                    head_conv.add_sublayer(
+                        name,
+                        nn.Conv2D(
+                            in_channels=ch_in if i == 0 else ch_out,
+                            out_channels=ch_out,
+                            kernel_size=3,
+                            padding=1,
+                            weight_attr=ParamAttr(initializer=Normal(0, 0.01)),
+                            bias_attr=ParamAttr(
+                                learning_rate=2., regularizer=L2Decay(0.))))
+                head_conv.add_sublayer(name + '.act', nn.ReLU())
+        self.feat = head_conv
         bias_init = float(-np.log((1 - 0.01) / 0.01))
-        self.head = self.add_sublayer(
-            'hm_head',
-            nn.Conv2D(
-                in_channels=ch_out,
-                out_channels=num_classes,
-                kernel_size=1,
-                weight_attr=ParamAttr(initializer=Normal(0, 0.01)),
-                bias_attr=ParamAttr(
-                    learning_rate=2.,
-                    regularizer=L2Decay(0.),
-                    initializer=Constant(bias_init))))
+        self.head = nn.Conv2D(
+            in_channels=ch_out,
+            out_channels=num_classes,
+            kernel_size=1,
+            weight_attr=ParamAttr(initializer=Normal(0, 0.01)),
+            bias_attr=ParamAttr(
+                learning_rate=2.,
+                regularizer=L2Decay(0.),
+                initializer=Constant(bias_init)))
 
     def forward(self, feat):
         out = self.feat(feat)
@@ -65,32 +104,70 @@ class HMHead(nn.Layer):
 
 @register
 class WHHead(nn.Layer):
-    def __init__(self, ch_in, ch_out=64, conv_num=2):
+    """
+    Args:
+        ch_in (int): The channel number of input Tensor.
+        ch_out (int): The channel number of output Tensor.
+        conv_num (int): The convolution number of wh_feat.
+        dcn_head(bool): whether use dcn in head. False by default.
+        lite_head(bool): whether use lite version. False by default.
+        norm_type (string): norm type, 'sync_bn', 'bn', 'gn' are optional.
+            bn by default
+    Return:
+        Width & Height head output
+    """
+    __shared__ = ['norm_type']
+
+    def __init__(self,
+                 ch_in,
+                 ch_out=64,
+                 conv_num=2,
+                 dcn_head=False,
+                 lite_head=False,
+                 norm_type='bn'):
         super(WHHead, self).__init__()
         head_conv = nn.Sequential()
         for i in range(conv_num):
             name = 'conv.{}'.format(i)
-            head_conv.add_sublayer(
-                name,
-                nn.Conv2D(
-                    in_channels=ch_in if i == 0 else ch_out,
-                    out_channels=ch_out,
-                    kernel_size=3,
-                    padding=1,
-                    weight_attr=ParamAttr(initializer=Normal(0, 0.001)),
-                    bias_attr=ParamAttr(
-                        learning_rate=2., regularizer=L2Decay(0.))))
-            head_conv.add_sublayer(name + '.act', nn.ReLU())
-        self.feat = self.add_sublayer('wh_feat', head_conv)
-        self.head = self.add_sublayer(
-            'wh_head',
-            nn.Conv2D(
-                in_channels=ch_out,
-                out_channels=4,
-                kernel_size=1,
-                weight_attr=ParamAttr(initializer=Normal(0, 0.001)),
-                bias_attr=ParamAttr(
-                    learning_rate=2., regularizer=L2Decay(0.))))
+            if lite_head:
+                lite_name = 'wh.' + name
+                head_conv.add_sublayer(
+                    lite_name,
+                    LiteConv(
+                        in_channels=ch_in if i == 0 else ch_out,
+                        out_channels=ch_out,
+                        norm_type=norm_type))
+                head_conv.add_sublayer(lite_name + '.act', nn.ReLU6())
+            else:
+                if dcn_head:
+                    head_conv.add_sublayer(
+                        name,
+                        DeformableConvV2(
+                            in_channels=ch_in if i == 0 else ch_out,
+                            out_channels=ch_out,
+                            kernel_size=3,
+                            weight_attr=ParamAttr(initializer=Normal(0, 0.01))))
+                else:
+                    head_conv.add_sublayer(
+                        name,
+                        nn.Conv2D(
+                            in_channels=ch_in if i == 0 else ch_out,
+                            out_channels=ch_out,
+                            kernel_size=3,
+                            padding=1,
+                            weight_attr=ParamAttr(initializer=Normal(0, 0.01)),
+                            bias_attr=ParamAttr(
+                                learning_rate=2., regularizer=L2Decay(0.))))
+                head_conv.add_sublayer(name + '.act', nn.ReLU())
+
+        self.feat = head_conv
+        self.head = nn.Conv2D(
+            in_channels=ch_out,
+            out_channels=4,
+            kernel_size=1,
+            weight_attr=ParamAttr(initializer=Normal(0, 0.001)),
+            bias_attr=ParamAttr(
+                learning_rate=2., regularizer=L2Decay(0.)))
 
     def forward(self, feat):
         out = self.feat(feat)
@@ -104,20 +181,28 @@ class TTFHead(nn.Layer):
     """
     TTFHead
     Args:
-        in_channels(int): the channel number of input to TTFHead. 
-        num_classes(int): the number of classes, 80 by default.
-        hm_head_planes(int): the channel number in wh head, 128 by default.
-        wh_head_planes(int): the channel number in wh head, 64 by default.
-        hm_head_conv_num(int): the number of convolution in wh head, 2 by default.
-        wh_head_conv_num(int): the number of convolution in wh head, 2 by default.
-        hm_loss(object): Instance of 'CTFocalLoss'.
-        wh_loss(object): Instance of 'GIoULoss'.
-        wh_offset_base(flaot): the base offset of width and height, 16. by default.
-        down_ratio(int): the actual down_ratio is calculated by base_down_ratio(default 16) 
-            and the number of upsample layers.
+        in_channels (int): the channel number of input to TTFHead.
+        num_classes (int): the number of classes, 80 by default.
+        hm_head_planes (int): the channel number in heatmap head,
+            128 by default.
+        wh_head_planes (int): the channel number in width & height head,
+            64 by default.
+        hm_head_conv_num (int): the number of convolution in heatmap head,
+            2 by default.
+        wh_head_conv_num (int): the number of convolution in width & height
+            head, 2 by default.
+        hm_loss (object): Instance of 'CTFocalLoss'.
+        wh_loss (object): Instance of 'GIoULoss'.
+        wh_offset_base (float): the base offset of width and height,
+            16.0 by default.
+        down_ratio (int): the actual down_ratio is calculated by base_down_ratio
+            (default 16) and the number of upsample layers.
+        lite_head(bool): whether use lite version. False by default.
+        norm_type (string): norm type, 'sync_bn', 'bn', 'gn' are optional.
+            bn by default
     """
 
-    __shared__ = ['num_classes', 'down_ratio']
+    __shared__ = ['num_classes', 'down_ratio', 'norm_type']
     __inject__ = ['hm_loss', 'wh_loss']
 
     def __init__(self,
@@ -130,12 +215,16 @@ class TTFHead(nn.Layer):
                  hm_loss='CTFocalLoss',
                  wh_loss='GIoULoss',
                  wh_offset_base=16.,
-                 down_ratio=4):
+                 down_ratio=4,
+                 dcn_head=False,
+                 lite_head=False,
+                 norm_type='bn'):
         super(TTFHead, self).__init__()
         self.in_channels = in_channels
         self.hm_head = HMHead(in_channels, hm_head_planes, num_classes,
-                              hm_head_conv_num)
-        self.wh_head = WHHead(in_channels, wh_head_planes, wh_head_conv_num)
+                              hm_head_conv_num, dcn_head, lite_head, norm_type)
+        self.wh_head = WHHead(in_channels, wh_head_planes, wh_head_conv_num,
+                              dcn_head, lite_head, norm_type)
         self.hm_loss = hm_loss
         self.wh_loss = wh_loss
 
@@ -154,6 +243,9 @@ class TTFHead(nn.Layer):
         return hm, wh
 
     def filter_box_by_weight(self, pred, target, weight):
+        """
+        Filter out boxes where ttf_reg_weight is 0, only keep positive samples.
+        """
         index = paddle.nonzero(weight > 0)
         index.stop_gradient = True
         weight = paddle.gather_nd(weight, index)
diff --git a/ppdet/modeling/heads/yolo_head.py b/ppdet/modeling/heads/yolo_head.py
index 723bf4fc6e541021a3d0f7c3a782f843b4272fff..a0817747f68c04743afc6e7da20d1485a0fcc196 100644
--- a/ppdet/modeling/heads/yolo_head.py
+++ b/ppdet/modeling/heads/yolo_head.py
@@ -4,7 +4,6 @@ import paddle.nn.functional as F
 from paddle import ParamAttr
 from paddle.regularizer import L2Decay
 from ppdet.core.workspace import register
-from ..backbones.darknet import ConvBNLayer
 
 
 def _de_sigmoid(x, eps=1e-7):
@@ -20,6 +19,7 @@ class YOLOv3Head(nn.Layer):
     __inject__ = ['loss']
 
     def __init__(self,
+                 in_channels=[1024, 512, 256],
                  anchors=[[10, 13], [16, 30], [33, 23], [30, 61], [62, 45],
                           [59, 119], [116, 90], [156, 198], [373, 326]],
                  anchor_masks=[[6, 7, 8], [3, 4, 5], [0, 1, 2]],
@@ -28,7 +28,21 @@ class YOLOv3Head(nn.Layer):
                  iou_aware=False,
                  iou_aware_factor=0.4,
                  data_format='NCHW'):
+        """
+        Head for YOLOv3 network
+
+        Args:
+            num_classes (int): number of foreground classes
+            anchors (list): anchors
+            anchor_masks (list): anchor masks
+            loss (object): YOLOv3Loss instance
+            iou_aware (bool): whether to use iou_aware
+            iou_aware_factor (float): iou aware factor
+            data_format (str): data format, NCHW or NHWC
+        """
         super(YOLOv3Head, self).__init__()
+        assert len(in_channels) > 0, "in_channels length should > 0"
+        self.in_channels = in_channels
         self.num_classes = num_classes
         self.loss = loss
 
@@ -47,18 +61,15 @@ class YOLOv3Head(nn.Layer):
             else:
                 num_filters = len(self.anchors[i]) * (self.num_classes + 5)
             name = 'yolo_output.{}'.format(i)
-            yolo_output = self.add_sublayer(
-                name,
-                nn.Conv2D(
-                    in_channels=128 * (2**self.num_outputs) // (2**i),
-                    out_channels=num_filters,
-                    kernel_size=1,
-                    stride=1,
-                    padding=0,
-                    data_format=data_format,
-                    weight_attr=ParamAttr(name=name + '.conv.weights'),
-                    bias_attr=ParamAttr(
-                        name=name + '.conv.bias', regularizer=L2Decay(0.))))
+            conv = nn.Conv2D(
+                in_channels=self.in_channels[i],
+                out_channels=num_filters,
+                kernel_size=1,
+                stride=1,
+                padding=0,
+                data_format=data_format,
+                bias_attr=ParamAttr(regularizer=L2Decay(0.)))
+            yolo_output = self.add_sublayer(name, conv)
             self.yolo_outputs.append(yolo_output)
 
     def parse_anchor(self, anchors, anchor_masks):
@@ -106,3 +117,7 @@ class YOLOv3Head(nn.Layer):
                 return y
             else:
                 return yolo_outputs
+
+    @classmethod
+    def from_config(cls, cfg, input_shape):
+        return {'in_channels': [i.channels for i in input_shape], }
diff --git a/ppdet/modeling/layers.py b/ppdet/modeling/layers.py
index a35bc6273f6aa4512282cbb86d5110dc5b90ae33..5877b5f37566e9f2e58213e785e56bdea9d330f9 100644
--- a/ppdet/modeling/layers.py
+++ b/ppdet/modeling/layers.py
@@ -23,7 +23,7 @@ from paddle import ParamAttr
 from paddle import to_tensor
 from paddle.nn import Conv2D, BatchNorm2D, GroupNorm
 import paddle.nn.functional as F
-from paddle.nn.initializer import Normal, Constant
+from paddle.nn.initializer import Normal, Constant, XavierUniform
 from paddle.regularizer import L2Decay
 
 from ppdet.core.workspace import register, serializable
@@ -51,37 +51,30 @@ class DeformableConvV2(nn.Layer):
                  weight_attr=None,
                  bias_attr=None,
                  lr_scale=1,
-                 regularizer=None,
-                 name=None):
+                 regularizer=None):
         super(DeformableConvV2, self).__init__()
         self.offset_channel = 2 * kernel_size**2
         self.mask_channel = kernel_size**2
 
         if lr_scale == 1 and regularizer is None:
-            offset_bias_attr = ParamAttr(
-                initializer=Constant(0.),
-                name='{}._conv_offset.bias'.format(name))
+            offset_bias_attr = ParamAttr(initializer=Constant(0.))
         else:
             offset_bias_attr = ParamAttr(
                 initializer=Constant(0.),
                 learning_rate=lr_scale,
-                regularizer=regularizer,
-                name='{}._conv_offset.bias'.format(name))
+                regularizer=regularizer)
         self.conv_offset = nn.Conv2D(
             in_channels,
             3 * kernel_size**2,
             kernel_size,
             stride=stride,
             padding=(kernel_size - 1) // 2,
-            weight_attr=ParamAttr(
-                initializer=Constant(0.0),
-                name='{}._conv_offset.weight'.format(name)),
+            weight_attr=ParamAttr(initializer=Constant(0.0)),
             bias_attr=offset_bias_attr)
 
         if bias_attr:
             # in FCOS-DCN head, specifically need learning_rate and regularizer
             dcn_bias_attr = ParamAttr(
-                name=name + "_bias",
                 initializer=Constant(value=0),
                 regularizer=L2Decay(0.),
                 learning_rate=2.)
@@ -116,25 +109,22 @@ class ConvNormLayer(nn.Layer):
                  ch_out,
                  filter_size,
                  stride,
+                 groups=1,
                  norm_type='bn',
                  norm_decay=0.,
                  norm_groups=32,
                  use_dcn=False,
-                 norm_name=None,
                  bias_on=False,
                  lr_scale=1.,
                  freeze_norm=False,
                  initializer=Normal(
-                     mean=0., std=0.01),
-                 name=None):
+                     mean=0., std=0.01)):
         super(ConvNormLayer, self).__init__()
         assert norm_type in ['bn', 'sync_bn', 'gn']
 
         if bias_on:
             bias_attr = ParamAttr(
-                name=name + "_bias",
-                initializer=Constant(value=0.),
-                learning_rate=lr_scale)
+                initializer=Constant(value=0.), learning_rate=lr_scale)
         else:
             bias_attr = False
 
@@ -145,11 +135,9 @@ class ConvNormLayer(nn.Layer):
                 kernel_size=filter_size,
                 stride=stride,
                 padding=(filter_size - 1) // 2,
-                groups=1,
+                groups=groups,
                 weight_attr=ParamAttr(
-                    name=name + "_weight",
-                    initializer=initializer,
-                    learning_rate=1.),
+                    initializer=initializer, learning_rate=1.),
                 bias_attr=bias_attr)
         else:
             # in FCOS-DCN head, specifically need learning_rate and regularizer
@@ -159,25 +147,18 @@ class ConvNormLayer(nn.Layer):
                 kernel_size=filter_size,
                 stride=stride,
                 padding=(filter_size - 1) // 2,
-                groups=1,
+                groups=groups,
                 weight_attr=ParamAttr(
-                    name=name + "_weight",
-                    initializer=initializer,
-                    learning_rate=1.),
+                    initializer=initializer, learning_rate=1.),
                 bias_attr=True,
                 lr_scale=2.,
-                regularizer=L2Decay(norm_decay),
-                name=name)
+                regularizer=L2Decay(norm_decay))
 
         norm_lr = 0. if freeze_norm else 1.
         param_attr = ParamAttr(
-            name=norm_name + "_scale",
-            learning_rate=norm_lr,
-            regularizer=L2Decay(norm_decay))
+            learning_rate=norm_lr, regularizer=L2Decay(norm_decay))
         bias_attr = ParamAttr(
-            name=norm_name + "_offset",
-            learning_rate=norm_lr,
-            regularizer=L2Decay(norm_decay))
+            learning_rate=norm_lr, regularizer=L2Decay(norm_decay))
         if norm_type == 'bn':
             self.norm = nn.BatchNorm2D(
                 ch_out, weight_attr=param_attr, bias_attr=bias_attr)
@@ -197,6 +178,63 @@ class ConvNormLayer(nn.Layer):
         return out
 
 
+class LiteConv(nn.Layer):
+    def __init__(self,
+                 in_channels,
+                 out_channels,
+                 stride=1,
+                 with_act=True,
+                 norm_type='sync_bn',
+                 name=None):
+        super(LiteConv, self).__init__()
+        self.lite_conv = nn.Sequential()
+        conv1 = ConvNormLayer(
+            in_channels,
+            in_channels,
+            filter_size=5,
+            stride=stride,
+            groups=in_channels,
+            norm_type=norm_type,
+            initializer=XavierUniform())
+        conv2 = ConvNormLayer(
+            in_channels,
+            out_channels,
+            filter_size=1,
+            stride=stride,
+            norm_type=norm_type,
+            initializer=XavierUniform())
+        conv3 = ConvNormLayer(
+            out_channels,
+            out_channels,
+            filter_size=1,
+            stride=stride,
+            norm_type=norm_type,
+            initializer=XavierUniform())
+        conv4 = ConvNormLayer(
+            out_channels,
+            out_channels,
+            filter_size=5,
+            stride=stride,
+            groups=out_channels,
+            norm_type=norm_type,
+            initializer=XavierUniform())
+        conv_list = [conv1, conv2, conv3, conv4]
+        self.lite_conv.add_sublayer('conv1', conv1)
+        self.lite_conv.add_sublayer('relu6_1', nn.ReLU6())
+        self.lite_conv.add_sublayer('conv2', conv2)
+        if with_act:
+            self.lite_conv.add_sublayer('relu6_2', nn.ReLU6())
+        self.lite_conv.add_sublayer('conv3', conv3)
+        self.lite_conv.add_sublayer('relu6_3', nn.ReLU6())
+        self.lite_conv.add_sublayer('conv4', conv4)
+        if with_act:
+            self.lite_conv.add_sublayer('relu6_4', nn.ReLU6())
+
+    def forward(self, inputs):
+        out = self.lite_conv(inputs)
+        return out
+
+
 @register
 @serializable
 class AnchorGeneratorRPN(object):
@@ -616,20 +654,20 @@ class AnchorGrid(object):
 @register
 @serializable
 class FCOSBox(object):
-    __shared__ = ['num_classes', 'batch_size']
+    __shared__ = ['num_classes']
 
-    def __init__(self, num_classes=80, batch_size=1):
+    def __init__(self, num_classes=80):
         super(FCOSBox, self).__init__()
         self.num_classes = num_classes
-        self.batch_size = batch_size
 
     def _merge_hw(self, inputs, ch_type="channel_first"):
         """
+        Merge h and w of the feature map into one dimension.
         Args:
-            inputs (Variables): Feature map whose H and W will be merged into one dimension
-            ch_type     (str): channel_first / channel_last
+            inputs (Tensor): Tensor of the input feature map
+            ch_type (str): "channel_first" or "channel_last" style
         Return:
-            new_shape (Variables): The new shape after h and w merged into one dimension
+            new_shape (Tensor): The new shape after h and w merged
         """
         shape_ = paddle.shape(inputs)
         bs, ch, hi, wi = shape_[0], shape_[1], shape_[2], shape_[3]
@@ -647,16 +685,18 @@ class FCOSBox(object):
     def _postprocessing_by_level(self, locations, box_cls, box_reg, box_ctn,
                                  scale_factor):
         """
+        Postprocess each layer of the output with corresponding locations.
         Args:
-            locations (Variables): anchor points for current layer, [H*W, 2]
-            box_cls   (Variables): categories prediction, [N, C, H, W],  C is the number of classes 
-            box_reg   (Variables): bounding box prediction, [N, 4, H, W]
-            box_ctn   (Variables): centerness prediction, [N, 1, H, W]
-            scale_factor   (Variables): [h_scale, w_scale] for input images
+            locations (Tensor): anchor points for current layer, [H*W, 2]
+            box_cls (Tensor): categories prediction, [N, C, H, W], 
+                C is the number of classes
+            box_reg (Tensor): bounding box prediction, [N, 4, H, W]
+            box_ctn (Tensor): centerness prediction, [N, 1, H, W]
+            scale_factor (Tensor): [h_scale, w_scale] for input images
         Return:
-            box_cls_ch_last  (Variables): score for each category, in [N, C, M]
+            box_cls_ch_last (Tensor): score for each category, in [N, C, M]
                 C is the number of classes and M is the number of anchor points
-            box_reg_decoding (Variables): decoded bounding box, in [N, M, 4]
+            box_reg_decoding (Tensor): decoded bounding box, in [N, M, 4]
                 last dimension is [x1, y1, x2, y2]
         """
         act_shape_cls = self._merge_hw(box_cls)
@@ -712,12 +752,18 @@ class TTFBox(object):
         self.down_ratio = down_ratio
 
     def _simple_nms(self, heat, kernel=3):
+        """
+        Use maxpool to filter the max score, get local peaks.
+        """
         pad = (kernel - 1) // 2
         hmax = F.max_pool2d(heat, kernel, stride=1, padding=pad)
         keep = paddle.cast(hmax == heat, 'float32')
         return heat * keep
 
     def _topk(self, scores):
+        """
+        Select top k scores and decode to get xy coordinates.
+        """
         k = self.max_per_img
         shape_fm = paddle.shape(scores)
         shape_fm.stop_gradient = True
diff --git a/ppdet/modeling/losses/fcos_loss.py b/ppdet/modeling/losses/fcos_loss.py
index 350011accd0dce47177f744665791afa9eccb5a0..201786c9a559206fcbf8176d025b72a5334b0237 100644
--- a/ppdet/modeling/losses/fcos_loss.py
+++ b/ppdet/modeling/losses/fcos_loss.py
@@ -20,6 +20,7 @@ import paddle
 import paddle.nn as nn
 import paddle.nn.functional as F
 from ppdet.core.workspace import register
+from ppdet.modeling import ops
 
 INF = 1e8
 __all__ = ['FCOSLoss']
@@ -44,19 +45,6 @@ def flatten_tensor(inputs, channel_first=False):
     return output_channel_last
 
 
-def sigmoid_cross_entropy_with_logits_loss(inputs,
-                                           label,
-                                           ignore_index=-100,
-                                           normalize=False):
-    output = F.binary_cross_entropy_with_logits(inputs, label, reduction='none')
-    mask_tensor = paddle.cast(label != ignore_index, 'float32')
-    output = paddle.multiply(output, mask_tensor)
-    if normalize:
-        sum_valid_mask = paddle.sum(mask_tensor)
-        output = output / sum_valid_mask
-    return output
-
-
 @register
 class FCOSLoss(nn.Layer):
     """
@@ -226,8 +214,8 @@ class FCOSLoss(nn.Layer):
 
         # 3. centerness: sigmoid_cross_entropy_with_logits_loss
         centerness_flatten = paddle.squeeze(centerness_flatten, axis=-1)
-        ctn_loss = sigmoid_cross_entropy_with_logits_loss(centerness_flatten,
-                                                          tag_center_flatten)
+        ctn_loss = ops.sigmoid_cross_entropy_with_logits(centerness_flatten,
+                                                         tag_center_flatten)
         ctn_loss = ctn_loss * mask_positive_float / num_positive_fp32
 
         loss_all = {
diff --git a/ppdet/modeling/losses/iou_aware_loss.py b/ppdet/modeling/losses/iou_aware_loss.py
index 2cc6f2a2c4077558c93ec55baa11f3d12a2e8476..1e6aa8bf068354cddca8b2dd7c89f8a2be7f5ab6 100644
--- a/ppdet/modeling/losses/iou_aware_loss.py
+++ b/ppdet/modeling/losses/iou_aware_loss.py
@@ -20,7 +20,7 @@ import paddle
 import paddle.nn.functional as F
 from ppdet.core.workspace import register, serializable
 from .iou_loss import IouLoss
-from ..utils import xywh2xyxy, bbox_iou, decode_yolo
+from ..bbox_utils import xywh2xyxy, bbox_iou
 
 
 @register
@@ -42,7 +42,7 @@ class IouAwareLoss(IouLoss):
         iou = bbox_iou(
             pbox, gbox, giou=self.giou, diou=self.diou, ciou=self.ciou)
         iou.stop_gradient = True
-        ioup = F.sigmoid(ioup)
-        loss_iou_aware = (-iou * paddle.log(ioup)).sum(-2, keepdim=True)
+        loss_iou_aware = F.binary_cross_entropy_with_logits(
+            ioup, iou, reduction='none')
         loss_iou_aware = loss_iou_aware * self.loss_weight
         return loss_iou_aware
diff --git a/ppdet/modeling/losses/iou_loss.py b/ppdet/modeling/losses/iou_loss.py
index 72613297d5df262519be019375bb2f9cf91aee0e..3ac857b9ca3cb1796355b8185cbfbc70ae8ff951 100644
--- a/ppdet/modeling/losses/iou_loss.py
+++ b/ppdet/modeling/losses/iou_loss.py
@@ -16,12 +16,14 @@ from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
 
+import numpy as np
+
 import paddle
 import paddle.nn.functional as F
 from ppdet.core.workspace import register, serializable
-from ..utils import xywh2xyxy, bbox_iou, decode_yolo
+from ..bbox_utils import xywh2xyxy, bbox_iou
 
-__all__ = ['IouLoss', 'GIoULoss']
+__all__ = ['IouLoss', 'GIoULoss', 'DIouLoss']
 
 
 @register
@@ -129,3 +131,74 @@ class GIoULoss(object):
         else:
             loss = paddle.mean(giou * iou_weight)
         return loss * self.loss_weight
+
+
+@register
+@serializable
+class DIouLoss(GIoULoss):
+    """
+    Distance-IoU Loss, see https://arxiv.org/abs/1911.08287
+    Args:
+        loss_weight (float): giou loss weight, default as 1
+        eps (float): epsilon to avoid divide by zero, default as 1e-10
+        use_complete_iou_loss (bool): whether to use complete iou loss
+    """
+
+    def __init__(self, loss_weight=1., eps=1e-10, use_complete_iou_loss=True):
+        super(DIouLoss, self).__init__(loss_weight=loss_weight, eps=eps)
+        self.use_complete_iou_loss = use_complete_iou_loss
+
+    def __call__(self, pbox, gbox, iou_weight=1.):
+        x1, y1, x2, y2 = paddle.split(pbox, num_or_sections=4, axis=-1)
+        x1g, y1g, x2g, y2g = paddle.split(gbox, num_or_sections=4, axis=-1)
+        cx = (x1 + x2) / 2
+        cy = (y1 + y2) / 2
+        w = x2 - x1
+        h = y2 - y1
+
+        cxg = (x1g + x2g) / 2
+        cyg = (y1g + y2g) / 2
+        wg = x2g - x1g
+        hg = y2g - y1g
+
+        x2 = paddle.maximum(x1, x2)
+        y2 = paddle.maximum(y1, y2)
+
+        # A and B
+        xkis1 = paddle.maximum(x1, x1g)
+        ykis1 = paddle.maximum(y1, y1g)
+        xkis2 = paddle.minimum(x2, x2g)
+        ykis2 = paddle.minimum(y2, y2g)
+
+        # A or B
+        xc1 = paddle.minimum(x1, x1g)
+        yc1 = paddle.minimum(y1, y1g)
+        xc2 = paddle.maximum(x2, x2g)
+        yc2 = paddle.maximum(y2, y2g)
+
+        intsctk = (xkis2 - xkis1) * (ykis2 - ykis1)
+        intsctk = intsctk * paddle.greater_than(
+            xkis2, xkis1) * paddle.greater_than(ykis2, ykis1)
+        unionk = (x2 - x1) * (y2 - y1) + (x2g - x1g) * (y2g - y1g
+                                                        ) - intsctk + self.eps
+        iouk = intsctk / unionk
+
+        # DIOU term
+        dist_intersection = (cx - cxg) * (cx - cxg) + (cy - cyg) * (cy - cyg)
+        dist_union = (xc2 - xc1) * (xc2 - xc1) + (yc2 - yc1) * (yc2 - yc1)
+        diou_term = (dist_intersection + self.eps) / (dist_union + self.eps)
+
+        # CIOU term
+        ciou_term = 0
+        if self.use_complete_iou_loss:
+            ar_gt = wg / hg
+            ar_pred = w / h
+            arctan = paddle.atan(ar_gt) - paddle.atan(ar_pred)
+            ar_loss = 4. / np.pi / np.pi * arctan * arctan
+            alpha = ar_loss / (1 - iouk + ar_loss + self.eps)
+            alpha.stop_gradient = True
+            ciou_term = alpha * ar_loss
+
+        diou = paddle.mean((1 - iouk + ciou_term + diou_term) * iou_weight)
+
+        return diou * self.loss_weight
diff --git a/ppdet/modeling/losses/ssd_loss.py b/ppdet/modeling/losses/ssd_loss.py
index 04ba75b64c4eee4b4460e2acefb130a4741be81d..0b68f317f15e736f1c741535363652c3c5e8a5e7 100644
--- a/ppdet/modeling/losses/ssd_loss.py
+++ b/ppdet/modeling/losses/ssd_loss.py
@@ -19,189 +19,144 @@ from __future__ import print_function
 import paddle
 import paddle.nn as nn
 import paddle.nn.functional as F
-import numpy as np
 from ppdet.core.workspace import register
-from ..ops import bipartite_match, box_coder, iou_similarity
+from ..ops import iou_similarity
+from ..bbox_utils import bbox2delta
 
 __all__ = ['SSDLoss']
 
 
 @register
 class SSDLoss(nn.Layer):
+    """
+    SSDLoss
+
+    Args:
+        overlap_threshold (float32, optional): IoU threshold for negative bboxes
+            and positive bboxes, 0.5 by default.
+        neg_pos_ratio (float): The ratio of negative samples / positive samples.
+        loc_loss_weight (float): The weight of loc_loss.
+        conf_loss_weight (float): The weight of conf_loss.
+        prior_box_var (list): Variances corresponding to prior box coord, [0.1,
+            0.1, 0.2, 0.2] by default.
+    """
+
     def __init__(self,
-                 match_type='per_prediction',
                  overlap_threshold=0.5,
                  neg_pos_ratio=3.0,
-                 neg_overlap=0.5,
                  loc_loss_weight=1.0,
-                 conf_loss_weight=1.0):
+                 conf_loss_weight=1.0,
+                 prior_box_var=[0.1, 0.1, 0.2, 0.2]):
         super(SSDLoss, self).__init__()
-        self.match_type = match_type
         self.overlap_threshold = overlap_threshold
         self.neg_pos_ratio = neg_pos_ratio
-        self.neg_overlap = neg_overlap
         self.loc_loss_weight = loc_loss_weight
         self.conf_loss_weight = conf_loss_weight
-
-    def _label_target_assign(self,
-                             gt_label,
-                             matched_indices,
-                             neg_mask=None,
-                             mismatch_value=0):
-        gt_label = gt_label.numpy()
-        matched_indices = matched_indices.numpy()
-        if neg_mask is not None:
-            neg_mask = neg_mask.numpy()
-
-        batch_size, num_priors = matched_indices.shape
-        trg_lbl = np.ones((batch_size, num_priors, 1)).astype('int32')
-        trg_lbl *= mismatch_value
-        trg_lbl_wt = np.zeros((batch_size, num_priors, 1)).astype('float32')
-
-        for i in range(batch_size):
-            col_ids = np.where(matched_indices[i] > -1)
-            col_val = matched_indices[i][col_ids]
-            trg_lbl[i][col_ids] = gt_label[i][col_val]
-            trg_lbl_wt[i][col_ids] = 1.0
-
-        if neg_mask is not None:
-            trg_lbl_wt += neg_mask[:, :, np.newaxis]
-
-        return paddle.to_tensor(trg_lbl), paddle.to_tensor(trg_lbl_wt)
-
-    def _bbox_target_assign(self, encoded_box, matched_indices):
-        encoded_box = encoded_box.numpy()
-        matched_indices = matched_indices.numpy()
-
-        batch_size, num_priors = matched_indices.shape
-        trg_bbox = np.zeros((batch_size, num_priors, 4)).astype('float32')
-        trg_bbox_wt = np.zeros((batch_size, num_priors, 1)).astype('float32')
-
+        self.prior_box_var = [1. / a for a in prior_box_var]
+
+    def _bipartite_match_for_batch(self, gt_bbox, gt_label, prior_boxes,
+                                   bg_index):
+        """
+        Args:
+            gt_bbox (Tensor): [B, N, 4]
+            gt_label (Tensor): [B, N, 1]
+            prior_boxes (Tensor): [A, 4]
+            bg_index (int): Background class index
+        """
+        batch_size, num_priors = gt_bbox.shape[0], prior_boxes.shape[0]
+        ious = iou_similarity(gt_bbox.reshape((-1, 4)), prior_boxes).reshape(
+            (batch_size, -1, num_priors))
+
+        # Calculate the number of object per sample.
+        num_object = (ious.sum(axis=-1) > 0).astype('int64').sum(axis=-1)
+
+        # For each prior box, get the max IoU of all GTs.
+        prior_max_iou, prior_argmax_iou = ious.max(axis=1), ious.argmax(axis=1)
+        # For each GT, get the max IoU of all prior boxes.
+        gt_max_iou, gt_argmax_iou = ious.max(axis=2), ious.argmax(axis=2)
+
+        # Gather target bbox and label according to 'prior_argmax_iou' index.
+        batch_ind = paddle.arange(
+            0, batch_size, dtype='int64').unsqueeze(-1).tile([1, num_priors])
+        prior_argmax_iou = paddle.stack([batch_ind, prior_argmax_iou], axis=-1)
+        targets_bbox = paddle.gather_nd(gt_bbox, prior_argmax_iou)
+        targets_label = paddle.gather_nd(gt_label, prior_argmax_iou)
+        # Assign negative
+        bg_index_tensor = paddle.full([batch_size, num_priors, 1], bg_index,
+                                      'int64')
+        targets_label = paddle.where(
+            prior_max_iou.unsqueeze(-1) < self.overlap_threshold,
+            bg_index_tensor, targets_label)
+
+        # Ensure each GT can match the max IoU prior box.
         for i in range(batch_size):
-            col_ids = np.where(matched_indices[i] > -1)
-            col_val = matched_indices[i][col_ids]
-            for v, c in zip(col_val.tolist(), col_ids[0]):
-                trg_bbox[i][c] = encoded_box[i][v][c]
-            trg_bbox_wt[i][col_ids] = 1.0
-
-        return paddle.to_tensor(trg_bbox), paddle.to_tensor(trg_bbox_wt)
-
-    def _mine_hard_example(self,
-                           conf_loss,
-                           matched_indices,
-                           matched_dist,
-                           neg_pos_ratio=3.0,
-                           neg_overlap=0.5):
-        pos = (matched_indices > -1).astype(conf_loss.dtype)
+            if num_object[i] > 0:
+                targets_bbox[i] = paddle.scatter(
+                    targets_bbox[i], gt_argmax_iou[i, :int(num_object[i])],
+                    gt_bbox[i, :int(num_object[i])])
+                targets_label[i] = paddle.scatter(
+                    targets_label[i], gt_argmax_iou[i, :int(num_object[i])],
+                    gt_label[i, :int(num_object[i])])
+
+        # Encode box
+        prior_boxes = prior_boxes.unsqueeze(0).tile([batch_size, 1, 1])
+        targets_bbox = bbox2delta(
+            prior_boxes.reshape([-1, 4]),
+            targets_bbox.reshape([-1, 4]), self.prior_box_var)
+        targets_bbox = targets_bbox.reshape([batch_size, -1, 4])
+
+        return targets_bbox, targets_label
+
+    def _mine_hard_example(self, conf_loss, targets_label, bg_index):
+        pos = (targets_label != bg_index).astype(conf_loss.dtype)
         num_pos = pos.sum(axis=1, keepdim=True)
-        neg = (matched_dist < neg_overlap).astype(conf_loss.dtype)
+        neg = (targets_label == bg_index).astype(conf_loss.dtype)
 
-        conf_loss = conf_loss * (1.0 - pos) * neg
+        conf_loss = conf_loss.clone() * neg
         loss_idx = conf_loss.argsort(axis=1, descending=True)
         idx_rank = loss_idx.argsort(axis=1)
         num_negs = []
-        for i in range(matched_indices.shape[0]):
-            cur_idx = loss_idx[i]
+        for i in range(conf_loss.shape[0]):
             cur_num_pos = num_pos[i]
-            num_neg = paddle.clip(cur_num_pos * neg_pos_ratio, max=pos.shape[1])
+            num_neg = paddle.clip(
+                cur_num_pos * self.neg_pos_ratio, max=pos.shape[1])
             num_negs.append(num_neg)
-        num_neg = paddle.stack(num_negs, axis=0).expand_as(idx_rank)
+        num_neg = paddle.stack(num_negs).expand_as(idx_rank)
         neg_mask = (idx_rank < num_neg).astype(conf_loss.dtype)
-        return neg_mask
 
-    def forward(self, boxes, scores, gt_box, gt_class, anchors):
+        return (neg_mask + pos).astype('bool')
+
+    def forward(self, boxes, scores, gt_bbox, gt_label, prior_boxes):
         boxes = paddle.concat(boxes, axis=1)
         scores = paddle.concat(scores, axis=1)
-        prior_boxes = paddle.concat(anchors, axis=0)
-        gt_label = gt_class.unsqueeze(-1)
-        batch_size, num_priors = scores.shape[:2]
-        num_classes = scores.shape[-1] - 1
-
-        def _reshape_to_2d(x):
-            return paddle.flatten(x, start_axis=2)
-
-        # 1. Find matched bounding box by prior box.
-        #   1.1 Compute IOU similarity between ground-truth boxes and prior boxes.
-        #   1.2 Compute matched bounding box by bipartite matching algorithm.
-        matched_indices = []
-        matched_dist = []
-        for i in range(gt_box.shape[0]):
-            iou = iou_similarity(gt_box[i], prior_boxes)
-            matched_indice, matched_d = bipartite_match(iou, self.match_type,
-                                                        self.overlap_threshold)
-            matched_indices.append(matched_indice)
-            matched_dist.append(matched_d)
-        matched_indices = paddle.concat(matched_indices, axis=0)
-        matched_indices.stop_gradient = True
-        matched_dist = paddle.concat(matched_dist, axis=0)
-        matched_dist.stop_gradient = True
-
-        # 2. Compute confidence for mining hard examples
-        # 2.1. Get the target label based on matched indices
-        target_label, _ = self._label_target_assign(
-            gt_label, matched_indices, mismatch_value=num_classes)
-        confidence = _reshape_to_2d(scores)
-        # 2.2. Compute confidence loss.
-        # Reshape confidence to 2D tensor.
-        target_label = _reshape_to_2d(target_label).astype('int64')
-        conf_loss = F.softmax_with_cross_entropy(confidence, target_label)
-        conf_loss = paddle.reshape(conf_loss, [batch_size, num_priors])
-
-        # 3. Mining hard examples
-        neg_mask = self._mine_hard_example(
-            conf_loss,
-            matched_indices,
-            matched_dist,
-            neg_pos_ratio=self.neg_pos_ratio,
-            neg_overlap=self.neg_overlap)
-
-        # 4. Assign classification and regression targets
-        # 4.1. Encoded bbox according to the prior boxes.
-        prior_box_var = paddle.to_tensor(
-            np.array(
-                [0.1, 0.1, 0.2, 0.2], dtype='float32')).reshape(
-                    [1, 4]).expand_as(prior_boxes)
-        encoded_bbox = []
-        for i in range(gt_box.shape[0]):
-            encoded_bbox.append(
-                box_coder(
-                    prior_box=prior_boxes,
-                    prior_box_var=prior_box_var,
-                    target_box=gt_box[i],
-                    code_type='encode_center_size'))
-        encoded_bbox = paddle.stack(encoded_bbox, axis=0)
-        # 4.2. Assign regression targets
-        target_bbox, target_loc_weight = self._bbox_target_assign(
-            encoded_bbox, matched_indices)
-        # 4.3. Assign classification targets
-        target_label, target_conf_weight = self._label_target_assign(
-            gt_label,
-            matched_indices,
-            neg_mask=neg_mask,
-            mismatch_value=num_classes)
-
-        # 5. Compute loss.
-        # 5.1 Compute confidence loss.
-        target_label = _reshape_to_2d(target_label).astype('int64')
-        conf_loss = F.softmax_with_cross_entropy(confidence, target_label)
-
-        target_conf_weight = _reshape_to_2d(target_conf_weight)
-        conf_loss = conf_loss * target_conf_weight * self.conf_loss_weight
-
-        # 5.2 Compute regression loss.
-        location = _reshape_to_2d(boxes)
-        target_bbox = _reshape_to_2d(target_bbox)
-
-        loc_loss = F.smooth_l1_loss(location, target_bbox, reduction='none')
-        loc_loss = paddle.sum(loc_loss, axis=-1, keepdim=True)
-        target_loc_weight = _reshape_to_2d(target_loc_weight)
-        loc_loss = loc_loss * target_loc_weight * self.loc_loss_weight
-
-        # 5.3 Compute overall weighted loss.
-        loss = conf_loss + loc_loss
-        loss = paddle.reshape(loss, [batch_size, num_priors])
-        loss = paddle.sum(loss, axis=1, keepdim=True)
-        normalizer = paddle.sum(target_loc_weight)
-        loss = paddle.sum(loss / normalizer)
+        gt_label = gt_label.unsqueeze(-1).astype('int64')
+        prior_boxes = paddle.concat(prior_boxes, axis=0)
+        bg_index = scores.shape[-1] - 1
+
+        # Match bbox and get targets.
+        targets_bbox, targets_label = \
+            self._bipartite_match_for_batch(gt_bbox, gt_label, prior_boxes, bg_index)
+        targets_bbox.stop_gradient = True
+        targets_label.stop_gradient = True
+
+        # Compute regression loss.
+        # Select positive samples.
+        bbox_mask = (targets_label != bg_index).astype(boxes.dtype)
+        loc_loss = bbox_mask * F.smooth_l1_loss(
+            boxes, targets_bbox, reduction='none')
+        loc_loss = loc_loss.sum() * self.loc_loss_weight
+
+        # Compute confidence loss.
+        conf_loss = F.softmax_with_cross_entropy(scores, targets_label)
+        # Mining hard examples.
+        label_mask = self._mine_hard_example(
+            conf_loss.squeeze(-1), targets_label.squeeze(-1), bg_index)
+        conf_loss = conf_loss * label_mask.unsqueeze(-1).astype(conf_loss.dtype)
+        conf_loss = conf_loss.sum() * self.conf_loss_weight
+
+        # Compute overall weighted loss.
+        normalizer = (targets_label != bg_index).astype('float32').sum().clip(
+            min=1)
+        loss = (conf_loss + loc_loss) / (normalizer + 1e-9)
 
         return loss
diff --git a/ppdet/modeling/losses/yolo_loss.py b/ppdet/modeling/losses/yolo_loss.py
index 149139989a425fad61648c4ee8de43e2fbe7f798..657959cd7e55cf43d6362f03e1a4c1204b814c07 100644
--- a/ppdet/modeling/losses/yolo_loss.py
+++ b/ppdet/modeling/losses/yolo_loss.py
@@ -21,7 +21,7 @@ import paddle.nn as nn
 import paddle.nn.functional as F
 from ppdet.core.workspace import register
 
-from ..utils import decode_yolo, xywh2xyxy, iou_similarity
+from ..bbox_utils import decode_yolo, xywh2xyxy, iou_similarity
 
 __all__ = ['YOLOv3Loss']
 
@@ -46,6 +46,18 @@ class YOLOv3Loss(nn.Layer):
                  scale_x_y=1.,
                  iou_loss=None,
                  iou_aware_loss=None):
+        """
+        YOLOv3Loss layer
+
+        Args:
+            num_calsses (int): number of foreground classes
+            ignore_thresh (float): threshold to ignore confidence loss
+            label_smooth (bool): whether to use label smoothing
+            downsample (list): downsample ratio for each detection block
+            scale_x_y (float): scale_x_y factor
+            iou_loss (object): IoULoss instance
+            iou_aware_loss (object): IouAwareLoss instance  
+        """
         super(YOLOv3Loss, self).__init__()
         self.num_classes = num_classes
         self.ignore_thresh = ignore_thresh
@@ -54,6 +66,7 @@ class YOLOv3Loss(nn.Layer):
         self.scale_x_y = scale_x_y
         self.iou_loss = iou_loss
         self.iou_aware_loss = iou_aware_loss
+        self.distill_pairs = []
 
     def obj_loss(self, pbox, gbox, pobj, tobj, anchor, downsample):
         # pbox
@@ -108,6 +121,7 @@ class YOLOv3Loss(nn.Layer):
         x, y = p[:, :, :, :, 0:1], p[:, :, :, :, 1:2]
         w, h = p[:, :, :, :, 2:3], p[:, :, :, :, 3:4]
         obj, pcls = p[:, :, :, :, 4:5], p[:, :, :, :, 5:]
+        self.distill_pairs.append([x, y, w, h, obj, pcls])
 
         t = t.transpose((0, 1, 3, 4, 2))
         tx, ty = t[:, :, :, :, 0:1], t[:, :, :, :, 1:2]
@@ -173,6 +187,7 @@ class YOLOv3Loss(nn.Layer):
         gt_targets = [targets['target{}'.format(i)] for i in range(np)]
         gt_box = targets['gt_bbox']
         yolo_losses = dict()
+        self.distill_pairs.clear()
         for x, t, anchor, downsample in zip(inputs, gt_targets, anchors,
                                             self.downsample):
             yolo_loss = self.yolov3_loss(x, t, gt_box, anchor, downsample,
diff --git a/ppdet/modeling/necks/fpn.py b/ppdet/modeling/necks/fpn.py
index 85767bb105dd4d134b339e37b9f759016ce9f369..867b7dc451a85773a1e902232c260b47d08ece4a 100644
--- a/ppdet/modeling/necks/fpn.py
+++ b/ppdet/modeling/necks/fpn.py
@@ -29,6 +29,34 @@ __all__ = ['FPN']
 @register
 @serializable
 class FPN(nn.Layer):
+    """
+    Feature Pyramid Network, see https://arxiv.org/abs/1612.03144
+
+    Args:
+        in_channels (list[int]): input channels of each level which can be 
+            derived from the output shape of backbone by from_config
+        out_channel (list[int]): output channel of each level
+        spatial_scales (list[float]): the spatial scales between input feature
+            maps and original input image which can be derived from the output 
+            shape of backbone by from_config
+        has_extra_convs (bool): whether to add extra conv to the last level.
+            default False
+        extra_stage (int): the number of extra stages added to the last level.
+            default 1
+        use_c5 (bool): Whether to use c5 as the input of extra stage, 
+            otherwise p5 is used. default True
+        norm_type (string|None): The normalization type in FPN module. If 
+            norm_type is None, norm will not be used after conv and if 
+            norm_type is string, bn, gn, sync_bn are available. default None
+        norm_decay (float): weight decay for normalization layer weights.
+            default 0.
+        freeze_norm (bool): whether to freeze normalization layer.  
+            default False
+        relu_before_extra_convs (bool): whether to add relu before extra convs.
+            default False
+        
+    """
+
     def __init__(self,
                  in_channels,
                  out_channel,
@@ -67,7 +95,7 @@ class FPN(nn.Layer):
             else:
                 lateral_name = 'fpn_inner_res{}_sum_lateral'.format(i + 2)
             in_c = in_channels[i - st_stage]
-            if self.norm_type == 'gn':
+            if self.norm_type is not None:
                 lateral = self.add_sublayer(
                     lateral_name,
                     ConvNormLayer(
@@ -77,10 +105,8 @@ class FPN(nn.Layer):
                         stride=1,
                         norm_type=self.norm_type,
                         norm_decay=self.norm_decay,
-                        norm_name=lateral_name + '_norm',
                         freeze_norm=self.freeze_norm,
-                        initializer=XavierUniform(fan_out=in_c),
-                        name=lateral_name))
+                        initializer=XavierUniform(fan_out=in_c)))
             else:
                 lateral = self.add_sublayer(
                     lateral_name,
@@ -93,7 +119,7 @@ class FPN(nn.Layer):
             self.lateral_convs.append(lateral)
 
             fpn_name = 'fpn_res{}_sum'.format(i + 2)
-            if self.norm_type == 'gn':
+            if self.norm_type is not None:
                 fpn_conv = self.add_sublayer(
                     fpn_name,
                     ConvNormLayer(
@@ -103,10 +129,8 @@ class FPN(nn.Layer):
                         stride=1,
                         norm_type=self.norm_type,
                         norm_decay=self.norm_decay,
-                        norm_name=fpn_name + '_norm',
                         freeze_norm=self.freeze_norm,
-                        initializer=XavierUniform(fan_out=fan),
-                        name=fpn_name))
+                        initializer=XavierUniform(fan_out=fan)))
             else:
                 fpn_conv = self.add_sublayer(
                     fpn_name,
@@ -128,7 +152,7 @@ class FPN(nn.Layer):
                 else:
                     in_c = out_channel
                 extra_fpn_name = 'fpn_{}'.format(lvl + 2)
-                if self.norm_type == 'gn':
+                if self.norm_type is not None:
                     extra_fpn_conv = self.add_sublayer(
                         extra_fpn_name,
                         ConvNormLayer(
@@ -138,10 +162,8 @@ class FPN(nn.Layer):
                             stride=2,
                             norm_type=self.norm_type,
                             norm_decay=self.norm_decay,
-                            norm_name=extra_fpn_name + '_norm',
                             freeze_norm=self.freeze_norm,
-                            initializer=XavierUniform(fan_out=fan),
-                            name=extra_fpn_name))
+                            initializer=XavierUniform(fan_out=fan)))
                 else:
                     extra_fpn_conv = self.add_sublayer(
                         extra_fpn_name,
diff --git a/ppdet/modeling/necks/hrfpn.py b/ppdet/modeling/necks/hrfpn.py
index 7afbbc0ea2cf25584a234ed731da25628b36c29b..4b737c9fb3153a3536fdb571318e38d407b89050 100644
--- a/ppdet/modeling/necks/hrfpn.py
+++ b/ppdet/modeling/necks/hrfpn.py
@@ -30,8 +30,8 @@ class HRFPN(nn.Layer):
         in_channels (list): number of input feature channels from backbone
         out_channel (int): number of output feature channels
         share_conv (bool): whether to share conv for different layers' reduction
-        spatial_scales (list): feature map scaling factor
         extra_stage (int): add extra stage for returning HRFPN fpn_feats
+        spatial_scales (list): feature map scaling factor
     """
 
     def __init__(self,
diff --git a/ppdet/modeling/necks/ttf_fpn.py b/ppdet/modeling/necks/ttf_fpn.py
index 16f808240d2f05217e048ce89f3a3effcbaffe94..9c7f3924f0c2f611be5aab73cfd23921226e5eec 100644
--- a/ppdet/modeling/necks/ttf_fpn.py
+++ b/ppdet/modeling/necks/ttf_fpn.py
@@ -16,23 +16,20 @@ import paddle
 import paddle.nn as nn
 import paddle.nn.functional as F
 from paddle import ParamAttr
-from paddle.nn.initializer import Constant, Uniform, Normal
-from paddle.nn import Conv2D, ReLU, Sequential
+from paddle.nn.initializer import Constant, Uniform, Normal, XavierUniform
 from paddle import ParamAttr
 from ppdet.core.workspace import register, serializable
 from paddle.regularizer import L2Decay
-from ppdet.modeling.layers import DeformableConvV2
+from ppdet.modeling.layers import DeformableConvV2, ConvNormLayer, LiteConv
 import math
 from ppdet.modeling.ops import batch_norm
 from ..shape_spec import ShapeSpec
 
 __all__ = ['TTFFPN']
 
-__all__ = ['TTFFPN']
-
 
 class Upsample(nn.Layer):
-    def __init__(self, ch_in, ch_out, name=None):
+    def __init__(self, ch_in, ch_out, norm_type='bn'):
         super(Upsample, self).__init__()
         fan_in = ch_in * 3 * 3
         stdv = 1. / math.sqrt(fan_in)
@@ -46,11 +43,10 @@ class Upsample(nn.Layer):
                 regularizer=L2Decay(0.),
                 learning_rate=2.),
             lr_scale=2.,
-            regularizer=L2Decay(0.),
-            name=name)
+            regularizer=L2Decay(0.))
 
         self.bn = batch_norm(
-            ch_out, norm_type='bn', initializer=Constant(1.), name=name)
+            ch_out, norm_type=norm_type, initializer=Constant(1.))
 
     def forward(self, feat):
         dcn = self.dcn(feat)
@@ -60,29 +56,98 @@ class Upsample(nn.Layer):
         return out
 
 
+class DeConv(nn.Layer):
+    def __init__(self, ch_in, ch_out, norm_type='bn'):
+        super(DeConv, self).__init__()
+        self.deconv = nn.Sequential()
+        conv1 = ConvNormLayer(
+            ch_in=ch_in,
+            ch_out=ch_out,
+            stride=1,
+            filter_size=1,
+            norm_type=norm_type,
+            initializer=XavierUniform())
+        conv2 = nn.Conv2DTranspose(
+            in_channels=ch_out,
+            out_channels=ch_out,
+            kernel_size=4,
+            padding=1,
+            stride=2,
+            groups=ch_out,
+            weight_attr=ParamAttr(initializer=XavierUniform()),
+            bias_attr=False)
+        bn = batch_norm(ch_out, norm_type=norm_type, norm_decay=0.)
+        conv3 = ConvNormLayer(
+            ch_in=ch_out,
+            ch_out=ch_out,
+            stride=1,
+            filter_size=1,
+            norm_type=norm_type,
+            initializer=XavierUniform())
+
+        self.deconv.add_sublayer('conv1', conv1)
+        self.deconv.add_sublayer('relu6_1', nn.ReLU6())
+        self.deconv.add_sublayer('conv2', conv2)
+        self.deconv.add_sublayer('bn', bn)
+        self.deconv.add_sublayer('relu6_2', nn.ReLU6())
+        self.deconv.add_sublayer('conv3', conv3)
+        self.deconv.add_sublayer('relu6_3', nn.ReLU6())
+
+    def forward(self, inputs):
+        return self.deconv(inputs)
+
+
+class LiteUpsample(nn.Layer):
+    def __init__(self, ch_in, ch_out, norm_type='bn'):
+        super(LiteUpsample, self).__init__()
+        self.deconv = DeConv(ch_in, ch_out, norm_type=norm_type)
+        self.conv = LiteConv(ch_in, ch_out, norm_type=norm_type)
+
+    def forward(self, inputs):
+        deconv_up = self.deconv(inputs)
+        conv = self.conv(inputs)
+        interp_up = F.interpolate(conv, scale_factor=2., mode='bilinear')
+        return deconv_up + interp_up
+
+
 class ShortCut(nn.Layer):
-    def __init__(self, layer_num, ch_out, name=None):
+    def __init__(self,
+                 layer_num,
+                 ch_in,
+                 ch_out,
+                 norm_type='bn',
+                 lite_neck=False,
+                 name=None):
         super(ShortCut, self).__init__()
-        shortcut_conv = Sequential()
-        ch_in = ch_out * 2
+        shortcut_conv = nn.Sequential()
         for i in range(layer_num):
             fan_out = 3 * 3 * ch_out
             std = math.sqrt(2. / fan_out)
             in_channels = ch_in if i == 0 else ch_out
             shortcut_name = name + '.conv.{}'.format(i)
-            shortcut_conv.add_sublayer(
-                shortcut_name,
-                Conv2D(
-                    in_channels=in_channels,
-                    out_channels=ch_out,
-                    kernel_size=3,
-                    padding=1,
-                    weight_attr=ParamAttr(initializer=Normal(0, std)),
-                    bias_attr=ParamAttr(
-                        learning_rate=2., regularizer=L2Decay(0.))))
-            if i < layer_num - 1:
-                shortcut_conv.add_sublayer(shortcut_name + '.act', ReLU())
-        self.shortcut = self.add_sublayer('short', shortcut_conv)
+            if lite_neck:
+                shortcut_conv.add_sublayer(
+                    shortcut_name,
+                    LiteConv(
+                        in_channels=in_channels,
+                        out_channels=ch_out,
+                        with_act=i < layer_num - 1,
+                        norm_type=norm_type))
+            else:
+                shortcut_conv.add_sublayer(
+                    shortcut_name,
+                    nn.Conv2D(
+                        in_channels=in_channels,
+                        out_channels=ch_out,
+                        kernel_size=3,
+                        padding=1,
+                        weight_attr=ParamAttr(initializer=Normal(0, std)),
+                        bias_attr=ParamAttr(
+                            learning_rate=2., regularizer=L2Decay(0.))))
+                if i < layer_num - 1:
+                    shortcut_conv.add_sublayer(shortcut_name + '.act',
+                                               nn.ReLU())
+        self.shortcut = self.add_sublayer('shortcut', shortcut_conv)
 
     def forward(self, feat):
         out = self.shortcut(feat)
@@ -97,35 +162,65 @@ class TTFFPN(nn.Layer):
         in_channels (list): number of input feature channels from backbone.
             [128,256,512,1024] by default, means the channels of DarkNet53
             backbone return_idx [1,2,3,4].
+        planes (list): the number of output feature channels of FPN.
+            [256, 128, 64] by default
         shortcut_num (list): the number of convolution layers in each shortcut.
             [3,2,1] by default, means DarkNet53 backbone return_idx_1 has 3 convs
             in its shortcut, return_idx_2 has 2 convs and return_idx_3 has 1 conv.
+        norm_type (string): norm type, 'sync_bn', 'bn', 'gn' are optional. 
+            bn by default
+        lite_neck (bool): whether to use lite conv in TTFNet FPN, 
+            False by default
+        fusion_method (string): the method to fusion upsample and lateral layer.
+            'add' and 'concat' are optional, add by default
     """
 
+    __shared__ = ['norm_type']
+
     def __init__(self,
-                 in_channels=[128, 256, 512, 1024],
-                 shortcut_num=[3, 2, 1]):
+                 in_channels,
+                 planes=[256, 128, 64],
+                 shortcut_num=[3, 2, 1],
+                 norm_type='bn',
+                 lite_neck=False,
+                 fusion_method='add'):
         super(TTFFPN, self).__init__()
-        self.planes = [c // 2 for c in in_channels[:-1]][::-1]
+        self.planes = planes
         self.shortcut_num = shortcut_num[::-1]
         self.shortcut_len = len(shortcut_num)
         self.ch_in = in_channels[::-1]
+        self.fusion_method = fusion_method
 
         self.upsample_list = []
         self.shortcut_list = []
+        self.upper_list = []
         for i, out_c in enumerate(self.planes):
-            in_c = self.ch_in[i] if i == 0 else self.ch_in[i] // 2
+            in_c = self.ch_in[i] if i == 0 else self.upper_list[-1]
+            upsample_module = LiteUpsample if lite_neck else Upsample
             upsample = self.add_sublayer(
                 'upsample.' + str(i),
-                Upsample(
-                    in_c, out_c, name='upsample.' + str(i)))
+                upsample_module(
+                    in_c, out_c, norm_type=norm_type))
             self.upsample_list.append(upsample)
             if i < self.shortcut_len:
                 shortcut = self.add_sublayer(
                     'shortcut.' + str(i),
                     ShortCut(
-                        self.shortcut_num[i], out_c, name='shortcut.' + str(i)))
+                        self.shortcut_num[i],
+                        self.ch_in[i + 1],
+                        out_c,
+                        norm_type=norm_type,
+                        lite_neck=lite_neck,
+                        name='shortcut.' + str(i)))
                 self.shortcut_list.append(shortcut)
+                if self.fusion_method == 'add':
+                    upper_c = out_c
+                elif self.fusion_method == 'concat':
+                    upper_c = out_c * 2
+                else:
+                    raise ValueError('Illegal fusion method. Expected add or\
+                        concat, but received {}'.format(self.fusion_method))
+                self.upper_list.append(upper_c)
 
     def forward(self, inputs):
         feat = inputs[-1]
@@ -133,7 +228,10 @@ class TTFFPN(nn.Layer):
             feat = self.upsample_list[i](feat)
             if i < self.shortcut_len:
                 shortcut = self.shortcut_list[i](inputs[-i - 2])
-                feat = feat + shortcut
+                if self.fusion_method == 'add':
+                    feat = feat + shortcut
+                else:
+                    feat = paddle.concat([feat, shortcut], axis=1)
         return feat
 
     @classmethod
@@ -142,4 +240,4 @@ class TTFFPN(nn.Layer):
 
     @property
     def out_shape(self):
-        return [ShapeSpec(channels=self.planes[-1], )]
+        return [ShapeSpec(channels=self.upper_list[-1], )]
diff --git a/ppdet/modeling/necks/yolo_fpn.py b/ppdet/modeling/necks/yolo_fpn.py
index 456bfae2097f3187ced647367986d20fee2730a5..25458974aa21c10d4b3635aba05dccebd2dfd141 100644
--- a/ppdet/modeling/necks/yolo_fpn.py
+++ b/ppdet/modeling/necks/yolo_fpn.py
@@ -25,8 +25,44 @@ from ..shape_spec import ShapeSpec
 __all__ = ['YOLOv3FPN', 'PPYOLOFPN']
 
 
+def add_coord(x, data_format):
+    b = x.shape[0]
+    if data_format == 'NCHW':
+        h = x.shape[2]
+        w = x.shape[3]
+    else:
+        h = x.shape[1]
+        w = x.shape[2]
+
+    gx = paddle.arange(w, dtype='float32') / (w - 1.) * 2.0 - 1.
+    if data_format == 'NCHW':
+        gx = gx.reshape([1, 1, 1, w]).expand([b, 1, h, w])
+    else:
+        gx = gx.reshape([1, 1, w, 1]).expand([b, h, w, 1])
+    gx.stop_gradient = True
+
+    gy = paddle.arange(h, dtype='float32') / (h - 1.) * 2.0 - 1.
+    if data_format == 'NCHW':
+        gy = gy.reshape([1, 1, h, 1]).expand([b, 1, h, w])
+    else:
+        gy = gy.reshape([1, h, 1, 1]).expand([b, h, w, 1])
+    gy.stop_gradient = True
+
+    return gx, gy
+
+
 class YoloDetBlock(nn.Layer):
     def __init__(self, ch_in, channel, norm_type, name, data_format='NCHW'):
+        """
+        YOLODetBlock layer for yolov3, see https://arxiv.org/abs/1804.02767
+
+        Args:
+            ch_in (int): input channel
+            channel (int): base channel
+            norm_type (str): batch norm type
+            name (str): layer name
+            data_format (str): data format, NCHW or NHWC
+        """
         super(YoloDetBlock, self).__init__()
         self.ch_in = ch_in
         self.channel = channel
@@ -77,9 +113,22 @@ class SPP(nn.Layer):
                  pool_size,
                  norm_type,
                  name,
+                 act='leaky',
                  data_format='NCHW'):
+        """
+        SPP layer, which consist of four pooling layer follwed by conv layer
+
+        Args:
+            ch_in (int): input channel of conv layer
+            ch_out (int): output channel of conv layer
+            k (int): kernel size of conv layer
+            norm_type (str): batch norm type
+            name (str): layer name
+            data_format (str): data format, NCHW or NHWC
+        """
         super(SPP, self).__init__()
         self.pool = []
+        self.data_format = data_format
         for size in pool_size:
             pool = self.add_sublayer(
                 '{}.pool1'.format(name),
@@ -97,19 +146,33 @@ class SPP(nn.Layer):
             padding=k // 2,
             norm_type=norm_type,
             name=name,
+            act=act,
             data_format=data_format)
 
     def forward(self, x):
         outs = [x]
         for pool in self.pool:
             outs.append(pool(x))
-        y = paddle.concat(outs, axis=1)
+        if self.data_format == "NCHW":
+            y = paddle.concat(outs, axis=1)
+        else:
+            y = paddle.concat(outs, axis=-1)
+
         y = self.conv(y)
         return y
 
 
 class DropBlock(nn.Layer):
     def __init__(self, block_size, keep_prob, name, data_format='NCHW'):
+        """
+        DropBlock layer, see https://arxiv.org/abs/1810.12890
+
+        Args:
+            block_size (int): block size
+            keep_prob (int): keep probability
+            name (str): layer name
+            data_format (str): data format, NCHW or NHWC
+        """
         super(DropBlock, self).__init__()
         self.block_size = block_size
         self.keep_prob = keep_prob
@@ -149,6 +212,19 @@ class CoordConv(nn.Layer):
                  norm_type,
                  name,
                  data_format='NCHW'):
+        """
+        CoordConv layer
+
+        Args:
+            ch_in (int): input channel
+            ch_out (int): output channel
+            filter_size (int): filter size, default 3
+            padding (int): padding size, default 0
+            norm_type (str): batch norm type, default bn
+            name (str): layer name
+            data_format (str): data format, NCHW or NHWC
+
+        """
         super(CoordConv, self).__init__()
         self.conv = ConvBNLayer(
             ch_in + 2,
@@ -161,28 +237,7 @@ class CoordConv(nn.Layer):
         self.data_format = data_format
 
     def forward(self, x):
-        b = x.shape[0]
-        if self.data_format == 'NCHW':
-            h = x.shape[2]
-            w = x.shape[3]
-        else:
-            h = x.shape[1]
-            w = x.shape[2]
-
-        gx = paddle.arange(w, dtype='float32') / (w - 1.) * 2.0 - 1.
-        if self.data_format == 'NCHW':
-            gx = gx.reshape([1, 1, 1, w]).expand([b, 1, h, w])
-        else:
-            gx = gx.reshape([1, 1, w, 1]).expand([b, h, w, 1])
-        gx.stop_gradient = True
-
-        gy = paddle.arange(h, dtype='float32') / (h - 1.) * 2.0 - 1.
-        if self.data_format == 'NCHW':
-            gy = gy.reshape([1, 1, h, 1]).expand([b, 1, h, w])
-        else:
-            gy = gy.reshape([1, h, 1, 1]).expand([b, h, w, 1])
-        gy.stop_gradient = True
-
+        gx, gy = add_coord(x, self.data_format)
         if self.data_format == 'NCHW':
             y = paddle.concat([x, gx, gy], axis=1)
         else:
@@ -193,6 +248,14 @@ class CoordConv(nn.Layer):
 
 class PPYOLODetBlock(nn.Layer):
     def __init__(self, cfg, name, data_format='NCHW'):
+        """
+        PPYOLODetBlock layer
+
+        Args:
+            cfg (list): layer configs for this block
+            name (str): block name
+            data_format (str): data format, NCHW or NHWC
+        """
         super(PPYOLODetBlock, self).__init__()
         self.conv_module = nn.Sequential()
         for idx, (conv_name, layer, args, kwargs) in enumerate(cfg[:-1]):
@@ -211,6 +274,143 @@ class PPYOLODetBlock(nn.Layer):
         return route, tip
 
 
+class PPYOLOTinyDetBlock(nn.Layer):
+    def __init__(self,
+                 ch_in,
+                 ch_out,
+                 name,
+                 drop_block=False,
+                 block_size=3,
+                 keep_prob=0.9,
+                 data_format='NCHW'):
+        """
+        PPYOLO Tiny DetBlock layer
+        Args:
+            ch_in (list): input channel number
+            ch_out (list): output channel number
+            name (str): block name
+            drop_block: whether user DropBlock
+            block_size: drop block size
+            keep_prob: probability to keep block in DropBlock
+            data_format (str): data format, NCHW or NHWC
+        """
+        super(PPYOLOTinyDetBlock, self).__init__()
+        self.drop_block_ = drop_block
+        self.conv_module = nn.Sequential()
+
+        cfgs = [
+            # name, in channels, out channels, filter_size, 
+            # stride, padding, groups
+            ['.0', ch_in, ch_out, 1, 1, 0, 1],
+            ['.1', ch_out, ch_out, 5, 1, 2, ch_out],
+            ['.2', ch_out, ch_out, 1, 1, 0, 1],
+            ['.route', ch_out, ch_out, 5, 1, 2, ch_out],
+        ]
+        for cfg in cfgs:
+            conv_name, conv_ch_in, conv_ch_out, filter_size, stride, padding, \
+                    groups = cfg
+            self.conv_module.add_sublayer(
+                name + conv_name,
+                ConvBNLayer(
+                    ch_in=conv_ch_in,
+                    ch_out=conv_ch_out,
+                    filter_size=filter_size,
+                    stride=stride,
+                    padding=padding,
+                    groups=groups,
+                    name=name + conv_name))
+
+        self.tip = ConvBNLayer(
+            ch_in=ch_out,
+            ch_out=ch_out,
+            filter_size=1,
+            stride=1,
+            padding=0,
+            groups=1,
+            name=name + conv_name)
+
+        if self.drop_block_:
+            self.drop_block = DropBlock(
+                block_size=block_size,
+                keep_prob=keep_prob,
+                data_format=data_format,
+                name=name + '.dropblock')
+
+    def forward(self, inputs):
+        if self.drop_block_:
+            inputs = self.drop_block(inputs)
+        route = self.conv_module(inputs)
+        tip = self.tip(route)
+        return route, tip
+
+
+class PPYOLODetBlockCSP(nn.Layer):
+    def __init__(self,
+                 cfg,
+                 ch_in,
+                 ch_out,
+                 act,
+                 norm_type,
+                 name,
+                 data_format='NCHW'):
+        """
+        PPYOLODetBlockCSP layer
+
+        Args:
+            cfg (list): layer configs for this block
+            ch_in (int): input channel
+            ch_out (int): output channel
+            act (str): default mish
+            name (str): block name
+            data_format (str): data format, NCHW or NHWC
+        """
+        super(PPYOLODetBlockCSP, self).__init__()
+        self.data_format = data_format
+        self.conv1 = ConvBNLayer(
+            ch_in,
+            ch_out,
+            1,
+            padding=0,
+            act=act,
+            norm_type=norm_type,
+            name=name + '.left',
+            data_format=data_format)
+        self.conv2 = ConvBNLayer(
+            ch_in,
+            ch_out,
+            1,
+            padding=0,
+            act=act,
+            norm_type=norm_type,
+            name=name + '.right',
+            data_format=data_format)
+        self.conv3 = ConvBNLayer(
+            ch_out * 2,
+            ch_out * 2,
+            1,
+            padding=0,
+            act=act,
+            norm_type=norm_type,
+            name=name,
+            data_format=data_format)
+        self.conv_module = nn.Sequential()
+        for idx, (layer_name, layer, args, kwargs) in enumerate(cfg):
+            kwargs.update(name=name + layer_name, data_format=data_format)
+            self.conv_module.add_sublayer(layer_name, layer(*args, **kwargs))
+
+    def forward(self, inputs):
+        conv_left = self.conv1(inputs)
+        conv_right = self.conv2(inputs)
+        conv_left = self.conv_module(conv_left)
+        if self.data_format == 'NCHW':
+            conv = paddle.concat([conv_left, conv_right], axis=1)
+        else:
+            conv = paddle.concat([conv_left, conv_right], axis=-1)
+
+        conv = self.conv3(conv)
+        return conv, conv
+
+
 @register
 @serializable
 class YOLOv3FPN(nn.Layer):
@@ -220,6 +420,15 @@ class YOLOv3FPN(nn.Layer):
                  in_channels=[256, 512, 1024],
                  norm_type='bn',
                  data_format='NCHW'):
+        """
+        YOLOv3FPN layer
+
+        Args:
+            in_channels (list): input channels for fpn
+            norm_type (str): batch norm type, default bn
+            data_format (str): data format, NCHW or NHWC
+
+        """
         super(YOLOv3FPN, self).__init__()
         assert len(in_channels) > 0, "in_channels length should > 0"
         self.in_channels = in_channels
@@ -299,20 +508,38 @@ class PPYOLOFPN(nn.Layer):
                  in_channels=[512, 1024, 2048],
                  norm_type='bn',
                  data_format='NCHW',
-                 **kwargs):
+                 coord_conv=False,
+                 conv_block_num=2,
+                 drop_block=False,
+                 block_size=3,
+                 keep_prob=0.9,
+                 spp=False):
+        """
+        PPYOLOFPN layer
+
+        Args:
+            in_channels (list): input channels for fpn
+            norm_type (str): batch norm type, default bn
+            data_format (str): data format, NCHW or NHWC
+            coord_conv (bool): whether use CoordConv or not
+            conv_block_num (int): conv block num of each pan block
+            drop_block (bool): whether use DropBlock or not
+            block_size (int): block size of DropBlock
+            keep_prob (float): keep probability of DropBlock
+            spp (bool): whether use spp or not
+
+        """
         super(PPYOLOFPN, self).__init__()
         assert len(in_channels) > 0, "in_channels length should > 0"
         self.in_channels = in_channels
         self.num_blocks = len(in_channels)
         # parse kwargs
-        self.coord_conv = kwargs.get('coord_conv', False)
-        self.drop_block = kwargs.get('drop_block', False)
-        if self.drop_block:
-            self.block_size = kwargs.get('block_size', 3)
-            self.keep_prob = kwargs.get('keep_prob', 0.9)
-
-        self.spp = kwargs.get('spp', False)
-        self.conv_block_num = kwargs.get('conv_block_num', 2)
+        self.coord_conv = coord_conv
+        self.drop_block = drop_block
+        self.block_size = block_size
+        self.keep_prob = keep_prob
+        self.spp = spp
+        self.conv_block_num = conv_block_num
         self.data_format = data_format
         if self.coord_conv:
             ConvLayer = CoordConv
@@ -427,3 +654,308 @@ class PPYOLOFPN(nn.Layer):
     @property
     def out_shape(self):
         return [ShapeSpec(channels=c) for c in self._out_channels]
+
+
+@register
+@serializable
+class PPYOLOTinyFPN(nn.Layer):
+    __shared__ = ['norm_type', 'data_format']
+
+    def __init__(self,
+                 in_channels=[80, 56, 34],
+                 detection_block_channels=[160, 128, 96],
+                 norm_type='bn',
+                 data_format='NCHW',
+                 **kwargs):
+        """
+        PPYOLO Tiny FPN layer
+        Args:
+            in_channels (list): input channels for fpn
+            detection_block_channels (list): channels in fpn
+            norm_type (str): batch norm type, default bn
+            data_format (str): data format, NCHW or NHWC
+            kwargs: extra key-value pairs, such as parameter of DropBlock and spp 
+        """
+        super(PPYOLOTinyFPN, self).__init__()
+        assert len(in_channels) > 0, "in_channels length should > 0"
+        self.in_channels = in_channels[::-1]
+        assert len(detection_block_channels
+                   ) > 0, "detection_block_channelslength should > 0"
+        self.detection_block_channels = detection_block_channels
+        self.data_format = data_format
+        self.num_blocks = len(in_channels)
+        # parse kwargs
+        self.drop_block = kwargs.get('drop_block', False)
+        self.block_size = kwargs.get('block_size', 3)
+        self.keep_prob = kwargs.get('keep_prob', 0.9)
+
+        self.spp_ = kwargs.get('spp', False)
+        if self.spp_:
+            self.spp = SPP(self.in_channels[0] * 4,
+                           self.in_channels[0],
+                           k=1,
+                           pool_size=[5, 9, 13],
+                           norm_type=norm_type,
+                           name='spp')
+
+        self._out_channels = []
+        self.yolo_blocks = []
+        self.routes = []
+        for i, (
+                ch_in, ch_out
+        ) in enumerate(zip(self.in_channels, self.detection_block_channels)):
+            name = 'yolo_block.{}'.format(i)
+            if i > 0:
+                ch_in += self.detection_block_channels[i - 1]
+            yolo_block = self.add_sublayer(
+                name,
+                PPYOLOTinyDetBlock(
+                    ch_in,
+                    ch_out,
+                    name,
+                    drop_block=self.drop_block,
+                    block_size=self.block_size,
+                    keep_prob=self.keep_prob))
+            self.yolo_blocks.append(yolo_block)
+            self._out_channels.append(ch_out)
+
+            if i < self.num_blocks - 1:
+                name = 'yolo_transition.{}'.format(i)
+                route = self.add_sublayer(
+                    name,
+                    ConvBNLayer(
+                        ch_in=ch_out,
+                        ch_out=ch_out,
+                        filter_size=1,
+                        stride=1,
+                        padding=0,
+                        norm_type=norm_type,
+                        data_format=data_format,
+                        name=name))
+                self.routes.append(route)
+
+    def forward(self, blocks):
+        assert len(blocks) == self.num_blocks
+        blocks = blocks[::-1]
+
+        yolo_feats = []
+        for i, block in enumerate(blocks):
+            if i == 0 and self.spp_:
+                block = self.spp(block)
+
+            if i > 0:
+                if self.data_format == 'NCHW':
+                    block = paddle.concat([route, block], axis=1)
+                else:
+                    block = paddle.concat([route, block], axis=-1)
+            route, tip = self.yolo_blocks[i](block)
+            yolo_feats.append(tip)
+
+            if i < self.num_blocks - 1:
+                route = self.routes[i](route)
+                route = F.interpolate(
+                    route, scale_factor=2., data_format=self.data_format)
+
+        return yolo_feats
+
+    @classmethod
+    def from_config(cls, cfg, input_shape):
+        return {'in_channels': [i.channels for i in input_shape], }
+
+    @property
+    def out_shape(self):
+        return [ShapeSpec(channels=c) for c in self._out_channels]
+
+
+@register
+@serializable
+class PPYOLOPAN(nn.Layer):
+    __shared__ = ['norm_type', 'data_format']
+
+    def __init__(self,
+                 in_channels=[512, 1024, 2048],
+                 norm_type='bn',
+                 data_format='NCHW',
+                 act='mish',
+                 conv_block_num=3,
+                 drop_block=False,
+                 block_size=3,
+                 keep_prob=0.9,
+                 spp=False):
+        """
+        PPYOLOPAN layer with SPP, DropBlock and CSP connection.
+
+        Args:
+            in_channels (list): input channels for fpn
+            norm_type (str): batch norm type, default bn
+            data_format (str): data format, NCHW or NHWC
+            act (str): activation function, default mish
+            conv_block_num (int): conv block num of each pan block
+            drop_block (bool): whether use DropBlock or not
+            block_size (int): block size of DropBlock
+            keep_prob (float): keep probability of DropBlock
+            spp (bool): whether use spp or not
+
+        """
+        super(PPYOLOPAN, self).__init__()
+        assert len(in_channels) > 0, "in_channels length should > 0"
+        self.in_channels = in_channels
+        self.num_blocks = len(in_channels)
+        # parse kwargs
+        self.drop_block = drop_block
+        self.block_size = block_size
+        self.keep_prob = keep_prob
+        self.spp = spp
+        self.conv_block_num = conv_block_num
+        self.data_format = data_format
+        if self.drop_block:
+            dropblock_cfg = [[
+                'dropblock', DropBlock, [self.block_size, self.keep_prob],
+                dict()
+            ]]
+        else:
+            dropblock_cfg = []
+
+        # fpn
+        self.fpn_blocks = []
+        self.fpn_routes = []
+        fpn_channels = []
+        for i, ch_in in enumerate(self.in_channels[::-1]):
+            if i > 0:
+                ch_in += 512 // (2**(i - 1))
+            channel = 512 // (2**i)
+            base_cfg = []
+            for j in range(self.conv_block_num):
+                base_cfg += [
+                    # name, layer, args
+                    [
+                        '{}.0'.format(j), ConvBNLayer, [channel, channel, 1],
+                        dict(
+                            padding=0, act=act, norm_type=norm_type)
+                    ],
+                    [
+                        '{}.1'.format(j), ConvBNLayer, [channel, channel, 3],
+                        dict(
+                            padding=1, act=act, norm_type=norm_type)
+                    ]
+                ]
+
+            if i == 0 and self.spp:
+                base_cfg[3] = [
+                    'spp', SPP, [channel * 4, channel, 1], dict(
+                        pool_size=[5, 9, 13], act=act, norm_type=norm_type)
+                ]
+
+            cfg = base_cfg[:4] + dropblock_cfg + base_cfg[4:]
+            name = 'fpn.{}'.format(i)
+            fpn_block = self.add_sublayer(
+                name,
+                PPYOLODetBlockCSP(cfg, ch_in, channel, act, norm_type, name,
+                                  data_format))
+            self.fpn_blocks.append(fpn_block)
+            fpn_channels.append(channel * 2)
+            if i < self.num_blocks - 1:
+                name = 'fpn_transition.{}'.format(i)
+                route = self.add_sublayer(
+                    name,
+                    ConvBNLayer(
+                        ch_in=channel * 2,
+                        ch_out=channel,
+                        filter_size=1,
+                        stride=1,
+                        padding=0,
+                        act=act,
+                        norm_type=norm_type,
+                        data_format=data_format,
+                        name=name))
+                self.fpn_routes.append(route)
+        # pan
+        self.pan_blocks = []
+        self.pan_routes = []
+        self._out_channels = [512 // (2**(self.num_blocks - 2)), ]
+        for i in reversed(range(self.num_blocks - 1)):
+            name = 'pan_transition.{}'.format(i)
+            route = self.add_sublayer(
+                name,
+                ConvBNLayer(
+                    ch_in=fpn_channels[i + 1],
+                    ch_out=fpn_channels[i + 1],
+                    filter_size=3,
+                    stride=2,
+                    padding=1,
+                    act=act,
+                    norm_type=norm_type,
+                    data_format=data_format,
+                    name=name))
+            self.pan_routes = [route, ] + self.pan_routes
+            base_cfg = []
+            ch_in = fpn_channels[i] + fpn_channels[i + 1]
+            channel = 512 // (2**i)
+            for j in range(self.conv_block_num):
+                base_cfg += [
+                    # name, layer, args
+                    [
+                        '{}.0'.format(j), ConvBNLayer, [channel, channel, 1],
+                        dict(
+                            padding=0, act=act, norm_type=norm_type)
+                    ],
+                    [
+                        '{}.1'.format(j), ConvBNLayer, [channel, channel, 3],
+                        dict(
+                            padding=1, act=act, norm_type=norm_type)
+                    ]
+                ]
+
+            cfg = base_cfg[:4] + dropblock_cfg + base_cfg[4:]
+            name = 'pan.{}'.format(i)
+            pan_block = self.add_sublayer(
+                name,
+                PPYOLODetBlockCSP(cfg, ch_in, channel, act, norm_type, name,
+                                  data_format))
+
+            self.pan_blocks = [pan_block, ] + self.pan_blocks
+            self._out_channels.append(channel * 2)
+
+        self._out_channels = self._out_channels[::-1]
+
+    def forward(self, blocks):
+        assert len(blocks) == self.num_blocks
+        blocks = blocks[::-1]
+        # fpn
+        fpn_feats = []
+        for i, block in enumerate(blocks):
+            if i > 0:
+                if self.data_format == 'NCHW':
+                    block = paddle.concat([route, block], axis=1)
+                else:
+                    block = paddle.concat([route, block], axis=-1)
+            route, tip = self.fpn_blocks[i](block)
+            fpn_feats.append(tip)
+
+            if i < self.num_blocks - 1:
+                route = self.fpn_routes[i](route)
+                route = F.interpolate(
+                    route, scale_factor=2., data_format=self.data_format)
+
+        pan_feats = [fpn_feats[-1], ]
+        route = fpn_feats[self.num_blocks - 1]
+        for i in reversed(range(self.num_blocks - 1)):
+            block = fpn_feats[i]
+            route = self.pan_routes[i](route)
+            if self.data_format == 'NCHW':
+                block = paddle.concat([route, block], axis=1)
+            else:
+                block = paddle.concat([route, block], axis=-1)
+
+            route, tip = self.pan_blocks[i](block)
+            pan_feats.append(tip)
+
+        return pan_feats[::-1]
+
+    @classmethod
+    def from_config(cls, cfg, input_shape):
+        return {'in_channels': [i.channels for i in input_shape], }
+
+    @property
+    def out_shape(self):
+        return [ShapeSpec(channels=c) for c in self._out_channels]
diff --git a/ppdet/modeling/ops.py b/ppdet/modeling/ops.py
index ef961dd1c220f8cee5a46745259add9a3d0cbbe1..f190a489580e114d06b39bc10bd9868833ed5bec 100644
--- a/ppdet/modeling/ops.py
+++ b/ppdet/modeling/ops.py
@@ -41,16 +41,19 @@ __all__ = [
     'collect_fpn_proposals',
     'matrix_nms',
     'batch_norm',
+    'mish',
 ]
 
 
+def mish(x):
+    return x * paddle.tanh(F.softplus(x))
+
+
 def batch_norm(ch,
                norm_type='bn',
                norm_decay=0.,
                initializer=None,
-               name=None,
                data_format='NCHW'):
-    bn_name = name + '.bn'
     if norm_type == 'sync_bn':
         batch_norm = nn.SyncBatchNorm
     else:
@@ -59,11 +62,8 @@ def batch_norm(ch,
     return batch_norm(
         ch,
         weight_attr=ParamAttr(
-            name=bn_name + '.scale',
-            initializer=initializer,
-            regularizer=L2Decay(norm_decay)),
-        bias_attr=ParamAttr(
-            name=bn_name + '.offset', regularizer=L2Decay(norm_decay)),
+            initializer=initializer, regularizer=L2Decay(norm_decay)),
+        bias_attr=ParamAttr(regularizer=L2Decay(norm_decay)),
         data_format=data_format)
 
 
@@ -1558,7 +1558,6 @@ def sigmoid_cross_entropy_with_logits(input,
     output = F.binary_cross_entropy_with_logits(input, label, reduction='none')
     mask_tensor = paddle.cast(label != ignore_index, 'float32')
     output = paddle.multiply(output, mask_tensor)
-    output = paddle.reshape(output, shape=[output.shape[0], -1])
     if normalize:
         sum_valid_mask = paddle.sum(mask_tensor)
         output = output / sum_valid_mask
diff --git a/ppdet/modeling/post_process.py b/ppdet/modeling/post_process.py
index 2b2fc4483fad99f3541c96e50c4773467526adc5..ca69ac3d01227796e66c44a509968b5bf365f71b 100644
--- a/ppdet/modeling/post_process.py
+++ b/ppdet/modeling/post_process.py
@@ -17,13 +17,15 @@ import paddle
 import paddle.nn as nn
 import paddle.nn.functional as F
 from ppdet.core.workspace import register
-from ppdet.modeling.bbox_utils import nonempty_bbox
+from ppdet.modeling.bbox_utils import nonempty_bbox, rbox2poly
 from . import ops
 try:
     from collections.abc import Sequence
 except Exception:
     from collections import Sequence
 
+__all__ = ['BBoxPostProcess', 'MaskPostProcess', 'FCOSPostProcess']
+
 
 @register
 class BBoxPostProcess(object):
@@ -40,13 +42,17 @@ class BBoxPostProcess(object):
         """
         Decode the bbox and do NMS if needed. 
 
+        Args:
+            head_out (tuple): bbox_pred and cls_prob of bbox_head output.
+            rois (tuple): roi and rois_num of rpn_head output.
+            im_shape (Tensor): The shape of the input image.
+            scale_factor (Tensor): The scale factor of the input image.
         Returns:
-            bbox_pred(Tensor): The output is the prediction with shape [N, 6]
-                               including labels, scores and bboxes. The size of 
-                               bboxes are corresponding to the input image and 
-                               the bboxes may be used in other brunch.
-            bbox_num(Tensor): The number of prediction of each batch with shape
-                              [N, 6].
+            bbox_pred (Tensor): The output prediction with shape [N, 6], including
+                labels, scores and bboxes. The size of bboxes are corresponding
+                to the input image, the bboxes may be used in other branch.
+            bbox_num (Tensor): The number of prediction boxes of each batch with
+                shape [1], and is N.
         """
         if self.nms is not None:
             bboxes, score = self.decode(head_out, rois, im_shape, scale_factor)
@@ -54,6 +60,9 @@ class BBoxPostProcess(object):
         else:
             bbox_pred, bbox_num = self.decode(head_out, rois, im_shape,
                                               scale_factor)
+
+        # Prevent empty bbox_pred from decode or NMS.
+        # Bboxes and score before NMS may be empty due to the score threshold.
         if bbox_pred.shape[0] == 0:
             bbox_pred = paddle.to_tensor(
                 np.array(
@@ -64,16 +73,22 @@ class BBoxPostProcess(object):
     def get_pred(self, bboxes, bbox_num, im_shape, scale_factor):
         """
         Rescale, clip and filter the bbox from the output of NMS to 
-        get final prediction.
+        get final prediction. 
+        
+        Notes:
+        Currently only support bs = 1.
 
         Args:
-            bboxes(Tensor): The output of __call__ with shape [N, 6]
+            bbox_pred (Tensor): The output bboxes with shape [N, 6] after decode
+                and NMS, including labels, scores and bboxes.
+            bbox_num (Tensor): The number of prediction boxes of each batch with
+                shape [1], and is N.
+            im_shape (Tensor): The shape of the input image.
+            scale_factor (Tensor): The scale factor of the input image.
         Returns:
-            bbox_pred(Tensor): The output is the prediction with shape [N, 6]
-                               including labels, scores and bboxes. The size of
-                               bboxes are corresponding to the original image.
+            pred_result (Tensor): The final prediction results with shape [N, 6]
+                including labels, scores and bboxes.
         """
-
         origin_shape = paddle.floor(im_shape / scale_factor + 0.5)
 
         origin_shape_list = []
@@ -125,7 +140,9 @@ class MaskPostProcess(object):
         self.binary_thresh = binary_thresh
 
     def paste_mask(self, masks, boxes, im_h, im_w):
-        # paste each mask on image
+        """
+        Paste the mask prediction to the original image.
+        """
         x0, y0, x1, y1 = paddle.split(boxes, 4, axis=1)
         masks = paddle.unsqueeze(masks, [0, 1])
         img_y = paddle.arange(0, im_h, dtype='float32') + 0.5
@@ -148,7 +165,19 @@ class MaskPostProcess(object):
 
     def __call__(self, mask_out, bboxes, bbox_num, origin_shape):
         """
-        Paste the mask prediction to the original image.
+        Decode the mask_out and paste the mask to the origin image.
+
+        Args:
+            mask_out (Tensor): mask_head output with shape [N, 28, 28].
+            bbox_pred (Tensor): The output bboxes with shape [N, 6] after decode
+                and NMS, including labels, scores and bboxes.
+            bbox_num (Tensor): The number of prediction boxes of each batch with
+                shape [1], and is N.
+            origin_shape (Tensor): The origin shape of the input image, the tensor
+                shape is [N, 2], and each row is [h, w].
+        Returns:
+            pred_result (Tensor): The final prediction mask results with shape
+                [N, h, w] in binary mask style.
         """
         num_mask = mask_out.shape[0]
         origin_shape = paddle.cast(origin_shape, 'int32')
@@ -186,3 +215,89 @@ class FCOSPostProcess(object):
                                     centerness, scale_factor)
         bbox_pred, bbox_num, _ = self.nms(bboxes, score)
         return bbox_pred, bbox_num
+
+
+@register
+class S2ANetBBoxPostProcess(object):
+    __inject__ = ['nms']
+
+    def __init__(self, nms_pre=2000, min_bbox_size=0, nms=None):
+        super(S2ANetBBoxPostProcess, self).__init__()
+        self.nms_pre = nms_pre
+        self.min_bbox_size = min_bbox_size
+        self.nms = nms
+        self.origin_shape_list = []
+
+    def get_prediction(self, pred_scores, pred_bboxes, im_shape, scale_factor):
+        """
+        pred_scores : [N, M]  score
+        pred_bboxes : [N, 5]  xc, yc, w, h, a
+        im_shape : [N, 2]  im_shape
+        scale_factor : [N, 2]  scale_factor
+        """
+        # TODO: support bs>1
+        pred_ploys = rbox2poly(pred_bboxes.numpy())
+        pred_ploys = paddle.to_tensor(pred_ploys)
+        pred_ploys = paddle.reshape(
+            pred_ploys, [1, pred_ploys.shape[0], pred_ploys.shape[1]])
+
+        pred_scores = paddle.to_tensor(pred_scores)
+        # pred_scores [NA, 16] --> [16, NA]
+        pred_scores = paddle.transpose(pred_scores, [1, 0])
+        pred_scores = paddle.reshape(
+            pred_scores, [1, pred_scores.shape[0], pred_scores.shape[1]])
+        pred_cls_score_bbox, bbox_num, index = self.nms(pred_ploys, pred_scores)
+
+        # post process scale
+        # result [n, 10]
+        if bbox_num > 0:
+            pred_bbox, bbox_num = self.post_process(pred_cls_score_bbox[:, 2:],
+                                                    bbox_num, im_shape[0],
+                                                    scale_factor[0])
+
+            pred_cls_score_bbox = paddle.concat(
+                [pred_cls_score_bbox[:, 0:2], pred_bbox], axis=1)
+        else:
+            pred_cls_score_bbox = paddle.to_tensor(
+                np.array(
+                    [[-1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]],
+                    dtype='float32'))
+            bbox_num = paddle.to_tensor(np.array([1], dtype='int32'))
+        return pred_cls_score_bbox, bbox_num, index
+
+    def post_process(self, bboxes, bbox_num, im_shape, scale_factor):
+        """
+        Rescale, clip and filter the bbox from the output of NMS to
+        get final prediction.
+
+        Args:
+            bboxes(Tensor): bboxes [N, 8]
+            bbox_num(Tensor): bbox_num
+            im_shape(Tensor): [1 2]
+            scale_factor(Tensor): [1 2]
+        Returns:
+            bbox_pred(Tensor): The output is the prediction with shape [N, 8]
+                               including labels, scores and bboxes. The size of
+                               bboxes are corresponding to the original image.
+        """
+
+        origin_shape = paddle.floor(im_shape / scale_factor + 0.5)
+
+        origin_h = origin_shape[0]
+        origin_w = origin_shape[1]
+
+        bboxes[:, 0::2] = bboxes[:, 0::2] / scale_factor[0]
+        bboxes[:, 1::2] = bboxes[:, 1::2] / scale_factor[1]
+
+        zeros = paddle.zeros_like(origin_h)
+        x1 = paddle.maximum(paddle.minimum(bboxes[:, 0], origin_w - 1), zeros)
+        y1 = paddle.maximum(paddle.minimum(bboxes[:, 1], origin_h - 1), zeros)
+        x2 = paddle.maximum(paddle.minimum(bboxes[:, 2], origin_w - 1), zeros)
+        y2 = paddle.maximum(paddle.minimum(bboxes[:, 3], origin_h - 1), zeros)
+        x3 = paddle.maximum(paddle.minimum(bboxes[:, 4], origin_w - 1), zeros)
+        y3 = paddle.maximum(paddle.minimum(bboxes[:, 5], origin_h - 1), zeros)
+        x4 = paddle.maximum(paddle.minimum(bboxes[:, 6], origin_w - 1), zeros)
+        y4 = paddle.maximum(paddle.minimum(bboxes[:, 7], origin_h - 1), zeros)
+        bbox = paddle.stack([x1, y1, x2, y2, x3, y3, x4, y4], axis=-1)
+        bboxes = (bbox, bbox_num)
+        return bboxes
diff --git a/ppdet/modeling/proposal_generator/anchor_generator.py b/ppdet/modeling/proposal_generator/anchor_generator.py
index 1ca0319d3ad13d3650022d8d958c0f92954914c9..8088ffa04affa3c2ecd81348d9eb7cb749f7b4f8 100644
--- a/ppdet/modeling/proposal_generator/anchor_generator.py
+++ b/ppdet/modeling/proposal_generator/anchor_generator.py
@@ -25,6 +25,24 @@ from .. import ops
 
 @register
 class AnchorGenerator(nn.Layer):
+    """
+    Generate anchors according to the feature maps
+
+    Args:
+        anchor_sizes (list[float] | list[list[float]]): The anchor sizes at 
+            each feature point. list[float] means all feature levels share the 
+            same sizes. list[list[float]] means the anchor sizes for 
+            each level. The sizes stand for the scale of input size.
+        aspect_ratios (list[float] | list[list[float]]): The aspect ratios at
+            each feature point. list[float] means all feature levels share the
+            same ratios. list[list[float]] means the aspect ratios for
+            each level.
+        strides (list[float]): The strides of feature maps which generate 
+            anchors
+        offset (float): The offset of the coordinate of anchors, default 0.
+        
+    """
+
     def __init__(self,
                  anchor_sizes=[32, 64, 128, 256, 512],
                  aspect_ratios=[0.5, 1.0, 2.0],
diff --git a/ppdet/modeling/proposal_generator/proposal_generator.py b/ppdet/modeling/proposal_generator/proposal_generator.py
index 8a5df53255d080ec83d083bd0db72b41ca8700b4..12518e48817233e705638163d59e9fe9a9986938 100644
--- a/ppdet/modeling/proposal_generator/proposal_generator.py
+++ b/ppdet/modeling/proposal_generator/proposal_generator.py
@@ -25,6 +25,28 @@ from .. import ops
 @register
 @serializable
 class ProposalGenerator(object):
+    """
+    Proposal generation module
+
+    For more details, please refer to the document of generate_proposals 
+    in ppdet/modeing/ops.py
+
+    Args:
+        pre_nms_top_n (int): Number of total bboxes to be kept per
+            image before NMS. default 6000
+        post_nms_top_n (int): Number of total bboxes to be kept per
+            image after NMS. default 1000
+        nms_thresh (float): Threshold in NMS. default 0.5
+        min_size (flaot): Remove predicted boxes with either height or
+             width < min_size. default 0.1
+        eta (float): Apply in adaptive NMS, if adaptive `threshold > 0.5`,
+             `adaptive_threshold = adaptive_threshold * eta` in each iteration.
+             default 1.
+        topk_after_collect (bool): whether to adopt topk after batch 
+             collection. If topk_after_collect is true, box filter will not be 
+             used after NMS at each image in proposal generation. default false
+    """
+
     def __init__(self,
                  pre_nms_top_n=12000,
                  post_nms_top_n=2000,
diff --git a/ppdet/modeling/proposal_generator/rpn_head.py b/ppdet/modeling/proposal_generator/rpn_head.py
index 6a1c980a452f0390b6e7355210cfc190ceab184a..2b1e6c77b7cb30511794d1e2c283cf81759c2857 100644
--- a/ppdet/modeling/proposal_generator/rpn_head.py
+++ b/ppdet/modeling/proposal_generator/rpn_head.py
@@ -27,12 +27,20 @@ from .proposal_generator import ProposalGenerator
 
 
 class RPNFeat(nn.Layer):
-    def __init__(self, feat_in=1024, feat_out=1024):
+    """
+    Feature extraction in RPN head
+
+    Args:
+        in_channel (int): Input channel
+        out_channel (int): Output channel
+    """
+
+    def __init__(self, in_channel=1024, out_channel=1024):
         super(RPNFeat, self).__init__()
         # rpn feat is shared with each level
         self.rpn_conv = nn.Conv2D(
-            in_channels=feat_in,
-            out_channels=feat_out,
+            in_channels=in_channel,
+            out_channels=out_channel,
             kernel_size=3,
             padding=1,
             weight_attr=paddle.ParamAttr(initializer=Normal(
@@ -47,6 +55,20 @@ class RPNFeat(nn.Layer):
 
 @register
 class RPNHead(nn.Layer):
+    """
+    Region Proposal Network
+
+    Args:
+        anchor_generator (dict): configure of anchor generation
+        rpn_target_assign (dict): configure of rpn targets assignment
+        train_proposal (dict): configure of proposals generation 
+            at the stage of training
+        test_proposal (dict): configure of proposals generation
+            at the stage of prediction
+        in_channel (int): channel of input feature maps which can be 
+            derived by from_config
+    """
+
     def __init__(self,
                  anchor_generator=AnchorGenerator().__dict__,
                  rpn_target_assign=RPNTargetAssign().__dict__,
diff --git a/ppdet/modeling/proposal_generator/target.py b/ppdet/modeling/proposal_generator/target.py
index b66f0d9cd5f837dc109c6aba0e75a5b9262781b3..8e45ef3c0e8265f9287fe242db72a823ea4e3970 100644
--- a/ppdet/modeling/proposal_generator/target.py
+++ b/ppdet/modeling/proposal_generator/target.py
@@ -135,11 +135,15 @@ def generate_proposal_target(rpn_rois,
     tgt_gt_inds = []
     new_rois_num = []
 
+    # In cascade rcnn, the threshold for foreground and background
+    # is used from cascade_iou
     fg_thresh = cascade_iou if is_cascade else fg_thresh
     bg_thresh = cascade_iou if is_cascade else bg_thresh
     for i, rpn_roi in enumerate(rpn_rois):
         gt_bbox = gt_boxes[i]
         gt_class = gt_classes[i]
+
+        # Concat RoIs and gt boxes except cascade rcnn
         if not is_cascade:
             bbox = paddle.concat([rpn_roi, gt_bbox])
         else:
@@ -269,9 +273,12 @@ def generate_mask_target(gt_segms, rois, labels_int32, sampled_gt_inds,
         rois_per_im = rois[k]
         gt_segms_per_im = gt_segms[k]
         labels_per_im = labels_int32[k]
+        # select rois labeled with foreground
         fg_inds = paddle.nonzero(
             paddle.logical_and(labels_per_im != -1, labels_per_im !=
                                num_classes))
+
+        # generate fake roi if foreground is empty
         if fg_inds.numel() == 0:
             has_fg = False
             fg_inds = paddle.ones([1], dtype='int32')
@@ -313,3 +320,287 @@ def generate_mask_target(gt_segms, rois, labels_int32, sampled_gt_inds,
     tgt_weights = paddle.concat(tgt_weights, axis=0)
 
     return mask_rois, mask_rois_num, tgt_classes, tgt_masks, mask_index, tgt_weights
+
+
+def libra_sample_pos(max_overlaps, max_classes, pos_inds, num_expected):
+    if len(pos_inds) <= num_expected:
+        return pos_inds
+    else:
+        unique_gt_inds = np.unique(max_classes[pos_inds])
+        num_gts = len(unique_gt_inds)
+        num_per_gt = int(round(num_expected / float(num_gts)) + 1)
+
+        sampled_inds = []
+        for i in unique_gt_inds:
+            inds = np.nonzero(max_classes == i)[0]
+            before_len = len(inds)
+            inds = list(set(inds) & set(pos_inds))
+            after_len = len(inds)
+            if len(inds) > num_per_gt:
+                inds = np.random.choice(inds, size=num_per_gt, replace=False)
+            sampled_inds.extend(list(inds))  # combine as a new sampler
+        if len(sampled_inds) < num_expected:
+            num_extra = num_expected - len(sampled_inds)
+            extra_inds = np.array(list(set(pos_inds) - set(sampled_inds)))
+            assert len(sampled_inds) + len(extra_inds) == len(pos_inds), \
+                "sum of sampled_inds({}) and extra_inds({}) length must be equal with pos_inds({})!".format(
+                    len(sampled_inds), len(extra_inds), len(pos_inds))
+            if len(extra_inds) > num_extra:
+                extra_inds = np.random.choice(
+                    extra_inds, size=num_extra, replace=False)
+            sampled_inds.extend(extra_inds.tolist())
+        elif len(sampled_inds) > num_expected:
+            sampled_inds = np.random.choice(
+                sampled_inds, size=num_expected, replace=False)
+        return paddle.to_tensor(sampled_inds)
+
+
+def libra_sample_via_interval(max_overlaps, full_set, num_expected, floor_thr,
+                              num_bins, bg_thresh):
+    max_iou = max_overlaps.max()
+    iou_interval = (max_iou - floor_thr) / num_bins
+    per_num_expected = int(num_expected / num_bins)
+
+    sampled_inds = []
+    for i in range(num_bins):
+        start_iou = floor_thr + i * iou_interval
+        end_iou = floor_thr + (i + 1) * iou_interval
+
+        tmp_set = set(
+            np.where(
+                np.logical_and(max_overlaps >= start_iou, max_overlaps <
+                               end_iou))[0])
+        tmp_inds = list(tmp_set & full_set)
+
+        if len(tmp_inds) > per_num_expected:
+            tmp_sampled_set = np.random.choice(
+                tmp_inds, size=per_num_expected, replace=False)
+        else:
+            tmp_sampled_set = np.array(tmp_inds, dtype=np.int)
+        sampled_inds.append(tmp_sampled_set)
+
+    sampled_inds = np.concatenate(sampled_inds)
+    if len(sampled_inds) < num_expected:
+        num_extra = num_expected - len(sampled_inds)
+        extra_inds = np.array(list(full_set - set(sampled_inds)))
+        assert len(sampled_inds) + len(extra_inds) == len(full_set), \
+            "sum of sampled_inds({}) and extra_inds({}) length must be equal with full_set({})!".format(
+                len(sampled_inds), len(extra_inds), len(full_set))
+
+        if len(extra_inds) > num_extra:
+            extra_inds = np.random.choice(extra_inds, num_extra, replace=False)
+        sampled_inds = np.concatenate([sampled_inds, extra_inds])
+
+    return sampled_inds
+
+
+def libra_sample_neg(max_overlaps,
+                     max_classes,
+                     neg_inds,
+                     num_expected,
+                     floor_thr=-1,
+                     floor_fraction=0,
+                     num_bins=3,
+                     bg_thresh=0.5):
+    if len(neg_inds) <= num_expected:
+        return neg_inds
+    else:
+        # balance sampling for negative samples
+        neg_set = set(neg_inds.tolist())
+        if floor_thr > 0:
+            floor_set = set(
+                np.where(
+                    np.logical_and(max_overlaps >= 0, max_overlaps < floor_thr))
+                [0])
+            iou_sampling_set = set(np.where(max_overlaps >= floor_thr)[0])
+        elif floor_thr == 0:
+            floor_set = set(np.where(max_overlaps == 0)[0])
+            iou_sampling_set = set(np.where(max_overlaps > floor_thr)[0])
+        else:
+            floor_set = set()
+            iou_sampling_set = set(np.where(max_overlaps > floor_thr)[0])
+            floor_thr = 0
+
+        floor_neg_inds = list(floor_set & neg_set)
+        iou_sampling_neg_inds = list(iou_sampling_set & neg_set)
+
+        num_expected_iou_sampling = int(num_expected * (1 - floor_fraction))
+        if len(iou_sampling_neg_inds) > num_expected_iou_sampling:
+            if num_bins >= 2:
+                iou_sampled_inds = libra_sample_via_interval(
+                    max_overlaps,
+                    set(iou_sampling_neg_inds), num_expected_iou_sampling,
+                    floor_thr, num_bins, bg_thresh)
+            else:
+                iou_sampled_inds = np.random.choice(
+                    iou_sampling_neg_inds,
+                    size=num_expected_iou_sampling,
+                    replace=False)
+        else:
+            iou_sampled_inds = np.array(iou_sampling_neg_inds, dtype=np.int)
+        num_expected_floor = num_expected - len(iou_sampled_inds)
+        if len(floor_neg_inds) > num_expected_floor:
+            sampled_floor_inds = np.random.choice(
+                floor_neg_inds, size=num_expected_floor, replace=False)
+        else:
+            sampled_floor_inds = np.array(floor_neg_inds, dtype=np.int)
+        sampled_inds = np.concatenate((sampled_floor_inds, iou_sampled_inds))
+        if len(sampled_inds) < num_expected:
+            num_extra = num_expected - len(sampled_inds)
+            extra_inds = np.array(list(neg_set - set(sampled_inds)))
+            if len(extra_inds) > num_extra:
+                extra_inds = np.random.choice(
+                    extra_inds, size=num_extra, replace=False)
+            sampled_inds = np.concatenate((sampled_inds, extra_inds))
+        return paddle.to_tensor(sampled_inds)
+
+
+def libra_label_box(anchors, gt_boxes, gt_classes, positive_overlap,
+                    negative_overlap, num_classes):
+    # TODO: use paddle API to speed up
+    gt_classes = gt_classes.numpy()
+    gt_overlaps = np.zeros((anchors.shape[0], num_classes))
+    matches = np.zeros((anchors.shape[0]), dtype=np.int32)
+    if len(gt_boxes) > 0:
+        proposal_to_gt_overlaps = bbox_overlaps(anchors, gt_boxes).numpy()
+        overlaps_argmax = proposal_to_gt_overlaps.argmax(axis=1)
+        overlaps_max = proposal_to_gt_overlaps.max(axis=1)
+        # Boxes which with non-zero overlap with gt boxes
+        overlapped_boxes_ind = np.where(overlaps_max > 0)[0]
+        overlapped_boxes_gt_classes = gt_classes[overlaps_argmax[
+            overlapped_boxes_ind]]
+
+        for idx in range(len(overlapped_boxes_ind)):
+            gt_overlaps[overlapped_boxes_ind[idx], overlapped_boxes_gt_classes[
+                idx]] = overlaps_max[overlapped_boxes_ind[idx]]
+            matches[overlapped_boxes_ind[idx]] = overlaps_argmax[
+                overlapped_boxes_ind[idx]]
+
+    gt_overlaps = paddle.to_tensor(gt_overlaps)
+    matches = paddle.to_tensor(matches)
+
+    matched_vals = paddle.max(gt_overlaps, axis=1)
+    match_labels = paddle.full(matches.shape, -1, dtype='int32')
+    match_labels = paddle.where(matched_vals < negative_overlap,
+                                paddle.zeros_like(match_labels), match_labels)
+    match_labels = paddle.where(matched_vals >= positive_overlap,
+                                paddle.ones_like(match_labels), match_labels)
+
+    return matches, match_labels, matched_vals
+
+
+def libra_sample_bbox(matches,
+                      match_labels,
+                      matched_vals,
+                      gt_classes,
+                      batch_size_per_im,
+                      num_classes,
+                      fg_fraction,
+                      fg_thresh,
+                      bg_thresh,
+                      num_bins,
+                      use_random=True,
+                      is_cascade_rcnn=False):
+    rois_per_image = int(batch_size_per_im)
+    fg_rois_per_im = int(np.round(fg_fraction * rois_per_image))
+    bg_rois_per_im = rois_per_image - fg_rois_per_im
+
+    if is_cascade_rcnn:
+        fg_inds = paddle.nonzero(matched_vals >= fg_thresh)
+        bg_inds = paddle.nonzero(matched_vals < bg_thresh)
+    else:
+        matched_vals_np = matched_vals.numpy()
+        match_labels_np = match_labels.numpy()
+
+        # sample fg
+        fg_inds = paddle.nonzero(matched_vals >= fg_thresh).flatten()
+        fg_nums = int(np.minimum(fg_rois_per_im, fg_inds.shape[0]))
+        if (fg_inds.shape[0] > fg_nums) and use_random:
+            fg_inds = libra_sample_pos(matched_vals_np, match_labels_np,
+                                       fg_inds.numpy(), fg_rois_per_im)
+        fg_inds = fg_inds[:fg_nums]
+
+        # sample bg
+        bg_inds = paddle.nonzero(matched_vals < bg_thresh).flatten()
+        bg_nums = int(np.minimum(rois_per_image - fg_nums, bg_inds.shape[0]))
+        if (bg_inds.shape[0] > bg_nums) and use_random:
+            bg_inds = libra_sample_neg(
+                matched_vals_np,
+                match_labels_np,
+                bg_inds.numpy(),
+                bg_rois_per_im,
+                num_bins=num_bins,
+                bg_thresh=bg_thresh)
+        bg_inds = bg_inds[:bg_nums]
+
+        sampled_inds = paddle.concat([fg_inds, bg_inds])
+
+        gt_classes = paddle.gather(gt_classes, matches)
+        gt_classes = paddle.where(match_labels == 0,
+                                  paddle.ones_like(gt_classes) * num_classes,
+                                  gt_classes)
+        gt_classes = paddle.where(match_labels == -1,
+                                  paddle.ones_like(gt_classes) * -1, gt_classes)
+        sampled_gt_classes = paddle.gather(gt_classes, sampled_inds)
+
+        return sampled_inds, sampled_gt_classes
+
+
+def libra_generate_proposal_target(rpn_rois,
+                                   gt_classes,
+                                   gt_boxes,
+                                   batch_size_per_im,
+                                   fg_fraction,
+                                   fg_thresh,
+                                   bg_thresh,
+                                   num_classes,
+                                   use_random=True,
+                                   is_cascade_rcnn=False,
+                                   max_overlaps=None,
+                                   num_bins=3):
+
+    rois_with_gt = []
+    tgt_labels = []
+    tgt_bboxes = []
+    sampled_max_overlaps = []
+    tgt_gt_inds = []
+    new_rois_num = []
+
+    for i, rpn_roi in enumerate(rpn_rois):
+        max_overlap = max_overlaps[i] if is_cascade_rcnn else None
+        gt_bbox = gt_boxes[i]
+        gt_class = gt_classes[i]
+        if is_cascade_rcnn:
+            rpn_roi = filter_roi(rpn_roi, max_overlap)
+        bbox = paddle.concat([rpn_roi, gt_bbox])
+
+        # Step1: label bbox
+        matches, match_labels, matched_vals = libra_label_box(
+            bbox, gt_bbox, gt_class, fg_thresh, bg_thresh, num_classes)
+
+        # Step2: sample bbox
+        sampled_inds, sampled_gt_classes = libra_sample_bbox(
+            matches, match_labels, matched_vals, gt_class, batch_size_per_im,
+            num_classes, fg_fraction, fg_thresh, bg_thresh, num_bins,
+            use_random, is_cascade_rcnn)
+
+        # Step3: make output
+        rois_per_image = paddle.gather(bbox, sampled_inds)
+        sampled_gt_ind = paddle.gather(matches, sampled_inds)
+        sampled_bbox = paddle.gather(gt_bbox, sampled_gt_ind)
+        sampled_overlap = paddle.gather(matched_vals, sampled_inds)
+
+        rois_per_image.stop_gradient = True
+        sampled_gt_ind.stop_gradient = True
+        sampled_bbox.stop_gradient = True
+        sampled_overlap.stop_gradient = True
+
+        tgt_labels.append(sampled_gt_classes)
+        tgt_bboxes.append(sampled_bbox)
+        rois_with_gt.append(rois_per_image)
+        sampled_max_overlaps.append(sampled_overlap)
+        tgt_gt_inds.append(sampled_gt_ind)
+        new_rois_num.append(paddle.shape(sampled_inds)[0])
+    new_rois_num = paddle.concat(new_rois_num)
+    # rois_with_gt, tgt_labels, tgt_bboxes, tgt_gt_inds, new_rois_num
+    return rois_with_gt, tgt_labels, tgt_bboxes, tgt_gt_inds, new_rois_num
diff --git a/ppdet/modeling/proposal_generator/target_layer.py b/ppdet/modeling/proposal_generator/target_layer.py
index 4586cadf3f0684220bbff524f8a87370f01ba1ae..cdf405e3e8c0b7e1136dd56e82ce1ed2e4e138d8 100644
--- a/ppdet/modeling/proposal_generator/target_layer.py
+++ b/ppdet/modeling/proposal_generator/target_layer.py
@@ -1,4 +1,4 @@
-# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. 
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
 #   
 # Licensed under the Apache License, Version 2.0 (the "License");   
 # you may not use this file except in compliance with the License.  
@@ -11,17 +11,43 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  
 # See the License for the specific language governing permissions and   
 # limitations under the License.
-
+import sys
 import paddle
-
 from ppdet.core.workspace import register, serializable
-
-from .target import rpn_anchor_target, generate_proposal_target, generate_mask_target
+from .target import rpn_anchor_target, generate_proposal_target, generate_mask_target, libra_generate_proposal_target
+from ppdet.modeling import bbox_utils
+import numpy as np
 
 
 @register
 @serializable
 class RPNTargetAssign(object):
+    """
+    RPN targets assignment module
+
+    The assignment consists of three steps:
+        1. Match anchor and ground-truth box, label the anchor with foreground
+           or background sample
+        2. Sample anchors to keep the properly ratio between foreground and 
+           background
+        3. Generate the targets for classification and regression branch
+
+
+    Args:
+        batch_size_per_im (int): Total number of RPN samples per image. 
+            default 256
+        fg_fraction (float): Fraction of anchors that is labeled
+            foreground, default 0.5
+        positive_overlap (float): Minimum overlap required between an anchor
+            and ground-truth box for the (anchor, gt box) pair to be 
+            a foreground sample. default 0.7
+        negative_overlap (float): Maximum overlap allowed between an anchor
+            and ground-truth box for the (anchor, gt box) pair to be 
+            a background sample. default 0.3
+        use_random (bool): Use random sampling to choose foreground and 
+            background boxes, default true.
+    """
+
     def __init__(self,
                  batch_size_per_im=256,
                  fg_fraction=0.5,
@@ -54,6 +80,33 @@ class RPNTargetAssign(object):
 @register
 class BBoxAssigner(object):
     __shared__ = ['num_classes']
+    """
+    RCNN targets assignment module
+
+    The assignment consists of three steps:
+        1. Match RoIs and ground-truth box, label the RoIs with foreground
+           or background sample
+        2. Sample anchors to keep the properly ratio between foreground and 
+           background
+        3. Generate the targets for classification and regression branch
+
+    Args:
+        batch_size_per_im (int): Total number of RoIs per image. 
+            default 512 
+        fg_fraction (float): Fraction of RoIs that is labeled
+            foreground, default 0.25
+        fg_thresh (float): Minimum overlap required between a RoI
+            and ground-truth box for the (roi, gt box) pair to be
+            a foreground sample. default 0.5
+        bg_thresh (float): Maximum overlap allowed between a RoI
+            and ground-truth box for the (roi, gt box) pair to be
+            a background sample. default 0.5
+        use_random (bool): Use random sampling to choose foreground and
+            background boxes, default true
+        cascade_iou (list[iou]): The list of overlap to select foreground and
+            background of each stage, which is only used In Cascade RCNN.
+        num_classes (int): The number of class.
+    """
 
     def __init__(self,
                  batch_size_per_im=512,
@@ -61,7 +114,6 @@ class BBoxAssigner(object):
                  fg_thresh=.5,
                  bg_thresh=.5,
                  use_random=True,
-                 is_cls_agnostic=False,
                  cascade_iou=[0.5, 0.6, 0.7],
                  num_classes=80):
         super(BBoxAssigner, self).__init__()
@@ -70,7 +122,6 @@ class BBoxAssigner(object):
         self.fg_thresh = fg_thresh
         self.bg_thresh = bg_thresh
         self.use_random = use_random
-        self.is_cls_agnostic = is_cls_agnostic
         self.cascade_iou = cascade_iou
         self.num_classes = num_classes
 
@@ -95,10 +146,93 @@ class BBoxAssigner(object):
         return rois, rois_num, targets
 
 
+@register
+class BBoxLibraAssigner(object):
+    __shared__ = ['num_classes']
+    """
+    Libra-RCNN targets assignment module
+
+    The assignment consists of three steps:
+        1. Match RoIs and ground-truth box, label the RoIs with foreground
+           or background sample
+        2. Sample anchors to keep the properly ratio between foreground and
+           background
+        3. Generate the targets for classification and regression branch
+
+    Args:
+        batch_size_per_im (int): Total number of RoIs per image.
+            default 512
+        fg_fraction (float): Fraction of RoIs that is labeled
+            foreground, default 0.25
+        fg_thresh (float): Minimum overlap required between a RoI
+            and ground-truth box for the (roi, gt box) pair to be
+            a foreground sample. default 0.5
+        bg_thresh (float): Maximum overlap allowed between a RoI
+            and ground-truth box for the (roi, gt box) pair to be
+            a background sample. default 0.5
+        use_random (bool): Use random sampling to choose foreground and
+            background boxes, default true
+        cascade_iou (list[iou]): The list of overlap to select foreground and
+            background of each stage, which is only used In Cascade RCNN.
+        num_classes (int): The number of class.
+        num_bins (int): The number of libra_sample.
+    """
+
+    def __init__(self,
+                 batch_size_per_im=512,
+                 fg_fraction=.25,
+                 fg_thresh=.5,
+                 bg_thresh=.5,
+                 use_random=True,
+                 cascade_iou=[0.5, 0.6, 0.7],
+                 num_classes=80,
+                 num_bins=3):
+        super(BBoxLibraAssigner, self).__init__()
+        self.batch_size_per_im = batch_size_per_im
+        self.fg_fraction = fg_fraction
+        self.fg_thresh = fg_thresh
+        self.bg_thresh = bg_thresh
+        self.use_random = use_random
+        self.cascade_iou = cascade_iou
+        self.num_classes = num_classes
+        self.num_bins = num_bins
+
+    def __call__(self,
+                 rpn_rois,
+                 rpn_rois_num,
+                 inputs,
+                 stage=0,
+                 is_cascade=False):
+        gt_classes = inputs['gt_class']
+        gt_boxes = inputs['gt_bbox']
+        # rois, tgt_labels, tgt_bboxes, tgt_gt_inds
+        outs = libra_generate_proposal_target(
+            rpn_rois, gt_classes, gt_boxes, self.batch_size_per_im,
+            self.fg_fraction, self.fg_thresh, self.bg_thresh, self.num_classes,
+            self.use_random, is_cascade, self.cascade_iou[stage], self.num_bins)
+        rois = outs[0]
+        rois_num = outs[-1]
+        # tgt_labels, tgt_bboxes, tgt_gt_inds
+        targets = outs[1:4]
+        return rois, rois_num, targets
+
+
 @register
 @serializable
 class MaskAssigner(object):
     __shared__ = ['num_classes', 'mask_resolution']
+    """
+    Mask targets assignment module
+
+    The assignment consists of three steps:
+        1. Select RoIs labels with foreground.
+        2. Encode the RoIs and corresponding gt polygons to generate 
+           mask target
+
+    Args:
+        num_classes (int): The number of class
+        mask_resolution (int): The resolution of mask target, default 14
+    """
 
     def __init__(self, num_classes=80, mask_resolution=14):
         super(MaskAssigner, self).__init__()
@@ -113,3 +247,172 @@ class MaskAssigner(object):
 
         # mask_rois, mask_rois_num, tgt_classes, tgt_masks, mask_index, tgt_weights
         return outs
+
+
+@register
+class RBoxAssigner(object):
+    """
+    assigner of rbox
+    Args:
+        pos_iou_thr (float): threshold of pos samples
+        neg_iou_thr (float): threshold of neg samples
+        min_iou_thr (float): the min threshold of samples
+        ignore_iof_thr (int): the ignored threshold
+    """
+
+    def __init__(self,
+                 pos_iou_thr=0.5,
+                 neg_iou_thr=0.4,
+                 min_iou_thr=0.0,
+                 ignore_iof_thr=-2):
+        super(RBoxAssigner, self).__init__()
+
+        self.pos_iou_thr = pos_iou_thr
+        self.neg_iou_thr = neg_iou_thr
+        self.min_iou_thr = min_iou_thr
+        self.ignore_iof_thr = ignore_iof_thr
+
+    def anchor_valid(self, anchors):
+        """
+
+        Args:
+            anchor: M x 4
+
+        Returns:
+
+        """
+        if anchors.ndim == 3:
+            anchors = anchors.reshape(-1, anchor.shape[-1])
+        assert anchors.ndim == 2
+        anchor_num = anchors.shape[0]
+        anchor_valid = np.ones((anchor_num), np.uint8)
+        anchor_inds = np.arange(anchor_num)
+        return anchor_inds
+
+    def assign_anchor(self,
+                      anchors,
+                      gt_bboxes,
+                      gt_lables,
+                      pos_iou_thr,
+                      neg_iou_thr,
+                      min_iou_thr=0.0,
+                      ignore_iof_thr=-2):
+        """
+
+        Args:
+            anchors:
+            gt_bboxes:[M, 5] rc,yc,w,h,angle
+            gt_lables:
+
+        Returns:
+
+        """
+        assert anchors.shape[1] == 4 or anchors.shape[1] == 5
+        assert gt_bboxes.shape[1] == 4 or gt_bboxes.shape[1] == 5
+        anchors_xc_yc = anchors
+        gt_bboxes_xc_yc = gt_bboxes
+
+        # calc rbox iou
+        anchors_xc_yc = anchors_xc_yc.astype(np.float32)
+        gt_bboxes_xc_yc = gt_bboxes_xc_yc.astype(np.float32)
+        anchors_xc_yc = paddle.to_tensor(anchors_xc_yc, place=paddle.CPUPlace())
+        gt_bboxes_xc_yc = paddle.to_tensor(
+            gt_bboxes_xc_yc, place=paddle.CPUPlace())
+
+        try:
+            from rbox_iou_ops import rbox_iou
+        except Exception as e:
+            print("import custom_ops error, try install rbox_iou_ops " \
+                  "following ppdet/ext_op/README.md", e)
+            sys.stdout.flush()
+            sys.exit(-1)
+
+        iou = rbox_iou(gt_bboxes_xc_yc, anchors_xc_yc)
+        iou = iou.numpy()
+        iou = iou.T
+
+        # every gt's anchor's index
+        gt_bbox_anchor_inds = iou.argmax(axis=0)
+        gt_bbox_anchor_iou = iou[gt_bbox_anchor_inds, np.arange(iou.shape[1])]
+        gt_bbox_anchor_iou_inds = np.where(iou == gt_bbox_anchor_iou)[0]
+
+        # every anchor's gt bbox's index
+        anchor_gt_bbox_inds = iou.argmax(axis=1)
+        anchor_gt_bbox_iou = iou[np.arange(iou.shape[0]), anchor_gt_bbox_inds]
+
+        # (1) set labels=-2 as default
+        labels = np.ones((iou.shape[0], ), dtype=np.int32) * ignore_iof_thr
+
+        # (2) assign ignore
+        labels[anchor_gt_bbox_iou < min_iou_thr] = ignore_iof_thr
+
+        # (3) assign neg_ids -1
+        assign_neg_ids1 = anchor_gt_bbox_iou >= min_iou_thr
+        assign_neg_ids2 = anchor_gt_bbox_iou < neg_iou_thr
+        assign_neg_ids = np.logical_and(assign_neg_ids1, assign_neg_ids2)
+        labels[assign_neg_ids] = -1
+
+        # anchor_gt_bbox_iou_inds
+        # (4) assign max_iou as pos_ids >=0
+        anchor_gt_bbox_iou_inds = anchor_gt_bbox_inds[gt_bbox_anchor_iou_inds]
+        # gt_bbox_anchor_iou_inds = np.logical_and(gt_bbox_anchor_iou_inds, anchor_gt_bbox_iou >= min_iou_thr)
+        labels[gt_bbox_anchor_iou_inds] = gt_lables[anchor_gt_bbox_iou_inds]
+
+        # (5) assign >= pos_iou_thr as pos_ids
+        iou_pos_iou_thr_ids = anchor_gt_bbox_iou >= pos_iou_thr
+        iou_pos_iou_thr_ids_box_inds = anchor_gt_bbox_inds[iou_pos_iou_thr_ids]
+        labels[iou_pos_iou_thr_ids] = gt_lables[iou_pos_iou_thr_ids_box_inds]
+        return anchor_gt_bbox_inds, anchor_gt_bbox_iou, labels
+
+    def __call__(self, anchors, gt_bboxes, gt_labels, is_crowd):
+
+        assert anchors.ndim == 2
+        assert anchors.shape[1] == 5
+        assert gt_bboxes.ndim == 2
+        assert gt_bboxes.shape[1] == 5
+
+        pos_iou_thr = self.pos_iou_thr
+        neg_iou_thr = self.neg_iou_thr
+        min_iou_thr = self.min_iou_thr
+        ignore_iof_thr = self.ignore_iof_thr
+
+        anchor_num = anchors.shape[0]
+        anchors_inds = self.anchor_valid(anchors)
+        anchors = anchors[anchors_inds]
+        gt_bboxes = gt_bboxes
+        is_crowd_slice = is_crowd
+        not_crowd_inds = np.where(is_crowd_slice == 0)
+
+        # Step1: match anchor and gt_bbox
+        anchor_gt_bbox_inds, anchor_gt_bbox_iou, labels = self.assign_anchor(
+            anchors, gt_bboxes,
+            gt_labels.reshape(-1), pos_iou_thr, neg_iou_thr, min_iou_thr,
+            ignore_iof_thr)
+
+        # Step2: sample anchor
+        pos_inds = np.where(labels >= 0)[0]
+        neg_inds = np.where(labels == -1)[0]
+
+        # Step3: make output
+        anchors_num = anchors.shape[0]
+        bbox_targets = np.zeros_like(anchors)
+        bbox_weights = np.zeros_like(anchors)
+        pos_labels = np.ones(anchors_num, dtype=np.int32) * -1
+        pos_labels_weights = np.zeros(anchors_num, dtype=np.float32)
+
+        pos_sampled_anchors = anchors[pos_inds]
+        #print('ancho target pos_inds', pos_inds, len(pos_inds))
+        pos_sampled_gt_boxes = gt_bboxes[anchor_gt_bbox_inds[pos_inds]]
+        if len(pos_inds) > 0:
+            pos_bbox_targets = bbox_utils.rbox2delta(pos_sampled_anchors,
+                                                     pos_sampled_gt_boxes)
+            bbox_targets[pos_inds, :] = pos_bbox_targets
+            bbox_weights[pos_inds, :] = 1.0
+
+            pos_labels[pos_inds] = labels[pos_inds]
+            pos_labels_weights[pos_inds] = 1.0
+
+        if len(neg_inds) > 0:
+            pos_labels_weights[neg_inds] = 1.0
+        return (pos_labels, pos_labels_weights, bbox_targets, bbox_weights,
+                pos_inds, neg_inds)
diff --git a/ppdet/modeling/tests/test_architectures.py b/ppdet/modeling/tests/test_architectures.py
new file mode 100644
index 0000000000000000000000000000000000000000..95cb212037fd8bfb44b5037e1117241b721d7358
--- /dev/null
+++ b/ppdet/modeling/tests/test_architectures.py
@@ -0,0 +1,59 @@
+#   Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import unittest
+import ppdet
+
+
+class TestFasterRCNN(unittest.TestCase):
+    def setUp(self):
+        self.set_config()
+
+    def set_config(self):
+        self.cfg_file = 'configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.yml'
+
+    def test_trainer(self):
+        # Trainer __init__ will build model and DataLoader
+        # 'train' and 'eval' mode include dataset loading
+        # use 'test' mode to simplify tests
+        cfg = ppdet.core.workspace.load_config(self.cfg_file)
+        trainer = ppdet.engine.Trainer(cfg, mode='test')
+
+
+class TestMaskRCNN(TestFasterRCNN):
+    def set_config(self):
+        self.cfg_file = 'configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.yml'
+
+
+class TestCascadeRCNN(TestFasterRCNN):
+    def set_config(self):
+        self.cfg_file = 'configs/cascade_rcnn/cascade_rcnn_r50_fpn_1x_coco.yml'
+
+
+class TestYolov3(TestFasterRCNN):
+    def set_config(self):
+        self.cfg_file = 'configs/yolov3/yolov3_darknet53_270e_coco.yml'
+
+
+class TestSSD(TestFasterRCNN):
+    def set_config(self):
+        self.cfg_file = 'configs/ssd/ssd_vgg16_300_240e_voc.yml'
+
+
+if __name__ == '__main__':
+    unittest.main()
diff --git a/ppdet/modeling/utils/__init__.py b/ppdet/modeling/utils/__init__.py
deleted file mode 100644
index e27f26a6f1254241a760af2b41a7eb26eb463ad6..0000000000000000000000000000000000000000
--- a/ppdet/modeling/utils/__init__.py
+++ /dev/null
@@ -1,17 +0,0 @@
-# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-from . import bbox_util
-
-from .bbox_util import *
diff --git a/ppdet/modeling/utils/bbox_util.py b/ppdet/modeling/utils/bbox_util.py
deleted file mode 100644
index 6ea3682b40ab3a48a04bdd78f64ca529dd2c9587..0000000000000000000000000000000000000000
--- a/ppdet/modeling/utils/bbox_util.py
+++ /dev/null
@@ -1,143 +0,0 @@
-# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import paddle
-import paddle.nn.functional as F
-import math
-
-
-def xywh2xyxy(box):
-    x, y, w, h = box
-    x1 = x - w * 0.5
-    y1 = y - h * 0.5
-    x2 = x + w * 0.5
-    y2 = y + h * 0.5
-    return [x1, y1, x2, y2]
-
-
-def make_grid(h, w, dtype):
-    yv, xv = paddle.meshgrid([paddle.arange(h), paddle.arange(w)])
-    return paddle.stack((xv, yv), 2).cast(dtype=dtype)
-
-
-def decode_yolo(box, anchor, downsample_ratio):
-    """decode yolo box
-
-    Args:
-        box (list): [x, y, w, h], all have the shape [b, na, h, w, 1]
-        anchor (list): anchor with the shape [na, 2]
-        downsample_ratio (int): downsample ratio, default 32
-        scale (float): scale, default 1.
-
-    Return:
-        box (list): decoded box, [x, y, w, h], all have the shape [b, na, h, w, 1]
-    """
-    x, y, w, h = box
-    na, grid_h, grid_w = x.shape[1:4]
-    grid = make_grid(grid_h, grid_w, x.dtype).reshape((1, 1, grid_h, grid_w, 2))
-    x1 = (x + grid[:, :, :, :, 0:1]) / grid_w
-    y1 = (y + grid[:, :, :, :, 1:2]) / grid_h
-
-    anchor = paddle.to_tensor(anchor)
-    anchor = paddle.cast(anchor, x.dtype)
-    anchor = anchor.reshape((1, na, 1, 1, 2))
-    w1 = paddle.exp(w) * anchor[:, :, :, :, 0:1] / (downsample_ratio * grid_w)
-    h1 = paddle.exp(h) * anchor[:, :, :, :, 1:2] / (downsample_ratio * grid_h)
-
-    return [x1, y1, w1, h1]
-
-
-def iou_similarity(box1, box2, eps=1e-9):
-    """Calculate iou of box1 and box2
-
-    Args:
-        box1 (Tensor): box with the shape [N, M1, 4]
-        box2 (Tensor): box with the shape [N, M2, 4]
-
-    Return:
-        iou (Tensor): iou between box1 and box2 with the shape [N, M1, M2]
-    """
-    box1 = box1.unsqueeze(2)  # [N, M1, 4] -> [N, M1, 1, 4]
-    box2 = box2.unsqueeze(1)  # [N, M2, 4] -> [N, 1, M2, 4]
-    px1y1, px2y2 = box1[:, :, :, 0:2], box1[:, :, :, 2:4]
-    gx1y1, gx2y2 = box2[:, :, :, 0:2], box2[:, :, :, 2:4]
-    x1y1 = paddle.maximum(px1y1, gx1y1)
-    x2y2 = paddle.minimum(px2y2, gx2y2)
-    overlap = (x2y2 - x1y1).clip(0).prod(-1)
-    area1 = (px2y2 - px1y1).clip(0).prod(-1)
-    area2 = (gx2y2 - gx1y1).clip(0).prod(-1)
-    union = area1 + area2 - overlap + eps
-    return overlap / union
-
-
-def bbox_iou(box1, box2, giou=False, diou=False, ciou=False, eps=1e-9):
-    """calculate the iou of box1 and box2
-
-    Args:
-        box1 (list): [x, y, w, h], all have the shape [b, na, h, w, 1]
-        box2 (list): [x, y, w, h], all have the shape [b, na, h, w, 1]
-        giou (bool): whether use giou or not, default False
-        diou (bool): whether use diou or not, default False
-        ciou (bool): whether use ciou or not, default False
-        eps (float): epsilon to avoid divide by zero
-
-    Return:
-        iou (Tensor): iou of box1 and box1, with the shape [b, na, h, w, 1]
-    """
-    px1, py1, px2, py2 = box1
-    gx1, gy1, gx2, gy2 = box2
-    x1 = paddle.maximum(px1, gx1)
-    y1 = paddle.maximum(py1, gy1)
-    x2 = paddle.minimum(px2, gx2)
-    y2 = paddle.minimum(py2, gy2)
-
-    overlap = ((x2 - x1).clip(0)) * ((y2 - y1).clip(0))
-
-    area1 = (px2 - px1) * (py2 - py1)
-    area1 = area1.clip(0)
-
-    area2 = (gx2 - gx1) * (gy2 - gy1)
-    area2 = area2.clip(0)
-
-    union = area1 + area2 - overlap + eps
-    iou = overlap / union
-
-    if giou or ciou or diou:
-        # convex w, h
-        cw = paddle.maximum(px2, gx2) - paddle.minimum(px1, gx1)
-        ch = paddle.maximum(py2, gy2) - paddle.minimum(py1, gy1)
-        if giou:
-            c_area = cw * ch + eps
-            return iou - (c_area - union) / c_area
-        else:
-            # convex diagonal squared
-            c2 = cw**2 + ch**2 + eps
-            # center distance
-            rho2 = ((px1 + px2 - gx1 - gx2)**2 + (py1 + py2 - gy1 - gy2)**2) / 4
-            if diou:
-                return iou - rho2 / c2
-            else:
-                w1, h1 = px2 - px1, py2 - py1 + eps
-                w2, h2 = gx2 - gx1, gy2 - gy1 + eps
-                delta = paddle.atan(w1 / h1) - paddle.atan(w2 / h2)
-                v = (4 / math.pi**2) * paddle.pow(delta, 2)
-                alpha = v / (1 + eps - iou + v)
-                alpha.stop_gradient = True
-                return iou - (rho2 / c2 + v * alpha)
-    else:
-        return iou
diff --git a/ppdet/optimizer.py b/ppdet/optimizer.py
index c476e2edb7b01a64795e099b4e3de1dad6141841..5334eba724d9b7ecfd5ec080ba58a02c20bb6434 100644
--- a/ppdet/optimizer.py
+++ b/ppdet/optimizer.py
@@ -249,6 +249,8 @@ class ModelEMA(object):
         self.step += 1
 
     def apply(self):
+        if self.step == 0:
+            return self.state_dict
         state_dict = dict()
         for k, v in self.state_dict.items():
             v = v / (1 - self._decay**self.step)
diff --git a/ppdet/py_op/__init__.py b/ppdet/py_op/__init__.py
deleted file mode 100644
index d48118906e65f80ebd15bdba7f5779a97b67bbdb..0000000000000000000000000000000000000000
--- a/ppdet/py_op/__init__.py
+++ /dev/null
@@ -1 +0,0 @@
-from .post_process import *
diff --git a/ppdet/slim/__init__.py b/ppdet/slim/__init__.py
index 7a58bf591c895d699301c0f88dd55268552581e5..ab286647d89b70ea2aef01f831a5a3b2c68266fe 100644
--- a/ppdet/slim/__init__.py
+++ b/ppdet/slim/__init__.py
@@ -14,6 +14,48 @@
 
 from . import prune
 from . import quant
+from . import distill
 
 from .prune import *
 from .quant import *
+from .distill import *
+
+import yaml
+from ppdet.core.workspace import load_config
+from ppdet.utils.checkpoint import load_pretrain_weight
+
+
+def build_slim_model(cfg, slim_cfg, mode='train'):
+    with open(slim_cfg) as f:
+        slim_load_cfg = yaml.load(f, Loader=yaml.Loader)
+    if mode != 'train' and slim_load_cfg['slim'] == 'Distill':
+        return cfg
+
+    if slim_load_cfg['slim'] == 'Distill':
+        model = DistillModel(cfg, slim_cfg)
+        cfg['model'] = model
+    elif slim_load_cfg['slim'] == 'DistillPrune':
+        if mode == 'train':
+            model = DistillModel(cfg, slim_cfg)
+            pruner = create(cfg.pruner)
+            pruner(model.student_model)
+        else:
+            model = create(cfg.architecture)
+            weights = cfg.weights
+            load_config(slim_cfg)
+            pruner = create(cfg.pruner)
+            model = pruner(model)
+            load_pretrain_weight(model, weights)
+        cfg['model'] = model
+    else:
+        load_config(slim_cfg)
+        model = create(cfg.architecture)
+        if mode == 'train':
+            load_pretrain_weight(model, cfg.pretrain_weights)
+        slim = create(cfg.slim)
+        cfg['model'] = slim(model)
+        cfg['slim'] = slim
+        if mode != 'train':
+            load_pretrain_weight(cfg['model'], cfg.weights)
+
+    return cfg
diff --git a/ppdet/slim/distill.py b/ppdet/slim/distill.py
new file mode 100644
index 0000000000000000000000000000000000000000..d5c9d72669a601ce331afde69ce92ca6642fb3d2
--- /dev/null
+++ b/ppdet/slim/distill.py
@@ -0,0 +1,110 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import paddle
+import paddle.nn as nn
+import paddle.nn.functional as F
+
+from ppdet.core.workspace import register, serializable, load_config
+from ppdet.core.workspace import create
+from ppdet.utils.logger import setup_logger
+from ppdet.modeling import ops
+from ppdet.utils.checkpoint import load_pretrain_weight
+from ppdet.modeling.losses import YOLOv3Loss
+logger = setup_logger(__name__)
+
+
+class DistillModel(nn.Layer):
+    def __init__(self, cfg, slim_cfg):
+        super(DistillModel, self).__init__()
+
+        self.student_model = create(cfg.architecture)
+        logger.debug('Load student model pretrain_weights:{}'.format(
+            cfg.pretrain_weights))
+        load_pretrain_weight(self.student_model, cfg.pretrain_weights)
+
+        slim_cfg = load_config(slim_cfg)
+        self.teacher_model = create(slim_cfg.architecture)
+        self.distill_loss = create(slim_cfg.distill_loss)
+        logger.debug('Load teacher model pretrain_weights:{}'.format(
+            slim_cfg.pretrain_weights))
+        load_pretrain_weight(self.teacher_model, slim_cfg.pretrain_weights)
+
+        for param in self.teacher_model.parameters():
+            param.trainable = False
+
+    def parameters(self):
+        return self.student_model.parameters()
+
+    def forward(self, inputs):
+        if self.training:
+            teacher_loss = self.teacher_model(inputs)
+            student_loss = self.student_model(inputs)
+            loss = self.distill_loss(self.teacher_model, self.student_model)
+            student_loss['distill_loss'] = loss
+            student_loss['teacher_loss'] = teacher_loss['loss']
+            student_loss['loss'] += student_loss['distill_loss']
+            return student_loss
+        else:
+            return self.student_model(inputs)
+
+
+@register
+class DistillYOLOv3Loss(nn.Layer):
+    def __init__(self, weight=1000):
+        super(DistillYOLOv3Loss, self).__init__()
+        self.weight = weight
+
+    def obj_weighted_reg(self, sx, sy, sw, sh, tx, ty, tw, th, tobj):
+        loss_x = ops.sigmoid_cross_entropy_with_logits(sx, F.sigmoid(tx))
+        loss_y = ops.sigmoid_cross_entropy_with_logits(sy, F.sigmoid(ty))
+        loss_w = paddle.abs(sw - tw)
+        loss_h = paddle.abs(sh - th)
+        loss = paddle.add_n([loss_x, loss_y, loss_w, loss_h])
+        weighted_loss = paddle.mean(loss * F.sigmoid(tobj))
+        return weighted_loss
+
+    def obj_weighted_cls(self, scls, tcls, tobj):
+        loss = ops.sigmoid_cross_entropy_with_logits(scls, F.sigmoid(tcls))
+        weighted_loss = paddle.mean(paddle.multiply(loss, F.sigmoid(tobj)))
+        return weighted_loss
+
+    def obj_loss(self, sobj, tobj):
+        obj_mask = paddle.cast(tobj > 0., dtype="float32")
+        obj_mask.stop_gradient = True
+        loss = paddle.mean(
+            ops.sigmoid_cross_entropy_with_logits(sobj, obj_mask))
+        return loss
+
+    def forward(self, teacher_model, student_model):
+        teacher_distill_pairs = teacher_model.yolo_head.loss.distill_pairs
+        student_distill_pairs = student_model.yolo_head.loss.distill_pairs
+        distill_reg_loss, distill_cls_loss, distill_obj_loss = [], [], []
+        for s_pair, t_pair in zip(student_distill_pairs, teacher_distill_pairs):
+            distill_reg_loss.append(
+                self.obj_weighted_reg(s_pair[0], s_pair[1], s_pair[2], s_pair[
+                    3], t_pair[0], t_pair[1], t_pair[2], t_pair[3], t_pair[4]))
+            distill_cls_loss.append(
+                self.obj_weighted_cls(s_pair[5], t_pair[5], t_pair[4]))
+            distill_obj_loss.append(self.obj_loss(s_pair[4], t_pair[4]))
+        distill_reg_loss = paddle.add_n(distill_reg_loss)
+        distill_cls_loss = paddle.add_n(distill_cls_loss)
+        distill_obj_loss = paddle.add_n(distill_obj_loss)
+        loss = (distill_reg_loss + distill_cls_loss + distill_obj_loss
+                ) * self.weight
+        return loss
diff --git a/ppdet/utils/bbox_utils.py b/ppdet/utils/bbox_utils.py
deleted file mode 100644
index 63c93976c1c63cf85e35a6ddc79ee32f0eb3b716..0000000000000000000000000000000000000000
--- a/ppdet/utils/bbox_utils.py
+++ /dev/null
@@ -1,81 +0,0 @@
-# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import numpy as np
-
-from .logger import setup_logger
-logger = setup_logger(__name__)
-
-__all__ = ["bbox_overlaps", "box_to_delta"]
-
-
-def bbox_overlaps(boxes_1, boxes_2):
-    '''
-    bbox_overlaps
-        boxes_1: x1, y, x2, y2
-        boxes_2: x1, y, x2, y2
-    '''
-    assert boxes_1.shape[1] == 4 and boxes_2.shape[1] == 4
-
-    num_1 = boxes_1.shape[0]
-    num_2 = boxes_2.shape[0]
-
-    x1_1 = boxes_1[:, 0:1]
-    y1_1 = boxes_1[:, 1:2]
-    x2_1 = boxes_1[:, 2:3]
-    y2_1 = boxes_1[:, 3:4]
-    area_1 = (x2_1 - x1_1 + 1) * (y2_1 - y1_1 + 1)
-
-    x1_2 = boxes_2[:, 0].transpose()
-    y1_2 = boxes_2[:, 1].transpose()
-    x2_2 = boxes_2[:, 2].transpose()
-    y2_2 = boxes_2[:, 3].transpose()
-    area_2 = (x2_2 - x1_2 + 1) * (y2_2 - y1_2 + 1)
-
-    xx1 = np.maximum(x1_1, x1_2)
-    yy1 = np.maximum(y1_1, y1_2)
-    xx2 = np.minimum(x2_1, x2_2)
-    yy2 = np.minimum(y2_1, y2_2)
-
-    w = np.maximum(0.0, xx2 - xx1 + 1)
-    h = np.maximum(0.0, yy2 - yy1 + 1)
-    inter = w * h
-
-    ovr = inter / (area_1 + area_2 - inter)
-    return ovr
-
-
-def box_to_delta(ex_boxes, gt_boxes, weights):
-    """ box_to_delta """
-    ex_w = ex_boxes[:, 2] - ex_boxes[:, 0] + 1
-    ex_h = ex_boxes[:, 3] - ex_boxes[:, 1] + 1
-    ex_ctr_x = ex_boxes[:, 0] + 0.5 * ex_w
-    ex_ctr_y = ex_boxes[:, 1] + 0.5 * ex_h
-
-    gt_w = gt_boxes[:, 2] - gt_boxes[:, 0] + 1
-    gt_h = gt_boxes[:, 3] - gt_boxes[:, 1] + 1
-    gt_ctr_x = gt_boxes[:, 0] + 0.5 * gt_w
-    gt_ctr_y = gt_boxes[:, 1] + 0.5 * gt_h
-
-    dx = (gt_ctr_x - ex_ctr_x) / ex_w / weights[0]
-    dy = (gt_ctr_y - ex_ctr_y) / ex_h / weights[1]
-    dw = (np.log(gt_w / ex_w)) / weights[2]
-    dh = (np.log(gt_h / ex_h)) / weights[3]
-
-    targets = np.vstack([dx, dy, dw, dh]).transpose()
-    return targets
diff --git a/ppdet/utils/checkpoint.py b/ppdet/utils/checkpoint.py
index f70cad4fce8c6bb32bde9ae577621e1b69b25334..d4f08097f11b9cdc974b076a947ab04efc397f42 100644
--- a/ppdet/utils/checkpoint.py
+++ b/ppdet/utils/checkpoint.py
@@ -157,22 +157,21 @@ def load_pretrain_weight(model, pretrain_weight):
 
     weights_path = path + '.pdparams'
     param_state_dict = paddle.load(weights_path)
-    ignore_set = set()
-    lack_modules = set()
-    for name, weight in model_dict.items():
-        if name in param_state_dict.keys():
-            if weight.shape != list(param_state_dict[name].shape):
+    ignore_weights = set()
+
+    for name, weight in param_state_dict.items():
+        if name in model_dict.keys():
+            if list(weight.shape) != list(model_dict[name].shape):
                 logger.info(
                     '{} not used, shape {} unmatched with {} in model.'.format(
-                        name, list(param_state_dict[name].shape), weight.shape))
-                param_state_dict.pop(name, None)
+                        name, weight.shape, list(model_dict[name].shape)))
+                ignore_weights.add(name)
         else:
-            lack_modules.add(name.split('.')[0])
-            logger.debug('Lack weights: {}'.format(name))
+            logger.info('Redundant weight {} and ignore it.'.format(name))
+            ignore_weights.add(name)
 
-    if len(lack_modules) > 0:
-        logger.info('Lack weights of modules: {}'.format(', '.join(
-            list(lack_modules))))
+    for weight in ignore_weights:
+        param_state_dict.pop(weight, None)
 
     model.set_dict(param_state_dict)
     logger.info('Finish loading model weights: {}'.format(weights_path))
diff --git a/ppdet/utils/download.py b/ppdet/utils/download.py
index 3b50ddd010d1263b30199350418f2d02c085f497..99635c75f3af96eb266cbf0545eb4b312648a6e1 100644
--- a/ppdet/utils/download.py
+++ b/ppdet/utils/download.py
@@ -23,6 +23,8 @@ import shutil
 import requests
 import tqdm
 import hashlib
+import base64
+import binascii
 import tarfile
 import zipfile
 
@@ -257,20 +259,22 @@ def get_path(url, root_dir, md5sum=None, check_exist=True):
         if fullpath.find(k) >= 0:
             fullpath = osp.join(osp.split(fullpath)[0], v)
 
-    exist_flag = False
     if osp.exists(fullpath) and check_exist:
-        exist_flag = True
-        logger.debug("Found {}".format(fullpath))
-    else:
-        exist_flag = False
-        fullname = _download(url, root_dir, md5sum)
+        if not osp.isfile(fullpath) or \
+                _check_exist_file_md5(fullpath, md5sum, url):
+            logger.debug("Found {}".format(fullpath))
+            return fullpath, True
+        else:
+            os.remove(fullpath)
 
-        # new weights format which postfix is 'pdparams' not
-        # need to decompress
-        if osp.splitext(fullname)[-1] not in ['.pdparams', '.yml']:
-            _decompress(fullname)
+    fullname = _download(url, root_dir, md5sum)
 
-    return fullpath, exist_flag
+    # new weights format which postfix is 'pdparams' not
+    # need to decompress
+    if osp.splitext(fullname)[-1] not in ['.pdparams', '.yml']:
+        _decompress(fullname)
+
+    return fullpath, False
 
 
 def download_dataset(path, dataset=None):
@@ -324,7 +328,8 @@ def _download(url, path, md5sum=None):
     fullname = osp.join(path, fname)
     retry_cnt = 0
 
-    while not (osp.exists(fullname) and _md5check(fullname, md5sum)):
+    while not (osp.exists(fullname) and _check_exist_file_md5(fullname, md5sum,
+                                                              url)):
         if retry_cnt < DOWNLOAD_RETRY_LIMIT:
             retry_cnt += 1
         else:
@@ -355,8 +360,30 @@ def _download(url, path, md5sum=None):
                     if chunk:
                         f.write(chunk)
         shutil.move(tmp_fullname, fullname)
-
-    return fullname
+        return fullname
+
+
+def _check_exist_file_md5(filename, md5sum, url):
+    # if md5sum is None, and file to check is weights file, 
+    # read md5um from url and check, else check md5sum directly
+    return _md5check_from_url(filename, url) if md5sum is None \
+            and filename.endswith('pdparams') \
+            else _md5check(filename, md5sum)
+
+
+def _md5check_from_url(filename, url):
+    # For weights in bcebos URLs, MD5 value is contained
+    # in request header as 'content_md5'
+    req = requests.get(url, stream=True)
+    content_md5 = req.headers.get('content-md5')
+    req.close()
+    if not content_md5 or _md5check(
+            filename,
+            binascii.hexlify(base64.b64decode(content_md5.strip('"'))).decode(
+            )):
+        return True
+    else:
+        return False
 
 
 def _md5check(fullname, md5sum=None):
diff --git a/ppdet/utils/logger.py b/ppdet/utils/logger.py
index 9f02313ed57b9ca19668ecd7ceecbddc8fa693e1..99b82f995e4ec1a8ec19b8253aa1c2b3948d1e2d 100644
--- a/ppdet/utils/logger.py
+++ b/ppdet/utils/logger.py
@@ -17,7 +17,7 @@ import logging
 import os
 import sys
 
-from paddle.distributed import ParallelEnv
+import paddle.distributed as dist
 
 __all__ = ['setup_logger']
 
@@ -47,7 +47,7 @@ def setup_logger(name="ppdet", output=None):
         "[%(asctime)s] %(name)s %(levelname)s: %(message)s",
         datefmt="%m/%d %H:%M:%S")
     # stdout logging: master only
-    local_rank = ParallelEnv().local_rank
+    local_rank = dist.get_rank()
     if local_rank == 0:
         ch = logging.StreamHandler(stream=sys.stdout)
         ch.setLevel(logging.DEBUG)
diff --git a/ppdet/utils/post_process.py b/ppdet/utils/post_process.py
deleted file mode 100644
index 45f9f9908af32f816c8965587d4e428b1b34ad5e..0000000000000000000000000000000000000000
--- a/ppdet/utils/post_process.py
+++ /dev/null
@@ -1,326 +0,0 @@
-# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import numpy as np
-import cv2
-
-from .logger import setup_logger
-logger = setup_logger(__name__)
-
-__all__ = ['nms']
-
-
-def box_flip(boxes, im_shape):
-    im_width = im_shape[0][1]
-    flipped_boxes = boxes.copy()
-
-    flipped_boxes[:, 0::4] = im_width - boxes[:, 2::4] - 1
-    flipped_boxes[:, 2::4] = im_width - boxes[:, 0::4] - 1
-    return flipped_boxes
-
-
-def nms(dets, thresh):
-    """Apply classic DPM-style greedy NMS."""
-    if dets.shape[0] == 0:
-        return dets[[], :]
-    scores = dets[:, 0]
-    x1 = dets[:, 1]
-    y1 = dets[:, 2]
-    x2 = dets[:, 3]
-    y2 = dets[:, 4]
-
-    areas = (x2 - x1 + 1) * (y2 - y1 + 1)
-    order = scores.argsort()[::-1]
-
-    ndets = dets.shape[0]
-    suppressed = np.zeros((ndets), dtype=np.int)
-
-    # nominal indices
-    # _i, _j
-    # sorted indices
-    # i, j
-    # temp variables for box i's (the box currently under consideration)
-    # ix1, iy1, ix2, iy2, iarea
-
-    # variables for computing overlap with box j (lower scoring box)
-    # xx1, yy1, xx2, yy2
-    # w, h
-    # inter, ovr
-
-    for _i in range(ndets):
-        i = order[_i]
-        if suppressed[i] == 1:
-            continue
-        ix1 = x1[i]
-        iy1 = y1[i]
-        ix2 = x2[i]
-        iy2 = y2[i]
-        iarea = areas[i]
-        for _j in range(_i + 1, ndets):
-            j = order[_j]
-            if suppressed[j] == 1:
-                continue
-            xx1 = max(ix1, x1[j])
-            yy1 = max(iy1, y1[j])
-            xx2 = min(ix2, x2[j])
-            yy2 = min(iy2, y2[j])
-            w = max(0.0, xx2 - xx1 + 1)
-            h = max(0.0, yy2 - yy1 + 1)
-            inter = w * h
-            ovr = inter / (iarea + areas[j] - inter)
-            if ovr >= thresh:
-                suppressed[j] = 1
-    keep = np.where(suppressed == 0)[0]
-    dets = dets[keep, :]
-    return dets
-
-
-def soft_nms(dets, sigma, thres):
-    dets_final = []
-    while len(dets) > 0:
-        maxpos = np.argmax(dets[:, 0])
-        dets_final.append(dets[maxpos].copy())
-        ts, tx1, ty1, tx2, ty2 = dets[maxpos]
-        scores = dets[:, 0]
-        # force remove bbox at maxpos
-        scores[maxpos] = -1
-        x1 = dets[:, 1]
-        y1 = dets[:, 2]
-        x2 = dets[:, 3]
-        y2 = dets[:, 4]
-        areas = (x2 - x1 + 1) * (y2 - y1 + 1)
-        xx1 = np.maximum(tx1, x1)
-        yy1 = np.maximum(ty1, y1)
-        xx2 = np.minimum(tx2, x2)
-        yy2 = np.minimum(ty2, y2)
-        w = np.maximum(0.0, xx2 - xx1 + 1)
-        h = np.maximum(0.0, yy2 - yy1 + 1)
-        inter = w * h
-        ovr = inter / (areas + areas[maxpos] - inter)
-        weight = np.exp(-(ovr * ovr) / sigma)
-        scores = scores * weight
-        idx_keep = np.where(scores >= thres)
-        dets[:, 0] = scores
-        dets = dets[idx_keep]
-    dets_final = np.array(dets_final).reshape(-1, 5)
-    return dets_final
-
-
-def bbox_area(box):
-    w = box[2] - box[0] + 1
-    h = box[3] - box[1] + 1
-    return w * h
-
-
-def bbox_overlaps(x, y):
-    N = x.shape[0]
-    K = y.shape[0]
-    overlaps = np.zeros((N, K), dtype=np.float32)
-    for k in range(K):
-        y_area = bbox_area(y[k])
-        for n in range(N):
-            iw = min(x[n, 2], y[k, 2]) - max(x[n, 0], y[k, 0]) + 1
-            if iw > 0:
-                ih = min(x[n, 3], y[k, 3]) - max(x[n, 1], y[k, 1]) + 1
-                if ih > 0:
-                    x_area = bbox_area(x[n])
-                    ua = x_area + y_area - iw * ih
-                    overlaps[n, k] = iw * ih / ua
-    return overlaps
-
-
-def box_voting(nms_dets, dets, vote_thresh):
-    top_dets = nms_dets.copy()
-    top_boxes = nms_dets[:, 1:]
-    all_boxes = dets[:, 1:]
-    all_scores = dets[:, 0]
-    top_to_all_overlaps = bbox_overlaps(top_boxes, all_boxes)
-    for k in range(nms_dets.shape[0]):
-        inds_to_vote = np.where(top_to_all_overlaps[k] >= vote_thresh)[0]
-        boxes_to_vote = all_boxes[inds_to_vote, :]
-        ws = all_scores[inds_to_vote]
-        top_dets[k, 1:] = np.average(boxes_to_vote, axis=0, weights=ws)
-
-    return top_dets
-
-
-def get_nms_result(boxes,
-                   scores,
-                   config,
-                   num_classes,
-                   background_label=0,
-                   labels=None):
-    has_labels = labels is not None
-    cls_boxes = [[] for _ in range(num_classes)]
-    start_idx = 1 if background_label == 0 else 0
-    for j in range(start_idx, num_classes):
-        inds = np.where(labels == j)[0] if has_labels else np.where(
-            scores[:, j] > config['score_thresh'])[0]
-        scores_j = scores[inds] if has_labels else scores[inds, j]
-        boxes_j = boxes[inds, :] if has_labels else boxes[inds, j * 4:(j + 1) *
-                                                          4]
-        dets_j = np.hstack((scores_j[:, np.newaxis], boxes_j)).astype(
-            np.float32, copy=False)
-        if config.get('use_soft_nms', False):
-            nms_dets = soft_nms(dets_j, config['sigma'], config['nms_thresh'])
-        else:
-            nms_dets = nms(dets_j, config['nms_thresh'])
-        if config.get('enable_voting', False):
-            nms_dets = box_voting(nms_dets, dets_j, config['vote_thresh'])
-        #add labels
-        label = np.array([j for _ in range(len(nms_dets))])
-        nms_dets = np.hstack((label[:, np.newaxis], nms_dets)).astype(
-            np.float32, copy=False)
-        cls_boxes[j] = nms_dets
-    # Limit to max_per_image detections **over all classes**
-    image_scores = np.hstack(
-        [cls_boxes[j][:, 1] for j in range(start_idx, num_classes)])
-    if len(image_scores) > config['detections_per_im']:
-        image_thresh = np.sort(image_scores)[-config['detections_per_im']]
-        for j in range(start_idx, num_classes):
-            keep = np.where(cls_boxes[j][:, 1] >= image_thresh)[0]
-            cls_boxes[j] = cls_boxes[j][keep, :]
-
-    im_results = np.vstack(
-        [cls_boxes[j] for j in range(start_idx, num_classes)])
-    return im_results
-
-
-def mstest_box_post_process(result, config, num_classes):
-    """
-    Multi-scale Test
-    Only available for batch_size=1 now.
-    """
-    post_bbox = {}
-    use_flip = False
-    ms_boxes = []
-    ms_scores = []
-    im_shape = result['im_shape'][0]
-    for k in result.keys():
-        if 'bbox' in k:
-            boxes = result[k][0]
-            boxes = np.reshape(boxes, (-1, 4 * num_classes))
-            scores = result['score' + k[4:]][0]
-            if 'flip' in k:
-                boxes = box_flip(boxes, im_shape)
-                use_flip = True
-            ms_boxes.append(boxes)
-            ms_scores.append(scores)
-
-    ms_boxes = np.concatenate(ms_boxes)
-    ms_scores = np.concatenate(ms_scores)
-    bbox_pred = get_nms_result(ms_boxes, ms_scores, config, num_classes)
-    post_bbox.update({'bbox': (bbox_pred, [[len(bbox_pred)]])})
-    if use_flip:
-        bbox = bbox_pred[:, 2:]
-        bbox_flip = np.append(
-            bbox_pred[:, :2], box_flip(bbox, im_shape), axis=1)
-        post_bbox.update({'bbox_flip': (bbox_flip, [[len(bbox_flip)]])})
-    return post_bbox
-
-
-def mstest_mask_post_process(result, cfg):
-    mask_list = []
-    im_shape = result['im_shape'][0]
-    M = cfg.FPNRoIAlign['mask_resolution']
-    for k in result.keys():
-        if 'mask' in k:
-            masks = result[k][0]
-            if len(masks.shape) != 4:
-                masks = np.zeros((0, M, M))
-                mask_list.append(masks)
-                continue
-            if 'flip' in k:
-                masks = masks[:, :, :, ::-1]
-            mask_list.append(masks)
-
-    mask_pred = np.mean(mask_list, axis=0)
-    return {'mask': (mask_pred, [[len(mask_pred)]])}
-
-
-def mask_encode(results, resolution, thresh_binarize=0.5):
-    import pycocotools.mask as mask_util
-    from ppdet.utils.coco_eval import expand_boxes
-    scale = (resolution + 2.0) / resolution
-    bboxes = results['bbox'][0]
-    masks = results['mask'][0]
-    lengths = results['mask'][1][0]
-    im_shapes = results['im_shape'][0]
-    segms = []
-    if bboxes.shape == (1, 1) or bboxes is None:
-        return segms
-    if len(bboxes.tolist()) == 0:
-        return segms
-
-    s = 0
-    # for each sample
-    for i in range(len(lengths)):
-        num = lengths[i]
-        im_shape = im_shapes[i]
-
-        bbox = bboxes[s:s + num][:, 2:]
-        clsid_scores = bboxes[s:s + num][:, 0:2]
-        mask = masks[s:s + num]
-        s += num
-
-        im_h = int(im_shape[0])
-        im_w = int(im_shape[1])
-        expand_bbox = expand_boxes(bbox, scale)
-        expand_bbox = expand_bbox.astype(np.int32)
-        padded_mask = np.zeros(
-            (resolution + 2, resolution + 2), dtype=np.float32)
-
-        for j in range(num):
-            xmin, ymin, xmax, ymax = expand_bbox[j].tolist()
-            clsid, score = clsid_scores[j].tolist()
-            clsid = int(clsid)
-            padded_mask[1:-1, 1:-1] = mask[j, clsid, :, :]
-
-            w = xmax - xmin + 1
-            h = ymax - ymin + 1
-            w = np.maximum(w, 1)
-            h = np.maximum(h, 1)
-            resized_mask = cv2.resize(padded_mask, (w, h))
-            resized_mask = np.array(
-                resized_mask > thresh_binarize, dtype=np.uint8)
-            im_mask = np.zeros((im_h, im_w), dtype=np.uint8)
-
-            x0 = min(max(xmin, 0), im_w)
-            x1 = min(max(xmax + 1, 0), im_w)
-            y0 = min(max(ymin, 0), im_h)
-            y1 = min(max(ymax + 1, 0), im_h)
-
-            im_mask[y0:y1, x0:x1] = resized_mask[(y0 - ymin):(y1 - ymin), (
-                x0 - xmin):(x1 - xmin)]
-            segm = mask_util.encode(
-                np.array(
-                    im_mask[:, :, np.newaxis], order='F'))[0]
-            segms.append(segm)
-    return segms
-
-
-def corner_post_process(results, config, num_classes):
-    detections = results['bbox'][0]
-    keep_inds = (detections[:, 1] > -1)
-    detections = detections[keep_inds]
-    labels = detections[:, 0]
-    scores = detections[:, 1]
-    boxes = detections[:, 2:6]
-    cls_boxes = get_nms_result(
-        boxes, scores, config, num_classes, background_label=-1, labels=labels)
-    results.update({'bbox': (cls_boxes, [[len(cls_boxes)]])})
diff --git a/ppdet/utils/visualizer.py b/ppdet/utils/visualizer.py
index 5327fef1d2cc92910347ae96f014d79453f70802..ecf95954107ebad2f43f37d135c7f381c50bd7cc 100644
--- a/ppdet/utils/visualizer.py
+++ b/ppdet/utils/visualizer.py
@@ -20,8 +20,9 @@ from __future__ import unicode_literals
 import numpy as np
 from PIL import Image, ImageDraw
 import cv2
-
 from .colormap import colormap
+from ppdet.utils.logger import setup_logger
+logger = setup_logger(__name__)
 
 __all__ = ['visualize_results']
 
@@ -86,21 +87,32 @@ def draw_bbox(image, im_id, catid2name, bboxes, threshold):
         if score < threshold:
             continue
 
-        xmin, ymin, w, h = bbox
-        xmax = xmin + w
-        ymax = ymin + h
-
         if catid not in catid2color:
             idx = np.random.randint(len(color_list))
             catid2color[catid] = color_list[idx]
         color = tuple(catid2color[catid])
 
         # draw bbox
-        draw.line(
-            [(xmin, ymin), (xmin, ymax), (xmax, ymax), (xmax, ymin),
-             (xmin, ymin)],
-            width=2,
-            fill=color)
+        if len(bbox) == 4:
+            # draw bbox
+            xmin, ymin, w, h = bbox
+            xmax = xmin + w
+            ymax = ymin + h
+            draw.line(
+                [(xmin, ymin), (xmin, ymax), (xmax, ymax), (xmax, ymin),
+                 (xmin, ymin)],
+                width=2,
+                fill=color)
+        elif len(bbox) == 8:
+            x1, y1, x2, y2, x3, y3, x4, y4 = bbox
+            draw.line(
+                [(x1, y1), (x2, y2), (x3, y3), (x4, y4), (x1, y1)],
+                width=2,
+                fill=color)
+            xmin = min(x1, x2, x3, x4)
+            ymin = min(y1, y2, y3, y4)
+        else:
+            logger.error('the shape of bbox must be [M, 4] or [M, 8]!')
 
         # draw label
         text = "{} {:.2f}".format(catid2name[catid], score)
@@ -112,6 +124,23 @@ def draw_bbox(image, im_id, catid2name, bboxes, threshold):
     return image
 
 
+def save_result(save_path, bbox_res, catid2name, threshold):
+    """
+    save result as txt
+    """
+    with open(save_path, 'w') as f:
+        for dt in bbox_res:
+            catid, bbox, score = dt['category_id'], dt['bbox'], dt['score']
+            if score < threshold:
+                continue
+            # each bbox result as a line
+            # for rbox: classname score x1 y1 x2 y2 x3 y3 x4 y4
+            # for bbox: classname score x1 y1 w h
+            bbox_pred = '{} {} '.format(catid2name[catid], score) + ' '.join(
+                [str(e) for e in bbox])
+            f.write(bbox_pred + '\n')
+
+
 def draw_segm(image,
               im_id,
               catid2name,
diff --git a/requirements.txt b/requirements.txt
index 9fcef3658ce6677fa37490dd3f66134a20874352..8ce34b5f06700535e9753504bb0f0d7df30a6df9 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -5,4 +5,6 @@ opencv-python
 PyYAML
 shapely
 scipy
-terminaltables
\ No newline at end of file
+terminaltables
+pycocotools
+setuptools>=42.0.0
diff --git a/setup.py b/setup.py
index 5c2f365b3cc9e5f1d46411f965c09fbf40ae96a6..c2d03ade3a24a14c878032ce428cfb85c51473fd 100644
--- a/setup.py
+++ b/setup.py
@@ -17,6 +17,7 @@ import os.path as osp
 import glob
 import shutil
 from setuptools import find_packages, setup
+from paddle.utils import cpp_extension
 
 
 def readme():
@@ -32,7 +33,6 @@ def parse_requirements(fname):
 
 
 def package_model_zoo():
-    from ppdet.model_zoo import MODEL_ZOO_FILENAME
     cur_dir = osp.dirname(osp.realpath(__file__))
     cfg_dir = osp.join(cur_dir, "configs")
     cfgs = glob.glob(osp.join(cfg_dir, '*/*.yml'))
@@ -42,9 +42,9 @@ def package_model_zoo():
         # exclude dataset base config
         if osp.split(osp.split(cfg)[0])[1] not in ['datasets']:
             valid_cfgs.append(cfg)
-    model_names = [osp.splitext(osp.split(cfg)[1])[0] for cfg in valid_cfgs]
+    model_names = [osp.relpath(cfg, cfg_dir).replace(".yml", "") for cfg in valid_cfgs]
 
-    model_zoo_file = osp.join(cur_dir, 'ppdet', 'model_zoo', MODEL_ZOO_FILENAME)
+    model_zoo_file = osp.join(cur_dir, 'ppdet', 'model_zoo', 'MODEL_ZOO')
     with open(model_zoo_file, 'w') as wf:
         for model_name in model_names:
             wf.write("{}\n".format(model_name))
@@ -60,17 +60,17 @@ packages = [
     'ppdet.metrics',
     'ppdet.modeling',
     'ppdet.model_zoo',
-    'ppdet.py_op',
+    'ppdet.slim',
     'ppdet.utils',
 ]
 
 if __name__ == "__main__":
     setup(
-        name='ppdet',
+        name='paddledet',
         packages=find_packages(exclude=("configs", "tools", "deploy")),
         package_data={'ppdet.model_zoo': package_model_zoo()},
         author='PaddlePaddle',
-        version='2.0-rc',
+        version='2.0.1',
         install_requires=parse_requirements('./requirements.txt'),
         description='Object detection and instance segmentation toolkit based on PaddlePaddle',
         long_description=readme(),
@@ -89,4 +89,4 @@ if __name__ == "__main__":
             'Programming Language :: Python :: 3.7', 'Topic :: Utilities'
         ],
         license='Apache License 2.0',
-        ext_modules=[], )
+        ext_modules=[])
diff --git a/static/configs/anchor_free/pafnet_10x_coco.yml b/static/configs/anchor_free/pafnet_10x_coco.yml
new file mode 100644
index 0000000000000000000000000000000000000000..4c6728bcda05654e1e5c383e084e4aad1bbc6c6e
--- /dev/null
+++ b/static/configs/anchor_free/pafnet_10x_coco.yml
@@ -0,0 +1,170 @@
+architecture: TTFNet
+use_gpu: true
+max_iters: 150000
+log_smooth_window: 20
+save_dir: output
+snapshot_iter: 10000
+metric: COCO
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_pretrained.tar
+weights: output/pafnet_10x_coco/model_final
+num_classes: 80
+use_ema: true
+ema_decay: 0.9998
+
+TTFNet:
+  backbone: ResNet
+  ttf_head: TTFHead
+
+ResNet:
+  norm_type: sync_bn
+  freeze_at: 0
+  freeze_norm: false
+  norm_decay: 0.
+  depth: 50
+  feature_maps: [2, 3, 4, 5]
+  variant: d
+  dcn_v2_stages: [3, 4, 5]
+
+TTFHead:
+  head_conv: 128
+  wh_conv: 64
+  hm_head_conv_num: 2
+  wh_head_conv_num: 2
+  wh_offset_base: 16
+  wh_loss: GiouLoss
+  dcn_head: True
+
+GiouLoss:
+  loss_weight: 5.
+  do_average: false
+  use_class_weight: false
+
+LearningRate:
+  base_lr: 0.015
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones:
+    - 112500
+    - 137500
+  - !LinearWarmup
+    start_factor: 0.2
+    steps: 500
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0004
+    type: L2
+
+TrainReader:
+  inputs_def:
+    fields: ['image', 'ttf_heatmap', 'ttf_box_target', 'ttf_reg_weight']
+  dataset:
+    !COCODataSet
+    image_dir: train2017
+    anno_path: annotations/instances_train2017.json
+    dataset_dir: dataset/coco
+    with_background: false
+  sample_transforms:
+  - !DecodeImage
+    to_rgb: true
+    with_cutmix: True
+  - !CutmixImage
+    alpha: 1.5
+    beta: 1.5
+  - !ColorDistort
+    hue: [-18., 18., 0.5]
+    saturation: [0.5, 1.5, 0.5]
+    contrast: [0.5, 1.5, 0.5]
+    brightness: [-32., 32., 0.5]
+    random_apply: False
+    hsv_format: True
+    random_channel: True
+  - !RandomExpand
+    ratio: 4
+    prob: 0.5
+    fill_value: [123.675, 116.28, 103.53]
+  - !RandomCrop
+    aspect_ratio: NULL
+    cover_all_box: True
+  - !RandomFlipImage
+    prob: 0.5
+  batch_transforms:
+  - !RandomShape
+    sizes: [416, 448, 480, 512, 544, 576, 608, 640, 672]
+    random_inter: True
+    resize_box: True
+  - !NormalizeImage
+    is_channel_first: false
+    is_scale: false
+    mean: [123.675, 116.28, 103.53]
+    std: [58.395, 57.12, 57.375]
+  - !Permute
+    to_bgr: false
+    channel_first: true
+  - !Gt2TTFTarget
+    num_classes: 80
+    down_ratio: 4
+  - !PadBatch
+    pad_to_stride: 32
+  batch_size: 12
+  shuffle: true
+  worker_num: 8
+  bufsize: 2
+  use_process: false
+  cutmix_epoch: 100
+
+EvalReader:
+  inputs_def:
+    image_shape: [3, 512, 512]
+    fields: ['image', 'im_id', 'scale_factor']
+  dataset:
+    !COCODataSet
+      image_dir: val2017
+      anno_path: annotations/instances_val2017.json
+      dataset_dir: dataset/coco
+      with_background: false
+  sample_transforms:
+    - !DecodeImage
+      to_rgb: True
+    - !Resize
+      target_dim: 512
+    - !NormalizeImage
+      mean: [123.675, 116.28, 103.53]
+      std: [58.395, 57.12, 57.375]
+      is_scale: false
+      is_channel_first: false
+    - !Permute
+      to_bgr: false
+      channel_first: True
+  batch_size: 1
+  drop_empty: false
+  worker_num: 8
+  bufsize: 16
+
+TestReader:
+  inputs_def:
+    image_shape: [3, 512, 512]
+    fields: ['image', 'im_id', 'scale_factor']
+  dataset:
+    !ImageFolder
+      anno_path: annotations/instances_val2017.json
+      with_background: false
+  sample_transforms:
+    - !DecodeImage
+      to_rgb: True
+    - !Resize
+      interp: 1
+      target_dim: 512
+    - !NormalizeImage
+      mean: [123.675, 116.28, 103.53]
+      std: [58.395, 57.12, 57.375]
+      is_scale: false
+      is_channel_first: false
+    - !Permute
+      to_bgr: false
+      channel_first: True
+  batch_size: 1
diff --git a/static/configs/anchor_free/pafnet_lite_mobilenet_v3_20x_coco.yml b/static/configs/anchor_free/pafnet_lite_mobilenet_v3_20x_coco.yml
new file mode 100644
index 0000000000000000000000000000000000000000..1b14238839e52a48707395de301c986c6f263334
--- /dev/null
+++ b/static/configs/anchor_free/pafnet_lite_mobilenet_v3_20x_coco.yml
@@ -0,0 +1,171 @@
+architecture: TTFNet
+use_gpu: true
+max_iters: 300000
+log_smooth_window: 20
+save_dir: output
+snapshot_iter: 50000
+metric: COCO
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x1_0_ssld_pretrained.tar
+weights: output/pafnet_lite_mobilenet_v3_20x_coco/model_final
+num_classes: 80
+
+TTFNet:
+  backbone: MobileNetV3RCNN
+  ttf_head: TTFLiteHead
+
+MobileNetV3RCNN:
+  norm_type: sync_bn
+  norm_decay: 0.0
+  model_name: large
+  scale: 1.0
+  conv_decay: 0.00001
+  lr_mult_list: [0.25, 0.25, 0.5, 0.5, 0.75]
+  freeze_norm: false
+
+TTFLiteHead:
+  head_conv: 48
+
+GiouLoss:
+  loss_weight: 5.
+  do_average: false
+  use_class_weight: false
+
+LearningRate:
+  base_lr: 0.015
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones:
+    - 225000
+    - 275000
+  - !LinearWarmup
+    start_factor: 0.2
+    steps: 1000
+
+OptimizerBuilder:
+  clip_grad_by_norm: 35
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0004
+    type: L2
+
+TrainReader:
+  inputs_def:
+    fields: ['image', 'ttf_heatmap', 'ttf_box_target', 'ttf_reg_weight']
+  dataset:
+    !COCODataSet
+    image_dir: train2017
+    anno_path: annotations/instances_train2017.json
+    dataset_dir: dataset/coco
+    with_background: false
+  sample_transforms:
+  - !DecodeImage
+    to_rgb: true
+    with_cutmix: True
+  - !ColorDistort
+    hue: [-18., 18., 0.5]
+    saturation: [0.5, 1.5, 0.5]
+    contrast: [0.5, 1.5, 0.5]
+    brightness: [-32., 32., 0.5]
+    random_apply: False
+    hsv_format: False
+    random_channel: True
+  - !RandomExpand
+    ratio: 4
+    prob: 0.5
+    fill_value: [123.675, 116.28, 103.53]
+  - !RandomCrop
+    aspect_ratio: NULL
+    cover_all_box: True
+  - !CutmixImage
+    alpha: 1.5
+    beta: 1.5
+  - !RandomFlipImage
+    prob: 0.5
+  - !GridMaskOp
+    use_h: true
+    use_w: true
+    rotate: 1
+    offset: false
+    ratio: 0.5
+    mode: 1
+    prob: 0.7
+    upper_iter: 300000
+  batch_transforms:
+  - !RandomShape
+    sizes: [320, 352, 384, 416, 448, 480, 512]
+    random_inter: True
+    resize_box: True
+  - !NormalizeImage
+    is_channel_first: false
+    is_scale: false
+    mean: [123.675, 116.28, 103.53]
+    std: [58.395, 57.12, 57.375]
+  - !Permute
+    to_bgr: false
+    channel_first: true
+  - !Gt2TTFTarget
+    num_classes: 80
+    down_ratio: 4
+  - !PadBatch
+    pad_to_stride: 32
+  batch_size: 12
+  shuffle: true
+  worker_num: 8
+  bufsize: 2
+  use_process: false
+  cutmix_epoch: 200
+
+EvalReader:
+  inputs_def:
+    image_shape: [3, 320, 320]
+    fields: ['image', 'im_id', 'scale_factor']
+  dataset:
+    !COCODataSet
+      image_dir: val2017
+      anno_path: annotations/instances_val2017.json
+      dataset_dir: dataset/coco
+      with_background: false
+  sample_transforms:
+    - !DecodeImage
+      to_rgb: True
+    - !Resize
+      target_dim: 320
+    - !NormalizeImage
+      mean: [123.675, 116.28, 103.53]
+      std: [58.395, 57.12, 57.375]
+      is_scale: false
+      is_channel_first: false
+    - !Permute
+      to_bgr: false
+      channel_first: True
+  batch_size: 1
+  drop_empty: false
+  worker_num: 2
+  bufsize: 2
+
+TestReader:
+  inputs_def:
+    image_shape: [3, 320, 320]
+    fields: ['image', 'im_id', 'scale_factor']
+  dataset:
+    !ImageFolder
+      anno_path: annotations/instances_val2017.json
+      with_background: false
+  sample_transforms:
+    - !DecodeImage
+      to_rgb: True
+    - !Resize
+      interp: 1
+      target_dim: 320
+    - !NormalizeImage
+      mean: [123.675, 116.28, 103.53]
+      std: [58.395, 57.12, 57.375]
+      is_scale: false
+      is_channel_first: false
+    - !Permute
+      to_bgr: false
+      channel_first: True
+  batch_size: 1
diff --git a/static/configs/faster_reader.yml b/static/configs/faster_reader.yml
index 26835d4c6d2ef6187cd1b5ce23283cf049194ba8..3099bb656d641d99c71e12aa6f61fb625d23d6bd 100644
--- a/static/configs/faster_reader.yml
+++ b/static/configs/faster_reader.yml
@@ -26,7 +26,7 @@ TrainReader:
     channel_first: true
   batch_transforms:
   - !PadBatch
-    pad_to_stride: -1.
+    pad_to_stride: -1
     use_padded_im_info: false
   batch_size: 1
   shuffle: true
diff --git a/static/configs/mask_reader.yml b/static/configs/mask_reader.yml
index 31742f2057f4a86d6a4a6d5f6d5bc6ca7dfa4b54..165a09b82bb448dede9273ae3b7da297e318c131 100644
--- a/static/configs/mask_reader.yml
+++ b/static/configs/mask_reader.yml
@@ -27,7 +27,7 @@ TrainReader:
     channel_first: true
   batch_transforms:
   - !PadBatch
-    pad_to_stride: -1.
+    pad_to_stride: -1
     use_padded_im_info: false
   batch_size: 1
   shuffle: true
diff --git a/static/configs/mask_reader_cocome.yml b/static/configs/mask_reader_cocome.yml
index a7760a16279475d8f10b22bd4b3fcb437ff7c088..1b44491c5c3a7dfbc12138c682ac00b24946ef4a 100644
--- a/static/configs/mask_reader_cocome.yml
+++ b/static/configs/mask_reader_cocome.yml
@@ -27,7 +27,7 @@ TrainReader:
     channel_first: true
   batch_transforms:
   - !PadBatch
-    pad_to_stride: -1.
+    pad_to_stride: -1
     use_padded_im_info: false
   batch_size: 1
   shuffle: true
diff --git a/static/configs/ppyolo/README.md b/static/configs/ppyolo/README.md
index 3d8f2af9ec805fda072d7d9a80f4babde21ddd8a..73425c7f1387bc73949aa4217270856812d874b7 100644
--- a/static/configs/ppyolo/README.md
+++ b/static/configs/ppyolo/README.md
@@ -38,21 +38,24 @@ PP-YOLO improved performance and speed of YOLOv3 with following methods:
 
 |          Model           | GPU number | images/GPU |  backbone  | input shape | Box AP<sup>val</sup> | Box AP<sup>test</sup> | V100 FP32(FPS) | V100 TensorRT FP16(FPS) | download | config  |
 |:------------------------:|:----------:|:----------:|:----------:| :----------:| :------------------: | :-------------------: | :------------: | :---------------------: | :------: | :-----: |
-| YOLOv4(AlexyAB)          |     -      |      -     | CSPDarknet |     608     |           -          |         43.5          |       62       |          105.5          | [model](https://paddlemodels.bj.bcebos.com/object_detection/yolov4_cspdarknet.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/yolov4/yolov4_csdarknet.yml)                   |
-| YOLOv4(AlexyAB)          |     -      |      -     | CSPDarknet |     512     |           -          |         43.0          |       83       |          138.4          | [model](https://paddlemodels.bj.bcebos.com/object_detection/yolov4_cspdarknet.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/yolov4/yolov4_csdarknet.yml)                   |
-| YOLOv4(AlexyAB)          |     -      |      -     | CSPDarknet |     416     |           -          |         41.2          |       96       |          164.0          | [model](https://paddlemodels.bj.bcebos.com/object_detection/yolov4_cspdarknet.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/yolov4/yolov4_csdarknet.yml)                   |
-| YOLOv4(AlexyAB)          |     -      |      -     | CSPDarknet |     320     |           -          |         38.0          |      123       |          199.0          | [model](https://paddlemodels.bj.bcebos.com/object_detection/yolov4_cspdarknet.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/yolov4/yolov4_csdarknet.yml)                   |
-| PP-YOLO                  |     8      |     24     | ResNet50vd |     608     |         44.8         |         45.2          |      72.9      |          155.6          | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/ppyolo/ppyolo.yml)                   |
-| PP-YOLO                  |     8      |     24     | ResNet50vd |     512     |         43.9         |         44.4          |      89.9      |          188.4          | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/ppyolo/ppyolo.yml)                   |
-| PP-YOLO                  |     8      |     24     | ResNet50vd |     416     |         42.1         |         42.5          |     109.1      |          215.4          | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/ppyolo/ppyolo.yml)                   |
-| PP-YOLO                  |     8      |     24     | ResNet50vd |     320     |         38.9         |         39.3          |     132.2      |          242.2          | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/ppyolo/ppyolo.yml)                   |
-| PP-YOLO_2x               |     8      |     24     | ResNet50vd |     608     |         45.3         |         45.9          |      72.9      |          155.6          | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_2x.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/ppyolo/ppyolo_2x.yml)                   |
-| PP-YOLO_2x               |     8      |     24     | ResNet50vd |     512     |         44.4         |         45.0          |      89.9      |          188.4          | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_2x.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/ppyolo/ppyolo_2x.yml)                   |
-| PP-YOLO_2x               |     8      |     24     | ResNet50vd |     416     |         42.7         |         43.2          |     109.1      |          215.4          | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_2x.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/ppyolo/ppyolo_2x.yml)                   |
-| PP-YOLO_2x               |     8      |     24     | ResNet50vd |     320     |         39.5         |         40.1          |     132.2      |          242.2          | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_2x.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/ppyolo/ppyolo_2x.yml)                   |
-| PP-YOLO_ResNet18vd       |     4      |     32     | ResNet18vd |     512     |         29.3         |         29.5          |     357.1      |          657.9          | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_r18vd.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/ppyolo/ppyolo_r18vd.yml)                  |
-| PP-YOLO_ResNet18vd       |     4      |     32     | ResNet18vd |     416     |         28.6         |         28.9          |     409.8      |          719.4          | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_r18vd.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/ppyolo/ppyolo_r18vd.yml)                  |
-| PP-YOLO_ResNet18vd       |     4      |     32     | ResNet18vd |     320     |         26.2         |         26.4          |     480.7      |          763.4          | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_r18vd.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/ppyolo/ppyolo_r18vd.yml)                   |
+| YOLOv4(AlexyAB)          |     -      |      -     | CSPDarknet |     608     |           -          |         43.5          |       62       |          105.5          | [model](https://paddlemodels.bj.bcebos.com/object_detection/yolov4_cspdarknet.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/yolov4/yolov4_csdarknet.yml)                   |
+| YOLOv4(AlexyAB)          |     -      |      -     | CSPDarknet |     512     |           -          |         43.0          |       83       |          138.4          | [model](https://paddlemodels.bj.bcebos.com/object_detection/yolov4_cspdarknet.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/yolov4/yolov4_csdarknet.yml)                   |
+| YOLOv4(AlexyAB)          |     -      |      -     | CSPDarknet |     416     |           -          |         41.2          |       96       |          164.0          | [model](https://paddlemodels.bj.bcebos.com/object_detection/yolov4_cspdarknet.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/yolov4/yolov4_csdarknet.yml)                   |
+| YOLOv4(AlexyAB)          |     -      |      -     | CSPDarknet |     320     |           -          |         38.0          |      123       |          199.0          | [model](https://paddlemodels.bj.bcebos.com/object_detection/yolov4_cspdarknet.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/yolov4/yolov4_csdarknet.yml)                   |
+| PP-YOLO                  |     8      |     24     | ResNet50vd |     608     |         44.8         |         45.2          |      72.9      |          155.6          | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo.yml)                   |
+| PP-YOLO                  |     8      |     24     | ResNet50vd |     512     |         43.9         |         44.4          |      89.9      |          188.4          | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo.yml)                   |
+| PP-YOLO                  |     8      |     24     | ResNet50vd |     416     |         42.1         |         42.5          |     109.1      |          215.4          | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo.yml)                   |
+| PP-YOLO                  |     8      |     24     | ResNet50vd |     320     |         38.9         |         39.3          |     132.2      |          242.2          | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo.yml)                   |
+| PP-YOLO_2x               |     8      |     24     | ResNet50vd |     608     |         45.3         |         45.9          |      72.9      |          155.6          | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_2x.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_2x.yml)                   |
+| PP-YOLO_2x               |     8      |     24     | ResNet50vd |     512     |         44.4         |         45.0          |      89.9      |          188.4          | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_2x.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_2x.yml)                   |
+| PP-YOLO_2x               |     8      |     24     | ResNet50vd |     416     |         42.7         |         43.2          |     109.1      |          215.4          | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_2x.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_2x.yml)                   |
+| PP-YOLO_2x               |     8      |     24     | ResNet50vd |     320     |         39.5         |         40.1          |     132.2      |          242.2          | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_2x.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_2x.yml)                   |
+| PP-YOLO       |     4      |     32     | ResNet18vd |     512     |         29.3         |         29.5          |     357.1      |          657.9          | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_r18vd.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_r18vd.yml)                  |
+| PP-YOLO       |     4      |     32     | ResNet18vd |     416     |         28.6         |         28.9          |     409.8      |          719.4          | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_r18vd.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_r18vd.yml)                  |
+| PP-YOLO       |     4      |     32     | ResNet18vd |     320     |         26.2         |         26.4          |     480.7      |          763.4          | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_r18vd.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_r18vd.yml)                   |
+| PP-YOLOv2       |     8      |     12     | ResNet50vd |     640     |         49.1         |         49.5          |     68.9     |          106.5          | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolov2_r50vd_dcn.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolov2_r50vd_dcn.yml)                   |
+| PP-YOLOv2       |     8      |     12     | ResNet101vd |     640     |         49.7         |         50.3          |     49.5     |          87.0          | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolov2_r101vd_dcn.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolov2_r101vd_dcn.yml)                   |
+
 
 **Notes:**
 
@@ -69,8 +72,8 @@ PP-YOLO improved performance and speed of YOLOv3 with following methods:
 
 |            Model             | GPU number | images/GPU | Model Size | input shape | Box AP<sup>val</sup> |  Box AP50<sup>val</sup> | Kirin 990 1xCore(FPS) | download | inference model download | config  |
 |:----------------------------:|:----------:|:----------:| :--------: | :----------:| :------------------: |  :--------------------: | :-------------------: | :------: | :----------------------: | :-----: |
-| PP-YOLO_MobileNetV3_large    |     4      |      32    |    18MB    |     320     |         23.2         |           42.6          |          15.6         | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_mobilenet_v3_large.pdparams) | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_mobilenet_v3_large.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/ppyolo/ppyolo_mobilenet_v3_large.yml)                   |
-| PP-YOLO_MobileNetV3_small    |     4      |      32    |    11MB    |     320     |         17.2         |           33.8          |          28.6         | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_mobilenet_v3_small.pdparams) | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_mobilenet_v3_small.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/ppyolo/ppyolo_mobilenet_v3_small.yml)                   |
+| PP-YOLO_MobileNetV3_large    |     4      |      32    |    18MB    |     320     |         23.2         |           42.6          |          15.6         | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_mobilenet_v3_large.pdparams) | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_mobilenet_v3_large.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_mobilenet_v3_large.yml)                   |
+| PP-YOLO_MobileNetV3_small    |     4      |      32    |    11MB    |     320     |         17.2         |           33.8          |          28.6         | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_mobilenet_v3_small.pdparams) | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_mobilenet_v3_small.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_mobilenet_v3_small.yml)                   |
 
 **Notes:**
 
@@ -82,11 +85,25 @@ PP-YOLO improved performance and speed of YOLOv3 with following methods:
 
 |            Model             | GPU number | images/GPU | Prune Ratio |        Teacher Model      | Model Size | input shape | Box AP<sup>val</sup> | Kirin 990 1xCore(FPS) | download | inference model download | config  |
 |:----------------------------:|:----------:|:----------:| :---------: | :-----------------------: | :--------: | :----------:| :------------------: | :-------------------: | :------: | :----------------------: | :-----: |
-| PP-YOLO_MobileNetV3_small    |     4      |      32    |     75%     | PP-YOLO_MobileNetV3_large |   4.2MB    |     320     |         16.2         |      39.8      | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_mobilenet_v3_small_prune75_distillby_mobilenet_v3_large.pdparams) | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_mobilenet_v3_small_prune75_distillby_mobilenet_v3_large.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/ppyolo/ppyolo_mobilenet_v3_small.yml)                   |
+| PP-YOLO_MobileNetV3_small    |     4      |      32    |     75%     | PP-YOLO_MobileNetV3_large |   4.2MB    |     320     |         16.2         |      39.8      | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_mobilenet_v3_small_prune75_distillby_mobilenet_v3_large.pdparams) | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_mobilenet_v3_small_prune75_distillby_mobilenet_v3_large.tar) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_mobilenet_v3_small.yml)                   |
 
-- Slim PP-YOLO is trained by slim traing method from [Distill pruned model](../../slim/extentions/distill_pruned_model/README.md)，distill training pruned PP-YOLO_MobileNetV3_small model with PP-YOLO_MobileNetV3_large model as the teacher model
+- Slim PP-YOLO is trained by slim traing method from [Distill pruned model](../../slim/extensions/distill_pruned_model/README.md)，distill training pruned PP-YOLO_MobileNetV3_small model with PP-YOLO_MobileNetV3_large model as the teacher model
 - Pruning detectiom head of PP-YOLO model with ratio as 75%, while the arguments are `--pruned_params="yolo_block.0.2.conv.weights,yolo_block.0.tip.conv.weights,yolo_block.1.2.conv.weights,yolo_block.1.tip.conv.weights" --pruned_ratios="0.75,0.75,0.75,0.75"`
-- For Slim PP-YOLO training, evaluation, inference and model exporting, please see [Distill pruned model](../../slim/extentions/distill_pruned_model/README.md)
+- For Slim PP-YOLO training, evaluation, inference and model exporting, please see [Distill pruned model](../../slim/extensions/distill_pruned_model/README.md)
+
+### PP-YOLO tiny
+
+|            Model             | GPU number | images/GPU | Model Size | Post Quant Model Size | input shape | Box AP<sup>val</sup> | Kirin 990 4xCore(FPS) | download | config | config | post quant model |
+|:----------------------------:|:-------:|:-------------:|:----------:| :-------------------: | :----------:| :------------------: | :-------------------: | :------: | :----: | :----: | :--------------: |
+| PP-YOLO tiny                 |    8    |      32       |   4.2MB    |       **1.3M**        |     320     |         20.6         |          92.3         | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_tiny.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/static/configs/ppyolo/ppyolo_tiny.yml) | [inference model](https://paddledet.bj.bcebos.com/models/ppyolo_tiny_quant.tar) |
+| PP-YOLO tiny                 |    8    |      32       |   4.2MB    |       **1.3M**        |     416     |         22.7         |          65.4         | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_tiny.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/static/configs/ppyolo/ppyolo_tiny.yml) | [inference model](https://paddledet.bj.bcebos.com/models/ppyolo_tiny_quant.tar) |
+
+**Notes:**
+
+- PP-YOLO-tiny is trained on COCO train2017 datast and evaluated on val2017 dataset，Box AP<sup>val</sup> is evaluation results of `mAP(IoU=0.5:0.95)`, Box AP<sup>val</sup> is evaluation results of `mAP(IoU=0.5)`.
+- PP-YOLO-tiny used 8 GPUs for training and mini-batch size as 32 on each GPU, if GPU number and mini-batch size is changed, learning rate and iteration times should be adjusted according [FAQ](https://github.com/PaddlePaddle/PaddleDetection/blob/master/docs/FAQ.md).
+- PP-YOLO-tiny inference speed is tested on Kirin 990 with 4 threads by arm8
+- we alse provide PP-YOLO-tiny post quant inference model, which can compress model to **1.3MB** with nearly no inference on inference speed and performance
 
 ### PP-YOLO on Pascal VOC
 
@@ -94,9 +111,9 @@ PP-YOLO trained on Pascal VOC dataset as follows:
 
 |       Model        | GPU number | images/GPU |  backbone  | input shape | Box AP50<sup>val</sup> | download | config  |
 |:------------------:|:----------:|:----------:|:----------:| :----------:| :--------------------: | :------: | :-----: |
-| PP-YOLO            |     8      |      12    | ResNet50vd |     608     |          84.9          | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/ppyolo/ppyolo_voc.yml)                   |
-| PP-YOLO            |     8      |      12    | ResNet50vd |     416     |          84.3          | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/ppyolo/ppyolo_voc.yml)                   |
-| PP-YOLO            |     8      |      12    | ResNet50vd |     320     |          82.2          | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/ppyolo/ppyolo_voc.yml)                   |
+| PP-YOLO            |     8      |      12    | ResNet50vd |     608     |          84.9          | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_voc.yml)                   |
+| PP-YOLO            |     8      |      12    | ResNet50vd |     416     |          84.3          | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_voc.yml)                   |
+| PP-YOLO            |     8      |      12    | ResNet50vd |     320     |          82.2          | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_voc.yml)                   |
 
 ## Getting Start
 
diff --git a/static/configs/ppyolo/README_cn.md b/static/configs/ppyolo/README_cn.md
index 68ca89d41a8f11d64ff56583a34bd145c5eee63c..5c3e5f1298f9d5373430160500ae3fdc512e68e5 100644
--- a/static/configs/ppyolo/README_cn.md
+++ b/static/configs/ppyolo/README_cn.md
@@ -38,21 +38,23 @@ PP-YOLO从如下方面优化和提升YOLOv3模型的精度和速度：
 
 |          模型            | GPU个数 | 每GPU图片个数 |  骨干网络  | 输入尺寸 | Box AP<sup>val</sup> | Box AP<sup>test</sup> | V100 FP32(FPS) | V100 TensorRT FP16(FPS) | 模型下载 | 配置文件 |
 |:------------------------:|:-------:|:-------------:|:----------:| :-------:| :------------------: | :-------------------: | :------------: | :---------------------: | :------: | :------: |
-| YOLOv4(AlexyAB)          |    -    |       -       | CSPDarknet |   608    |           -          |         43.5          |       62       |          105.5           | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/yolov4_cspdarknet.pdparams) |  [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/yolov4/yolov4_csdarknet.yml)                   |
-| YOLOv4(AlexyAB)          |    -    |       -       | CSPDarknet |   512    |           -          |         43.0          |       83       |          138.4           | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/yolov4_cspdarknet.pdparams) |  [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/yolov4/yolov4_csdarknet.yml)                   |
-| YOLOv4(AlexyAB)          |    -    |       -       | CSPDarknet |   416    |           -          |         41.2          |       96       |          164.0           | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/yolov4_cspdarknet.pdparams) |  [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/yolov4/yolov4_csdarknet.yml)                   |
-| YOLOv4(AlexyAB)          |    -    |       -       | CSPDarknet |   320    |           -          |         38.0          |      123       |          199.0           | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/yolov4_cspdarknet.pdparams) |  [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/yolov4/yolov4_csdarknet.yml)                   |
-| PP-YOLO                   |    8    |      24      | ResNet50vd |   608    |         44.8         |         45.2          |      72.9      |          155.6          | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams) |  [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/ppyolo/ppyolo.yml)                   |
-| PP-YOLO                   |    8    |      24      | ResNet50vd |   512    |         43.9         |         44.4          |      89.9      |          188.4          | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams) |  [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/ppyolo/ppyolo.yml)                   |
-| PP-YOLO                   |    8    |      24      | ResNet50vd |   416    |         42.1         |         42.5          |     109.1      |          215.4          | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams) |  [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/ppyolo/ppyolo.yml)                   |
-| PP-YOLO                   |    8    |      24      | ResNet50vd |   320    |         38.9         |         39.3          |     132.2      |          242.2          | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams) |  [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/ppyolo/ppyolo.yml)                   |
-| PP-YOLO_2x                |    8    |      24      | ResNet50vd |   608    |         45.3         |         45.9          |      72.9      |          155.6          | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams) |  [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/ppyolo/ppyolo.yml)                   |
-| PP-YOLO_2x                |    8    |      24      | ResNet50vd |   512    |         44.4         |         45.0          |      89.9      |          188.4          | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams) |  [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/ppyolo/ppyolo.yml)                   |
-| PP-YOLO_2x                |    8    |      24      | ResNet50vd |   416    |         42.7         |         43.2          |     109.1      |          215.4          | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams) |  [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/ppyolo/ppyolo.yml)                   |
-| PP-YOLO_2x                |    8    |      24      | ResNet50vd |   320    |         39.5         |         40.1          |     132.2      |          242.2          | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams) |  [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/ppyolo/ppyolo.yml)                   |
-| PP-YOLO_ResNet18vd        |    4    |      32      | ResNet18vd |   512    |         29.3         |         29.5          |     357.1      |          657.9          | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_r18vd.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/ppyolo/ppyolo_r18vd.yml)                  |
-| PP-YOLO_ResNet18vd        |    4    |      32      | ResNet18vd |   416    |         28.6         |         28.9          |     409.8      |          719.4          | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_r18vd.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/ppyolo/ppyolo_r18vd.yml)                  |
-| PP-YOLO_ResNet18vd        |    4    |      32      | ResNet18vd |   320    |         26.2         |         26.4          |     480.7      |          763.4          | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_r18vd.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/ppyolo/ppyolo_r18vd.yml)                   |
+| YOLOv4(AlexyAB)          |    -    |       -       | CSPDarknet |   608    |           -          |         43.5          |       62       |          105.5           | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/yolov4_cspdarknet.pdparams) |  [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/yolov4/yolov4_csdarknet.yml)                   |
+| YOLOv4(AlexyAB)          |    -    |       -       | CSPDarknet |   512    |           -          |         43.0          |       83       |          138.4           | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/yolov4_cspdarknet.pdparams) |  [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/yolov4/yolov4_csdarknet.yml)                   |
+| YOLOv4(AlexyAB)          |    -    |       -       | CSPDarknet |   416    |           -          |         41.2          |       96       |          164.0           | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/yolov4_cspdarknet.pdparams) |  [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/yolov4/yolov4_csdarknet.yml)                   |
+| YOLOv4(AlexyAB)          |    -    |       -       | CSPDarknet |   320    |           -          |         38.0          |      123       |          199.0           | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/yolov4_cspdarknet.pdparams) |  [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/yolov4/yolov4_csdarknet.yml)                   |
+| PP-YOLO                   |    8    |      24      | ResNet50vd |   608    |         44.8         |         45.2          |      72.9      |          155.6          | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams) |  [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo.yml)                   |
+| PP-YOLO                   |    8    |      24      | ResNet50vd |   512    |         43.9         |         44.4          |      89.9      |          188.4          | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams) |  [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo.yml)                   |
+| PP-YOLO                   |    8    |      24      | ResNet50vd |   416    |         42.1         |         42.5          |     109.1      |          215.4          | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams) |  [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo.yml)                   |
+| PP-YOLO                   |    8    |      24      | ResNet50vd |   320    |         38.9         |         39.3          |     132.2      |          242.2          | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams) |  [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo.yml)                   |
+| PP-YOLO_2x                |    8    |      24      | ResNet50vd |   608    |         45.3         |         45.9          |      72.9      |          155.6          | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams) |  [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo.yml)                   |
+| PP-YOLO_2x                |    8    |      24      | ResNet50vd |   512    |         44.4         |         45.0          |      89.9      |          188.4          | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams) |  [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo.yml)                   |
+| PP-YOLO_2x                |    8    |      24      | ResNet50vd |   416    |         42.7         |         43.2          |     109.1      |          215.4          | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams) |  [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo.yml)                   |
+| PP-YOLO_2x                |    8    |      24      | ResNet50vd |   320    |         39.5         |         40.1          |     132.2      |          242.2          | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams) |  [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo.yml)                   |
+| PP-YOLO        |    4    |      32      | ResNet18vd |   512    |         29.3         |         29.5          |     357.1      |          657.9          | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_r18vd.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_r18vd.yml)                  |
+| PP-YOLO        |    4    |      32      | ResNet18vd |   416    |         28.6         |         28.9          |     409.8      |          719.4          | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_r18vd.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_r18vd.yml)                  |
+| PP-YOLO        |    4    |      32      | ResNet18vd |   320    |         26.2         |         26.4          |     480.7      |          763.4          | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_r18vd.pdparams) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_r18vd.yml)                   |
+| PP-YOLOv2       |     8      |     12     | ResNet50vd |     640     |         49.1         |         49.5          |     68.9     |          106.5          | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolov2_r50vd_dcn.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolov2_r50vd_dcn.yml)                   |
+| PP-YOLOv2       |     8      |     12     | ResNet101vd |     640     |         49.7         |         50.3          |     49.5     |          87.0          | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolov2_r101vd_dcn.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolov2_r101vd_dcn.yml)                   |
 
 **注意:**
 
@@ -70,8 +72,8 @@ PP-YOLO从如下方面优化和提升YOLOv3模型的精度和速度：
 
 |          模型                | GPU个数 | 每GPU图片个数 |  模型体积  | 输入尺寸 | Box AP<sup>val</sup> |  Box AP50<sup>val</sup> | Kirin 990 1xCore (FPS) | 模型下载 | 预测模型下载 | 配置文件 |
 |:----------------------------:|:-------:|:-------------:|:----------:| :-------:| :------------------: |  :--------------------: | :--------------------: | :------: | :----------: | :------: |
-| PP-YOLO_MobileNetV3_large    |    4    |      32       |    18MB    |   320    |         23.2         |           42.6          |           14.1         | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_mobilenet_v3_large.pdparams) | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_mobilenet_v3_large.tar) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/ppyolo/ppyolo_mobilenet_v3_large.yml)                   |
-| PP-YOLO_MobileNetV3_small    |    4    |      32       |    11MB    |   320    |         17.2         |           33.8          |           21.5         | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_mobilenet_v3_small.pdparams) | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_mobilenet_v3_large.tar) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/ppyolo/ppyolo_mobilenet_v3_small.yml)                   |
+| PP-YOLO_MobileNetV3_large    |    4    |      32       |    18MB    |   320    |         23.2         |           42.6          |           14.1         | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_mobilenet_v3_large.pdparams) | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_mobilenet_v3_large.tar) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_mobilenet_v3_large.yml)                   |
+| PP-YOLO_MobileNetV3_small    |    4    |      32       |    11MB    |   320    |         17.2         |           33.8          |           21.5         | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_mobilenet_v3_small.pdparams) | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_mobilenet_v3_large.tar) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_mobilenet_v3_small.yml)                   |
 
 - PP-YOLO_MobileNetV3 模型使用COCO数据集中train2017作为训练集，使用val2017作为测试集，Box AP<sup>val</sup>为`mAP(IoU=0.5:0.95)`评估结果, Box AP50<sup>val</sup>为`mAP(IoU=0.5)`评估结果。
 - PP-YOLO_MobileNetV3 模型训练过程中使用4GPU，每GPU batch size为32进行训练，如训练GPU数和batch size不使用上述配置，须参考[FAQ](../../docs/FAQ.md)调整学习率和迭代次数。
@@ -81,11 +83,23 @@ PP-YOLO从如下方面优化和提升YOLOv3模型的精度和速度：
 
 |            模型              |  GPU 个数  | 每GPU图片个数 |   裁剪率    |         Teacher模型       |  模型体积  |   输入尺寸  | Box AP<sup>val</sup> | Kirin 990 1xCore (FPS) | 模型下载 | 预测模型下载 | 配置文件 |
 |:----------------------------:|:----------:|:-------------:| :---------: | :-----------------------: | :--------: | :----------:| :------------------: | :--------------------: | :------: | :----------: | :------: |
-| PP-YOLO_MobileNetV3_small    |     4      |       32      |     75%     | PP-YOLO_MobileNetV3_large |   4.2MB    |     320     |         16.2         |          39.8          | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_mobilenet_v3_small_prune75_distillby_mobilenet_v3_large.pdparams) | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_mobilenet_v3_small_prune75_distillby_mobilenet_v3_large.tar) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/ppyolo/ppyolo_mobilenet_v3_small.yml)                   |
+| PP-YOLO_MobileNetV3_small    |     4      |       32      |     75%     | PP-YOLO_MobileNetV3_large |   4.2MB    |     320     |         16.2         |          39.8          | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_mobilenet_v3_small_prune75_distillby_mobilenet_v3_large.pdparams) | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_mobilenet_v3_small_prune75_distillby_mobilenet_v3_large.tar) | [配置文件](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_mobilenet_v3_small.yml)                   |
 
-- PP-YOLO 轻量级裁剪模型采用[蒸馏通道剪裁模型](../../slim/extentions/distill_pruned_model/README.md) 的方式训练得到，基于 PP-YOLO_MobileNetV3_small 模型对Head部分做卷积通道剪裁后使用 PP-YOLO_MobileNetV3_large 模型进行蒸馏训练
+- PP-YOLO 轻量级裁剪模型采用[蒸馏通道剪裁模型](../../slim/extensions/distill_pruned_model/README.md) 的方式训练得到，基于 PP-YOLO_MobileNetV3_small 模型对Head部分做卷积通道剪裁后使用 PP-YOLO_MobileNetV3_large 模型进行蒸馏训练
 - 卷积通道检测对Head部分剪裁掉75%的通道数，及剪裁参数为`--pruned_params="yolo_block.0.2.conv.weights,yolo_block.0.tip.conv.weights,yolo_block.1.2.conv.weights,yolo_block.1.tip.conv.weights" --pruned_ratios="0.75,0.75,0.75,0.75"`
-- PP-YOLO 轻量级裁剪模型的训练、评估、预测及模型导出方法见[蒸馏通道剪裁模型](../../slim/extentions/distill_pruned_model/README.md)
+- PP-YOLO 轻量级裁剪模型的训练、评估、预测及模型导出方法见[蒸馏通道剪裁模型](../../slim/extensions/distill_pruned_model/README.md)
+
+### PP-YOLO tiny模型
+
+|            模型              |  GPU 个数  | 每GPU图片个数 |  模型体积  | 后量化模型体积 |   输入尺寸  | Box AP<sup>val</sup> | Kirin 990 1xCore (FPS) | 模型下载 | 配置文件 | 后量化模型 |
+|:----------------------------:|:----------:|:-------------:| :--------: | :------------: | :----------:| :------------------: | :--------------------: | :------: | :------: | :--------: |
+| PP-YOLO tiny                 |     8      |      32       |   4.2MB    |   **1.3M**     |     320     |         20.6         |          92.3         | [model](https://paddledet.bj.bcebos.com/models/ppyolo_tiny_650e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/ppyolo/ppyolo_tiny_650e_coco.yml) | [预测模型](https://paddledet.bj.bcebos.com/models/ppyolo_tiny_quant.tar) |
+| PP-YOLO tiny                 |     8      |      32       |   4.2MB    |   **1.3M**     |     416     |         22.7         |          65.4         | [model](https://paddledet.bj.bcebos.com/models/ppyolo_tiny_650e_coco.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph/configs/ppyolo/ppyolo_tiny_650e_coco.yml) | [预测模型](https://paddledet.bj.bcebos.com/models/ppyolo_tiny_quant.tar) |
+
+- PP-YOLO-tiny 模型使用COCO数据集中train2017作为训练集，使用val2017作为测试集，Box AP<sup>val</sup>为`mAP(IoU=0.5:0.95)`评估结果, Box AP50<sup>val</sup>为`mAP(IoU=0.5)`评估结果。
+- PP-YOLO-tiny 模型训练过程中使用8GPU，每GPU batch size为32进行训练，如训练GPU数和batch size不使用上述配置，须参考[FAQ](../../docs/FAQ.md)调整学习率和迭代次数。
+- PP-YOLO-tiny 模型推理速度测试环境配置为麒麟990芯片4线程，arm8架构。
+- 我们也提供的PP-YOLO-tiny的后量化压缩模型，将模型体积压缩到**1.3M**，对精度和预测速度基本无影响
 
 ### Pascal VOC数据集上的PP-YOLO
 
@@ -93,9 +107,9 @@ PP-YOLO在Pascal VOC数据集上训练模型如下:
 
 |       模型         | GPU个数 | 每GPU图片个数 |  骨干网络  |   输入尺寸  | Box AP50<sup>val</sup> | 模型下载 | 配置文件 |
 |:------------------:|:-------:|:-------------:|:----------:| :----------:| :--------------------: | :------: | :-----: |
-| PP-YOLO            |    8    |       12      | ResNet50vd |     608     |          84.9          | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/ppyolo/ppyolo_voc.yml)                   |
-| PP-YOLO            |    8    |       12      | ResNet50vd |     416     |          84.3          | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/ppyolo/ppyolo_voc.yml)                   |
-| PP-YOLO            |    8    |       12      | ResNet50vd |     320     |          82.2          | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/master/configs/ppyolo/ppyolo_voc.yml)                   |
+| PP-YOLO            |    8    |       12      | ResNet50vd |     608     |          84.9          | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_voc.yml)                   |
+| PP-YOLO            |    8    |       12      | ResNet50vd |     416     |          84.3          | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_voc.yml)                   |
+| PP-YOLO            |    8    |       12      | ResNet50vd |     320     |          82.2          | [model](https://paddlemodels.bj.bcebos.com/object_detection/ppyolo_voc.pdparams) | [config](https://github.com/PaddlePaddle/PaddleDetection/tree/develop/static/configs/ppyolo/ppyolo_voc.yml)                   |
 
 ## 使用说明
 
diff --git a/static/configs/ppyolo/ppyolo_tiny.yml b/static/configs/ppyolo/ppyolo_tiny.yml
new file mode 100755
index 0000000000000000000000000000000000000000..aa80ebf41ee035863ad76417de7a01915b383598
--- /dev/null
+++ b/static/configs/ppyolo/ppyolo_tiny.yml
@@ -0,0 +1,193 @@
+architecture: YOLOv3
+use_gpu: true
+max_iters: 300000
+log_smooth_window: 100
+log_iter: 100
+save_dir: output
+snapshot_iter: 10000
+metric: COCO
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/MobileNetV3_large_x0_5_pretrained.tar
+weights: output/ppyolo_tiny/model_final
+num_classes: 80
+use_fine_grained_loss: true
+use_ema: true
+ema_decay: 0.9998
+
+YOLOv3:
+  backbone: MobileNetV3
+  yolo_head: PPYOLOTinyHead
+  use_fine_grained_loss: true
+
+MobileNetV3:
+  norm_type: sync_bn
+  norm_decay: 0.
+  model_name: large
+  scale: .5
+  extra_block_filters: []
+  feature_maps: [1, 2, 3, 4, 6]
+
+PPYOLOTinyHead:
+  anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
+  anchors: [[10, 15], [24, 36], [72, 42],
+            [35, 87], [102, 96], [60, 170],
+            [220, 125], [128, 222], [264, 266]]
+  detection_block_channels: [160, 128, 96]
+  norm_decay: 0.
+  scale_x_y: 1.05
+  yolo_loss: YOLOv3Loss
+  spp: true
+  drop_block: true
+  nms:
+    background_label: -1
+    keep_top_k: 100
+    nms_threshold: 0.45
+    nms_top_k: 1000
+    normalized: false
+    score_threshold: 0.01
+
+YOLOv3Loss:
+  ignore_thresh: 0.5
+  scale_x_y: 1.05
+  label_smooth: false
+  use_fine_grained_loss: true
+  iou_loss: IouLoss
+
+IouLoss:
+  loss_weight: 2.5
+  max_height: 512
+  max_width: 512
+
+LearningRate:
+  base_lr: 0.005
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones:
+    - 200000
+    - 250000
+    - 280000
+  - !LinearWarmup
+    start_factor: 0.
+    steps: 4000
+
+OptimizerBuilder:
+  optimizer:
+    momentum: 0.949
+    type: Momentum
+  regularizer:
+    factor: 0.0005
+    type: L2
+
+TrainReader:
+  inputs_def:
+    fields: ['image', 'gt_bbox', 'gt_class', 'gt_score']
+    num_max_boxes: 100
+  dataset:
+    !COCODataSet
+      image_dir: train2017
+      anno_path: annotations/instances_train2017.json
+      dataset_dir: train_data/dataset/coco
+      with_background: false
+  sample_transforms:
+    - !DecodeImage
+      to_rgb: True
+      with_mixup: True
+    - !MixupImage
+      alpha: 1.5
+      beta: 1.5
+    - !ColorDistort {}
+    - !RandomExpand
+      fill_value: [123.675, 116.28, 103.53]
+      ratio: 2
+    - !RandomCrop {}
+    - !RandomFlipImage
+      is_normalized: false
+    - !NormalizeBox {}
+    - !PadBox
+      num_max_boxes: 100
+    - !BboxXYXY2XYWH {}
+  batch_transforms:
+  - !RandomShape
+    sizes: [192, 224, 256, 288, 320, 352, 384, 416, 448, 480, 512]
+    random_inter: True
+  - !NormalizeImage
+    mean: [0.485, 0.456, 0.406]
+    std: [0.229, 0.224, 0.225]
+    is_scale: True
+    is_channel_first: false
+  - !Permute
+    to_bgr: false
+    channel_first: True
+  # Gt2YoloTarget is only used when use_fine_grained_loss set as true,
+  # this operator will be deleted automatically if use_fine_grained_loss
+  # is set as false
+  - !Gt2YoloTarget
+    anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
+    anchors: [[10, 15], [24, 36], [72, 42],
+              [35, 87], [102, 96], [60, 170],
+              [220, 125], [128, 222], [264, 266]]
+    downsample_ratios: [32, 16, 8]
+    iou_thresh: 0.25
+    num_classes: 80
+  batch_size: 32
+  shuffle: true
+  mixup_epoch: 200
+  drop_last: true
+  worker_num: 16
+  bufsize: 4
+  use_process: true
+
+EvalReader:
+  inputs_def:
+    fields: ['image', 'im_size', 'im_id']
+    num_max_boxes: 100
+  dataset:
+    !COCODataSet
+      image_dir: val2017
+      anno_path: annotations/instances_val2017.json
+      dataset_dir: train_data/dataset/coco
+      with_background: false
+  sample_transforms:
+    - !DecodeImage
+      to_rgb: True
+    - !ResizeImage
+      target_size: 320
+      interp: 2
+    - !NormalizeImage
+      mean: [0.485, 0.456, 0.406]
+      std: [0.229, 0.224, 0.225]
+      is_scale: True
+      is_channel_first: false
+    - !PadBox
+      num_max_boxes: 100
+    - !Permute
+      to_bgr: false
+      channel_first: True
+  batch_size: 1
+  drop_empty: false
+  worker_num: 2
+  bufsize: 4
+
+TestReader:
+  inputs_def:
+    image_shape: [3, 320, 320]
+    fields: ['image', 'im_size', 'im_id']
+  dataset:
+    !ImageFolder
+      anno_path: annotations/instances_val2017.json
+      with_background: false
+  sample_transforms:
+    - !DecodeImage
+      to_rgb: True
+    - !ResizeImage
+      target_size: 320
+      interp: 2
+    - !NormalizeImage
+      mean: [0.485, 0.456, 0.406]
+      std: [0.229, 0.224, 0.225]
+      is_scale: True
+      is_channel_first: false
+    - !Permute
+      to_bgr: false
+      channel_first: True
+  batch_size: 1
diff --git a/static/configs/ppyolo/ppyolov2_r101vd_dcn.yml b/static/configs/ppyolo/ppyolov2_r101vd_dcn.yml
new file mode 100644
index 0000000000000000000000000000000000000000..9ba339912fe791ef3d9f48442f7e91bf1e2bec12
--- /dev/null
+++ b/static/configs/ppyolo/ppyolov2_r101vd_dcn.yml
@@ -0,0 +1,89 @@
+architecture: YOLOv3
+use_gpu: true
+max_iters: 450000
+log_iter: 100
+save_dir: output
+snapshot_iter: 10000
+metric: COCO
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_vd_ssld_pretrained.tar
+weights: output/ppyolov2_r101vd_dcn/model_final
+num_classes: 80
+use_fine_grained_loss: true
+use_ema: true
+ema_decay: 0.9998
+
+YOLOv3:
+  backbone: ResNet
+  yolo_head: YOLOv3PANHead
+  use_fine_grained_loss: true
+
+ResNet:
+  norm_type: sync_bn
+  freeze_at: 0
+  freeze_norm: false
+  norm_decay: 0.
+  depth: 101
+  feature_maps: [3, 4, 5]
+  variant: d
+  dcn_v2_stages: [5]
+
+YOLOv3PANHead:
+  anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
+  anchors: [[10, 13], [16, 30], [33, 23],
+            [30, 61], [62, 45], [59, 119],
+            [116, 90], [156, 198], [373, 326]]
+  norm_decay: 0.
+  iou_aware: true
+  iou_aware_factor: 0.5
+  scale_x_y: 1.05
+  spp: true
+  yolo_loss: YOLOv3Loss
+  nms: MatrixNMS
+  drop_block: true
+
+YOLOv3Loss:
+  ignore_thresh: 0.7
+  scale_x_y: 1.05
+  label_smooth: false
+  use_fine_grained_loss: true
+  iou_loss: IouLoss
+  iou_aware_loss: IouAwareLoss
+
+IouLoss:
+  loss_weight: 2.5
+  max_height: 768
+  max_width: 768
+
+IouAwareLoss:
+  loss_weight: 1.0
+  max_height: 768
+  max_width: 768
+
+MatrixNMS:
+  background_label: -1
+  keep_top_k: 100
+  normalized: false
+  score_threshold: 0.01
+  post_threshold: 0.01
+
+LearningRate:
+  base_lr: 0.005
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones:
+    - 300000
+  - !LinearWarmup
+    start_factor: 0.
+    steps: 4000
+
+OptimizerBuilder:
+  clip_grad_by_norm: 35.
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0005
+    type: L2
+
+_READER_: 'ppyolov2_reader.yml'
diff --git a/static/configs/ppyolo/ppyolov2_r50vd_dcn.yml b/static/configs/ppyolo/ppyolov2_r50vd_dcn.yml
new file mode 100644
index 0000000000000000000000000000000000000000..7ceb75833767b60c76528e6fb07786b40dbd6bdd
--- /dev/null
+++ b/static/configs/ppyolo/ppyolov2_r50vd_dcn.yml
@@ -0,0 +1,89 @@
+architecture: YOLOv3
+use_gpu: true
+max_iters: 450000
+log_iter: 100
+save_dir: output
+snapshot_iter: 10000
+metric: COCO
+pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_ssld_pretrained.tar
+weights: output/ppyolov2_r50vd_dcn/model_final
+num_classes: 80
+use_fine_grained_loss: true
+use_ema: true
+ema_decay: 0.9998
+
+YOLOv3:
+  backbone: ResNet
+  yolo_head: YOLOv3PANHead
+  use_fine_grained_loss: true
+
+ResNet:
+  norm_type: sync_bn
+  freeze_at: 0
+  freeze_norm: false
+  norm_decay: 0.
+  depth: 50
+  feature_maps: [3, 4, 5]
+  variant: d
+  dcn_v2_stages: [5]
+
+YOLOv3PANHead:
+  anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
+  anchors: [[10, 13], [16, 30], [33, 23],
+            [30, 61], [62, 45], [59, 119],
+            [116, 90], [156, 198], [373, 326]]
+  norm_decay: 0.
+  iou_aware: true
+  iou_aware_factor: 0.5
+  scale_x_y: 1.05
+  spp: true
+  yolo_loss: YOLOv3Loss
+  nms: MatrixNMS
+  drop_block: true
+
+YOLOv3Loss:
+  ignore_thresh: 0.7
+  scale_x_y: 1.05
+  label_smooth: false
+  use_fine_grained_loss: true
+  iou_loss: IouLoss
+  iou_aware_loss: IouAwareLoss
+
+IouLoss:
+  loss_weight: 2.5
+  max_height: 768
+  max_width: 768
+
+IouAwareLoss:
+  loss_weight: 1.0
+  max_height: 768
+  max_width: 768
+
+MatrixNMS:
+  background_label: -1
+  keep_top_k: 100
+  normalized: false
+  score_threshold: 0.01
+  post_threshold: 0.01
+
+LearningRate:
+  base_lr: 0.005
+  schedulers:
+  - !PiecewiseDecay
+    gamma: 0.1
+    milestones:
+    - 300000
+  - !LinearWarmup
+    start_factor: 0.
+    steps: 4000
+
+OptimizerBuilder:
+  clip_grad_by_norm: 35.
+  optimizer:
+    momentum: 0.9
+    type: Momentum
+  regularizer:
+    factor: 0.0005
+    type: L2
+
+_READER_: 'ppyolov2_reader.yml'
diff --git a/static/configs/ppyolo/ppyolov2_reader.yml b/static/configs/ppyolo/ppyolov2_reader.yml
new file mode 100644
index 0000000000000000000000000000000000000000..02a385c1898f2f698b0adc852f42acf2712bab2b
--- /dev/null
+++ b/static/configs/ppyolo/ppyolov2_reader.yml
@@ -0,0 +1,111 @@
+TrainReader:
+  inputs_def:
+    fields: ['image', 'gt_bbox', 'gt_class', 'gt_score']
+    num_max_boxes: 100
+  dataset:
+    !COCODataSet
+      image_dir: train2017
+      anno_path: annotations/instances_train2017.json
+      dataset_dir: dataset/coco
+      with_background: false
+  sample_transforms:
+    - !DecodeImage
+      to_rgb: True
+      with_mixup: True
+    - !MixupImage
+      alpha: 1.5
+      beta: 1.5
+    - !ColorDistort {}
+    - !RandomExpand
+      ratio: 2.0
+      fill_value: [123.675, 116.28, 103.53]
+    - !RandomCrop {}
+    - !RandomFlipImage
+      is_normalized: false
+    - !NormalizeBox {}
+    - !PadBox
+      num_max_boxes: 100
+    - !BboxXYXY2XYWH {}
+  batch_transforms:
+  - !RandomShape
+    sizes: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608, 640, 672, 704, 736, 768]
+    random_inter: True
+  - !NormalizeImage
+    mean: [0.485, 0.456, 0.406]
+    std: [0.229, 0.224, 0.225]
+    is_scale: True
+    is_channel_first: false
+  - !Permute
+    to_bgr: false
+    channel_first: True
+  # Gt2YoloTarget is only used when use_fine_grained_loss set as true,
+  # this operator will be deleted automatically if use_fine_grained_loss
+  # is set as false
+  - !Gt2YoloTarget
+    anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
+    anchors: [[10, 13], [16, 30], [33, 23],
+              [30, 61], [62, 45], [59, 119],
+              [116, 90], [156, 198], [373, 326]]
+    downsample_ratios: [32, 16, 8]
+  batch_size: 12
+  shuffle: true
+  mixup_epoch: 25000
+  drop_last: true
+  worker_num: 8
+  bufsize: 4
+  use_process: true
+
+EvalReader:
+  inputs_def:
+    fields: ['image', 'im_size', 'im_id']
+    num_max_boxes: 100
+  dataset:
+    !COCODataSet
+      image_dir: val2017
+      anno_path: annotations/instances_val2017.json
+      dataset_dir: dataset/coco
+      with_background: false
+  sample_transforms:
+    - !DecodeImage
+      to_rgb: True
+    - !ResizeImage
+      target_size: 640
+      interp: 2
+    - !NormalizeImage
+      mean: [0.485, 0.456, 0.406]
+      std: [0.229, 0.224, 0.225]
+      is_scale: True
+      is_channel_first: false
+    - !PadBox
+      num_max_boxes: 50
+    - !Permute
+      to_bgr: false
+      channel_first: True
+  batch_size: 8
+  drop_empty: false
+  worker_num: 8
+  bufsize: 4
+
+TestReader:
+  inputs_def:
+    image_shape: [3, 640, 640]
+    fields: ['image', 'im_size', 'im_id']
+  dataset:
+    !ImageFolder
+      anno_path: annotations/instances_val2017.json
+      with_background: false
+  sample_transforms:
+    - !DecodeImage
+      to_rgb: True
+    - !ResizeImage
+      target_size: 640
+      interp: 2
+    - !NormalizeImage
+      mean: [0.485, 0.456, 0.406]
+      std: [0.229, 0.224, 0.225]
+      is_scale: True
+      is_channel_first: false
+    - !Permute
+      to_bgr: false
+      channel_first: True
+  batch_size: 1
diff --git a/static/deploy/cpp/CMakeLists.txt b/static/deploy/cpp/CMakeLists.txt
index 0517825795dce9ae4c550e174eb642bd2273d6bd..0bc0be9aa949dfb89f726555bac16066127502fb 100644
--- a/static/deploy/cpp/CMakeLists.txt
+++ b/static/deploy/cpp/CMakeLists.txt
@@ -3,10 +3,11 @@ project(PaddleObjectDetector CXX C)
 
 option(WITH_MKL        "Compile demo with MKL/OpenBlas support,defaultuseMKL."          ON)
 option(WITH_GPU        "Compile demo with GPU/CPU, default use CPU."                    ON)
-option(WITH_STATIC_LIB "Compile demo with static/shared library, default use static."   ON)
-option(WITH_TENSORRT "Compile demo with TensorRT."   OFF)
+option(WITH_TENSORRT   "Compile demo with TensorRT."                                    OFF)
+
 
 SET(PADDLE_DIR "" CACHE PATH "Location of libraries")
+SET(PADDLE_LIB_NAME "" CACHE STRING "libpaddle_inference")
 SET(OPENCV_DIR "" CACHE PATH "Location of libraries")
 SET(CUDA_LIB "" CACHE PATH "Location of libraries")
 SET(CUDNN_LIB "" CACHE PATH "Location of libraries")
@@ -36,6 +37,7 @@ endif()
 if (NOT DEFINED PADDLE_DIR OR ${PADDLE_DIR} STREQUAL "")
     message(FATAL_ERROR "please set PADDLE_DIR with -DPADDLE_DIR=/path/paddle_influence_dir")
 endif()
+message("PADDLE_DIR IS:"${PADDLE_DIR})
 
 if (NOT DEFINED OPENCV_DIR OR ${OPENCV_DIR} STREQUAL "")
     message(FATAL_ERROR "please set OPENCV_DIR with -DOPENCV_DIR=/path/opencv")
@@ -70,6 +72,8 @@ link_directories("${PADDLE_DIR}/third_party/install/xxhash/lib")
 link_directories("${PADDLE_DIR}/paddle/lib/")
 link_directories("${CMAKE_CURRENT_BINARY_DIR}")
 
+
+
 if (WIN32)
   include_directories("${PADDLE_DIR}/paddle/fluid/inference")
   include_directories("${PADDLE_DIR}/paddle/include")
@@ -89,10 +93,6 @@ if (WIN32)
     set(CMAKE_C_FLAGS_RELEASE  "${CMAKE_C_FLAGS_RELEASE} /bigobj /MT")
     set(CMAKE_CXX_FLAGS_DEBUG  "${CMAKE_CXX_FLAGS_DEBUG} /bigobj /MTd")
     set(CMAKE_CXX_FLAGS_RELEASE   "${CMAKE_CXX_FLAGS_RELEASE} /bigobj /MT")
-    if (WITH_STATIC_LIB)
-        safe_set_static_flag()
-        add_definitions(-DSTATIC_LIB)
-    endif()
 else()
     set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -g -o2 -fopenmp -std=c++11")
     set(CMAKE_STATIC_LIBRARY_PREFIX "")
@@ -113,8 +113,8 @@ endif()
 
 if (NOT WIN32)
   if (WITH_TENSORRT AND WITH_GPU)
-	  include_directories("${TENSORRT_INC_DIR}")
-	  link_directories("${TENSORRT_LIB_DIR}")
+	  include_directories("${TENSORRT_INC_DIR}/")
+	  link_directories("${TENSORRT_LIB_DIR}/")
   endif()
 endif(NOT WIN32)
 
@@ -148,31 +148,30 @@ if(WITH_MKL)
     endif ()
   endif()
 else()
-  if (WIN32)
-    set(MATH_LIB ${PADDLE_DIR}/third_party/install/openblas/lib/openblas${CMAKE_STATIC_LIBRARY_SUFFIX})
-  else()
-    set(MATH_LIB ${PADDLE_DIR}/third_party/install/openblas/lib/libopenblas${CMAKE_STATIC_LIBRARY_SUFFIX})
-  endif()
+  set(MATH_LIB ${PADDLE_DIR}/third_party/install/openblas/lib/libopenblas${CMAKE_STATIC_LIBRARY_SUFFIX})
 endif()
 
+
 if (WIN32)
-    if(EXISTS "${PADDLE_DIR}/paddle/fluid/inference/libpaddle_fluid${CMAKE_STATIC_LIBRARY_SUFFIX}")
+    if(EXISTS "${PADDLE_DIR}/paddle/fluid/inference/${PADDLE_LIB_NAME}${CMAKE_STATIC_LIBRARY_SUFFIX}")
         set(DEPS
-            ${PADDLE_DIR}/paddle/fluid/inference/libpaddle_fluid${CMAKE_STATIC_LIBRARY_SUFFIX})
+            ${PADDLE_DIR}/paddle/fluid/inference/${PADDLE_LIB_NAME}${CMAKE_STATIC_LIBRARY_SUFFIX})
     else()
         set(DEPS
-            ${PADDLE_DIR}/paddle/lib/libpaddle_fluid${CMAKE_STATIC_LIBRARY_SUFFIX})
+            ${PADDLE_DIR}/paddle/lib/${PADDLE_LIB_NAME}${CMAKE_STATIC_LIBRARY_SUFFIX})
     endif()
 endif()
 
-if(WITH_STATIC_LIB)
-    set(DEPS
-        ${PADDLE_DIR}/paddle/lib/libpaddle_fluid${CMAKE_STATIC_LIBRARY_SUFFIX})
+
+if (WIN32)
+    set(DEPS ${PADDLE_DIR}/paddle/lib/${PADDLE_LIB_NAME}${CMAKE_STATIC_LIBRARY_SUFFIX})
 else()
-    set(DEPS
-        ${PADDLE_DIR}/paddle/lib/libpaddle_fluid${CMAKE_SHARED_LIBRARY_SUFFIX})
+    set(DEPS ${PADDLE_DIR}/paddle/lib/${PADDLE_LIB_NAME}${CMAKE_SHARED_LIBRARY_SUFFIX})
 endif()
 
+message("PADDLE_LIB_NAME:" ${PADDLE_LIB_NAME})
+message("DEPS:" $DEPS)
+
 if (NOT WIN32)
     set(DEPS ${DEPS}
         ${MATH_LIB} ${MKLDNN_LIB}
@@ -220,6 +219,7 @@ endif()
 set(DEPS ${DEPS} ${OpenCV_LIBS})
 add_executable(main src/main.cc src/preprocess_op.cc src/object_detector.cc)
 ADD_DEPENDENCIES(main ext-yaml-cpp)
+message("DEPS:" $DEPS)
 target_link_libraries(main ${DEPS})
 
 if (WIN32 AND WITH_MKL)
@@ -230,5 +230,12 @@ if (WIN32 AND WITH_MKL)
         COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/mklml.dll ./release/mklml.dll
         COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mklml/lib/libiomp5md.dll ./release/libiomp5md.dll
         COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/third_party/install/mkldnn/lib/mkldnn.dll ./release/mkldnn.dll
+        COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/paddle/lib/${PADDLE_LIB_NAME}.dll ./release/${PADDLE_LIB_NAME}.dll
+    )
+endif()
+
+if (WIN32)
+    add_custom_command(TARGET main POST_BUILD
+        COMMAND ${CMAKE_COMMAND} -E copy_if_different ${PADDLE_DIR}/paddle/lib/${PADDLE_LIB_NAME}.dll ./release/${PADDLE_LIB_NAME}.dll
     )
 endif()
diff --git a/static/deploy/cpp/docs/Jetson_build.md b/static/deploy/cpp/docs/Jetson_build.md
index 0262ffbd1bf8bea2d62fcac7fda5080c6d472697..8bd0c1efc6453c243252676533ed68b691f1f29e 100644
--- a/static/deploy/cpp/docs/Jetson_build.md
+++ b/static/deploy/cpp/docs/Jetson_build.md
@@ -34,7 +34,7 @@ cat /etc/nv_tegra_release
 
 ### Step2: 下载PaddlePaddle C++ 预测库 fluid_inference
 
-解压下载的[nv_jetson_cuda10_cudnn7.6_trt6(jetpack4.3)](https://paddle-inference-lib.bj.bcebos.com/2.0.0-nv-jetson-jetpack4.3-all/paddle_inference.tgz) 。
+解压下载的[nv_jetson_cuda10_cudnn7.6_trt6(jetpack4.3)](https://paddle-inference-lib.bj.bcebos.com/2.0.1-nv-jetson-jetpack4.3-all/paddle_inference.tgz) 。
 
 下载并解压后`/root/projects/fluid_inference`目录包含内容为：
 ```
@@ -74,6 +74,9 @@ TENSORRT_LIB_DIR=/usr/lib/aarch64-linux-gnu
 # Paddle 预测库路径
 PADDLE_DIR=/path/to/fluid_inference/
 
+# Paddle 预测库名称
+PADDLE_LIB_NAME=paddle_inference
+
 # Paddle 的预测库是否使用静态库来编译
 # 使用TensorRT时，Paddle的预测库通常为动态库
 WITH_STATIC_LIB=OFF
@@ -101,7 +104,8 @@ cmake .. \
     -DWITH_STATIC_LIB=${WITH_STATIC_LIB} \
     -DCUDA_LIB=${CUDA_LIB} \
     -DCUDNN_LIB=${CUDNN_LIB} \
-    -DOPENCV_DIR=${OPENCV_DIR}
+    -DOPENCV_DIR=${OPENCV_DIR} \
+    -DPADDLE_LIB_NAME={PADDLE_LIB_NAME}
 make
 ```
 
@@ -151,7 +155,7 @@ CUDNN_LIB=/usr/lib/aarch64-linux-gnu/
 | --camera_id | Option | 用来预测的摄像头ID，默认为-1（表示不使用摄像头预测）|
 | --use_gpu  | 是否使用 GPU 预测, 支持值为0或1(默认值为0)|
 | --gpu_id  |  指定进行推理的GPU device id(默认值为0)|
-| --run_mode | 使用GPU时，默认为fluid, 可选（fluid/trt_fp32/trt_fp16）|
+| --run_mode | 使用GPU时，默认为fluid, 可选（fluid/trt_fp32/trt_fp16/trt_int8）|
 | --run_benchmark | 是否重复预测来进行benchmark测速 ｜
 | --output_dir | 输出图片所在的文件夹, 默认为output ｜
 
diff --git a/static/deploy/cpp/docs/linux_build.md b/static/deploy/cpp/docs/linux_build.md
index ab95dc786ba766e88d3ec069ef79124b506e9900..14e171191ae5637d3efe60c5eee34fb8233c492a 100644
--- a/static/deploy/cpp/docs/linux_build.md
+++ b/static/deploy/cpp/docs/linux_build.md
@@ -1,10 +1,10 @@
 # Linux平台编译指南
 
 ## 说明
-本文档在 `Linux`平台使用`GCC 4.8.5` 和 `GCC 4.9.4`测试过，如果需要使用更高G++版本编译使用，则需要重新编译Paddle预测库，请参考: [从源码编译Paddle预测库](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/05_inference_deployment/inference/build_and_install_lib_cn.html#zhijiexiazaianzhuang) 。本文档使用的预置的opencv库是在ubuntu 16.04上用gcc4.8编译的，如果需要在ubuntu 16.04以外的系统环境编译，那么需自行编译opencv库。
+本文档在 `Linux`平台使用`GCC 8.2`测试过，如果需要使用其他G++版本编译使用，则需要重新编译Paddle预测库，请参考: [从源码编译Paddle预测库](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/advanced_guide/inference_deployment/inference/build_and_install_lib_cn.html)。本文档使用的预置的opencv库是在ubuntu 16.04上用gcc4.8编译的，如果需要在ubuntu 16.04以外的系统环境编译，那么需自行编译opencv库。
 
 ## 前置条件
-* G++ 4.8.2 ~ 4.9.4
+* G++ 8.2
 * CUDA 9.0 / CUDA 10.0, cudnn 7+ （仅在使用GPU版本的预测库时需要）
 * CMake 3.0+
 
@@ -19,7 +19,7 @@
 
 ### Step2: 下载PaddlePaddle C++ 预测库 fluid_inference
 
-PaddlePaddle C++ 预测库针对不同的`CPU`和`CUDA`版本提供了不同的预编译版本，请根据实际情况下载:  [C++预测库下载列表](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/05_inference_deployment/inference/build_and_install_lib_cn.html#linux)
+PaddlePaddle C++ 预测库针对不同的`CPU`和`CUDA`版本提供了不同的预编译版本，请根据实际情况下载:  [C++预测库下载列表](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/05_inference_deployment/inference/build_and_install_lib_cn.html)
 
 
 下载并解压后`/root/projects/fluid_inference`目录包含内容为：
@@ -58,9 +58,8 @@ TENSORRT_LIB_DIR=/path/to/TensorRT/lib
 # Paddle 预测库路径
 PADDLE_DIR=/path/to/fluid_inference
 
-# Paddle 的预测库是否使用静态库来编译
-# 使用TensorRT时，Paddle的预测库通常为动态库
-WITH_STATIC_LIB=OFF
+# Paddle 预测库名称
+PADDLE_LIB_NAME=paddle_inference
 
 # CUDA 的 lib 路径
 CUDA_LIB=/path/to/cuda/lib
@@ -68,10 +67,6 @@ CUDA_LIB=/path/to/cuda/lib
 # CUDNN 的 lib 路径
 CUDNN_LIB=/path/to/cudnn/lib
 
-# 修改脚本设置好主要参数后，执行`build`脚本：
-sh ./scripts/build.sh
-
-
 # 请检查以上各个路径是否正确
 
 # 以下无需改动
@@ -82,10 +77,10 @@ cmake .. \
     -DTENSORRT_LIB_DIR=${TENSORRT_LIB_DIR} \
     -DTENSORRT_INC_DIR=${TENSORRT_INC_DIR} \
     -DPADDLE_DIR=${PADDLE_DIR} \
-    -DWITH_STATIC_LIB=${WITH_STATIC_LIB} \
     -DCUDA_LIB=${CUDA_LIB} \
     -DCUDNN_LIB=${CUDNN_LIB} \
-    -DOPENCV_DIR=${OPENCV_DIR}
+    -DOPENCV_DIR=${OPENCV_DIR} \
+    -DPADDLE_LIB_NAME={PADDLE_LIB_NAME}
 make
 
 ```
@@ -94,6 +89,7 @@ make
  ```shell
  sh ./scripts/build.sh
  ```
+
 **注意**: OPENCV依赖OPENBLAS，Ubuntu用户需确认系统是否已存在`libopenblas.so`。如未安装，可执行apt-get install libopenblas-dev进行安装。
 
 ### Step5: 预测及可视化
@@ -106,7 +102,7 @@ make
 | --camera_id | Option | 用来预测的摄像头ID，默认为-1（表示不使用摄像头预测）|
 | --use_gpu  | 是否使用 GPU 预测, 支持值为0或1(默认值为0)|
 | --gpu_id  |  指定进行推理的GPU device id(默认值为0)|
-| --run_mode | 使用GPU时，默认为fluid, 可选（fluid/trt_fp32/trt_fp16）|
+| --run_mode | 使用GPU时，默认为fluid, 可选（fluid/trt_fp32/trt_fp16/trt_int8）|
 | --run_benchmark | 是否重复预测来进行benchmark测速 ｜
 | --output_dir | 输出图片所在的文件夹, 默认为output ｜
 
diff --git a/static/deploy/cpp/docs/windows_vs2019_build.md b/static/deploy/cpp/docs/windows_vs2019_build.md
index 7b8dcff078281d39773d862c48df2f88d645105b..efb2d75c89ad04a1c064233cae929375ade21074 100644
--- a/static/deploy/cpp/docs/windows_vs2019_build.md
+++ b/static/deploy/cpp/docs/windows_vs2019_build.md
@@ -24,7 +24,7 @@ git clone https://github.com/PaddlePaddle/PaddleDetection.git
 
 ### Step2: 下载PaddlePaddle C++ 预测库 fluid_inference
 
-PaddlePaddle C++ 预测库针对不同的`CPU`和`CUDA`版本提供了不同的预编译版本，请根据实际情况下载:  [C++预测库下载列表](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/05_inference_deployment/inference/windows_cpp_inference.html#windows)
+PaddlePaddle C++ 预测库针对不同的`CPU`和`CUDA`版本提供了不同的预编译版本，请根据实际情况下载:  [C++预测库下载列表](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/05_inference_deployment/inference/windows_cpp_inference.html)
 
 解压后`D:\projects\fluid_inference`目录包含内容为：
 ```
@@ -62,18 +62,23 @@ cd D:\projects\PaddleDetection\deploy\cpp
 | *CUDNN_LIB | CUDNN的库路径 |
 | OPENCV_DIR  | OpenCV的安装路径， |
 | PADDLE_DIR | Paddle预测库的路径 |
+| PADDLE_LIB_NAME | Paddle 预测库名称 |
 
+**注意：** 1. 使用`CPU`版预测库，请把`WITH_GPU`的勾去掉 2. 如果使用的是`openblas`版本，请把`WITH_MKL`勾去掉
+
+执行如下命令项目文件：
 ```
-cmake . -G "Visual Studio 16 2019" -A x64 -T host=x64 -DWITH_GPU=ON -DWITH_MKL=ON -DCMAKE_BUILD_TYPE=Release -DCUDA_LIB=path_to_cuda_lib -DCUDNN_LIB=path_to_cudnn_lib -DPADDLE_DIR=path_to_paddle_lib -DOPENCV_DIR=path_to_opencv
+cmake . -G "Visual Studio 16 2019" -A x64 -T host=x64 -DWITH_GPU=ON -DWITH_MKL=ON -DCMAKE_BUILD_TYPE=Release -DCUDA_LIB=path_to_cuda_lib -DCUDNN_LIB=path_to_cudnn_lib -DPADDLE_DIR=path_to_paddle_lib -DPADDLE_LIB_NAME=paddle_inference -DOPENCV_DIR=path_to_opencv
 ```
 
 例如：
 ```
-cmake . -G "Visual Studio 16 2019" -A x64 -T host=x64 -DWITH_GPU=ON -DWITH_MKL=ON -DCMAKE_BUILD_TYPE=Release -DCUDA_LIB=D:\projects\packages\cuda10_0\lib\x64 -DCUDNN_LIB=D:\projects\packages\cuda10_0\lib\x64 -DPADDLE_DIR=D:\projects\packages\fluid_inference -DOPENCV_DIR=D:\projects\packages\opencv3_4_6
+cmake . -G "Visual Studio 16 2019" -A x64 -T host=x64 -DWITH_GPU=ON -DWITH_MKL=ON -DCMAKE_BUILD_TYPE=Release -DCUDA_LIB=D:\projects\packages\cuda10_0\lib\x64 -DCUDNN_LIB=D:\projects\packages\cuda10_0\lib\x64 -DPADDLE_DIR=D:\projects\packages\fluid_inference -DPADDLE_LIB_NAME=paddle_inference -DOPENCV_DIR=D:\projects\packages\opencv3_4_6
 ```
 
 3. 编译
-用`Visual Studio 16 2019`打开`cpp`文件夹下的`PaddleObjectDetector.sln`，点击`生成`->`全部生成`
+用`Visual Studio 16 2019`打开`cpp`文件夹下的`PaddleObjectDetector.sln`，将编译模式设置为`Release`，点击`生成`->`全部生成
+
 
 ### Step5: 预测及可视化
 
@@ -92,7 +97,7 @@ cd D:\projects\PaddleDetection\deploy\cpp\out\build\x64-Release
 | --camera_id | Option | 用来预测的摄像头ID，默认为-1（表示不使用摄像头预测）|
 | --use_gpu  | 是否使用 GPU 预测, 支持值为0或1(默认值为0)|
 | --gpu_id  |  指定进行推理的GPU device id(默认值为0)|
-| --run_mode | 使用GPU时，默认为fluid, 可选（fluid/trt_fp32/trt_fp16）|
+| --run_mode | 使用GPU时，默认为fluid, 可选（fluid/trt_fp32/trt_fp16/trt_int8）|
 | --run_benchmark | 是否重复预测来进行benchmark测速 |
 | --output_dir | 输出图片所在的文件夹, 默认为output |
 
diff --git a/static/deploy/cpp/scripts/build.sh b/static/deploy/cpp/scripts/build.sh
index fb6ca625e9bcc1a84dbf6728e809b08e6432d075..ed901d01462779ef1f2ad74c80d27f5d8067ad17 100644
--- a/static/deploy/cpp/scripts/build.sh
+++ b/static/deploy/cpp/scripts/build.sh
@@ -7,21 +7,17 @@ WITH_MKL=ON
 # 是否集成 TensorRT(仅WITH_GPU=ON 有效)
 WITH_TENSORRT=OFF
 
-# 是否使用2.0rc1预测库
-USE_PADDLE_20RC1=OFF
+# paddle 预测库lib名称，由于不同平台不同版本预测库lib名称不同，请查看所下载的预测库中`paddle_inference/lib/`文件夹下`lib`的名称
+PADDLE_LIB_NAME=libpaddle_inference
 
 # TensorRT 的include路径
-TENSORRT_INC_DIR=/path/to/tensorrt/lib
+TENSORRT_INC_DIR=/path/to/tensorrt/include
 
 # TensorRT 的lib路径
-TENSORRT_LIB_DIR=/path/to/tensorrt/include
+TENSORRT_LIB_DIR=/path/to/tensorrt/lib
 
 # Paddle 预测库路径
-PADDLE_DIR=/path/to/fluid_inference/
-
-# Paddle 的预测库是否使用静态库来编译
-# 使用TensorRT时，Paddle的预测库通常为动态库
-WITH_STATIC_LIB=OFF
+PADDLE_DIR=/path/to/paddle_inference
 
 # CUDA 的 lib 路径
 CUDA_LIB=/path/to/cuda/lib
@@ -39,11 +35,11 @@ then
   echo "set OPENCV_DIR for x86_64"
   # linux系统通过以下命令下载预编译的opencv
   mkdir -p $(pwd)/deps && cd $(pwd)/deps
-  wget -c https://bj.bcebos.com/paddleseg/deploy/opencv3.4.6gcc4.8ffmpeg.tar.gz2
-  tar xvfj opencv3.4.6gcc4.8ffmpeg.tar.gz2 && cd ..
+  wget -c https://paddledet.bj.bcebos.com/data/opencv3.4.6gcc8.2ffmpeg.zip
+  unzip opencv3.4.6gcc8.2ffmpeg.zip && cd ..
 
   # set OPENCV_DIR
-  OPENCV_DIR=$(pwd)/deps/opencv3.4.6gcc4.8ffmpeg/
+  OPENCV_DIR=$(pwd)/deps/opencv3.4.6gcc8.2ffmpeg
 
 elif [ "$MACHINE_TYPE" = "aarch64" ]
 then
@@ -76,7 +72,8 @@ cmake .. \
     -DWITH_STATIC_LIB=${WITH_STATIC_LIB} \
     -DCUDA_LIB=${CUDA_LIB} \
     -DCUDNN_LIB=${CUDNN_LIB} \
-    -DOPENCV_DIR=${OPENCV_DIR}
+    -DOPENCV_DIR=${OPENCV_DIR} \
+    -DPADDLE_LIB_NAME=${PADDLE_LIB_NAME}
 
 make
 echo "make finished!"
diff --git a/static/deploy/cpp/src/main.cc b/static/deploy/cpp/src/main.cc
index f2b4b1ad26b923df7f597f8ef88c9a4850bff23f..9c2831600689b6e43783c25782bf6b7a7bea0470 100644
--- a/static/deploy/cpp/src/main.cc
+++ b/static/deploy/cpp/src/main.cc
@@ -198,8 +198,8 @@ int main(int argc, char** argv) {
     return -1;
   }
   if (!(FLAGS_run_mode == "fluid" || FLAGS_run_mode == "trt_fp32"
-      || FLAGS_run_mode == "trt_fp16")) {
-    std::cout << "run_mode should be 'fluid', 'trt_fp32' or 'trt_fp16'.";
+      || FLAGS_run_mode == "trt_fp16" || FLAGS_run_mode == "trt_int8")) {
+    std::cout << "run_mode should be 'fluid', 'trt_fp32', 'trt_fp16' or 'trt_int8'.";
     return -1;
   }
 
diff --git a/static/deploy/cpp/src/object_detector.cc b/static/deploy/cpp/src/object_detector.cc
index 0e5b814eb0f9cec1c557cc5ce666f9b171a8fdcb..50673bdb4e42669152fd3a011ff774f0740ed9bf 100644
--- a/static/deploy/cpp/src/object_detector.cc
+++ b/static/deploy/cpp/src/object_detector.cc
@@ -32,17 +32,17 @@ void ObjectDetector::LoadModel(const std::string& model_dir,
   config.SetModel(prog_file, params_file);
   if (use_gpu) {
     config.EnableUseGpu(100, gpu_id);
+    config.SwitchIrOptim(true);
+    bool use_calib_mode = false;
     if (run_mode != "fluid") {
       auto precision = paddle::AnalysisConfig::Precision::kFloat32;
       if (run_mode == "trt_fp16") {
         precision = paddle::AnalysisConfig::Precision::kHalf;
       } else if (run_mode == "trt_int8") {
-        printf("TensorRT int8 mode is not supported now, "
-               "please use 'trt_fp32' or 'trt_fp16' instead");
+        precision = paddle::AnalysisConfig::Precision::kInt8;
+        use_calib_mode = true;
       } else {
-        if (run_mode != "trt_fp32") {
-          printf("run_mode should be 'fluid', 'trt_fp32' or 'trt_fp16'");
-        }
+        printf("run_mode should be 'fluid', 'trt_fp32', 'trt_fp16' or 'trt_int8'");
       }
       config.EnableTensorRtEngine(
           1 << 10,
@@ -50,7 +50,7 @@ void ObjectDetector::LoadModel(const std::string& model_dir,
           min_subgraph_size,
           precision,
           false,
-          false);
+          use_calib_mode);
    }
   } else {
     config.DisableGpu();
diff --git a/static/deploy/python/README.md b/static/deploy/python/README.md
index b8b3b87be13dec4279f90e4ed04bab09842de3a9..928910f6bcb9e7fe445b6592fb733e5fc8cd4b70 100644
--- a/static/deploy/python/README.md
+++ b/static/deploy/python/README.md
@@ -46,7 +46,7 @@ python deploy/python/infer.py --model_dir=/path/to/models --image_file=/path/to/
 | --video_file | Option |需要预测的视频 |
 | --camera_id | Option | 用来预测的摄像头ID，默认为-1(表示不使用摄像头预测，可设置为：0 - (摄像头数目-1) )，预测过程中在可视化界面按`q`退出输出预测结果到：output/output.mp4|
 | --use_gpu |No|是否GPU，默认为False|
-| --run_mode |No|使用GPU时，默认为fluid, 可选（fluid/trt_fp32/trt_fp16）|
+| --run_mode |No|使用GPU时，默认为fluid, 可选（fluid/trt_fp32/trt_fp16/trt_int8）|
 | --threshold |No|预测得分的阈值，默认为0.5|
 | --output_dir |No|可视化结果保存的根目录，默认为output/|
 | --run_benchmark |No|是否运行benchmark，同时需指定--image_file|
diff --git a/static/deploy/python/infer.py b/static/deploy/python/infer.py
index ae0ff80e95a970010d65d52c8d6e03d66b30e091..59989a6cb801cb8194acadb44e1b8528838bb952 100644
--- a/static/deploy/python/infer.py
+++ b/static/deploy/python/infer.py
@@ -393,9 +393,7 @@ def load_predictor(model_dir,
         raise ValueError(
             "Predict by TensorRT mode: {}, expect use_gpu==True, but use_gpu == {}"
             .format(run_mode, use_gpu))
-    if run_mode == 'trt_int8':
-        raise ValueError("TensorRT int8 mode is not supported now, "
-                         "please use trt_fp32 or trt_fp16 instead.")
+    use_calib_mode = True if run_mode == 'trt_int8' else False
     precision_map = {
         'trt_int8': fluid.core.AnalysisConfig.Precision.Int8,
         'trt_fp32': fluid.core.AnalysisConfig.Precision.Float32,
@@ -419,7 +417,7 @@ def load_predictor(model_dir,
             min_subgraph_size=min_subgraph_size,
             precision_mode=precision_map[run_mode],
             use_static=False,
-            use_calib_mode=False)
+            use_calib_mode=use_calib_mode)
 
     # disable print log when predict
     config.disable_glog_info()
@@ -574,7 +572,7 @@ if __name__ == '__main__':
         "--run_mode",
         type=str,
         default='fluid',
-        help="mode of running(fluid/trt_fp32/trt_fp16)")
+        help="mode of running(fluid/trt_fp32/trt_fp16/trt_int8)")
     parser.add_argument(
         "--use_gpu",
         type=ast.literal_eval,
diff --git a/static/ppdet/data/transform/gridmask_utils.py b/static/ppdet/data/transform/gridmask_utils.py
index a23e69b20860fe90c7a25472e11de770d238dd07..af1f8d56fd75e75271834de0cf10285a93177319 100644
--- a/static/ppdet/data/transform/gridmask_utils.py
+++ b/static/ppdet/data/transform/gridmask_utils.py
@@ -45,7 +45,8 @@ class GridMask(object):
         self.prob = self.st_prob * min(1, 1.0 * curr_iter / self.upper_iter)
         if np.random.rand() > self.prob:
             return x
-        _, h, w = x.shape
+        # image should be C, H, W format
+        h, w, _ = x.shape
         hh = int(1.5 * h)
         ww = int(1.5 * w)
         d = np.random.randint(2, h)
@@ -73,7 +74,7 @@ class GridMask(object):
 
         if self.mode == 1:
             mask = 1 - mask
-        mask = np.expand_dims(mask, axis=0)
+        mask = np.expand_dims(mask, axis=-1)
         if self.offset:
             offset = (2 * (np.random.rand(h, w) - 0.5)).astype(np.float32)
             x = (x * mask + offset * (1 - mask)).astype(x.dtype)
diff --git a/static/ppdet/data/transform/operators.py b/static/ppdet/data/transform/operators.py
index 4646e2582146d0ebde5d19668a069f4e6907dcd0..78ae281e9c1c652a319eda9bf354ee214934773b 100644
--- a/static/ppdet/data/transform/operators.py
+++ b/static/ppdet/data/transform/operators.py
@@ -626,7 +626,7 @@ class GridMaskOp(BaseOperator):
                                                sample['curr_iter'])
         if not batch_input:
             samples = samples[0]
-        return sample
+        return samples
 
 
 @register_op
@@ -2100,7 +2100,7 @@ class BboxXYXY2XYWH(BaseOperator):
 @register_op
 class Lighting(BaseOperator):
     """
-    Lighting the imagen by eigenvalues and eigenvectors
+    Lighting the image by eigenvalues and eigenvectors
     Args:
         eigval (list): eigenvalues
         eigvec (list): eigenvectors
diff --git a/static/ppdet/modeling/anchor_heads/ttf_head.py b/static/ppdet/modeling/anchor_heads/ttf_head.py
index ba9ec802e80cc37afd4bce8c157b7840eb3ec289..31add344d3a1b0dcac694de08c3b802bfcc1c11f 100644
--- a/static/ppdet/modeling/anchor_heads/ttf_head.py
+++ b/static/ppdet/modeling/anchor_heads/ttf_head.py
@@ -24,10 +24,10 @@ from paddle.fluid.param_attr import ParamAttr
 from paddle.fluid.initializer import Normal, Constant, Uniform, Xavier
 from paddle.fluid.regularizer import L2Decay
 from ppdet.core.workspace import register
-from ppdet.modeling.ops import DeformConv, DropBlock
+from ppdet.modeling.ops import DeformConv, DropBlock, ConvNorm
 from ppdet.modeling.losses import GiouLoss
 
-__all__ = ['TTFHead']
+__all__ = ['TTFHead', 'TTFLiteHead']
 
 
 @register
@@ -65,6 +65,8 @@ class TTFHead(object):
         drop_block(bool): whether use dropblock. False by default.
         block_size(int): block_size parameter for drop_block. 3 by default.
         keep_prob(float): keep_prob parameter for drop_block. 0.9 by default.
+        fusion_method (string): Method to fusion upsample and lateral branch.
+            'add' and 'concat' are optional, add by default
     """
 
     __inject__ = ['wh_loss']
@@ -90,7 +92,8 @@ class TTFHead(object):
                  dcn_head=False,
                  drop_block=False,
                  block_size=3,
-                 keep_prob=0.9):
+                 keep_prob=0.9,
+                 fusion_method='add'):
         super(TTFHead, self).__init__()
         self.head_conv = head_conv
         self.num_classes = num_classes
@@ -115,6 +118,7 @@ class TTFHead(object):
         self.drop_block = drop_block
         self.block_size = block_size
         self.keep_prob = keep_prob
+        self.fusion_method = fusion_method
 
     def shortcut(self, x, out_c, layer_num, kernel_size=3, padding=1,
                  name=None):
@@ -255,7 +259,14 @@ class TTFHead(object):
                     out_c,
                     self.shortcut_num[i],
                     name=name + '.shortcut_layers.' + str(i))
-                feat = fluid.layers.elementwise_add(feat, shortcut)
+                if self.fusion_method == 'add':
+                    feat = fluid.layers.elementwise_add(feat, shortcut)
+                elif self.fusion_method == 'concat':
+                    feat = fluid.layers.concat([feat, shortcut], axis=1)
+                else:
+                    raise ValueError(
+                        "Illegal fusion method, expected 'add' or 'concat', but received {}".
+                        format(self.fusion_method))
 
         hm = self.hm_head(feat, name=name + '.hm', is_test=is_test)
         wh = self.wh_head(feat, name=name + '.wh') * self.wh_offset_base
@@ -273,12 +284,13 @@ class TTFHead(object):
         # batch size is 1
         scores_r = fluid.layers.reshape(scores, [cat, -1])
         topk_scores, topk_inds = fluid.layers.topk(scores_r, k)
-        topk_ys = topk_inds / width
+        topk_ys = topk_inds // width
         topk_xs = topk_inds % width
 
         topk_score_r = fluid.layers.reshape(topk_scores, [-1])
         topk_score, topk_ind = fluid.layers.topk(topk_score_r, k)
-        topk_clses = fluid.layers.cast(topk_ind / k, 'float32')
+        k_t = fluid.layers.assign(np.array([k], dtype='int64'))
+        topk_clses = fluid.layers.cast(topk_ind / k_t, 'float32')
 
         topk_inds = fluid.layers.reshape(topk_inds, [-1])
         topk_ys = fluid.layers.reshape(topk_ys, [-1, 1])
@@ -384,3 +396,172 @@ class TTFHead(object):
 
         ttf_loss = {'hm_loss': hm_loss, 'wh_loss': wh_loss}
         return ttf_loss
+
+
+@register
+class TTFLiteHead(TTFHead):
+    """
+    TTFLiteHead
+
+    Lite version for TTFNet
+    Args:
+        head_conv(int): the default channel number of convolution in head.
+            32 by default.
+        num_classes(int): the number of classes, 80 by default.
+        planes(tuple): the channel number of convolution in each upsample.
+            (96, 48, 24) by default.
+        wh_conv(int): the channel number of convolution in wh head.
+            24 by default.
+        wh_loss(object): `GiouLoss` instance.
+        shortcut_num(tuple): the number of convolution layers in each shortcut.
+            (1, 2, 2) by default.
+        fusion_method (string): Method to fusion upsample and lateral branch.
+            'add' and 'concat' are optional, add by default
+    """
+    __inject__ = ['wh_loss']
+    __shared__ = ['num_classes']
+
+    def __init__(self,
+                 head_conv=32,
+                 num_classes=80,
+                 planes=(96, 48, 24),
+                 wh_conv=24,
+                 wh_loss='GiouLoss',
+                 shortcut_num=(1, 2, 2),
+                 fusion_method='concat'):
+        super(TTFLiteHead, self).__init__(
+            head_conv=head_conv,
+            num_classes=num_classes,
+            planes=planes,
+            wh_conv=wh_conv,
+            wh_loss=wh_loss,
+            shortcut_num=shortcut_num,
+            fusion_method=fusion_method)
+
+    def _lite_conv(self, x, out_c, act=None, name=None):
+        conv1 = ConvNorm(
+            input=x,
+            num_filters=x.shape[1],
+            filter_size=5,
+            groups=x.shape[1],
+            norm_type='bn',
+            act='relu6',
+            initializer=Xavier(),
+            name=name + '.depthwise',
+            norm_name=name + '.depthwise.bn')
+
+        conv2 = ConvNorm(
+            input=conv1,
+            num_filters=out_c,
+            filter_size=1,
+            norm_type='bn',
+            act=act,
+            initializer=Xavier(),
+            name=name + '.pointwise_linear',
+            norm_name=name + '.pointwise_linear.bn')
+
+        conv3 = ConvNorm(
+            input=conv2,
+            num_filters=out_c,
+            filter_size=1,
+            norm_type='bn',
+            act='relu6',
+            initializer=Xavier(),
+            name=name + '.pointwise',
+            norm_name=name + '.pointwise.bn')
+
+        conv4 = ConvNorm(
+            input=conv3,
+            num_filters=out_c,
+            filter_size=5,
+            groups=out_c,
+            norm_type='bn',
+            act=act,
+            initializer=Xavier(),
+            name=name + '.depthwise_linear',
+            norm_name=name + '.depthwise_linear.bn')
+
+        return conv4
+
+    def shortcut(self, x, out_c, layer_num, name=None):
+        assert layer_num > 0
+        for i in range(layer_num):
+            param_name = name + '.layers.' + str(i * 2)
+            act = 'relu6' if i < layer_num - 1 else None
+            x = self._lite_conv(x, out_c, act, param_name)
+        return x
+
+    def _deconv_upsample(self, x, out_c, name=None):
+        conv1 = ConvNorm(
+            input=x,
+            num_filters=out_c,
+            filter_size=1,
+            norm_type='bn',
+            act='relu6',
+            name=name + '.pointwise',
+            initializer=Xavier(),
+            norm_name=name + '.pointwise.bn')
+        conv2 = fluid.layers.conv2d_transpose(
+            input=conv1,
+            num_filters=out_c,
+            filter_size=4,
+            padding=1,
+            stride=2,
+            groups=out_c,
+            param_attr=ParamAttr(
+                name=name + '.deconv.weights', initializer=Xavier()),
+            bias_attr=False)
+        bn = fluid.layers.batch_norm(
+            input=conv2,
+            act='relu6',
+            param_attr=ParamAttr(
+                name=name + '.deconv.bn.scale', regularizer=L2Decay(0.)),
+            bias_attr=ParamAttr(
+                name=name + '.deconv.bn.offset', regularizer=L2Decay(0.)),
+            moving_mean_name=name + '.deconv.bn.mean',
+            moving_variance_name=name + '.deconv.bn.variance')
+        conv3 = ConvNorm(
+            input=bn,
+            num_filters=out_c,
+            filter_size=1,
+            norm_type='bn',
+            act='relu6',
+            name=name + '.normal',
+            initializer=Xavier(),
+            norm_name=name + '.normal.bn')
+        return conv3
+
+    def _interp_upsample(self, x, out_c, name=None):
+        conv = self._lite_conv(x, out_c, 'relu6', name)
+        up = fluid.layers.resize_bilinear(conv, scale=2)
+        return up
+
+    def upsample(self, x, out_c, name=None):
+        deconv_up = self._deconv_upsample(x, out_c, name=name + '.dilation_up')
+        interp_up = self._interp_upsample(x, out_c, name=name + '.interp_up')
+        return deconv_up + interp_up
+
+    def _head(self,
+              x,
+              out_c,
+              conv_num=1,
+              head_out_c=None,
+              name=None,
+              is_test=False):
+        head_out_c = self.head_conv if not head_out_c else head_out_c
+        for i in range(conv_num):
+            conv_name = '{}.{}.conv'.format(name, i)
+            x = self._lite_conv(x, head_out_c, 'relu6', conv_name)
+        bias_init = float(-np.log((1 - 0.01) / 0.01)) if '.hm' in name else 0.
+        conv_b_init = Constant(bias_init)
+        x = fluid.layers.conv2d(
+            x,
+            out_c,
+            1,
+            param_attr=ParamAttr(name='{}.{}.weight'.format(name, conv_num)),
+            bias_attr=ParamAttr(
+                learning_rate=2.,
+                regularizer=L2Decay(0.),
+                name='{}.{}.bias'.format(name, conv_num),
+                initializer=conv_b_init))
+        return x
diff --git a/static/ppdet/modeling/anchor_heads/yolo_head.py b/static/ppdet/modeling/anchor_heads/yolo_head.py
index 645b1e835bf1bc08515d36ff4aa079819b077506..49b211ff6327d1d63ad31b13dcbb7bacdd239b9f 100644
--- a/static/ppdet/modeling/anchor_heads/yolo_head.py
+++ b/static/ppdet/modeling/anchor_heads/yolo_head.py
@@ -163,6 +163,7 @@ class YOLOv3Head(object):
                  filter_size,
                  stride,
                  padding,
+                 groups=None,
                  act='leaky',
                  name=None):
         conv = fluid.layers.conv2d(
@@ -171,6 +172,7 @@ class YOLOv3Head(object):
             filter_size=filter_size,
             stride=stride,
             padding=padding,
+            groups=groups,
             act=None,
             param_attr=ParamAttr(name=name + ".conv.weights"),
             bias_attr=False)
@@ -190,6 +192,8 @@ class YOLOv3Head(object):
 
         if act == 'leaky':
             out = fluid.layers.leaky_relu(x=out, alpha=0.1)
+        elif act == 'mish':
+            out = fluid.layers.mish(out)
         return out
 
     def _spp_module(self, input, name=""):
@@ -649,3 +653,416 @@ class YOLOv4Head(YOLOv3Head):
             outputs.append(block_out)
 
         return outputs
+
+
+@register
+class PPYOLOTinyHead(YOLOv3Head):
+    """
+    Head block for YOLOv3 network
+    Args:
+        norm_decay (float): weight decay for normalization layer weights
+        num_classes (int): number of output classes
+        anchors (list): anchors
+        anchor_masks (list): anchor masks
+        nms (object): an instance of `MultiClassNMS`
+        detection_block_channels (list): the channel number of each
+                                        detection block.
+    """
+    __inject__ = ['yolo_loss', 'nms']
+    __shared__ = ['num_classes', 'weight_prefix_name']
+
+    def __init__(self,
+                 norm_decay=0.,
+                 num_classes=80,
+                 anchors=[[10, 13], [16, 30], [33, 23], [30, 61], [62, 45],
+                          [59, 119], [116, 90], [156, 198], [373, 326]],
+                 anchor_masks=[[6, 7, 8], [3, 4, 5], [0, 1, 2]],
+                 detection_block_channels=[128, 96],
+                 drop_block=False,
+                 block_size=3,
+                 keep_prob=0.9,
+                 yolo_loss="YOLOv3Loss",
+                 spp=False,
+                 nms=MultiClassNMS(
+                     score_threshold=0.01,
+                     nms_top_k=1000,
+                     keep_top_k=100,
+                     nms_threshold=0.45,
+                     background_label=-1).__dict__,
+                 weight_prefix_name='',
+                 downsample=[32, 16, 8],
+                 scale_x_y=1.0,
+                 clip_bbox=True):
+        super(PPYOLOTinyHead, self).__init__(
+            norm_decay=norm_decay,
+            num_classes=num_classes,
+            anchors=anchors,
+            anchor_masks=anchor_masks,
+            drop_block=drop_block,
+            block_size=block_size,
+            keep_prob=0.9,
+            spp=spp,
+            yolo_loss=yolo_loss,
+            nms=nms,
+            weight_prefix_name=weight_prefix_name,
+            downsample=downsample,
+            scale_x_y=scale_x_y,
+            clip_bbox=clip_bbox)
+        self.detection_block_channels = detection_block_channels
+
+    def _detection_block(self,
+                         input,
+                         channel,
+                         is_first=False,
+                         is_test=True,
+                         name=None):
+        assert channel % 2 == 0, \
+            "channel {} cannot be divided by 2 in detection block {}" \
+            .format(channel, name)
+
+        conv = input
+        if self.use_spp and is_first:
+            c = conv.shape[1]
+            conv = self._spp_module(conv, name="spp")
+            conv = self._conv_bn(
+                conv,
+                c,
+                filter_size=1,
+                stride=1,
+                padding=0,
+                name='{}.spp.conv'.format(name))
+
+        if self.drop_block:
+            conv = DropBlock(
+                conv,
+                block_size=self.block_size,
+                keep_prob=self.keep_prob,
+                is_test=is_test)
+
+        conv = self._conv_bn(
+            conv,
+            ch_out=channel,
+            filter_size=1,
+            stride=1,
+            padding=0,
+            groups=1,
+            name='{}.0'.format(name))
+        conv = self._conv_bn(
+            conv,
+            channel,
+            filter_size=5,
+            stride=1,
+            padding=2,
+            groups=channel,
+            name='{}.1'.format(name))
+        conv = self._conv_bn(
+            conv,
+            channel,
+            filter_size=1,
+            stride=1,
+            padding=0,
+            name='{}.2'.format(name))
+        route = self._conv_bn(
+            conv,
+            channel,
+            filter_size=5,
+            stride=1,
+            padding=2,
+            groups=channel,
+            name='{}.route'.format(name))
+        tip = self._conv_bn(
+            route,
+            channel,
+            filter_size=1,
+            stride=1,
+            padding=0,
+            name='{}.tip'.format(name))
+        return route, tip
+
+    def _get_outputs(self, input, is_train=True):
+        """
+        Get PP-YOLO tiny head output
+        Args:
+            input (list): List of Variables, output of backbone stages
+            is_train (bool): whether in train or test mode
+        Returns:
+            outputs (list): Variables of each output layer
+        """
+
+        outputs = []
+
+        # get last out_layer_num blocks in reverse order
+        out_layer_num = len(self.anchor_masks)
+        blocks = input[-1:-out_layer_num - 1:-1]
+
+        route = None
+        for i, block in enumerate(blocks):
+            if i > 0:  # perform concat in first 2 detection_block
+                block = fluid.layers.concat(input=[route, block], axis=1)
+            route, tip = self._detection_block(
+                block,
+                channel=self.detection_block_channels[i],
+                is_first=i == 0,
+                is_test=(not is_train),
+                name=self.prefix_name + "yolo_block.{}".format(i))
+
+            # out channel number = mask_num * (5 + class_num)
+            num_filters = len(self.anchor_masks[i]) * (self.num_classes + 5)
+            with fluid.name_scope('yolo_output'):
+                block_out = fluid.layers.conv2d(
+                    input=tip,
+                    num_filters=num_filters,
+                    filter_size=1,
+                    stride=1,
+                    padding=0,
+                    act=None,
+                    param_attr=ParamAttr(
+                        name=self.prefix_name +
+                        "yolo_output.{}.conv.weights".format(i)),
+                    bias_attr=ParamAttr(
+                        regularizer=L2Decay(0.),
+                        name=self.prefix_name +
+                        "yolo_output.{}.conv.bias".format(i)))
+                outputs.append(block_out)
+
+            if i < len(blocks) - 1:
+                # upsample
+                route = self._conv_bn(
+                    input=route,
+                    ch_out=self.detection_block_channels[i],
+                    filter_size=1,
+                    stride=1,
+                    padding=0,
+                    name=self.prefix_name + "yolo_transition.{}".format(i))
+                route = self._upsample(route)
+
+        return outputs
+
+
+@register
+class YOLOv3PANHead(YOLOv3Head):
+    """
+    Head block for YOLOv3PANHead network
+
+    Args:
+        conv_block_num (int): number of conv block in each detection block
+        norm_decay (float): weight decay for normalization layer weights
+        num_classes (int): number of output classes
+        anchors (list): anchors
+        anchor_masks (list): anchor masks
+        nms (object): an instance of `MultiClassNMS`
+    """
+    __inject__ = ['yolo_loss', 'nms']
+    __shared__ = ['num_classes', 'weight_prefix_name']
+
+    def __init__(self,
+                 conv_block_num=3,
+                 norm_decay=0.,
+                 num_classes=80,
+                 anchors=[[10, 13], [16, 30], [33, 23], [30, 61], [62, 45],
+                          [59, 119], [116, 90], [156, 198], [373, 326]],
+                 anchor_masks=[[6, 7, 8], [3, 4, 5], [0, 1, 2]],
+                 drop_block=False,
+                 iou_aware=False,
+                 iou_aware_factor=0.4,
+                 block_size=3,
+                 keep_prob=0.9,
+                 yolo_loss="YOLOv3Loss",
+                 spp=False,
+                 nms=MultiClassNMS(
+                     score_threshold=0.01,
+                     nms_top_k=1000,
+                     keep_top_k=100,
+                     nms_threshold=0.45,
+                     background_label=-1).__dict__,
+                 weight_prefix_name='',
+                 downsample=[32, 16, 8],
+                 scale_x_y=1.0,
+                 clip_bbox=True,
+                 act='mish'):
+        super(YOLOv3PANHead, self).__init__(
+            conv_block_num=conv_block_num,
+            norm_decay=norm_decay,
+            num_classes=num_classes,
+            anchors=anchors,
+            anchor_masks=anchor_masks,
+            drop_block=drop_block,
+            iou_aware=iou_aware,
+            iou_aware_factor=iou_aware_factor,
+            block_size=block_size,
+            keep_prob=keep_prob,
+            yolo_loss=yolo_loss,
+            spp=spp,
+            nms=nms,
+            weight_prefix_name=weight_prefix_name,
+            downsample=downsample,
+            scale_x_y=scale_x_y,
+            clip_bbox=clip_bbox)
+        self.act = act
+
+    def _detection_block(self,
+                         input,
+                         channel,
+                         conv_block_num=2,
+                         is_first=False,
+                         is_test=True,
+                         name=None):
+        conv_left = self._conv_bn(
+            input,
+            channel,
+            act=self.act,
+            filter_size=1,
+            stride=1,
+            padding=0,
+            name='{}.left'.format(name))
+        conv_right = self._conv_bn(
+            input,
+            channel,
+            act=self.act,
+            filter_size=1,
+            stride=1,
+            padding=0,
+            name='{}.right'.format(name))
+        for j in range(conv_block_num):
+            conv_left = self._conv_bn(
+                conv_left,
+                channel,
+                act=self.act,
+                filter_size=1,
+                stride=1,
+                padding=0,
+                name='{}.left.{}'.format(name, 2 * j))
+            if self.use_spp and is_first and j == 1:
+                c = conv_left.shape[1]
+                conv_left = self._spp_module(conv_left, name="spp")
+                conv_left = self._conv_bn(
+                    conv_left,
+                    c,
+                    act=self.act,
+                    filter_size=1,
+                    stride=1,
+                    padding=0,
+                    name='{}.left.{}'.format(name, 2 * j + 1))
+            else:
+                conv_left = self._conv_bn(
+                    conv_left,
+                    channel,
+                    act=self.act,
+                    filter_size=3,
+                    stride=1,
+                    padding=1,
+                    name='{}.left.{}'.format(name, 2 * j + 1))
+            if self.drop_block and j == 1:
+                conv_left = DropBlock(
+                    conv_left,
+                    block_size=self.block_size,
+                    keep_prob=self.keep_prob,
+                    is_test=is_test)
+
+        conv = fluid.layers.concat(input=[conv_left, conv_right], axis=1)
+        conv = self._conv_bn(
+            conv,
+            channel * 2,
+            act=self.act,
+            filter_size=1,
+            stride=1,
+            padding=0,
+            name=name)
+        return conv, conv
+
+    def _get_outputs(self, input, is_train=True):
+        """
+        Get YOLOv3 head output
+
+        Args:
+            input (list): List of Variables, output of backbone stages
+            is_train (bool): whether in train or test mode
+
+        Returns:
+            outputs (list): Variables of each output layer
+        """
+
+        # get last out_layer_num blocks in reverse order
+        out_layer_num = len(self.anchor_masks)
+        blocks = input[-1:-out_layer_num - 1:-1]
+
+        # fpn
+        yolo_feats = []
+        route = None
+        for i, block in enumerate(blocks):
+            if i > 0:  # perform concat in first 2 detection_block
+                block = fluid.layers.concat(input=[route, block], axis=1)
+            route, tip = self._detection_block(
+                block,
+                channel=512 // (2**i),
+                is_first=i == 0,
+                is_test=(not is_train),
+                conv_block_num=self.conv_block_num,
+                name=self.prefix_name + "fpn.{}".format(i))
+
+            yolo_feats.append(tip)
+
+            if i < len(blocks) - 1:
+                # do not perform upsample in the last detection_block
+                route = self._conv_bn(
+                    input=route,
+                    ch_out=512 // (2**i),
+                    filter_size=1,
+                    stride=1,
+                    padding=0,
+                    act=self.act,
+                    name=self.prefix_name + "fpn_transition.{}".format(i))
+                # upsample
+                route = self._upsample(route)
+
+        # pan
+        pan_feats = [yolo_feats[-1]]
+        route = yolo_feats[out_layer_num - 1]
+        for i in reversed(range(out_layer_num - 1)):
+            channel = 512 // (2**i)
+            route = self._conv_bn(
+                input=route,
+                ch_out=channel,
+                filter_size=3,
+                stride=2,
+                padding=1,
+                act=self.act,
+                name=self.prefix_name + "pan_transition.{}".format(i))
+            block = yolo_feats[i]
+            block = fluid.layers.concat(input=[route, block], axis=1)
+
+            route, tip = self._detection_block(
+                block,
+                channel=channel,
+                is_first=False,
+                is_test=(not is_train),
+                conv_block_num=self.conv_block_num,
+                name=self.prefix_name + "pan.{}".format(i))
+
+            pan_feats.append(tip)
+
+        pan_feats = pan_feats[::-1]
+        outputs = []
+        for i, block in enumerate(pan_feats):
+            if self.iou_aware:
+                num_filters = len(self.anchor_masks[i]) * (self.num_classes + 6)
+            else:
+                num_filters = len(self.anchor_masks[i]) * (self.num_classes + 5)
+            with fluid.name_scope('yolo_output'):
+                block_out = fluid.layers.conv2d(
+                    input=block,
+                    num_filters=num_filters,
+                    filter_size=1,
+                    stride=1,
+                    padding=0,
+                    act=None,
+                    param_attr=ParamAttr(
+                        name=self.prefix_name +
+                        "yolo_output.{}.conv.weights".format(i)),
+                    bias_attr=ParamAttr(
+                        regularizer=L2Decay(0.),
+                        name=self.prefix_name +
+                        "yolo_output.{}.conv.bias".format(i)))
+                outputs.append(block_out)
+
+        return outputs
diff --git a/static/ppdet/modeling/losses/iou_aware_loss.py b/static/ppdet/modeling/losses/iou_aware_loss.py
index c68c7a7076412a8005b46aad0641151fe3ccd298..d0aeb9df38579475158ca871d2f4efa7b0ffec6a 100644
--- a/static/ppdet/modeling/losses/iou_aware_loss.py
+++ b/static/ppdet/modeling/losses/iou_aware_loss.py
@@ -74,6 +74,7 @@ class IouAwareLoss(IouLoss):
         iouk = self._iou(pred, gt, ioup, eps)
         iouk.stop_gradient = True
 
-        loss_iou_aware = fluid.layers.cross_entropy(ioup, iouk, soft_label=True)
+        loss_iou_aware = fluid.layers.sigmoid_cross_entropy_with_logits(ioup,
+                                                                        iouk)
         loss_iou_aware = loss_iou_aware * self._loss_weight
         return loss_iou_aware
diff --git a/static/ppdet/modeling/losses/yolo_loss.py b/static/ppdet/modeling/losses/yolo_loss.py
index c16c6cb11b1c6f0d5925fd68955c94b4f9e07dbf..553e633224b67ed15ed772be09acf35901fb7713 100644
--- a/static/ppdet/modeling/losses/yolo_loss.py
+++ b/static/ppdet/modeling/losses/yolo_loss.py
@@ -238,7 +238,6 @@ class YOLOv3Loss(object):
         along channel dimension
         """
         ioup = fluid.layers.slice(output, axes=[1], starts=[0], ends=[an_num])
-        ioup = fluid.layers.sigmoid(ioup)
         oriout = fluid.layers.slice(
             output,
             axes=[1],
diff --git a/static/ppdet/utils/download.py b/static/ppdet/utils/download.py
index 6e4cb4019a0a4252e6a11c7f15c16b2bf5d7f562..2c53406e867da4a308642d012bc469b7231a92aa 100644
--- a/static/ppdet/utils/download.py
+++ b/static/ppdet/utils/download.py
@@ -37,7 +37,7 @@ __all__ = [
     'create_voc_list'
 ]
 
-WEIGHTS_HOME = osp.expanduser("~/.cache/paddle/weights")
+WEIGHTS_HOME = osp.expanduser("~/.cache/paddle/weights/static")
 DATASET_HOME = osp.expanduser("~/.cache/paddle/dataset")
 
 # dict of {dataset_name: (download_info, sub_dirs)}
diff --git a/static/ppdet/utils/export_utils.py b/static/ppdet/utils/export_utils.py
index 1904e7cfd9ba7497c2aa86139f2cdbc599c70799..3579ddb495f8e6055a512f9d8fe32897ca7abe22 100644
--- a/static/ppdet/utils/export_utils.py
+++ b/static/ppdet/utils/export_utils.py
@@ -37,7 +37,7 @@ TRT_MIN_SUBGRAPH = {
     'EfficientDet': 40,
     'Face': 3,
     'TTFNet': 3,
-    'FCOS': 3,
+    'FCOS': 33,
     'SOLOv2': 60,
 }
 RESIZE_SCALE_SET = {
diff --git a/static/slim/distillation/distill.py b/static/slim/distillation/distill.py
index 5e186fffaa8d9633f757dc35cbe5bea9886a322a..d19ef2eb446a7059ae61896aff0c54754df81c15 100644
--- a/static/slim/distillation/distill.py
+++ b/static/slim/distillation/distill.py
@@ -22,24 +22,38 @@ parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 3)))
 if parent_path not in sys.path:
     sys.path.append(parent_path)
 
+import logging
+FORMAT = '%(asctime)s-%(levelname)s: %(message)s'
+logging.basicConfig(level=logging.INFO, format=FORMAT)
+logger = logging.getLogger(__name__)
+
 import numpy as np
 from collections import OrderedDict
 
 from paddleslim.dist.single_distiller import merge, l2_loss
 import paddle
 from paddle import fluid
-from ppdet.core.workspace import load_config, merge_config, create
-from ppdet.data.reader import create_reader
-from ppdet.utils.eval_utils import parse_fetches, eval_results, eval_run
-from ppdet.utils.stats import TrainingStats
-from ppdet.utils.cli import ArgsParser
-from ppdet.utils.check import check_gpu, check_version, check_config, enable_static_mode
-import ppdet.utils.checkpoint as checkpoint
 
-import logging
-FORMAT = '%(asctime)s-%(levelname)s: %(message)s'
-logging.basicConfig(level=logging.INFO, format=FORMAT)
-logger = logging.getLogger(__name__)
+try:
+    from ppdet.core.workspace import load_config, merge_config, create
+    from ppdet.data.reader import create_reader
+    from ppdet.utils.eval_utils import parse_fetches, eval_results, eval_run
+    from ppdet.utils.stats import TrainingStats
+    from ppdet.utils.cli import ArgsParser
+    from ppdet.utils.check import check_gpu, check_version, check_config, enable_static_mode
+    import ppdet.utils.checkpoint as checkpoint
+except ImportError as e:
+    if sys.argv[0].find('static') >= 0:
+        logger.error("Importing ppdet failed when running static model "
+                     "with error: {}\n"
+                     "please try:\n"
+                     "\t1. run static model under PaddleDetection/static "
+                     "directory\n"
+                     "\t2. run 'pip uninstall ppdet' to uninstall ppdet "
+                     "dynamic version firstly.".format(e))
+        sys.exit(-1)
+    else:
+        raise e
 
 
 def l2_distill(pairs, weight):
diff --git a/static/slim/extensions/distill_pruned_model/distill_pruned_model.py b/static/slim/extensions/distill_pruned_model/distill_pruned_model.py
index 19516ce93d97d25a6246729a1c1263a5129d8729..9824024b4023a20573dc82828b55196d8343e9a8 100644
--- a/static/slim/extensions/distill_pruned_model/distill_pruned_model.py
+++ b/static/slim/extensions/distill_pruned_model/distill_pruned_model.py
@@ -23,6 +23,11 @@ parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 4)))
 if parent_path not in sys.path:
     sys.path.append(parent_path)
 
+import logging
+FORMAT = '%(asctime)s-%(levelname)s: %(message)s'
+logging.basicConfig(level=logging.INFO, format=FORMAT)
+logger = logging.getLogger(__name__)
+
 import numpy as np
 from collections import OrderedDict
 from paddleslim.dist.single_distiller import merge, l2_loss
@@ -31,18 +36,27 @@ from paddleslim.analysis import flops
 
 import paddle
 from paddle import fluid
-from ppdet.core.workspace import load_config, merge_config, create
-from ppdet.data.reader import create_reader
-from ppdet.utils.eval_utils import parse_fetches, eval_results, eval_run
-from ppdet.utils.stats import TrainingStats
-from ppdet.utils.cli import ArgsParser
-from ppdet.utils.check import check_gpu, check_version, check_config, enable_static_mode
-import ppdet.utils.checkpoint as checkpoint
 
-import logging
-FORMAT = '%(asctime)s-%(levelname)s: %(message)s'
-logging.basicConfig(level=logging.INFO, format=FORMAT)
-logger = logging.getLogger(__name__)
+try:
+    from ppdet.core.workspace import load_config, merge_config, create
+    from ppdet.data.reader import create_reader
+    from ppdet.utils.eval_utils import parse_fetches, eval_results, eval_run
+    from ppdet.utils.stats import TrainingStats
+    from ppdet.utils.cli import ArgsParser
+    from ppdet.utils.check import check_gpu, check_version, check_config, enable_static_mode
+    import ppdet.utils.checkpoint as checkpoint
+except ImportError as e:
+    if sys.argv[0].find('static') >= 0:
+        logger.error("Importing ppdet failed when running static model "
+                     "with error: {}\n"
+                     "please try:\n"
+                     "\t1. run static model under PaddleDetection/static "
+                     "directory\n"
+                     "\t2. run 'pip uninstall ppdet' to uninstall ppdet "
+                     "dynamic version firstly.".format(e))
+        sys.exit(-1)
+    else:
+        raise e
 
 
 def split_distill(split_output_names, weight, target_number):
diff --git a/static/slim/nas/train_nas.py b/static/slim/nas/train_nas.py
index df9198a611488fcf3226c20828f19e80dcfd0047..12709c82d8a9db5120c1016f3562fa1b46c5c9e8 100644
--- a/static/slim/nas/train_nas.py
+++ b/static/slim/nas/train_nas.py
@@ -30,25 +30,39 @@ from collections import deque
 import paddle
 from paddle import fluid
 
-from ppdet.experimental import mixed_precision_context
-from ppdet.core.workspace import load_config, merge_config, create, register
-from ppdet.data.reader import create_reader
-
-from ppdet.utils import dist_utils
-from ppdet.utils.eval_utils import parse_fetches, eval_run
-from ppdet.utils.stats import TrainingStats
-from ppdet.utils.cli import ArgsParser
-from ppdet.utils.check import check_gpu, check_version, check_config, enable_static_mode
-import ppdet.utils.checkpoint as checkpoint
-from paddleslim.analysis import flops, TableLatencyEvaluator
-from paddleslim.nas import SANAS
-import search_space
-
 import logging
 FORMAT = '%(asctime)s-%(levelname)s: %(message)s'
 logging.basicConfig(level=logging.INFO, format=FORMAT)
 logger = logging.getLogger(__name__)
 
+try:
+    from ppdet.experimental import mixed_precision_context
+    from ppdet.core.workspace import load_config, merge_config, create, register
+    from ppdet.data.reader import create_reader
+
+    from ppdet.utils import dist_utils
+    from ppdet.utils.eval_utils import parse_fetches, eval_run
+    from ppdet.utils.stats import TrainingStats
+    from ppdet.utils.cli import ArgsParser
+    from ppdet.utils.check import check_gpu, check_version, check_config, enable_static_mode
+    import ppdet.utils.checkpoint as checkpoint
+except ImportError as e:
+    if sys.argv[0].find('static') >= 0:
+        logger.error("Importing ppdet failed when running static model "
+                     "with error: {}\n"
+                     "please try:\n"
+                     "\t1. run static model under PaddleDetection/static "
+                     "directory\n"
+                     "\t2. run 'pip uninstall ppdet' to uninstall ppdet "
+                     "dynamic version firstly.".format(e))
+        sys.exit(-1)
+    else:
+        raise e
+
+from paddleslim.analysis import flops, TableLatencyEvaluator
+from paddleslim.nas import SANAS
+import search_space
+
 
 @register
 class Constraint(object):
diff --git a/static/slim/prune/eval.py b/static/slim/prune/eval.py
index 6dae100b17591512c1f4eab346140e0168c73ce2..fa76ac8bbe352ec50c23b8bb7f3b3147af743fcc 100644
--- a/static/slim/prune/eval.py
+++ b/static/slim/prune/eval.py
@@ -28,20 +28,33 @@ import paddle.fluid as fluid
 from paddleslim.prune import Pruner
 from paddleslim.analysis import flops
 
-from ppdet.utils.eval_utils import parse_fetches, eval_run, eval_results, json_eval_results
-import ppdet.utils.checkpoint as checkpoint
-from ppdet.utils.check import check_gpu, check_version, check_config, enable_static_mode
-
-from ppdet.data.reader import create_reader
-
-from ppdet.core.workspace import load_config, merge_config, create
-from ppdet.utils.cli import ArgsParser
-
 import logging
 FORMAT = '%(asctime)s-%(levelname)s: %(message)s'
 logging.basicConfig(level=logging.INFO, format=FORMAT)
 logger = logging.getLogger(__name__)
 
+try:
+    from ppdet.utils.eval_utils import parse_fetches, eval_run, eval_results, json_eval_results
+    import ppdet.utils.checkpoint as checkpoint
+    from ppdet.utils.check import check_gpu, check_version, check_config, enable_static_mode
+
+    from ppdet.data.reader import create_reader
+
+    from ppdet.core.workspace import load_config, merge_config, create
+    from ppdet.utils.cli import ArgsParser
+except ImportError as e:
+    if sys.argv[0].find('static') >= 0:
+        logger.error("Importing ppdet failed when running static model "
+                     "with error: {}\n"
+                     "please try:\n"
+                     "\t1. run static model under PaddleDetection/static "
+                     "directory\n"
+                     "\t2. run 'pip uninstall ppdet' to uninstall ppdet "
+                     "dynamic version firstly.".format(e))
+        sys.exit(-1)
+    else:
+        raise e
+
 
 def main():
     """
diff --git a/static/slim/prune/export_model.py b/static/slim/prune/export_model.py
index 342182878eef6a4eb4734c8e3cc8da3a8751ac07..d8fcdb8fccac085b9bbab3f59000cd00a474586c 100644
--- a/static/slim/prune/export_model.py
+++ b/static/slim/prune/export_model.py
@@ -25,19 +25,33 @@ if parent_path not in sys.path:
 import paddle
 from paddle import fluid
 
-from ppdet.core.workspace import load_config, merge_config, create
-from ppdet.utils.cli import ArgsParser
-import ppdet.utils.checkpoint as checkpoint
-from ppdet.utils.export_utils import save_infer_model, dump_infer_config
-from ppdet.utils.check import check_config, check_version, enable_static_mode
-from paddleslim.prune import Pruner
-from paddleslim.analysis import flops
-
 import logging
 FORMAT = '%(asctime)s-%(levelname)s: %(message)s'
 logging.basicConfig(level=logging.INFO, format=FORMAT)
 logger = logging.getLogger(__name__)
 
+try:
+    from ppdet.core.workspace import load_config, merge_config, create
+    from ppdet.utils.cli import ArgsParser
+    import ppdet.utils.checkpoint as checkpoint
+    from ppdet.utils.export_utils import save_infer_model, dump_infer_config
+    from ppdet.utils.check import check_config, check_version, enable_static_mode
+except ImportError as e:
+    if sys.argv[0].find('static') >= 0:
+        logger.error("Importing ppdet failed when running static model "
+                     "with error: {}\n"
+                     "please try:\n"
+                     "\t1. run static model under PaddleDetection/static "
+                     "directory\n"
+                     "\t2. run 'pip uninstall ppdet' to uninstall ppdet "
+                     "dynamic version firstly.".format(e))
+        sys.exit(-1)
+    else:
+        raise e
+
+from paddleslim.prune import Pruner
+from paddleslim.analysis import flops
+
 
 def main():
     cfg = load_config(FLAGS.config)
diff --git a/static/slim/prune/prune.py b/static/slim/prune/prune.py
index bb260fcafd90f3a13b560900b0288702ddc13835..52b0b0c7c5519ef326242207479d9926c6febd18 100644
--- a/static/slim/prune/prune.py
+++ b/static/slim/prune/prune.py
@@ -32,21 +32,34 @@ from paddleslim.analysis import flops
 import paddle
 from paddle import fluid
 
-from ppdet.experimental import mixed_precision_context
-from ppdet.core.workspace import load_config, merge_config, create
-from ppdet.data.reader import create_reader
-from ppdet.utils import dist_utils
-from ppdet.utils.eval_utils import parse_fetches, eval_run, eval_results
-from ppdet.utils.stats import TrainingStats
-from ppdet.utils.cli import ArgsParser
-from ppdet.utils.check import check_gpu, check_version, check_config, enable_static_mode
-import ppdet.utils.checkpoint as checkpoint
-
 import logging
 FORMAT = '%(asctime)s-%(levelname)s: %(message)s'
 logging.basicConfig(level=logging.INFO, format=FORMAT)
 logger = logging.getLogger(__name__)
 
+try:
+    from ppdet.experimental import mixed_precision_context
+    from ppdet.core.workspace import load_config, merge_config, create
+    from ppdet.data.reader import create_reader
+    from ppdet.utils import dist_utils
+    from ppdet.utils.eval_utils import parse_fetches, eval_run, eval_results
+    from ppdet.utils.stats import TrainingStats
+    from ppdet.utils.cli import ArgsParser
+    from ppdet.utils.check import check_gpu, check_version, check_config, enable_static_mode
+    import ppdet.utils.checkpoint as checkpoint
+except ImportError as e:
+    if sys.argv[0].find('static') >= 0:
+        logger.error("Importing ppdet failed when running static model "
+                     "with error: {}\n"
+                     "please try:\n"
+                     "\t1. run static model under PaddleDetection/static "
+                     "directory\n"
+                     "\t2. run 'pip uninstall ppdet' to uninstall ppdet "
+                     "dynamic version firstly.".format(e))
+        sys.exit(-1)
+    else:
+        raise e
+
 
 def main():
     env = os.environ
diff --git a/static/slim/quantization/eval.py b/static/slim/quantization/eval.py
index 04142c98250ef0fac58c41dcb2f3078b9baa4825..b16d8007d768ee0dc66d913e5dc1fc09cd1174a8 100644
--- a/static/slim/quantization/eval.py
+++ b/static/slim/quantization/eval.py
@@ -26,20 +26,33 @@ if parent_path not in sys.path:
 import paddle
 import paddle.fluid as fluid
 
-from ppdet.utils.eval_utils import parse_fetches, eval_run, eval_results, json_eval_results
-import ppdet.utils.checkpoint as checkpoint
-from ppdet.utils.check import check_gpu, check_version, check_config, enable_static_mode
-
-from ppdet.data.reader import create_reader
-
-from ppdet.core.workspace import load_config, merge_config, create
-from ppdet.utils.cli import ArgsParser
-
 import logging
 FORMAT = '%(asctime)s-%(levelname)s: %(message)s'
 logging.basicConfig(level=logging.INFO, format=FORMAT)
 logger = logging.getLogger(__name__)
 
+try:
+    from ppdet.utils.eval_utils import parse_fetches, eval_run, eval_results, json_eval_results
+    import ppdet.utils.checkpoint as checkpoint
+    from ppdet.utils.check import check_gpu, check_version, check_config, enable_static_mode
+
+    from ppdet.data.reader import create_reader
+
+    from ppdet.core.workspace import load_config, merge_config, create
+    from ppdet.utils.cli import ArgsParser
+except ImportError as e:
+    if sys.argv[0].find('static') >= 0:
+        logger.error("Importing ppdet failed when running static model "
+                     "with error: {}\n"
+                     "please try:\n"
+                     "\t1. run static model under PaddleDetection/static "
+                     "directory\n"
+                     "\t2. run 'pip uninstall ppdet' to uninstall ppdet "
+                     "dynamic version firstly.".format(e))
+        sys.exit(-1)
+    else:
+        raise e
+
 # import paddleslim
 from paddleslim.quant import quant_aware, convert
 
diff --git a/static/slim/quantization/export_model.py b/static/slim/quantization/export_model.py
index 41585c883712ca82607e552bb7235c55121d7084..067c21c376c350715190a056be330650102498f1 100644
--- a/static/slim/quantization/export_model.py
+++ b/static/slim/quantization/export_model.py
@@ -22,19 +22,33 @@ parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 3)))
 if parent_path not in sys.path:
     sys.path.append(parent_path)
 
-import paddle
-from paddle import fluid
-
-from ppdet.core.workspace import load_config, merge_config, create
-from ppdet.utils.cli import ArgsParser
-import ppdet.utils.checkpoint as checkpoint
-from ppdet.utils.export_utils import save_infer_model, dump_infer_config
-from ppdet.utils.check import check_config, check_version, enable_static_mode
-
 import logging
 FORMAT = '%(asctime)s-%(levelname)s: %(message)s'
 logging.basicConfig(level=logging.INFO, format=FORMAT)
 logger = logging.getLogger(__name__)
+
+import paddle
+from paddle import fluid
+
+try:
+    from ppdet.core.workspace import load_config, merge_config, create
+    from ppdet.utils.cli import ArgsParser
+    import ppdet.utils.checkpoint as checkpoint
+    from ppdet.utils.export_utils import save_infer_model, dump_infer_config
+    from ppdet.utils.check import check_config, check_version, enable_static_mode
+except ImportError as e:
+    if sys.argv[0].find('static') >= 0:
+        logger.error("Importing ppdet failed when running static model "
+                     "with error: {}\n"
+                     "please try:\n"
+                     "\t1. run static model under PaddleDetection/static "
+                     "directory\n"
+                     "\t2. run 'pip uninstall ppdet' to uninstall ppdet "
+                     "dynamic version firstly.".format(e))
+        sys.exit(-1)
+    else:
+        raise e
+
 from paddleslim.quant import quant_aware, convert
 
 
diff --git a/static/slim/quantization/infer.py b/static/slim/quantization/infer.py
index 1051043e48c4555fb442a139ba9e4df6f9c5f771..58c1dac714d4aa3c61eb206e42b885e50fc0dab5 100644
--- a/static/slim/quantization/infer.py
+++ b/static/slim/quantization/infer.py
@@ -29,19 +29,33 @@ from PIL import Image
 import paddle
 from paddle import fluid
 
-from ppdet.core.workspace import load_config, merge_config, create
-from ppdet.utils.eval_utils import parse_fetches
-from ppdet.utils.cli import ArgsParser
-from ppdet.utils.check import check_gpu, check_version, check_config, enable_static_mode
-from ppdet.utils.visualizer import visualize_results
-import ppdet.utils.checkpoint as checkpoint
-
-from ppdet.data.reader import create_reader
-from tools.infer import get_test_images, get_save_image_name
 import logging
 FORMAT = '%(asctime)s-%(levelname)s: %(message)s'
 logging.basicConfig(level=logging.INFO, format=FORMAT)
 logger = logging.getLogger(__name__)
+
+try:
+    from ppdet.core.workspace import load_config, merge_config, create
+    from ppdet.utils.eval_utils import parse_fetches
+    from ppdet.utils.cli import ArgsParser
+    from ppdet.utils.check import check_gpu, check_version, check_config, enable_static_mode
+    from ppdet.utils.visualizer import visualize_results
+    import ppdet.utils.checkpoint as checkpoint
+    from ppdet.data.reader import create_reader
+    from tools.infer import get_test_images, get_save_image_name
+except ImportError as e:
+    if sys.argv[0].find('static') >= 0:
+        logger.error("Importing ppdet failed when running static model "
+                     "with error: {}\n"
+                     "please try:\n"
+                     "\t1. run static model under PaddleDetection/static "
+                     "directory\n"
+                     "\t2. run 'pip uninstall ppdet' to uninstall ppdet "
+                     "dynamic version firstly.".format(e))
+        sys.exit(-1)
+    else:
+        raise e
+
 from paddleslim.quant import quant_aware, convert
 
 
diff --git a/static/slim/quantization/train.py b/static/slim/quantization/train.py
index b9fe275809f7bc44c3d2e28d6b89ee33bb073bf8..12ed7ef508fda5c6a5d1435d30735d3718626c88 100644
--- a/static/slim/quantization/train.py
+++ b/static/slim/quantization/train.py
@@ -31,21 +31,36 @@ import shutil
 import paddle
 from paddle import fluid
 
-from ppdet.core.workspace import load_config, merge_config, create
-from ppdet.data.reader import create_reader
-from ppdet.utils import dist_utils
-from ppdet.utils.eval_utils import parse_fetches, eval_run, eval_results
-from ppdet.utils.stats import TrainingStats
-from ppdet.utils.cli import ArgsParser
-from ppdet.utils.check import check_gpu, check_version, check_config, enable_static_mode
-import ppdet.utils.checkpoint as checkpoint
-from paddleslim.quant import quant_aware, convert
-from pact import pact, get_optimizer
 import logging
 FORMAT = '%(asctime)s-%(levelname)s: %(message)s'
 logging.basicConfig(level=logging.INFO, format=FORMAT)
 logger = logging.getLogger(__name__)
 
+try:
+    from ppdet.core.workspace import load_config, merge_config, create
+    from ppdet.data.reader import create_reader
+    from ppdet.utils import dist_utils
+    from ppdet.utils.eval_utils import parse_fetches, eval_run, eval_results
+    from ppdet.utils.stats import TrainingStats
+    from ppdet.utils.cli import ArgsParser
+    from ppdet.utils.check import check_gpu, check_version, check_config, enable_static_mode
+    import ppdet.utils.checkpoint as checkpoint
+except ImportError as e:
+    if sys.argv[0].find('static') >= 0:
+        logger.error("Importing ppdet failed when running static model "
+                     "with error: {}\n"
+                     "please try:\n"
+                     "\t1. run static model under PaddleDetection/static "
+                     "directory\n"
+                     "\t2. run 'pip uninstall ppdet' to uninstall ppdet "
+                     "dynamic version firstly.".format(e))
+        sys.exit(-1)
+    else:
+        raise e
+
+from paddleslim.quant import quant_aware, convert
+from pact import pact, get_optimizer
+
 
 def save_checkpoint(exe, prog, path, train_prog):
     if os.path.isdir(path):
diff --git a/static/tools/anchor_cluster.py b/static/tools/anchor_cluster.py
index 5ec26355c00ec283c230deea7cbeedf2b521c87f..67ad2d9cd9a692322e0b2c9d330fed1a9bf6e0cc 100644
--- a/static/tools/anchor_cluster.py
+++ b/static/tools/anchor_cluster.py
@@ -23,18 +23,32 @@ parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2)))
 if parent_path not in sys.path:
     sys.path.append(parent_path)
 
+import logging
+FORMAT = '%(asctime)s-%(levelname)s: %(message)s'
+logging.basicConfig(level=logging.INFO, format=FORMAT)
+logger = logging.getLogger(__name__)
+
 from scipy.cluster.vq import kmeans
 import random
 import numpy as np
 from tqdm import tqdm
-from ppdet.utils.cli import ArgsParser
-from ppdet.utils.check import check_gpu, check_version, check_config
-from ppdet.core.workspace import load_config, merge_config, create
 
-import logging
-FORMAT = '%(asctime)s-%(levelname)s: %(message)s'
-logging.basicConfig(level=logging.INFO, format=FORMAT)
-logger = logging.getLogger(__name__)
+try:
+    from ppdet.utils.cli import ArgsParser
+    from ppdet.utils.check import check_gpu, check_version, check_config
+    from ppdet.core.workspace import load_config, merge_config, create
+except ImportError as e:
+    if sys.argv[0].find('static') >= 0:
+        logger.error("Importing ppdet failed when running static model "
+                     "with error: {}\n"
+                     "please try:\n"
+                     "\t1. run static model under PaddleDetection/static "
+                     "directory\n"
+                     "\t2. run 'pip uninstall ppdet' to uninstall ppdet "
+                     "dynamic version firstly.".format(e))
+        sys.exit(-1)
+    else:
+        raise e
 
 
 class BaseAnchorCluster(object):
diff --git a/static/tools/configure.py b/static/tools/configure.py
index fdf826a5521e080765fc30cffc5a4cefa1a9c56b..64ff575b4a77f2a8dea6db9c08b37907948806be 100644
--- a/static/tools/configure.py
+++ b/static/tools/configure.py
@@ -23,10 +23,28 @@ parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2)))
 if parent_path not in sys.path:
     sys.path.append(parent_path)
 
+import logging
+FORMAT = '%(asctime)s-%(levelname)s: %(message)s'
+logging.basicConfig(level=logging.INFO, format=FORMAT)
+logger = logging.getLogger(__name__)
+
 import yaml
 
-from ppdet.core.workspace import get_registered_modules, load_config, dump_value
-from ppdet.utils.cli import ColorTTY, print_total_cfg
+try:
+    from ppdet.core.workspace import get_registered_modules, load_config, dump_value
+    from ppdet.utils.cli import ColorTTY, print_total_cfg
+except ImportError as e:
+    if sys.argv[0].find('static') >= 0:
+        logger.error("Importing ppdet failed when running static model "
+                     "with error: {}\n"
+                     "please try:\n"
+                     "\t1. run static model under PaddleDetection/static "
+                     "directory\n"
+                     "\t2. run 'pip uninstall ppdet' to uninstall ppdet "
+                     "dynamic version firstly.".format(e))
+        sys.exit(-1)
+    else:
+        raise e
 
 color_tty = ColorTTY()
 
diff --git a/static/tools/eval.py b/static/tools/eval.py
index 6b54f87a90b3bcf37a047f0098f8395b907b3078..dfaf70dfe6c555b4382958b50718724e8831e90b 100644
--- a/static/tools/eval.py
+++ b/static/tools/eval.py
@@ -25,20 +25,33 @@ if parent_path not in sys.path:
 import paddle
 import paddle.fluid as fluid
 
-from ppdet.utils.eval_utils import parse_fetches, eval_run, eval_results, json_eval_results
-import ppdet.utils.checkpoint as checkpoint
-from ppdet.utils.check import check_gpu, check_xpu, check_version, check_config, enable_static_mode
-
-from ppdet.data.reader import create_reader
-
-from ppdet.core.workspace import load_config, merge_config, create
-from ppdet.utils.cli import ArgsParser
-
 import logging
 FORMAT = '%(asctime)s-%(levelname)s: %(message)s'
 logging.basicConfig(level=logging.INFO, format=FORMAT)
 logger = logging.getLogger(__name__)
 
+try:
+    from ppdet.utils.eval_utils import parse_fetches, eval_run, eval_results, json_eval_results
+    import ppdet.utils.checkpoint as checkpoint
+    from ppdet.utils.check import check_gpu, check_xpu, check_version, check_config, enable_static_mode
+
+    from ppdet.data.reader import create_reader
+
+    from ppdet.core.workspace import load_config, merge_config, create
+    from ppdet.utils.cli import ArgsParser
+except ImportError as e:
+    if sys.argv[0].find('static') >= 0:
+        logger.error("Importing ppdet failed when running static model "
+                     "with error: {}\n"
+                     "please try:\n"
+                     "\t1. run static model under PaddleDetection/static "
+                     "directory\n"
+                     "\t2. run 'pip uninstall ppdet' to uninstall ppdet "
+                     "dynamic version firstly.".format(e))
+        sys.exit(-1)
+    else:
+        raise e
+
 
 def main():
     """
diff --git a/static/tools/export_model.py b/static/tools/export_model.py
index 6827b6fe1eebff6a25265caccbed5948eaad7c64..d6f6013b7162dc1b03679d514387634703c522e0 100644
--- a/static/tools/export_model.py
+++ b/static/tools/export_model.py
@@ -26,16 +26,30 @@ if parent_path not in sys.path:
 import paddle
 from paddle import fluid
 
-from ppdet.core.workspace import load_config, merge_config, create
-from ppdet.utils.cli import ArgsParser
-import ppdet.utils.checkpoint as checkpoint
-from ppdet.utils.export_utils import save_infer_model, dump_infer_config
-from ppdet.utils.check import check_config, check_version, check_py_func, enable_static_mode
 import logging
 FORMAT = '%(asctime)s-%(levelname)s: %(message)s'
 logging.basicConfig(level=logging.INFO, format=FORMAT)
 logger = logging.getLogger(__name__)
 
+try:
+    from ppdet.core.workspace import load_config, merge_config, create
+    from ppdet.utils.cli import ArgsParser
+    import ppdet.utils.checkpoint as checkpoint
+    from ppdet.utils.export_utils import save_infer_model, dump_infer_config
+    from ppdet.utils.check import check_config, check_version, check_py_func, enable_static_mode
+except ImportError as e:
+    if sys.argv[0].find('static') >= 0:
+        logger.error("Importing ppdet failed when running static model "
+                     "with error: {}\n"
+                     "please try:\n"
+                     "\t1. run static model under PaddleDetection/static "
+                     "directory\n"
+                     "\t2. run 'pip uninstall ppdet' to uninstall ppdet "
+                     "dynamic version firstly.".format(e))
+        sys.exit(-1)
+    else:
+        raise e
+
 
 def main():
     cfg = load_config(FLAGS.config)
diff --git a/static/tools/export_serving_model.py b/static/tools/export_serving_model.py
index 368ee157599eb08087f4f344b8cb09aa84ab0138..ae11a60cae5cda489014d3b48e18a7a0b22117c6 100644
--- a/static/tools/export_serving_model.py
+++ b/static/tools/export_serving_model.py
@@ -22,20 +22,34 @@ parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2)))
 if parent_path not in sys.path:
     sys.path.append(parent_path)
 
+import yaml
 import paddle
 from paddle import fluid
 
-from ppdet.core.workspace import load_config, merge_config, create
-from ppdet.utils.cli import ArgsParser
-from ppdet.utils.check import check_config, check_version, enable_static_mode
-import ppdet.utils.checkpoint as checkpoint
-import yaml
 import logging
-from ppdet.utils.export_utils import dump_infer_config, prune_feed_vars
 FORMAT = '%(asctime)s-%(levelname)s: %(message)s'
 logging.basicConfig(level=logging.INFO, format=FORMAT)
 logger = logging.getLogger(__name__)
 
+try:
+    from ppdet.core.workspace import load_config, merge_config, create
+    from ppdet.utils.cli import ArgsParser
+    from ppdet.utils.check import check_config, check_version, enable_static_mode
+    import ppdet.utils.checkpoint as checkpoint
+    from ppdet.utils.export_utils import dump_infer_config, prune_feed_vars
+except ImportError as e:
+    if sys.argv[0].find('static') >= 0:
+        logger.error("Importing ppdet failed when running static model "
+                     "with error: {}\n"
+                     "please try:\n"
+                     "\t1. run static model under PaddleDetection/static "
+                     "directory\n"
+                     "\t2. run 'pip uninstall ppdet' to uninstall ppdet "
+                     "dynamic version firstly.".format(e))
+        sys.exit(-1)
+    else:
+        raise e
+
 
 def save_serving_model(FLAGS, exe, feed_vars, test_fetches, infer_prog):
     cfg_name = os.path.basename(FLAGS.config).split('.')[0]
diff --git a/static/tools/face_eval.py b/static/tools/face_eval.py
index 3ee0e3041629289e79e972d33ede634ef4481360..e47ded2674ccfd371d07ec76aa1d0c931d55399f 100644
--- a/static/tools/face_eval.py
+++ b/static/tools/face_eval.py
@@ -29,18 +29,31 @@ import numpy as np
 import cv2
 from collections import OrderedDict
 
-import ppdet.utils.checkpoint as checkpoint
-from ppdet.utils.cli import ArgsParser
-from ppdet.utils.check import check_gpu, check_version, check_config, enable_static_mode
-from ppdet.utils.widerface_eval_utils import get_shrink, bbox_vote, \
-    save_widerface_bboxes, save_fddb_bboxes, to_chw_bgr
-from ppdet.core.workspace import load_config, merge_config, create
-
 import logging
 FORMAT = '%(asctime)s-%(levelname)s: %(message)s'
 logging.basicConfig(level=logging.INFO, format=FORMAT)
 logger = logging.getLogger(__name__)
 
+try:
+    import ppdet.utils.checkpoint as checkpoint
+    from ppdet.utils.cli import ArgsParser
+    from ppdet.utils.check import check_gpu, check_version, check_config, enable_static_mode
+    from ppdet.utils.widerface_eval_utils import get_shrink, bbox_vote, \
+        save_widerface_bboxes, save_fddb_bboxes, to_chw_bgr
+    from ppdet.core.workspace import load_config, merge_config, create
+except ImportError as e:
+    if sys.argv[0].find('static') >= 0:
+        logger.error("Importing ppdet failed when running static model "
+                     "with error: {}\n"
+                     "please try:\n"
+                     "\t1. run static model under PaddleDetection/static "
+                     "directory\n"
+                     "\t2. run 'pip uninstall ppdet' to uninstall ppdet "
+                     "dynamic version firstly.".format(e))
+        sys.exit(-1)
+    else:
+        raise e
+
 
 def face_img_process(image,
                      mean=[104., 117., 123.],
diff --git a/static/tools/infer.py b/static/tools/infer.py
index bd26650bbec56e890507927874ec93a9375851c2..df831fd547924f468107131d2c45819d87071b93 100644
--- a/static/tools/infer.py
+++ b/static/tools/infer.py
@@ -30,21 +30,34 @@ from PIL import Image, ImageOps
 import paddle
 from paddle import fluid
 
-from ppdet.core.workspace import load_config, merge_config, create
-
-from ppdet.utils.eval_utils import parse_fetches
-from ppdet.utils.cli import ArgsParser
-from ppdet.utils.check import check_gpu, check_version, check_config, enable_static_mode
-from ppdet.utils.visualizer import visualize_results
-import ppdet.utils.checkpoint as checkpoint
-
-from ppdet.data.reader import create_reader
-
 import logging
 FORMAT = '%(asctime)s-%(levelname)s: %(message)s'
 logging.basicConfig(level=logging.INFO, format=FORMAT)
 logger = logging.getLogger(__name__)
 
+try:
+    from ppdet.core.workspace import load_config, merge_config, create
+
+    from ppdet.utils.eval_utils import parse_fetches
+    from ppdet.utils.cli import ArgsParser
+    from ppdet.utils.check import check_gpu, check_version, check_config, enable_static_mode
+    from ppdet.utils.visualizer import visualize_results
+    import ppdet.utils.checkpoint as checkpoint
+
+    from ppdet.data.reader import create_reader
+except ImportError as e:
+    if sys.argv[0].find('static') >= 0:
+        logger.error("Importing ppdet failed when running static model "
+                     "with error: {}\n"
+                     "please try:\n"
+                     "\t1. run static model under PaddleDetection/static "
+                     "directory\n"
+                     "\t2. run 'pip uninstall ppdet' to uninstall ppdet "
+                     "dynamic version firstly.".format(e))
+        sys.exit(-1)
+    else:
+        raise e
+
 
 def get_save_image_name(output_dir, image_path):
     """
diff --git a/static/tools/train.py b/static/tools/train.py
index 104946167a5c54947907288a3838f364bce91659..4417a9d9b8b36d5a23ecde7dbc5988b2b4e4c9fd 100644
--- a/static/tools/train.py
+++ b/static/tools/train.py
@@ -35,22 +35,35 @@ from paddle import fluid
 from paddle.fluid.layers.learning_rate_scheduler import _decay_step_counter
 from paddle.fluid.optimizer import ExponentialMovingAverage
 
-from ppdet.experimental import mixed_precision_context
-from ppdet.core.workspace import load_config, merge_config, create
-from ppdet.data.reader import create_reader
-
-from ppdet.utils import dist_utils
-from ppdet.utils.eval_utils import parse_fetches, eval_run, eval_results
-from ppdet.utils.stats import TrainingStats
-from ppdet.utils.cli import ArgsParser
-from ppdet.utils.check import check_gpu, check_xpu, check_version, check_config, enable_static_mode
-import ppdet.utils.checkpoint as checkpoint
-
 import logging
 FORMAT = '%(asctime)s-%(levelname)s: %(message)s'
 logging.basicConfig(level=logging.INFO, format=FORMAT)
 logger = logging.getLogger(__name__)
 
+try:
+    from ppdet.experimental import mixed_precision_context
+    from ppdet.core.workspace import load_config, merge_config, create
+    from ppdet.data.reader import create_reader
+
+    from ppdet.utils import dist_utils
+    from ppdet.utils.eval_utils import parse_fetches, eval_run, eval_results
+    from ppdet.utils.stats import TrainingStats
+    from ppdet.utils.cli import ArgsParser
+    from ppdet.utils.check import check_gpu, check_xpu, check_version, check_config, enable_static_mode
+    import ppdet.utils.checkpoint as checkpoint
+except ImportError as e:
+    if sys.argv[0].find('static') >= 0:
+        logger.error("Importing ppdet failed when running static model "
+                     "with error: {}\n"
+                     "please try:\n"
+                     "\t1. run static model under PaddleDetection/static "
+                     "directory\n"
+                     "\t2. run 'pip uninstall ppdet' to uninstall ppdet "
+                     "dynamic version firstly.".format(e))
+        sys.exit(-1)
+    else:
+        raise e
+
 
 def main():
     env = os.environ
diff --git a/static/tools/train_multi_machine.py b/static/tools/train_multi_machine.py
index 1ac4901a6de23b0766baaed9d3bed8de0e5576f0..31a7f706a832ac179ab5ef450be1eef64d357aec 100644
--- a/static/tools/train_multi_machine.py
+++ b/static/tools/train_multi_machine.py
@@ -22,6 +22,11 @@ parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2)))
 if parent_path not in sys.path:
     sys.path.append(parent_path)
 
+import logging
+FORMAT = '%(asctime)s-%(levelname)s: %(message)s'
+logging.basicConfig(level=logging.INFO, format=FORMAT)
+logger = logging.getLogger(__name__)
+
 import time
 import numpy as np
 import random
@@ -35,24 +40,32 @@ from paddle import fluid
 from paddle.fluid.layers.learning_rate_scheduler import _decay_step_counter
 from paddle.fluid.optimizer import ExponentialMovingAverage
 
-from ppdet.experimental import mixed_precision_context
-from ppdet.core.workspace import load_config, merge_config, create
-from ppdet.data.reader import create_reader
-
-from ppdet.utils import dist_utils
-from ppdet.utils.eval_utils import parse_fetches, eval_run, eval_results
-from ppdet.utils.stats import TrainingStats
-from ppdet.utils.cli import ArgsParser
-from ppdet.utils.check import check_gpu, check_version, check_config, enable_static_mode
-import ppdet.utils.checkpoint as checkpoint
-
 from paddle.fluid.incubate.fleet.collective import fleet, DistributedStrategy  # new line 1
 from paddle.fluid.incubate.fleet.base import role_maker  # new line 2
 
-import logging
-FORMAT = '%(asctime)s-%(levelname)s: %(message)s'
-logging.basicConfig(level=logging.INFO, format=FORMAT)
-logger = logging.getLogger(__name__)
+try:
+    from ppdet.experimental import mixed_precision_context
+    from ppdet.core.workspace import load_config, merge_config, create
+    from ppdet.data.reader import create_reader
+
+    from ppdet.utils import dist_utils
+    from ppdet.utils.eval_utils import parse_fetches, eval_run, eval_results
+    from ppdet.utils.stats import TrainingStats
+    from ppdet.utils.cli import ArgsParser
+    from ppdet.utils.check import check_gpu, check_version, check_config, enable_static_mode
+    import ppdet.utils.checkpoint as checkpoint
+except ImportError as e:
+    if sys.argv[0].find('static') >= 0:
+        logger.error("Importing ppdet failed when running static model "
+                     "with error: {}\n"
+                     "please try:\n"
+                     "\t1. run static model under PaddleDetection/static "
+                     "directory\n"
+                     "\t2. run 'pip uninstall ppdet' to uninstall ppdet "
+                     "dynamic version firstly.".format(e))
+        sys.exit(-1)
+    else:
+        raise e
 
 
 def main():
diff --git a/tools/anchor_cluster.py b/tools/anchor_cluster.py
new file mode 100644
index 0000000000000000000000000000000000000000..0b339bb367b91057bb6c428be2800d6836460454
--- /dev/null
+++ b/tools/anchor_cluster.py
@@ -0,0 +1,363 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import os
+import sys
+# add python path of PadleDetection to sys.path
+parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2)))
+if parent_path not in sys.path:
+    sys.path.append(parent_path)
+
+from ppdet.utils.logger import setup_logger
+logger = setup_logger('ppdet.anchor_cluster')
+
+from scipy.cluster.vq import kmeans
+import random
+import numpy as np
+from tqdm import tqdm
+
+from ppdet.utils.cli import ArgsParser
+from ppdet.utils.check import check_gpu, check_version, check_config
+from ppdet.core.workspace import load_config, merge_config, create
+
+
+class BaseAnchorCluster(object):
+    def __init__(self, n, cache_path, cache, verbose=True):
+        """
+        Base Anchor Cluster
+
+        Args:
+            n (int): number of clusters
+            cache_path (str): cache directory path
+            cache (bool): whether using cache
+            verbose (bool): whether print results
+        """
+        super(BaseAnchorCluster, self).__init__()
+        self.n = n
+        self.cache_path = cache_path
+        self.cache = cache
+        self.verbose = verbose
+
+    def print_result(self, centers):
+        raise NotImplementedError('%s.print_result is not available' %
+                                  self.__class__.__name__)
+
+    def get_whs(self):
+        whs_cache_path = os.path.join(self.cache_path, 'whs.npy')
+        shapes_cache_path = os.path.join(self.cache_path, 'shapes.npy')
+        if self.cache and os.path.exists(whs_cache_path) and os.path.exists(
+                shapes_cache_path):
+            self.whs = np.load(whs_cache_path)
+            self.shapes = np.load(shapes_cache_path)
+            return self.whs, self.shapes
+        whs = np.zeros((0, 2))
+        shapes = np.zeros((0, 2))
+        self.dataset.parse_dataset()
+        roidbs = self.dataset.roidbs
+        for rec in tqdm(roidbs):
+            h, w = rec['h'], rec['w']
+            bbox = rec['gt_bbox']
+            wh = bbox[:, 2:4] - bbox[:, 0:2] + 1
+            wh = wh / np.array([[w, h]])
+            shape = np.ones_like(wh) * np.array([[w, h]])
+            whs = np.vstack((whs, wh))
+            shapes = np.vstack((shapes, shape))
+
+        if self.cache:
+            os.makedirs(self.cache_path, exist_ok=True)
+            np.save(whs_cache_path, whs)
+            np.save(shapes_cache_path, shapes)
+
+        self.whs = whs
+        self.shapes = shapes
+        return self.whs, self.shapes
+
+    def calc_anchors(self):
+        raise NotImplementedError('%s.calc_anchors is not available' %
+                                  self.__class__.__name__)
+
+    def __call__(self):
+        self.get_whs()
+        centers = self.calc_anchors()
+        if self.verbose:
+            self.print_result(centers)
+        return centers
+
+
+class YOLOv2AnchorCluster(BaseAnchorCluster):
+    def __init__(self,
+                 n,
+                 dataset,
+                 size,
+                 cache_path,
+                 cache,
+                 iters=1000,
+                 verbose=True):
+        super(YOLOv2AnchorCluster, self).__init__(
+            n, cache_path, cache, verbose=verbose)
+        """
+        YOLOv2 Anchor Cluster
+
+        Reference:
+            https://github.com/AlexeyAB/darknet/blob/master/scripts/gen_anchors.py
+
+        Args:
+            n (int): number of clusters
+            dataset (DataSet): DataSet instance, VOC or COCO
+            size (list): [w, h]
+            cache_path (str): cache directory path
+            cache (bool): whether using cache
+            iters (int): kmeans algorithm iters
+            verbose (bool): whether print results
+        """
+        self.dataset = dataset
+        self.size = size
+        self.iters = iters
+
+    def print_result(self, centers):
+        logger.info('%d anchor cluster result: [w, h]' % self.n)
+        for w, h in centers:
+            logger.info('[%d, %d]' % (round(w), round(h)))
+
+    def metric(self, whs, centers):
+        wh1 = whs[:, None]
+        wh2 = centers[None]
+        inter = np.minimum(wh1, wh2).prod(2)
+        return inter / (wh1.prod(2) + wh2.prod(2) - inter)
+
+    def kmeans_expectation(self, whs, centers, assignments):
+        dist = self.metric(whs, centers)
+        new_assignments = dist.argmax(1)
+        converged = (new_assignments == assignments).all()
+        return converged, new_assignments
+
+    def kmeans_maximizations(self, whs, centers, assignments):
+        new_centers = np.zeros_like(centers)
+        for i in range(centers.shape[0]):
+            mask = (assignments == i)
+            if mask.sum():
+                new_centers[i, :] = whs[mask].mean(0)
+        return new_centers
+
+    def calc_anchors(self):
+        self.whs = self.whs * np.array([self.size])
+        # random select k centers
+        whs, n, iters = self.whs, self.n, self.iters
+        logger.info('Running kmeans for %d anchors on %d points...' %
+                    (n, len(whs)))
+        idx = np.random.choice(whs.shape[0], size=n, replace=False)
+        centers = whs[idx]
+        assignments = np.zeros(whs.shape[0:1]) * -1
+        # kmeans
+        if n == 1:
+            return self.kmeans_maximizations(whs, centers, assignments)
+
+        pbar = tqdm(range(iters), desc='Cluster anchors with k-means algorithm')
+        for _ in pbar:
+            # E step
+            converged, assignments = self.kmeans_expectation(whs, centers,
+                                                             assignments)
+            if converged:
+                logger.info('kmeans algorithm has converged')
+                break
+            # M step
+            centers = self.kmeans_maximizations(whs, centers, assignments)
+            ious = self.metric(whs, centers)
+            pbar.desc = 'avg_iou: %.4f' % (ious.max(1).mean())
+
+        centers = sorted(centers, key=lambda x: x[0] * x[1])
+        return centers
+
+
+class YOLOv5AnchorCluster(BaseAnchorCluster):
+    def __init__(self,
+                 n,
+                 dataset,
+                 size,
+                 cache_path,
+                 cache,
+                 iters=300,
+                 gen_iters=1000,
+                 thresh=0.25,
+                 verbose=True):
+        super(YOLOv5AnchorCluster, self).__init__(
+            n, cache_path, cache, verbose=verbose)
+        """
+        YOLOv5 Anchor Cluster
+
+        Reference:
+            https://github.com/ultralytics/yolov5/blob/master/utils/general.py
+
+        Args:
+            n (int): number of clusters
+            dataset (DataSet): DataSet instance, VOC or COCO
+            size (list): [w, h]
+            cache_path (str): cache directory path
+            cache (bool): whether using cache
+            iters (int): iters of kmeans algorithm
+            gen_iters (int): iters of genetic algorithm
+            threshold (float): anchor scale threshold
+            verbose (bool): whether print results
+        """
+        self.dataset = dataset
+        self.size = size
+        self.iters = iters
+        self.gen_iters = gen_iters
+        self.thresh = thresh
+
+    def print_result(self, centers):
+        whs = self.whs
+        centers = centers[np.argsort(centers.prod(1))]
+        x, best = self.metric(whs, centers)
+        bpr, aat = (
+            best > self.thresh).mean(), (x > self.thresh).mean() * self.n
+        logger.info(
+            'thresh=%.2f: %.4f best possible recall, %.2f anchors past thr' %
+            (self.thresh, bpr, aat))
+        logger.info(
+            'n=%g, img_size=%s, metric_all=%.3f/%.3f-mean/best, past_thresh=%.3f-mean: '
+            % (self.n, self.size, x.mean(), best.mean(),
+               x[x > self.thresh].mean()))
+        logger.info('%d anchor cluster result: [w, h]' % self.n)
+        for w, h in centers:
+            logger.info('[%d, %d]' % (round(w), round(h)))
+
+    def metric(self, whs, centers):
+        r = whs[:, None] / centers[None]
+        x = np.minimum(r, 1. / r).min(2)
+        return x, x.max(1)
+
+    def fitness(self, whs, centers):
+        _, best = self.metric(whs, centers)
+        return (best * (best > self.thresh)).mean()
+
+    def calc_anchors(self):
+        self.whs = self.whs * self.shapes / self.shapes.max(
+            1, keepdims=True) * np.array([self.size])
+        wh0 = self.whs
+        i = (wh0 < 3.0).any(1).sum()
+        if i:
+            logger.warn('Extremely small objects found. %d of %d'
+                        'labels are < 3 pixels in width or height' %
+                        (i, len(wh0)))
+
+        wh = wh0[(wh0 >= 2.0).any(1)]
+        logger.info('Running kmeans for %g anchors on %g points...' %
+                    (self.n, len(wh)))
+        s = wh.std(0)
+        centers, dist = kmeans(wh / s, self.n, iter=self.iters)
+        centers *= s
+
+        f, sh, mp, s = self.fitness(wh, centers), centers.shape, 0.9, 0.1
+        pbar = tqdm(
+            range(self.gen_iters),
+            desc='Evolving anchors with Genetic Algorithm')
+        for _ in pbar:
+            v = np.ones(sh)
+            while (v == 1).all():
+                v = ((np.random.random(sh) < mp) * np.random.random() *
+                     np.random.randn(*sh) * s + 1).clip(0.3, 3.0)
+            new_centers = (centers.copy() * v).clip(min=2.0)
+            new_f = self.fitness(wh, new_centers)
+            if new_f > f:
+                f, centers = new_f, new_centers.copy()
+                pbar.desc = 'Evolving anchors with Genetic Algorithm: fitness = %.4f' % f
+
+        return centers
+
+
+def main():
+    parser = ArgsParser()
+    parser.add_argument(
+        '--n', '-n', default=9, type=int, help='num of clusters')
+    parser.add_argument(
+        '--iters',
+        '-i',
+        default=1000,
+        type=int,
+        help='num of iterations for kmeans')
+    parser.add_argument(
+        '--gen_iters',
+        '-gi',
+        default=1000,
+        type=int,
+        help='num of iterations for genetic algorithm')
+    parser.add_argument(
+        '--thresh',
+        '-t',
+        default=0.25,
+        type=float,
+        help='anchor scale threshold')
+    parser.add_argument(
+        '--verbose', '-v', default=True, type=bool, help='whether print result')
+    parser.add_argument(
+        '--size',
+        '-s',
+        default=None,
+        type=str,
+        help='image size: w,h, using comma as delimiter')
+    parser.add_argument(
+        '--method',
+        '-m',
+        default='v2',
+        type=str,
+        help='cluster method, [v2, v5] are supported now')
+    parser.add_argument(
+        '--cache_path', default='cache', type=str, help='cache path')
+    parser.add_argument(
+        '--cache', action='store_true', help='whether use cache')
+    FLAGS = parser.parse_args()
+
+    cfg = load_config(FLAGS.config)
+    merge_config(FLAGS.opt)
+    check_config(cfg)
+    # check if set use_gpu=True in paddlepaddle cpu version
+    check_gpu(cfg.use_gpu)
+    # check if paddlepaddle version is satisfied
+    check_version()
+
+    # get dataset
+    dataset = cfg['TrainDataset']
+    if FLAGS.size:
+        if ',' in FLAGS.size:
+            size = list(map(int, FLAGS.size.split(',')))
+            assert len(size) == 2, "the format of size is incorrect"
+        else:
+            size = int(FLAGS.size)
+            size = [size, size]
+    elif 'inputs_def' in cfg['TrainReader'] and 'image_shape' in cfg[
+            'TrainReader']['inputs_def']:
+        size = cfg['TrainReader']['inputs_def']['image_shape'][1:]
+    else:
+        raise ValueError('size is not specified')
+
+    if FLAGS.method == 'v2':
+        cluster = YOLOv2AnchorCluster(FLAGS.n, dataset, size, FLAGS.cache_path,
+                                      FLAGS.cache, FLAGS.iters, FLAGS.verbose)
+    elif FLAGS.method == 'v5':
+        cluster = YOLOv5AnchorCluster(FLAGS.n, dataset, size, FLAGS.cache_path,
+                                      FLAGS.cache, FLAGS.iters, FLAGS.gen_iters,
+                                      FLAGS.thresh, FLAGS.verbose)
+    else:
+        raise ValueError('cluster method: %s is not supported' % FLAGS.method)
+
+    anchors = cluster()
+
+
+if __name__ == "__main__":
+    main()
diff --git a/tools/eval.py b/tools/eval.py
index 8b0064762d4cbbad8bc0b6461ee289a60d1eec72..5df7a07745b86009b02f34f81edb27a81bed7822 100755
--- a/tools/eval.py
+++ b/tools/eval.py
@@ -27,13 +27,13 @@ import warnings
 warnings.filterwarnings('ignore')
 
 import paddle
-from paddle.distributed import ParallelEnv
 
 from ppdet.core.workspace import load_config, merge_config
 from ppdet.utils.check import check_gpu, check_version, check_config
 from ppdet.utils.cli import ArgsParser
 from ppdet.engine import Trainer, init_parallel_env
 from ppdet.metrics.coco_utils import json_eval_results
+from ppdet.slim import build_slim_model
 
 from ppdet.utils.logger import setup_logger
 logger = setup_logger('eval')
@@ -70,6 +70,12 @@ def parse_args():
         action="store_true",
         help="whether per-category AP and draw P-R Curve or not.")
 
+    parser.add_argument(
+        '--save_prediction_only',
+        action='store_true',
+        default=False,
+        help='Whether to save the evaluation results only')
+
     args = parser.parse_args()
     return args
 
@@ -89,7 +95,7 @@ def run(FLAGS, cfg):
     # init parallel environment if nranks > 1
     init_parallel_env()
 
-    # build trainer 
+    # build trainer
     trainer = Trainer(cfg, mode='eval')
 
     # load weights
@@ -101,23 +107,26 @@ def run(FLAGS, cfg):
 
 def main():
     FLAGS = parse_args()
-
     cfg = load_config(FLAGS.config)
     # TODO: bias should be unified
     cfg['bias'] = 1 if FLAGS.bias else 0
     cfg['classwise'] = True if FLAGS.classwise else False
     cfg['output_eval'] = FLAGS.output_eval
+    cfg['save_prediction_only'] = FLAGS.save_prediction_only
     merge_config(FLAGS.opt)
+
+    place = paddle.set_device('gpu' if cfg.use_gpu else 'cpu')
+
+    if 'norm_type' in cfg and cfg['norm_type'] == 'sync_bn' and not cfg.use_gpu:
+        cfg['norm_type'] = 'bn'
+
     if FLAGS.slim_config:
-        slim_cfg = load_config(FLAGS.slim_config)
-        merge_config(slim_cfg)
+        cfg = build_slim_model(cfg, FLAGS.slim_config, mode='eval')
+
     check_config(cfg)
     check_gpu(cfg.use_gpu)
     check_version()
 
-    place = 'gpu:{}'.format(ParallelEnv().dev_id) if cfg.use_gpu else 'cpu'
-    place = paddle.set_device(place)
-
     run(FLAGS, cfg)
 
 
diff --git a/tools/export_model.py b/tools/export_model.py
index d04422873d06e49e2de596d09496c91778251532..8cf3885c88552ca9b48f8b8d6796377d96912a2e 100644
--- a/tools/export_model.py
+++ b/tools/export_model.py
@@ -31,6 +31,7 @@ from ppdet.core.workspace import load_config, merge_config
 from ppdet.utils.check import check_gpu, check_version, check_config
 from ppdet.utils.cli import ArgsParser
 from ppdet.engine import Trainer
+from ppdet.slim import build_slim_model
 
 from ppdet.utils.logger import setup_logger
 logger = setup_logger('export_model')
@@ -84,15 +85,15 @@ def run(FLAGS, cfg):
 def main():
     paddle.set_device("cpu")
     FLAGS = parse_args()
-
     cfg = load_config(FLAGS.config)
     # TODO: to be refined in the future
     if 'norm_type' in cfg and cfg['norm_type'] == 'sync_bn':
         FLAGS.opt['norm_type'] = 'bn'
     merge_config(FLAGS.opt)
+
     if FLAGS.slim_config:
-        slim_cfg = load_config(FLAGS.slim_config)
-        merge_config(slim_cfg)
+        cfg = build_slim_model(cfg, FLAGS.slim_config, mode='test')
+
     check_config(cfg)
     check_gpu(cfg.use_gpu)
     check_version()
diff --git a/tools/infer.py b/tools/infer.py
index 9226e1eea84c0ca15d14b917623b193b5edcb779..7ea0d2353a2472127e5cbfeb07c63f66d88f2830 100755
--- a/tools/infer.py
+++ b/tools/infer.py
@@ -27,11 +27,11 @@ warnings.filterwarnings('ignore')
 import glob
 
 import paddle
-from paddle.distributed import ParallelEnv
 from ppdet.core.workspace import load_config, merge_config
 from ppdet.engine import Trainer
 from ppdet.utils.check import check_gpu, check_version, check_config
 from ppdet.utils.cli import ArgsParser
+from ppdet.slim import build_slim_model
 
 from ppdet.utils.logger import setup_logger
 logger = setup_logger('train')
@@ -68,12 +68,17 @@ def parse_args():
         "--use_vdl",
         type=bool,
         default=False,
-        help="whether to record the data to VisualDL.")
+        help="Whether to record the data to VisualDL.")
     parser.add_argument(
         '--vdl_log_dir',
         type=str,
         default="vdl_log_dir/image",
         help='VisualDL logging directory for image.')
+    parser.add_argument(
+        "--save_txt",
+        type=bool,
+        default=False,
+        help="Whether to save inference result in txt.")
     args = parser.parse_args()
     return args
 
@@ -123,25 +128,29 @@ def run(FLAGS, cfg):
     trainer.predict(
         images,
         draw_threshold=FLAGS.draw_threshold,
-        output_dir=FLAGS.output_dir)
+        output_dir=FLAGS.output_dir,
+        save_txt=FLAGS.save_txt)
 
 
 def main():
     FLAGS = parse_args()
-
     cfg = load_config(FLAGS.config)
     cfg['use_vdl'] = FLAGS.use_vdl
     cfg['vdl_log_dir'] = FLAGS.vdl_log_dir
     merge_config(FLAGS.opt)
+
+    place = paddle.set_device('gpu' if cfg.use_gpu else 'cpu')
+
+    if 'norm_type' in cfg and cfg['norm_type'] == 'sync_bn' and not cfg.use_gpu:
+        cfg['norm_type'] = 'bn'
+
     if FLAGS.slim_config:
-        slim_cfg = load_config(FLAGS.slim_config)
-        merge_config(slim_cfg)
+        cfg = build_slim_model(cfg, FLAGS.slim_config, mode='test')
+
     check_config(cfg)
     check_gpu(cfg.use_gpu)
     check_version()
 
-    place = 'gpu:{}'.format(ParallelEnv().dev_id) if cfg.use_gpu else 'cpu'
-    place = paddle.set_device(place)
     run(FLAGS, cfg)
 
 
diff --git a/tools/train.py b/tools/train.py
index e7efcd07a30a729435437ae5db1a780f5fc6d7b2..d9ef6d6f24d89d6322b6ecd01d358049386af971 100755
--- a/tools/train.py
+++ b/tools/train.py
@@ -22,18 +22,18 @@ parent_path = os.path.abspath(os.path.join(__file__, *(['..'] * 2)))
 if parent_path not in sys.path:
     sys.path.append(parent_path)
 
+import random
+import numpy as np
 # ignore warning log
 import warnings
 warnings.filterwarnings('ignore')
-import random
-import numpy as np
 
 import paddle
-from paddle.distributed import ParallelEnv
 
 from ppdet.core.workspace import load_config, merge_config, create
-from ppdet.utils.checkpoint import load_weight, load_pretrain_weight
+from ppdet.utils.checkpoint import load_weight
 from ppdet.engine import Trainer, init_parallel_env, set_random_seed, init_fleet_env
+from ppdet.slim import build_slim_model
 
 import ppdet.utils.cli as cli
 import ppdet.utils.check as check
@@ -78,6 +78,11 @@ def parse_args():
         type=str,
         default="vdl_log_dir/scalar",
         help='VisualDL logging directory for scalar.')
+    parser.add_argument(
+        '--save_prediction_only',
+        action='store_true',
+        default=False,
+        help='Whether to save the evaluation results only')
     args = parser.parse_args()
     return args
 
@@ -99,7 +104,7 @@ def run(FLAGS, cfg):
     # load weights
     if FLAGS.resume is not None:
         trainer.resume_weights(FLAGS.resume)
-    elif not FLAGS.slim_config and 'pretrain_weights' in cfg and cfg.pretrain_weights:
+    elif 'pretrain_weights' in cfg and cfg.pretrain_weights:
         trainer.load_weights(cfg.pretrain_weights)
 
     # training
@@ -108,23 +113,26 @@ def run(FLAGS, cfg):
 
 def main():
     FLAGS = parse_args()
-
     cfg = load_config(FLAGS.config)
     cfg['fp16'] = FLAGS.fp16
     cfg['fleet'] = FLAGS.fleet
     cfg['use_vdl'] = FLAGS.use_vdl
     cfg['vdl_log_dir'] = FLAGS.vdl_log_dir
+    cfg['save_prediction_only'] = FLAGS.save_prediction_only
     merge_config(FLAGS.opt)
+
+    place = paddle.set_device('gpu' if cfg.use_gpu else 'cpu')
+
+    if 'norm_type' in cfg and cfg['norm_type'] == 'sync_bn' and not cfg.use_gpu:
+        cfg['norm_type'] = 'bn'
+
     if FLAGS.slim_config:
-        slim_cfg = load_config(FLAGS.slim_config)
-        merge_config(slim_cfg)
+        cfg = build_slim_model(cfg, FLAGS.slim_config)
+
     check.check_config(cfg)
     check.check_gpu(cfg.use_gpu)
     check.check_version()
 
-    place = 'gpu:{}'.format(ParallelEnv().dev_id) if cfg.use_gpu else 'cpu'
-    place = paddle.set_device(place)
-
     run(FLAGS, cfg)
 
 
diff --git a/ppdet/data/tools/x2coco.py b/tools/x2coco.py
similarity index 45%
rename from ppdet/data/tools/x2coco.py
rename to tools/x2coco.py
index 53faa3f5c6c48ea3b81d65df1b18aa7aa1a1f5e1..ef2f0d7172da4f678b5405a8b56dbfe3e808a590 100644
--- a/ppdet/data/tools/x2coco.py
+++ b/tools/x2coco.py
@@ -21,6 +21,9 @@ import os
 import os.path as osp
 import sys
 import shutil
+import xml.etree.ElementTree as ET
+from tqdm import tqdm
+import re
 
 import numpy as np
 import PIL.ImageDraw
@@ -42,18 +45,15 @@ class MyEncoder(json.JSONEncoder):
             return super(MyEncoder, self).default(obj)
 
 
-def getbbox(self, points):
-    polygons = points
-    mask = self.polygons_to_mask([self.height, self.width], polygons)
-    return self.mask2box(mask)
-
-
 def images_labelme(data, num):
     image = {}
     image['height'] = data['imageHeight']
     image['width'] = data['imageWidth']
     image['id'] = num + 1
-    image['file_name'] = data['imagePath'].split('/')[-1]
+    if '\\' in data['imagePath']:
+        image['file_name'] = data['imagePath'].split('\\')[-1]
+    else:
+        image['file_name'] = data['imagePath'].split('/')[-1]
     return image
 
 
@@ -154,17 +154,19 @@ def deal_json(ds_type, img_path, json_path):
                         categories_list.append(categories(label, labels_list))
                         labels_list.append(label)
                         label_to_num[label] = len(labels_list)
-                    points = shapes['points']
                     p_type = shapes['shape_type']
                     if p_type == 'polygon':
+                        points = shapes['points']
                         annotations_list.append(
                             annotations_polygon(data['imageHeight'], data[
                                 'imageWidth'], points, label, image_num,
                                                 object_num, label_to_num))
 
                     if p_type == 'rectangle':
-                        points.append([points[0][0], points[1][1]])
-                        points.append([points[1][0], points[0][1]])
+                        (x1, y1), (x2, y2) = shapes['points']
+                        x1, x2 = sorted([x1, x2])
+                        y1, y2 = sorted([y1, y2])
+                        points = [[x1, y1], [x2, y2], [x1, y2], [x2, y1]]
                         annotations_list.append(
                             annotations_rectangle(points, label, image_num,
                                                   object_num, label_to_num))
@@ -187,14 +189,108 @@ def deal_json(ds_type, img_path, json_path):
     return data_coco
 
 
+def voc_get_label_anno(ann_dir_path, ann_ids_path, labels_path):
+    with open(labels_path, 'r') as f:
+        labels_str = f.read().split()
+    labels_ids = list(range(1, len(labels_str) + 1))
+
+    with open(ann_ids_path, 'r') as f:
+        ann_ids = f.read().split()
+    ann_paths = []
+    for aid in ann_ids:
+        if aid.endswith('xml'):
+            ann_path = os.path.join(ann_dir_path, aid)
+        else:
+            ann_path = os.path.join(ann_dir_path, aid + '.xml')
+        ann_paths.append(ann_path)
+
+    return dict(zip(labels_str, labels_ids)), ann_paths
+
+
+def voc_get_image_info(annotation_root, im_id):
+    filename = annotation_root.findtext('filename')
+    assert filename is not None
+    img_name = os.path.basename(filename)
+
+    size = annotation_root.find('size')
+    width = float(size.findtext('width'))
+    height = float(size.findtext('height'))
+
+    image_info = {
+        'file_name': filename,
+        'height': height,
+        'width': width,
+        'id': im_id
+    }
+    return image_info
+
+
+def voc_get_coco_annotation(obj, label2id):
+    label = obj.findtext('name')
+    assert label in label2id, "label is not in label2id."
+    category_id = label2id[label]
+    bndbox = obj.find('bndbox')
+    xmin = float(bndbox.findtext('xmin'))
+    ymin = float(bndbox.findtext('ymin'))
+    xmax = float(bndbox.findtext('xmax'))
+    ymax = float(bndbox.findtext('ymax'))
+    assert xmax > xmin and ymax > ymin, "Box size error."
+    o_width = xmax - xmin
+    o_height = ymax - ymin
+    anno = {
+        'area': o_width * o_height,
+        'iscrowd': 0,
+        'bbox': [xmin, ymin, o_width, o_height],
+        'category_id': category_id,
+        'ignore': 0,
+    }
+    return anno
+
+
+def voc_xmls_to_cocojson(annotation_paths, label2id, output_dir, output_file):
+    output_json_dict = {
+        "images": [],
+        "type": "instances",
+        "annotations": [],
+        "categories": []
+    }
+    bnd_id = 1  # bounding box start id
+    im_id = 0
+    print('Start converting !')
+    for a_path in tqdm(annotation_paths):
+        # Read annotation xml
+        ann_tree = ET.parse(a_path)
+        ann_root = ann_tree.getroot()
+
+        img_info = voc_get_image_info(ann_root, im_id)
+        output_json_dict['images'].append(img_info)
+
+        for obj in ann_root.findall('object'):
+            ann = voc_get_coco_annotation(obj=obj, label2id=label2id)
+            ann.update({'image_id': im_id, 'id': bnd_id})
+            output_json_dict['annotations'].append(ann)
+            bnd_id = bnd_id + 1
+        im_id += 1
+
+    for label, label_id in label2id.items():
+        category_info = {'supercategory': 'none', 'id': label_id, 'name': label}
+        output_json_dict['categories'].append(category_info)
+    output_file = os.path.join(output_dir, output_file)
+    with open(output_file, 'w') as f:
+        output_json = json.dumps(output_json_dict)
+        f.write(output_json)
+
+
 def main():
     parser = argparse.ArgumentParser(
         formatter_class=argparse.ArgumentDefaultsHelpFormatter)
-    parser.add_argument('--dataset_type', help='the type of dataset')
+    parser.add_argument(
+        '--dataset_type',
+        help='the type of dataset, can be `voc`, `labelme` or `cityscape`')
     parser.add_argument('--json_input_dir', help='input annotated directory')
     parser.add_argument('--image_input_dir', help='image directory')
     parser.add_argument(
-        '--output_dir', help='output dataset directory', default='../../../')
+        '--output_dir', help='output dataset directory', default='./')
     parser.add_argument(
         '--train_proportion',
         help='the proportion of train dataset',
@@ -210,96 +306,143 @@ def main():
         help='the proportion of test dataset',
         type=float,
         default=0.0)
+    parser.add_argument(
+        '--voc_anno_dir',
+        help='In Voc format dataset, path to annotation files directory.',
+        type=str,
+        default=None)
+    parser.add_argument(
+        '--voc_anno_list',
+        help='In Voc format dataset, path to annotation files ids list.',
+        type=str,
+        default=None)
+    parser.add_argument(
+        '--voc_label_list',
+        help='In Voc format dataset, path to label list. The content of each line is a category.',
+        type=str,
+        default=None)
+    parser.add_argument(
+        '--voc_out_name',
+        type=str,
+        default='voc.json',
+        help='In Voc format dataset, path to output json file')
     args = parser.parse_args()
     try:
-        assert args.dataset_type in ['labelme', 'cityscape']
-    except AssertionError as e:
-        print('Now only support the cityscape dataset and labelme dataset!!')
-        os._exit(0)
-    try:
-        assert os.path.exists(args.json_input_dir)
-    except AssertionError as e:
-        print('The json folder does not exist!')
-        os._exit(0)
-    try:
-        assert os.path.exists(args.image_input_dir)
-    except AssertionError as e:
-        print('The image folder does not exist!')
-        os._exit(0)
-    try:
-        assert abs(args.train_proportion + args.val_proportion \
-                   + args.test_proportion - 1.0) < 1e-5
+        assert args.dataset_type in ['voc', 'labelme', 'cityscape']
     except AssertionError as e:
         print(
-            'The sum of pqoportion of training, validation and test datase must be 1!'
-        )
+            'Now only support the voc, cityscape dataset and labelme dataset!!')
         os._exit(0)
 
-    # Allocate the dataset.
-    total_num = len(glob.glob(osp.join(args.json_input_dir, '*.json')))
-    if args.train_proportion != 0:
-        train_num = int(total_num * args.train_proportion)
-        os.makedirs(args.output_dir + '/train')
-    else:
-        train_num = 0
-    if args.val_proportion == 0.0:
-        val_num = 0
-        test_num = total_num - train_num
-        if args.test_proportion != 0.0:
-            os.makedirs(args.output_dir + '/test')
+    if args.dataset_type == 'voc':
+        assert args.voc_anno_dir and args.voc_anno_list and args.voc_label_list
+        label2id, ann_paths = voc_get_label_anno(
+            args.voc_anno_dir, args.voc_anno_list, args.voc_label_list)
+        voc_xmls_to_cocojson(
+            annotation_paths=ann_paths,
+            label2id=label2id,
+            output_dir=args.output_dir,
+            output_file=args.voc_out_name)
     else:
-        val_num = int(total_num * args.val_proportion)
-        test_num = total_num - train_num - val_num
-        os.makedirs(args.output_dir + '/val')
-        if args.test_proportion != 0.0:
-            os.makedirs(args.output_dir + '/test')
-    count = 1
-    for img_name in os.listdir(args.image_input_dir):
-        if count <= train_num:
-            if osp.exists(args.output_dir + '/train/'):
-                shutil.copyfile(
-                    osp.join(args.image_input_dir, img_name),
-                    osp.join(args.output_dir + '/train/', img_name))
+        try:
+            assert os.path.exists(args.json_input_dir)
+        except AssertionError as e:
+            print('The json folder does not exist!')
+            os._exit(0)
+        try:
+            assert os.path.exists(args.image_input_dir)
+        except AssertionError as e:
+            print('The image folder does not exist!')
+            os._exit(0)
+        try:
+            assert abs(args.train_proportion + args.val_proportion \
+                    + args.test_proportion - 1.0) < 1e-5
+        except AssertionError as e:
+            print(
+                'The sum of pqoportion of training, validation and test datase must be 1!'
+            )
+            os._exit(0)
+
+        # Allocate the dataset.
+        total_num = len(glob.glob(osp.join(args.json_input_dir, '*.json')))
+        if args.train_proportion != 0:
+            train_num = int(total_num * args.train_proportion)
+            out_dir = args.output_dir + '/train'
+            if not os.path.exists(out_dir):
+                os.makedirs(out_dir)
+        else:
+            train_num = 0
+        if args.val_proportion == 0.0:
+            val_num = 0
+            test_num = total_num - train_num
+            out_dir = args.output_dir + '/test'
+            if args.test_proportion != 0.0 and not os.path.exists(out_dir):
+                os.makedirs(out_dir)
         else:
-            if count <= train_num + val_num:
-                if osp.exists(args.output_dir + '/val/'):
+            val_num = int(total_num * args.val_proportion)
+            test_num = total_num - train_num - val_num
+            val_out_dir = args.output_dir + '/val'
+            if not os.path.exists(val_out_dir):
+                os.makedirs(val_out_dir)
+            test_out_dir = args.output_dir + '/test'
+            if args.test_proportion != 0.0 and not os.path.exists(test_out_dir):
+                os.makedirs(test_out_dir)
+        count = 1
+        for img_name in os.listdir(args.image_input_dir):
+            if count <= train_num:
+                if osp.exists(args.output_dir + '/train/'):
                     shutil.copyfile(
                         osp.join(args.image_input_dir, img_name),
-                        osp.join(args.output_dir + '/val/', img_name))
+                        osp.join(args.output_dir + '/train/', img_name))
             else:
-                if osp.exists(args.output_dir + '/test/'):
-                    shutil.copyfile(
-                        osp.join(args.image_input_dir, img_name),
-                        osp.join(args.output_dir + '/test/', img_name))
-        count = count + 1
-
-    # Deal with the json files.
-    if not os.path.exists(args.output_dir + '/annotations'):
-        os.makedirs(args.output_dir + '/annotations')
-    if args.train_proportion != 0:
-        train_data_coco = deal_json(
-            args.dataset_type, args.output_dir + '/train', args.json_input_dir)
-        train_json_path = osp.join(args.output_dir + '/annotations',
-                                   'instance_train.json')
-        json.dump(
-            train_data_coco,
-            open(train_json_path, 'w'),
-            indent=4,
-            cls=MyEncoder)
-    if args.val_proportion != 0:
-        val_data_coco = deal_json(args.dataset_type, args.output_dir + '/val',
-                                  args.json_input_dir)
-        val_json_path = osp.join(args.output_dir + '/annotations',
-                                 'instance_val.json')
-        json.dump(
-            val_data_coco, open(val_json_path, 'w'), indent=4, cls=MyEncoder)
-    if args.test_proportion != 0:
-        test_data_coco = deal_json(args.dataset_type, args.output_dir + '/test',
-                                   args.json_input_dir)
-        test_json_path = osp.join(args.output_dir + '/annotations',
-                                  'instance_test.json')
-        json.dump(
-            test_data_coco, open(test_json_path, 'w'), indent=4, cls=MyEncoder)
+                if count <= train_num + val_num:
+                    if osp.exists(args.output_dir + '/val/'):
+                        shutil.copyfile(
+                            osp.join(args.image_input_dir, img_name),
+                            osp.join(args.output_dir + '/val/', img_name))
+                else:
+                    if osp.exists(args.output_dir + '/test/'):
+                        shutil.copyfile(
+                            osp.join(args.image_input_dir, img_name),
+                            osp.join(args.output_dir + '/test/', img_name))
+            count = count + 1
+
+        # Deal with the json files.
+        if not os.path.exists(args.output_dir + '/annotations'):
+            os.makedirs(args.output_dir + '/annotations')
+        if args.train_proportion != 0:
+            train_data_coco = deal_json(args.dataset_type,
+                                        args.output_dir + '/train',
+                                        args.json_input_dir)
+            train_json_path = osp.join(args.output_dir + '/annotations',
+                                       'instance_train.json')
+            json.dump(
+                train_data_coco,
+                open(train_json_path, 'w'),
+                indent=4,
+                cls=MyEncoder)
+        if args.val_proportion != 0:
+            val_data_coco = deal_json(args.dataset_type,
+                                      args.output_dir + '/val',
+                                      args.json_input_dir)
+            val_json_path = osp.join(args.output_dir + '/annotations',
+                                     'instance_val.json')
+            json.dump(
+                val_data_coco,
+                open(val_json_path, 'w'),
+                indent=4,
+                cls=MyEncoder)
+        if args.test_proportion != 0:
+            test_data_coco = deal_json(args.dataset_type,
+                                       args.output_dir + '/test',
+                                       args.json_input_dir)
+            test_json_path = osp.join(args.output_dir + '/annotations',
+                                      'instance_test.json')
+            json.dump(
+                test_data_coco,
+                open(test_json_path, 'w'),
+                indent=4,
+                cls=MyEncoder)
 
 
 if __name__ == '__main__':