Ver código fonte

Merge branch 'develop' of github.com:joey12300/PaddleX into develop

jack 5 anos atrás
pai
commit
4c63156591
55 arquivos alterados com 1852 adições e 711 exclusões
  1. 1 0
      README.md
  2. 4 4
      docs/FAQ.md
  3. 1 1
      docs/apis/models/classification.md
  4. 2 2
      docs/apis/models/semantic_segmentation.md
  5. 1 1
      docs/apis/transforms/seg_transforms.md
  6. 6 4
      docs/apis/visualize.md
  7. 1 0
      docs/appendix/index.rst
  8. 14 3
      docs/appendix/interpret.md
  9. 2 2
      docs/appendix/model_zoo.md
  10. 121 0
      docs/appendix/slim_model_zoo.md
  11. 105 36
      docs/cv_solutions.md
  12. BIN
      docs/images/lime.png
  13. BIN
      docs/images/normlime.png
  14. 1 1
      docs/tutorials/deploy/deploy_lite.md
  15. 2 2
      docs/tutorials/deploy/deploy_server/deploy_cpp/deploy_cpp_linux.md
  16. 181 0
      examples/human_segmentation/README.md
  17. 314 0
      examples/human_segmentation/bg_replace.py
  18. 33 0
      examples/human_segmentation/data/download_data.py
  19. 85 0
      examples/human_segmentation/eval.py
  20. 109 0
      examples/human_segmentation/infer.py
  21. 125 0
      examples/human_segmentation/postprocess.py
  22. 40 0
      examples/human_segmentation/pretrain_weights/download_pretrain_weights.py
  23. 85 0
      examples/human_segmentation/quant_offline.py
  24. 156 0
      examples/human_segmentation/train.py
  25. 187 0
      examples/human_segmentation/video_infer.py
  26. 0 21
      new_tutorials/train/README.md
  27. 0 47
      new_tutorials/train/classification/mobilenetv2.py
  28. 0 56
      new_tutorials/train/classification/resnet50.py
  29. 0 49
      new_tutorials/train/detection/faster_rcnn_r50_fpn.py
  30. 0 48
      new_tutorials/train/detection/mask_rcnn_r50_fpn.py
  31. 0 48
      new_tutorials/train/detection/yolov3_darknet53.py
  32. 0 51
      new_tutorials/train/segmentation/deeplabv3p.py
  33. 0 47
      new_tutorials/train/segmentation/hrnet.py
  34. 0 47
      new_tutorials/train/segmentation/unet.py
  35. 1 1
      paddlex/__init__.py
  36. 12 11
      paddlex/command.py
  37. 15 117
      paddlex/convertor.py
  38. 2 5
      paddlex/cv/datasets/coco.py
  39. 5 6
      paddlex/cv/datasets/easydata_cls.py
  40. 3 3
      paddlex/cv/datasets/imagenet.py
  41. 10 9
      paddlex/cv/datasets/seg_dataset.py
  42. 49 10
      paddlex/cv/datasets/voc.py
  43. 8 7
      paddlex/cv/models/hrnet.py
  44. 1 0
      paddlex/cv/models/load_model.py
  45. 2 0
      paddlex/cv/models/slim/prune.py
  46. 32 1
      paddlex/cv/models/slim/prune_config.py
  47. 1 1
      paddlex/cv/models/utils/pretrain_weights.py
  48. 55 22
      paddlex/cv/nets/hrnet.py
  49. 2 1
      paddlex/cv/nets/segmentation/hrnet.py
  50. 15 12
      paddlex/cv/transforms/cls_transforms.py
  51. 36 21
      paddlex/cv/transforms/det_transforms.py
  52. 20 9
      paddlex/cv/transforms/seg_transforms.py
  53. 4 2
      paddlex/interpret/visualize.py
  54. 2 2
      setup.py
  55. 1 1
      tutorials/train/segmentation/fast_scnn.py

+ 1 - 0
README.md

@@ -6,6 +6,7 @@
 [![Version](https://img.shields.io/github/release/PaddlePaddle/PaddleX.svg)](https://github.com/PaddlePaddle/PaddleX/releases)
 ![python version](https://img.shields.io/badge/python-3.6+-orange.svg)
 ![support os](https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-yellow.svg)
+![QQGroup](https://img.shields.io/badge/QQ_Group-1045148026-52B6EF?style=social&logo=tencent-qq&logoColor=000&logoWidth=20)
 
 PaddleX是基于飞桨核心框架、开发套件和工具组件的深度学习全流程开发工具。具备**全流程打通**、**融合产业实践**、**易用易集成**三大特点。
 

+ 4 - 4
docs/FAQ.md

@@ -13,7 +13,7 @@
 > 可以使用模型裁剪,参考文档[模型裁剪使用教程](slim/prune.md),通过调整裁剪参数,可以控制模型裁剪后的大小,在实际实验中,如VOC检测数据,使用yolov3-mobilenet,原模型大小为XXM,裁剪后为XX M,精度基本保持不变
 
 ## 4. 如何配置训练时GPU的卡数
-> 通过在终端export环境变量,或在Python代码中设置,可参考文档[CPU/多卡GPU训练](gpu_configure.md)
+> 通过在终端export环境变量,或在Python代码中设置,可参考文档[CPU/多卡GPU训练](appendix/gpu_configure.md)
 
 ## 5. 想将之前训练的模型参数上继续训练
 > 在训练调用`train`接口时,将`pretrain_weights`设为之前的模型保存路径即可
@@ -52,7 +52,7 @@
 > 1. 用户自行训练时,如不确定迭代的轮数,可以将轮数设高一些,同时注意设置`save_interval_epochs`,这样模型迭代每间隔相应轮数就会在验证集上进行评估和保存,可以根据不同轮数模型在验证集上的评估指标,判断模型是否已经收敛,若模型已收敛,可以自行结束训练进程
 >
 ## 9. 只有CPU,没有GPU,如何提升训练速度
-> 当没有GPU时,可以根据自己的CPU配置,选择是否使用多CPU进行训练,具体配置方式可以参考文档[多卡CPU/GPU训练](gpu_configure.md)
+> 当没有GPU时,可以根据自己的CPU配置,选择是否使用多CPU进行训练,具体配置方式可以参考文档[多卡CPU/GPU训练](appendix/gpu_configure.md)
 >
 ## 10. 电脑不能联网,训练时因为下载预训练模型失败,如何解决
 > 可以预先通过其它方式准备好预训练模型,然后训练时自定义`pretrain_weights`即可,可参考文档[无联网模型训练](how_to_offline_run.md)
@@ -61,8 +61,8 @@
 > 1.可以按照9的方式来解决这个问题  
 > 2.每次训练前都设定`paddlex.pretrain_dir`路径,如设定`paddlex.pretrain_dir='/usrname/paddlex`,如此下载完的预训练模型会存放至`/usrname/paddlex`目录下,而已经下载在该目录的模型也不会再次重复下载
 
-## 12. 程序启动时提示"Failed to execute script PaddleX",如何解决?
+## 12. PaddleX GUI启动时提示"Failed to execute script PaddleX",如何解决?
 > 1. 请检查目标机器上PaddleX程序所在路径是否包含中文。目前暂不支持中文路径,请尝试将程序移动到英文目录。
 > 2. 如果您的系统是Windows 7或者Windows Server 2012时,原因是缺少MFPlat.DLL/MF.dll/MFReadWrite.dll等OpenCV依赖的DLL,请按如下方式安装桌面体验:通过“我的电脑”-->“属性”-->"管理"打开服务器管理器,点击右上角“管理”选择“添加角色和功能”。点击“服务器选择”-->“功能”,拖动滚动条到最下端,点开“用户界面和基础结构”,勾选“桌面体验”后点击“安装”,等安装完成尝试再次运行PaddleX。
 > 3. 请检查目标机器上是否有其他的PaddleX程序或者进程在运行中,如有请退出或者重启机器看是否解决
-> 4. 请确认运行程序的用户是否有管理员权限,如非管理员权限用户请尝试使用管理员运行看是否成功
+> 4. 请确认运行程序的用户是否有管理员权限,如非管理员权限用户请尝试使用管理员运行看是否成功

+ 1 - 1
docs/apis/models/classification.md

@@ -80,7 +80,7 @@ predict(self, img_file, transforms=None, topk=5)
 
 ## 其它分类器类
 
-PaddleX提供了共计22种分类器,所有分类器均提供同`ResNet50`相同的训练`train`,评估`evaluate`和预测`predict`接口,各模型效果可参考[模型库](../appendix/model_zoo.md)。
+PaddleX提供了共计22种分类器,所有分类器均提供同`ResNet50`相同的训练`train`,评估`evaluate`和预测`predict`接口,各模型效果可参考[模型库](https://paddlex.readthedocs.io/zh_CN/latest/appendix/model_zoo.html)。
 
 ### ResNet18
 ```python

+ 2 - 2
docs/apis/models/semantic_segmentation.md

@@ -186,10 +186,10 @@ paddlex.seg.HRNet(num_classes=2, width=18, use_bce_loss=False, use_dice_loss=Fal
 > **参数**
 
 > > - **num_classes** (int): 类别数。
-> > - **width** (int): 高分辨率分支中特征层的通道数量。默认值为18。可选择取值为[18, 30, 32, 40, 44, 48, 60, 64]。
+> > - **width** (int|str): 高分辨率分支中特征层的通道数量。默认值为18。可选择取值为[18, 30, 32, 40, 44, 48, 60, 64, '18_small_v1']。'18_small_v1'是18的轻量级版本。
 > > - **use_bce_loss** (bool): 是否使用bce loss作为网络的损失函数,只能用于两类分割。可与dice loss同时使用。默认False。
 > > - **use_dice_loss** (bool): 是否使用dice loss作为网络的损失函数,只能用于两类分割,可与bce loss同时使用。当use_bce_loss和use_dice_loss都为False时,使用交叉熵损失函数。默认False。
-> > - **class_weight** (list/str): 交叉熵损失函数各类损失的权重。当`class_weight`为list的时候,长度应为`num_classes`。当`class_weight`为str时, weight.lower()应为'dynamic',这时会根据每一轮各类像素的比重自行计算相应的权重,每一类的权重为:每类的比例 * num_classes。class_weight取默认值None是,各类的权重1,即平时使用的交叉熵损失函数。
+> > - **class_weight** (list|str): 交叉熵损失函数各类损失的权重。当`class_weight`为list的时候,长度应为`num_classes`。当`class_weight`为str时, weight.lower()应为'dynamic',这时会根据每一轮各类像素的比重自行计算相应的权重,每一类的权重为:每类的比例 * num_classes。class_weight取默认值None是,各类的权重1,即平时使用的交叉熵损失函数。
 > > - **ignore_index** (int): label上忽略的值,label为`ignore_index`的像素不参与损失函数的计算。默认255。
 
 ### train 训练接口

+ 1 - 1
docs/apis/transforms/seg_transforms.md

@@ -200,7 +200,7 @@ ComposedSegTransforms.add_augmenters(augmenters)
 import paddlex as pdx
 from paddlex.seg import transforms
 train_transforms = transforms.ComposedSegTransforms(mode='train', train_crop_size=[512, 512])
-eval_transforms = transforms.ComposedYOLOTransforms(mode='eval')
+eval_transforms = transforms.ComposedSegTransforms(mode='eval')
 
 # 添加数据增强
 import imgaug.augmenters as iaa

+ 6 - 4
docs/apis/visualize.md

@@ -146,10 +146,11 @@ paddlex.interpret.normlime(img_file,
                            dataset=None,
                            num_samples=3000, 
                            batch_size=50,
-                           save_dir='./')
+                           save_dir='./',
+                           normlime_weights_file=None)
 ```
 使用NormLIME算法将模型预测结果的可解释性可视化。
-NormLIME是利用一定数量的样本来出一个全局的解释。NormLIME会提前计算一定数量的测试样本的LIME结果,然后对相同的特征进行权重的归一化,这样来得到一个全局的输入和输出的关系
+NormLIME是利用一定数量的样本来出一个全局的解释。由于NormLIME计算量较大,此处采用一种简化的方式:使用一定数量的测试样本(目前默认使用所有测试样本),对每个样本进行特征提取,映射到同一个特征空间;然后以此特征做为输入,以模型输出做为输出,使用线性回归对其进行拟合,得到一个全局的输入和输出的关系。之后,对一测试样本进行解释时,使用NormLIME全局的解释,来对LIME的结果进行滤波,使最终的可视化结果更加稳定
 
 **注意:** 可解释性结果可视化目前只支持分类模型。
 
@@ -159,9 +160,10 @@ NormLIME是利用一定数量的样本来出一个全局的解释。NormLIME会
 >* **dataset** (paddlex.datasets): 数据集读取器,默认为None。
 >* **num_samples** (int): LIME用于学习线性模型的采样数,默认为3000。
 >* **batch_size** (int): 预测数据batch大小,默认为50。
->* **save_dir** (str): 可解释性可视化结果(保存为png格式文件)和中间文件存储路径。 
+>* **save_dir** (str): 可解释性可视化结果(保存为png格式文件)和中间文件存储路径。
+>* **normlime_weights_file** (str): NormLIME初始化文件名,若不存在,则计算一次,保存于该路径;若存在,则直接载入。
 
-**注意:** dataset`读取的是一个数据集,该数据集不宜过大,否则计算时间会较长,但应包含所有类别的数据。
+**注意:** dataset`读取的是一个数据集,该数据集不宜过大,否则计算时间会较长,但应包含所有类别的数据。NormLIME可解释性结果可视化目前只支持分类模型。
 ### 使用示例
 > 对预测可解释性结果可视化的过程可参见[代码](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/interpret/normlime.py)。
 

+ 1 - 0
docs/appendix/index.rst

@@ -7,6 +7,7 @@
    :caption: 目录:
 
    model_zoo.md
+   slim_model_zoo.md
    metrics.md
    interpret.md
    parameters.md

+ 14 - 3
docs/appendix/interpret.md

@@ -20,9 +20,20 @@ LIME的使用方式可参见[代码示例](https://github.com/PaddlePaddle/Paddl
 ## NormLIME
 NormLIME是在LIME上的改进,LIME的解释是局部性的,是针对当前样本给的特定解释,而NormLIME是利用一定数量的样本对当前样本的一个全局性的解释,有一定的降噪效果。其实现步骤如下所示:  
 1. 下载Kmeans模型参数和ResNet50_vc网络前三层参数。(ResNet50_vc的参数是在ImageNet上训练所得网络的参数;使用ImageNet图像作为数据集,每张图像从ResNet50_vc的第三层输出提取对应超象素位置上的平均特征和质心上的特征,训练将得到此处的Kmeans模型)  
-2. 计算测试集中每张图像的LIME结果。(如无测试集,可用验证集代替)  
-3. 使用Kmeans模型对所有图像中的所有像素进行聚类。  
-4. 对在同一个簇的超像素(相同的特征)进行权重的归一化,得到每个超像素的权重,以此来解释模型。  
+2. 使用测试集中的数据计算normlime的权重信息(如无测试集,可用验证集代替):  
+    对每张图像的处理:
+    (1) 获取图像的超像素。
+    (2) 使用ResNet50_vc获取第三层特征,针对每个超像素位置,组合质心特征和均值特征`F`。  
+    (3) 把`F`作为Kmeans模型的输入,计算每个超像素位置的聚类中心。  
+    (4) 使用训练好的分类模型,预测该张图像的`label`。  
+    对所有图像的处理:  
+    (1) 以每张图像的聚类中心信息组成的向量(若某聚类中心出现在盖章途中设置为1,反之为0)为输入,
+        预测的`label`为输出,构建逻辑回归函数`regression_func`。  
+    (2) 由`regression_func`可获得每个聚类中心不同类别下的权重,并对权重进行归一化。  
+3. 使用Kmeans模型获取需要可视化图像的每个超像素的聚类中心。  
+4. 对需要可视化的图像的超像素进行随机遮掩构成新的图像。   
+5. 对每张构造的图像使用预测模型预测label。  
+6. 根据normlime的权重信息,每个超像素可获不同的权重,选取最高的权重为最终的权重,以此来解释模型。   
 
 NormLIME的使用方式可参见[代码示例](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/interpret/normlime.py)和[api介绍](../apis/visualize.html#normlime)。在使用时,参数中的`num_samples`设置尤为重要,其表示上述步骤2中的随机采样的个数,若设置过小会影响可解释性结果的稳定性,若设置过大则将在上述步骤3耗费较长时间;参数`batch_size`则表示在计算上述步骤3时,预测的batch size,若设置过小将在上述步骤3耗费较长时间,而上限则根据机器配置决定;而`dataset`则是由测试集或验证集构造的数据。  
 

+ 2 - 2
docs/appendix/model_zoo.md

@@ -40,8 +40,8 @@
 |[FasterRCNN-ResNet101](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r101_1x.tar)| 212.5MB | 582.911 | 38.3 |
 |[FasterRCNN-ResNet50-FPN](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r50_fpn_1x.tar)| 167.7MB | 83.189 | 37.2 |
 |[FasterRCNN-ResNet50_vd-FPN](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r50_vd_fpn_2x.tar)|167.8MB | 128.277 | 38.9 |
-|[FasterRCNN-ResNet101-FPN](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r101_fpn_1x.tar)| 244.2MB | 156.097 | 38.7 |
-|[FasterRCNN-ResNet101_vd-FPN](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r101_vd_fpn_2x.tar) |244.3MB | 119.788 | 40.5 |
+|[FasterRCNN-ResNet101-FPN](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r101_fpn_1x.tar)| 244.2MB | 119.788 | 38.7 |
+|[FasterRCNN-ResNet101_vd-FPN](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_r101_vd_fpn_2x.tar) |244.3MB | 156.097 | 40.5 |
 |[FasterRCNN-HRNet_W18-FPN](https://paddlemodels.bj.bcebos.com/object_detection/faster_rcnn_hrnetv2p_w18_1x.tar) |115.5MB | 81.592 | 36 |
 |[YOLOv3-DarkNet53](https://paddlemodels.bj.bcebos.com/object_detection/yolov3_darknet.tar)|249.2MB | 42.672 | 38.9 |
 |[YOLOv3-MobileNetV1](https://paddlemodels.bj.bcebos.com/object_detection/yolov3_mobilenet_v1.tar) |99.2MB | 15.442 | 29.3 |

+ 121 - 0
docs/appendix/slim_model_zoo.md

@@ -0,0 +1,121 @@
+# PaddleX压缩模型库
+
+## 图像分类
+
+数据集:ImageNet-1000
+
+### 量化
+
+| 模型 | 压缩策略 | Top-1准确率 | 存储体积 | TensorRT时延(V100, ms) |
+|:--:|:---:|:--:|:--:|:--:|
+|MobileNetV1| 无 |70.99%| 17MB | -|
+|MobileNetV1| 量化 |70.18% (-0.81%)| 4.4MB | - |
+| MobileNetV2 | 无 |72.15%| 15MB | - |
+| MobileNetV2 | 量化 | 71.15% (-1%)| 4.0MB   | - |
+|ResNet50| 无 |76.50%| 99MB | 2.71 |
+|ResNet50| 量化 |76.33% (-0.17%)| 25.1MB | 1.19 |
+
+分类模型Lite时延(ms)
+
+| 设备    | 模型类型    | 压缩策略      | armv7 Thread 1 | armv7 Thread 2 | armv7 Thread 4 | armv8 Thread 1 | armv8 Thread 2 | armv8 Thread 4 |
+| ------- | ----------- | ------------- | -------------- | -------------- | -------------- | -------------- | -------------- | -------------- |
+| 高通835 | MobileNetV1 | 无 | 96.1942        | 53.2058        | 32.4468        | 88.4955        | 47.95          | 27.5189        |
+| 高通835 | MobileNetV1 | 量化    | 60.5615        | 32.4016        | 16.6596        | 56.5266        | 29.7178        | 15.1459        |
+| 高通835 | MobileNetV2 | 无 | 65.715         | 38.1346        | 25.155         | 61.3593        | 36.2038        | 22.849         |
+| 高通835 | MobileNetV2 | 量化    | 48.3495        | 30.3069        | 22.1506        | 45.8715        | 27.4105        | 18.2223        |
+| 高通835 | ResNet50    | 无 | 526.811        | 319.6486       | 205.8345       | 506.1138       | 335.1584       | 214.8936       |
+| 高通835 | ResNet50    | 量化    | 476.0507       | 256.5963       | 139.7266       | 461.9176       | 248.3795       | 149.353        |
+| 高通855 | MobileNetV1 | 无 | 33.5086        | 19.5773        | 11.7534        | 31.3474        | 18.5382        | 10.0811        |
+| 高通855 | MobileNetV1 | 量化    | 37.0498        | 21.7081        | 11.0779        | 14.0947        | 8.1926         | 4.2934         |
+| 高通855 | MobileNetV2 | 无 | 25.0396        | 15.2862        | 9.6609         | 22.909         | 14.1797        | 8.8325         |
+| 高通855 | MobileNetV2 | 量化    | 28.1631        | 18.3917        | 11.8333        | 16.9399        | 11.1772        | 7.4176         |
+| 高通855 | ResNet50    | 无 | 185.3705       | 113.0825       | 87.0741        | 177.7367       | 110.0433       | 74.4114        |
+| 高通855 | ResNet50    | 量化    | 328.2683       | 201.9937       | 106.744        | 242.6397       | 150.0338       | 79.8659        |
+| 麒麟970 | MobileNetV1 | 无 | 101.2455       | 56.4053        | 35.6484        | 94.8985        | 51.7251        | 31.9511        |
+| 麒麟970 | MobileNetV1 | 量化    | 62.4412        | 32.2585        | 16.6215        | 57.825         | 29.2573        | 15.1206        |
+| 麒麟970 | MobileNetV2 | 无 | 70.4176        | 42.0795        | 25.1939        | 68.9597        | 39.2145        | 22.6617        |
+| 麒麟970 | MobileNetV2 | 量化    | 53.0961        | 31.7987        | 21.8334        | 49.383         | 28.2358        | 18.3642        |
+| 麒麟970 | ResNet50    | 无 | 586.8943       | 344.0858       | 228.2293       | 573.3344       | 351.4332       | 225.8006       |
+| 麒麟970 | ResNet50    | 量化    | 489.6188       | 258.3279       | 142.6063       | 480.0064       | 249.5339       | 138.5284       |
+
+### 剪裁
+
+PaddleLite推理耗时说明:
+
+环境:Qualcomm SnapDragon 845 + armv8
+
+速度指标:Thread1/Thread2/Thread4耗时
+
+
+| 模型 | 压缩策略 | Top-1 | 存储体积 |PaddleLite推理耗时|TensorRT推理速度(FPS)|
+|:--:|:---:|:--:|:--:|:--:|:--:|
+| MobileNetV1 |    无    |         70.99%         |       17MB       | 66.052\35.8014\19.5762|-|
+| MobileNetV1 | 剪裁 -30% |  70.4% (-0.59%)  |       12MB       | 46.5958\25.3098\13.6982|-|
+| MobileNetV1 | 剪裁 -50% | 69.8% (-1.19%) |       9MB        | 37.9892\20.7882\11.3144|-|
+
+## 目标检测
+
+### 量化
+
+数据集: COCO2017
+
+|              模型              |  压缩策略   | 数据集 | Image/GPU | 输入608 Box AP | 存储体积 |   TensorRT时延(V100, ms) |
+| :----------------------------: | :---------: | :----: | :-------: | :------------: | :------------: | :----------: |
+|      MobileNet-V1-YOLOv3       | 无 |  COCO  |     8     |      29.3      |        95MB       |  - |
+|      MobileNet-V1-YOLOv3       | 量化  |  COCO  |     8     |     27.9 (-1.4)|        25MB       | -  |
+|      R34-YOLOv3                | 无 |  COCO  |     8     |      36.2      |        162MB       |  - |
+|      R34-YOLOv3                | 量化  |  COCO  |     8     | 35.7 (-0.5)    |        42.7MB      |  - |
+
+### 剪裁
+
+数据集:Pasacl VOC & COCO2017
+
+PaddleLite推理耗时说明:
+
+环境:Qualcomm SnapDragon 845 + armv8
+
+速度指标:Thread1/Thread2/Thread4耗时
+
+|              模型              |     压缩策略      |   数据集   | Image/GPU | 输入608 Box mmAP | 存储体积 | PaddleLite推理耗时(ms)(608*608) | TensorRT推理速度(FPS)(608*608) |
+| :----------------------------: | :---------------: | :--------: | :-------: | :------------: | :----------: | :--------------: | :--------------: |
+|      MobileNet-V1-YOLOv3       | 无     | Pascal VOC |     8     |      76.2      |      94MB      | 1238\796.943\520.101|60.04|
+|      MobileNet-V1-YOLOv3       | 剪裁 -52.88% | Pascal VOC |     8     |  77.6 (+1.4)   |      31MB      | 602.497\353.759\222.427 |99.36|
+|      MobileNet-V1-YOLOv3       | 无     |    COCO    |     8     |      29.3      |      95MB      |-|-|
+|      MobileNet-V1-YOLOv3       | 剪裁 -51.77% |    COCO    |     8     |  26.0 (-3.3)   |      32MB      |-|73.93|
+
+## 语义分割
+
+数据集:Cityscapes
+
+
+### 量化
+
+|          模型          |  压缩策略   |     mIoU      | 存储体积 |
+| :--------------------: | :---------: | :-----------: | :------------: |
+| DeepLabv3-MobileNetv2 | 无 |     69.81     |      7.4MB       |
+| DeepLabv3-MobileNetv2 | 量化  | 67.59 (-2.22) |      2.1MB       |
+
+图像分割模型Lite时延(ms), 输入尺寸769 x 769
+
+| 设备    | 模型类型               | 压缩策略      | armv7 Thread 1 | armv7 Thread 2 | armv7 Thread 4 | armv8 Thread 1 | armv8 Thread 2 | armv8 Thread 4 |
+| ------- | ---------------------- | ------------- | -------------- | -------------- | -------------- | -------------- | -------------- | -------------- |
+| 高通835 | Deeplabv3-MobileNetV2  | 无 | 1282.8126      | 793.2064       | 653.6538       | 1193.9908      | 737.1827       | 593.4522       |
+| 高通835 | Deeplabv3-MobileNetV2  | 量化    | 981.44         | 658.4969       | 538.6166       | 885.3273       | 586.1284       | 484.0018       |
+| 高通855 | Deeplabv3-MobileNetV2  | 无 | 639.4425       | 390.1851       | 322.7014       | 477.7667       | 339.7411       | 262.2847       |
+| 高通855 | Deeplabv3-MobileNetV2  | 量化    | 705.7589       | 474.4076       | 427.2951       | 394.8352       | 297.4035       | 264.6724       |
+| 麒麟970 | Deeplabv3-MobileNetV2  | 无 | 1771.1301      | 1746.0569      | 1222.4805      | 1448.9739      | 1192.4491      | 760.606        |
+| 麒麟970 | Deeplabv3-MobileNetV2  | 量化    | 1320.386       | 918.5328       | 672.2481       | 1020.753       | 820.094        | 591.4114       |
+
+### 剪裁
+
+PaddleLite推理耗时说明:
+
+环境:Qualcomm SnapDragon 845 + armv8
+
+速度指标:Thread1/Thread2/Thread4耗时
+
+
+|   模型    |     压缩方法      |     mIoU      | 存储体积 | PaddleLite推理耗时 | TensorRT推理速度(FPS) |
+| :-------: | :---------------: | :-----------: | :------: | :------------: | :----: |
+| FastSCNN | 无     |     69.64     |       11MB       | 1226.36\682.96\415.664 |39.53|
+| FastSCNN | 剪裁 -47.60% | 66.68 (-2.96) |      5.7MB       | 866.693\494.467\291.748 |51.48|

+ 105 - 36
docs/cv_solutions.md

@@ -1,63 +1,132 @@
 # PaddleX视觉方案介绍  
 
-PaddleX目前提供了4种视觉任务解决方案,分别为图像分类、目标检测、实例分割和语义分割。用户可以根据自己的任务类型按需选取
+PaddleX针对图像分类、目标检测、实例分割和语义分割4种视觉任务提供了包含模型选择、压缩策略选择、部署方案选择在内的解决方案。用户根据自己的需求选择合适的模型,选择合适的压缩策略来减小模型的计算量和存储体积、加速模型预测推理,最后选择合适的部署方案将模型部署在移动端或者服务器端
 
-## 图像分类
+## 模型选择
+
+### 图像分类
 图像分类任务指的是输入一张图片,模型预测图片的类别,如识别为风景、动物、车等。
 
 ![](./images/image_classification.png)
 
-对于图像分类任务,针对不同的应用场景,PaddleX提供了百度改进的模型,见下表所示
+对于图像分类任务,针对不同的应用场景,PaddleX提供了百度改进的模型,见下表所示:
+> 表中GPU预测速度是使用PaddlePaddle Python预测接口测试得到(测试GPU型号为Nvidia Tesla P40)。
+> 表中CPU预测速度 (测试CPU型号为)。
+> 表中骁龙855预测速度是使用处理器为骁龙855的手机测试得到。
+> 测速时模型输入大小为224 x 224,Top1准确率为ImageNet-1000数据集上评估所得。
 
-|    模型    | 模型大小 | GPU预测速度 | CPU预测速度 | ARM芯片预测速度 | 准确率 | 备注 |
-| :--------- | :------  | :---------- | :-----------| :-------------  | :----- | :--- |
-| MobileNetV3_small_ssld | 12M | - | - | - | 71.3% |适用于移动端场景 |
-| MobileNetV3_large_ssld | 21M | - | - | - | 79.0% | 适用于移动端/服务端场景 |
-| ResNet50_vd_ssld | 102.8MB | - | - | - | 82.4% | 适用于服务端场景 |
-| ResNet101_vd_ssld | 179.2MB | - | - | - |83.7% | 适用于服务端场景 |
+|    模型    |  模型特点 | 存储体积 | GPU预测速度(毫秒) | CPU(x86)预测速度(毫秒) | 骁龙855(ARM)预测速度 (毫秒)| Top1准确率 |
+| :--------- | :------  | :---------- | :-----------| :-------------  | :-------------  |:--- |
+| MobileNetV3_small_ssld | 轻量高速,适用于追求高速的实时移动端场景 | 12.5MB | 7.08837 | - | 6.546 | 71.3.0% |
+| ShuffleNetV2 | 轻量级模型,精度相对偏低,适用于要求更小存储体积的实时移动端场景 | 10.2MB | 15.40 | - | 10.941 | 68.8% |
+| MobileNetV3_large_ssld | 轻量级模型,在存储方面优势不大,在速度和精度上表现适中,适合于移动端场景 | 22.8MB | 8.06651 | - | 19.803 | 79.0% |
+| MobileNetV2 | 轻量级模型,适用于使用GPU预测的移动端场景 | 15.0MB | 5.92667 | - | 23.318| 72.2 % |
+| ResNet50_vd_ssld | 高精度模型,预测时间较短,适用于大多数的服务器端场景 | 103.5MB | 7.79264 | - | - | 82.4% |
+| ResNet101_vd_ssld | 超高精度模型,预测时间相对较长,适用于有大数据量时的服务器端场景 | 180.5MB | 13.34580 | - | -| 83.7% |
+| Xception65 | 超高精度模型,预测时间更长,在处理较大数据量时有较高的精度,适用于服务器端场景 | 161.6MB | 13.87017 | - | - | 80.3% |
 
-除上述模型外,PaddleX还支持近20种图像分类模型,模型列表可参考[PaddleX模型库](../appendix/model_zoo.md)
+包括上述模型,PaddleX支持近20种图像分类模型,其余模型可参考[PaddleX模型库](../appendix/model_zoo.md)
 
 
-## 目标检测
+### 目标检测
 目标检测任务指的是输入图像,模型识别出图像中物体的位置(用矩形框框出来,并给出框的位置),和物体的类别,如在手机等零件质检中,用于检测外观上的瑕疵等。
 
 ![](./images/object_detection.png)
 
 对于目标检测,针对不同的应用场景,PaddleX提供了主流的YOLOv3模型和Faster-RCNN模型,见下表所示
-
-|   模型   | 模型大小  | GPU预测速度 | CPU预测速度 |ARM芯片预测速度 | BoxMAP | 备注 |
-| :------- | :-------  | :---------  | :---------- | :-------------  | :----- | :--- |
-| YOLOv3-MobileNetV1 | 101.2M | - | - | - | 29.3 | |
-| YOLOv3-MobileNetV3 | 94.6M | - | - | - | 31.6 | |
-| YOLOv3-ResNet34 | 169.7M | - | - | - | 36.2 | |
-| YOLOv3-DarkNet53 | 252.4 | - | - | - | 38.9 | |
-
-除YOLOv3模型外,PaddleX同时也支持FasterRCNN模型,支持FPN结构和5种backbone网络,详情可参考[PaddleX模型库](../appendix/model_zoo.md)
-
-## 实例分割
+> 表中GPU预测速度是使用PaddlePaddle Python预测接口测试得到(测试GPU型号为Nvidia Tesla P40)。
+> 表中CPU预测速度 (测试CPU型号为)。
+> 表中骁龙855预测速度是使用处理器为骁龙855的手机测试得到。
+> 测速时YOLOv3的输入大小为608 x 608,FasterRCNN的输入大小为800 x 1333,Box mmAP为COCO2017数据集上评估所得。
+
+|   模型   | 模型特点 | 存储体积  | GPU预测速度 | CPU(x86)预测速度(毫秒) | 骁龙855(ARM)预测速度 (毫秒)| Box mmAP |
+| :------- | :-------  | :---------  | :---------- | :-------------  | :-------------  |:--- |
+| YOLOv3-MobileNetV3_larget | 适用于追求高速预测的移动端场景 | 100.7MB | 143.322 | - | - | 31.6 |
+| YOLOv3-MobileNetV1 | 精度相对偏低,适用于追求高速预测的服务器端场景 | 99.2MB| 15.422 | - | - | 29.3 |
+| YOLOv3-DarkNet53 | 在预测速度和模型精度上都有较好的表现,适用于大多数的服务器端场景| 249.2MB | 42.672 | - | - | 38.9 |
+| FasterRCNN-ResNet50-FPN | 经典的二阶段检测器,预测速度相对较慢,适用于重视模型精度的服务器端场景 | 167.MB | 83.189 | - | -| 37.2 |
+| FasterRCNN-HRNet_W18-FPN | 适用于对图像分辨率较为敏感、对目标细节预测要求更高的服务器端场景 | 115.5MB | 81.592 | - | - | 36 |
+| FasterRCNN-ResNet101_vd-FPN | 超高精度模型,预测时间更长,在处理较大数据量时有较高的精度,适用于服务器端场景 | 244.3MB | 156.097 | - | - | 40.5 |
+
+除上述模型外,YOLOv3和Faster RCNN还支持其他backbone,详情可参考[PaddleX模型库](../appendix/model_zoo.md)
+
+### 实例分割
 在目标检测中,模型识别出图像中物体的位置和物体的类别。而实例分割则是在目标检测的基础上,做了像素级的分类,将框内的属于目标物体的像素识别出来。
 
 ![](./images/instance_segmentation.png)
 
 PaddleX目前提供了实例分割MaskRCNN模型,支持5种不同的backbone网络,详情可参考[PaddleX模型库](../appendix/model_zoo.md)
-
-|  模型 | 模型大小 | GPU预测速度 | CPU预测速度 | ARM芯片预测速度 | BoxMAP | SegMAP | 备注 |
-| :---- | :------- | :---------- | :---------- | :-------------  | :----- | :----- | :--- |
-| MaskRCNN-ResNet50_vd-FPN | 185.5M | - | - | - | 39.8 | 35.4 | |
-| MaskRCNN-ResNet101_vd-FPN | 268.6M | - | - | - | 41.4 | 36.8 | |
-
-
-## 语义分割
+> 表中GPU预测速度是使用PaddlePaddle Python预测接口测试得到(测试GPU型号为Nvidia Tesla P40)。
+> 表中CPU预测速度 (测试CPU型号为)。
+> 表中骁龙855预测速度是使用处理器为骁龙855的手机测试得到。
+> 测速时MaskRCNN的输入大小为800 x 1333,Box mmAP和Seg mmAP为COCO2017数据集上评估所得。
+
+|  模型 | 模型特点 | 存储体积 | GPU预测速度 | CPU(x86)预测速度(毫秒) | 骁龙855(ARM)预测速度 (毫秒)| Box mmAP | Seg mmAP |
+| :---- | :------- | :---------- | :---------- | :----- | :----- | :--- |:--- |
+| MaskRCNN-HRNet_W18-FPN | 适用于对图像分辨率较为敏感、对目标细节预测要求更高的服务器端场景 | - | - | - | - | 37.0 | 33.4 |
+| MaskRCNN-ResNet50-FPN | 精度较高,适合大多数的服务器端场景| 185.5M | - | - | - | 37.9 | 34.2 |
+| MaskRCNN-ResNet101_vd-FPN | 高精度但预测时间更长,在处理较大数据量时有较高的精度,适用于服务器端场景 | 268.6M | - | - | - | 41.4 | 36.8 |
+
+### 语义分割
 语义分割用于对图像做像素级的分类,应用在人像分类、遥感图像识别等场景。  
 
 ![](./images/semantic_segmentation.png)
 
 对于语义分割,PaddleX也针对不同的应用场景,提供了不同的模型选择,如下表所示
+> 表中GPU预测速度是使用PaddlePaddle Python预测接口测试得到(测试GPU型号为Nvidia Tesla P40)。
+> 表中CPU预测速度 (测试CPU型号为)。
+> 表中骁龙855预测速度是使用处理器为骁龙855的手机测试得到。
+> 测速时模型的输入大小为1024 x 2048,mIOU为Cityscapes数据集上评估所得。
+
+| 模型 | 模型特点 | 存储体积 | GPU预测速度 | CPU(x86)预测速度(毫秒) | 骁龙855(ARM)预测速度 (毫秒)| mIOU |
+| :---- | :------- | :---------- | :---------- | :----- | :----- |:--- |
+| DeepLabv3p-MobileNetV2_x1.0 | 轻量级模型,适用于移动端场景| - | - | - | 69.8% |
+| HRNet_W18_Small_v1 | 轻量高速,适用于移动端场景 | - | - | - | - |
+| FastSCNN | 轻量高速,适用于追求高速预测的移动端或服务器端场景 | - | - | - | 69.64 |
+| HRNet_W18 | 高精度模型,适用于对图像分辨率较为敏感、对目标细节预测要求更高的服务器端场景| - | - | - | 79.36 |
+| DeepLabv3p-Xception65 | 高精度但预测时间更长,在处理较大数据量时有较高的精度,适用于服务器且背景复杂的场景| - | - | - | 79.3% |
+
+## 压缩策略选择
+
+PaddleX提供包含模型剪裁、定点量化的模型压缩策略来减小模型的计算量和存储体积,加快模型部署后的预测速度。使用不同压缩策略在图像分类、目标检测和语义分割模型上的模型精度和预测速度详见以下内容,用户可以选择根据自己的需求选择合适的压缩策略,进一步优化模型的性能。
+
+| 压缩策略 | 策略特点 |
+| :---- | :------- |
+| 量化  | 较为显著地减少模型的存储体积,适用于移动端或服务期端TensorRT部署,在移动端对于MobileNet系列模型有明显的加速效果 |
+| 剪裁 | 能够去除冗余的参数,达到显著减少参数计算量和模型体积的效果,提升模型的预测性能,适用于CPU部署或移动端部署(GPU上无明显加速效果) |
+| 先剪裁后量化 | 可以进一步提升模型的预测性能,适用于移动端或服务器端TensorRT部署 |
+
+### 性能对比
+
+* 表中各指标的格式为XXX/YYY,XXX表示未采取压缩策略时的指标,YYY表示压缩后的指标
+* 分类模型的准确率指的是ImageNet-1000数据集上的Top1准确率(模型输入大小为224x224),检测模型的准确率指的是COCO2017数据集上的mmAP(模型输入大小为608x608),分割模型的准确率指的是Cityscapes数据集上mIOU(模型输入大小为769x769)
+* 量化策略中,PaddleLiter推理环境为Qualcomm SnapDragon 855 + armv8,速度指标为Thread4耗时
+* 剪裁策略中,PaddleLiter推理环境为Qualcomm SnapDragon 845 + armv8,速度指标为Thread4耗时
+
+
+| 模型 | 压缩策略 | 存储体积(MB) | 准确率(%) | PaddleLite推理耗时(ms) |
+| :--: | :------: | :------: | :----: | :----------------: |
+| MobileNetV1 | 量化 | 17/4.4 | 70.99/70.18 | 10.0811/4.2934 |
+| MobileNetV1 | 剪裁 -30% | 17/12 | 70.99/70.4 | 19.5762/13.6982 |
+| YOLOv3-MobileNetV1 | 量化 | 95/25 | 29.3/27.9 | - |
+| YOLOv3-MobileNetV1 | 剪裁 -51.77% | 95/25 | 29.3/26 | - |
+| Deeplabv3-MobileNetV2 | 量化 | 7.4/1.8 | 63.26/62.03 | 593.4522/484.0018 |
+| FastSCNN | 剪裁 -47.60% | 11/5.7 | 69.64/66.68 | 415.664/291.748 |
+
+更多模型在不同设备上压缩前后的指标对比详见[PaddleX压缩模型库](appendix/slim_model_zoo.md)
+
+压缩策略的具体使用流程详见[模型压缩](tutorials/compress)
+
+**注意:PaddleX中全部图像分类模型和语义分割模型都支持量化和剪裁操作,目标检测仅有YOLOv3支持量化和剪裁操作。**
+
+## 模型部署
+
+PaddleX提供服务器端python部署、服务器端c++部署、服务器端加密部署、OpenVINO部署、移动端部署共5种部署方案,用户可以根据自己的需求选择合适的部署方案,点击以下链接了解部署的具体流程。
 
-| 模型 | 模型大小 | GPU预测速度 | CPU预测速度 | ARM芯片预测速度 | mIOU | 备注 |
-| :---- | :------- | :---------- | :---------- | :-------------  | :----- | :----- |
-| DeepLabv3p-MobileNetV2_x0.25 | | - | - | - | - | - |
-| DeepLabv3p-MobileNetV2_x1.0 | | - | - | - | - | - |
-| DeepLabv3p-Xception65 | | - | - | - | - | - |
-| UNet | | - | - | - | - | - |
+| 部署方案 | 部署流程 |
+| :------: | :------: |
+| 服务器端python部署 | [部署流程](tutorials/deploy/deploy_server/deploy_python.html)|
+| 服务器端c++部署 | [部署流程](tutorials/deploy/deploy_server/deploy_cpp/) |
+| 服务器端加密部署 | [部署流程](tutorials/deploy/deploy_server/encryption.html) |
+| OpenVINO部署 | [部署流程](tutorials/deploy/deploy_openvino.html) |
+| 移动端部署 | [部署流程](tutorials/deploy/deploy_lite.html) |

BIN
docs/images/lime.png


BIN
docs/images/normlime.png


+ 1 - 1
docs/tutorials/deploy/deploy_lite.md

@@ -21,7 +21,7 @@ step 2: 将PaddleX模型导出为inference模型
 step 3: 将inference模型转换成PaddleLite模型
 
 ```
-python /path/to/PaddleX/deploy/lite/export_lite.py --model_dir /path/to/inference_model --save_file /path/to/onnx_model --place place/to/run
+python /path/to/PaddleX/deploy/lite/export_lite.py --model_dir /path/to/inference_model --save_file /path/to/lite_model --place place/to/run
 
 ```
 

+ 2 - 2
docs/tutorials/deploy/deploy_server/deploy_cpp/deploy_cpp_linux.md

@@ -30,7 +30,7 @@ PaddlePaddle C++ 预测库针对不同的`CPU`,`CUDA`,以及是否支持Tens
 | ubuntu14.04_cuda10.0_cudnn7_avx_mkl  | [fluid_inference.tgz](https://paddle-inference-lib.bj.bcebos.com/1.8.2-gpu-cuda10-cudnn7-avx-mkl/fluid_inference.tgz ) |
 | ubuntu14.04_cuda10.1_cudnn7.6_avx_mkl_trt6  | [fluid_inference.tgz](https://paddle-inference-lib.bj.bcebos.com/1.8.2-gpu-cuda10.1-cudnn7.6-avx-mkl-trt6%2Ffluid_inference.tgz) |
 
-更多和更新的版本,请根据实际情况下载:  [C++预测库下载列表](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/advanced_guide/inference_deployment/inference/windows_cpp_inference.html#id1)
+更多和更新的版本,请根据实际情况下载:  [C++预测库下载列表](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/advanced_guide/inference_deployment/inference/build_and_install_lib_cn.html)
 
 下载并解压后`/root/projects/fluid_inference`目录包含内容为:
 ```
@@ -42,7 +42,7 @@ fluid_inference
 └── version.txt # 版本和编译信息
 ```
 
-**注意:** 预编译版本除`nv-jetson-cuda10-cudnn7.5-trt5` 以外其它包都是基于`GCC 4.8.5`编译,使用高版本`GCC`可能存在 `ABI`兼容性问题,建议降级或[自行编译预测库](https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/inference_deployment/inference/build_and_install_lib_cn.html#id12)。
+**注意:** 预编译版本除`nv-jetson-cuda10-cudnn7.5-trt5` 以外其它包都是基于`GCC 4.8.5`编译,使用高版本`GCC`可能存在 `ABI`兼容性问题,建议降级或[自行编译预测库](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/advanced_guide/inference_deployment/inference/build_and_install_lib_cn.html#id12)。
 
 
 ### Step4: 编译

+ 181 - 0
examples/human_segmentation/README.md

@@ -0,0 +1,181 @@
+# HumanSeg人像分割模型
+
+本教程基于PaddleX核心分割网络,提供针对人像分割场景从预训练模型、Fine-tune、视频分割预测部署的全流程应用指南。
+
+## 安装
+
+**前置依赖**
+* paddlepaddle >= 1.8.0
+* python >= 3.5
+
+```
+pip install paddlex -i https://mirror.baidu.com/pypi/simple
+```
+安装的相关问题参考[PaddleX安装](https://paddlex.readthedocs.io/zh_CN/latest/install.html)
+
+## 预训练模型
+HumanSeg开放了在大规模人像数据上训练的两个预训练模型,满足多种使用场景的需求
+
+| 模型类型 | Checkpoint Parameter | Inference Model | Quant Inference Model | 备注 |
+| --- | --- | --- | ---| --- |
+| HumanSeg-server  | [humanseg_server_params](https://paddlex.bj.bcebos.com/humanseg/models/humanseg_server.pdparams) | [humanseg_server_inference](https://paddlex.bj.bcebos.com/humanseg/models/humanseg_server_inference.zip) | -- | 高精度模型,适用于服务端GPU且背景复杂的人像场景, 模型结构为Deeplabv3+/Xcetion65, 输入大小(512, 512) |
+| HumanSeg-mobile | [humanseg_mobile_params](https://paddlex.bj.bcebos.com/humanseg/models/humanseg_mobile.pdparams) | [humanseg_mobile_inference](https://paddlex.bj.bcebos.com/humanseg/models/humanseg_mobile_inference.zip) | [humanseg_mobile_quant](https://paddlex.bj.bcebos.com/humanseg/models/humanseg_mobile_quant.zip) | 轻量级模型, 适用于移动端或服务端CPU的前置摄像头场景,模型结构为HRNet_w18_samll_v1,输入大小(192, 192)  |
+
+
+模型性能
+
+| 模型 | 模型大小 | 计算耗时 |
+| --- | --- | --- |
+|humanseg_server_inference| 158M | - |
+|humanseg_mobile_inference | 5.8 M | 42.35ms |
+|humanseg_mobile_quant | 1.6M | 24.93ms |
+
+计算耗时运行环境: 小米,cpu:骁龙855, 内存:6GB, 图片大小:192*192
+
+
+**NOTE:**
+其中Checkpoint Parameter为模型权重,用于Fine-tuning场景。
+
+* Inference Model和Quant Inference Model为预测部署模型,包含`__model__`计算图结构、`__params__`模型参数和`model.yaml`基础的模型配置信息。
+
+* 其中Inference Model适用于服务端的CPU和GPU预测部署,Qunat Inference Model为量化版本,适用于通过Paddle Lite进行移动端等端侧设备部署。
+
+执行以下脚本进行HumanSeg预训练模型的下载
+```bash
+python pretrain_weights/download_pretrain_weights.py
+```
+
+## 下载测试数据
+我们提供了[supervise.ly](https://supervise.ly/)发布人像分割数据集**Supervisely Persons**, 从中随机抽取一小部分并转化成PaddleX可直接加载数据格式。通过运行以下代码进行快速下载,其中包含手机前置摄像头的人像测试视频`video_test.mp4`.
+
+```bash
+python data/download_data.py
+```
+
+## 快速体验视频流人像分割
+结合DIS(Dense Inverse Search-basedmethod)光流算法预测结果与分割结果,改善视频流人像分割
+```bash
+# 通过电脑摄像头进行实时分割处理
+python video_infer.py --model_dir pretrain_weights/humanseg_mobile_inference
+
+# 对人像视频进行分割处理
+python video_infer.py --model_dir pretrain_weights/humanseg_mobile_inference --video_path data/video_test.mp4
+```
+
+视频分割结果如下:
+
+<img src="https://paddleseg.bj.bcebos.com/humanseg/data/video_test.gif" width="20%" height="20%"><img src="https://paddleseg.bj.bcebos.com/humanseg/data/result.gif" width="20%" height="20%">
+
+根据所选背景进行背景替换,背景可以是一张图片,也可以是一段视频。
+```bash
+# 通过电脑摄像头进行实时背景替换处理, 也可通过'--background_video_path'传入背景视频
+python bg_replace.py --model_dir pretrain_weights/humanseg_mobile_inference --background_image_path data/background.jpg
+
+# 对人像视频进行背景替换处理, 也可通过'--background_video_path'传入背景视频
+python bg_replace.py --model_dir pretrain_weights/humanseg_mobile_inference --video_path data/video_test.mp4 --background_image_path data/background.jpg
+
+# 对单张图像进行背景替换
+python bg_replace.py --model_dir pretrain_weights/humanseg_mobile_inference --image_path data/human_image.jpg --background_image_path data/background.jpg
+
+```
+
+背景替换结果如下:
+
+<img src="https://paddleseg.bj.bcebos.com/humanseg/data/video_test.gif" width="20%" height="20%"><img src="https://paddleseg.bj.bcebos.com/humanseg/data/bg_replace.gif" width="20%" height="20%">
+
+
+**NOTE**:
+
+视频分割处理时间需要几分钟,请耐心等待。
+
+提供的模型适用于手机摄像头竖屏拍摄场景,宽屏效果会略差一些。
+
+## 训练
+使用下述命令基于与训练模型进行Fine-tuning,请确保选用的模型结构`model_type`与模型参数`pretrain_weights`匹配。
+```bash
+# 指定GPU卡号(以0号卡为例)
+export CUDA_VISIBLE_DEVICES=0
+# 若不使用GPU,则将CUDA_VISIBLE_DEVICES指定为空
+# export CUDA_VISIBLE_DEVICES=
+python train.py --model_type HumanSegMobile \
+--save_dir output/ \
+--data_dir data/mini_supervisely \
+--train_list data/mini_supervisely/train.txt \
+--val_list data/mini_supervisely/val.txt \
+--pretrain_weights pretrain_weights/humanseg_mobile_params \
+--batch_size 8 \
+--learning_rate 0.001 \
+--num_epochs 10 \
+--image_shape 192 192
+```
+其中参数含义如下:
+* `--model_type`: 模型类型,可选项为:HumanSegServer和HumanSegMobile
+* `--save_dir`: 模型保存路径
+* `--data_dir`: 数据集路径
+* `--train_list`: 训练集列表路径
+* `--val_list`: 验证集列表路径
+* `--pretrain_weights`: 预训练模型路径
+* `--batch_size`: 批大小
+* `--learning_rate`: 初始学习率
+* `--num_epochs`: 训练轮数
+* `--image_shape`: 网络输入图像大小(w, h)
+
+更多命令行帮助可运行下述命令进行查看:
+```bash
+python train.py --help
+```
+**NOTE**
+可通过更换`--model_type`变量与对应的`--pretrain_weights`使用不同的模型快速尝试。
+
+## 评估
+使用下述命令进行评估
+```bash
+python eval.py --model_dir output/best_model \
+--data_dir data/mini_supervisely \
+--val_list data/mini_supervisely/val.txt \
+--image_shape 192 192
+```
+其中参数含义如下:
+* `--model_dir`: 模型路径
+* `--data_dir`: 数据集路径
+* `--val_list`: 验证集列表路径
+* `--image_shape`: 网络输入图像大小(w, h)
+
+## 预测
+使用下述命令进行预测, 预测结果默认保存在`./output/result/`文件夹中。
+```bash
+python infer.py --model_dir output/best_model \
+--data_dir data/mini_supervisely \
+--test_list data/mini_supervisely/test.txt \
+--save_dir output/result \
+--image_shape 192 192
+```
+其中参数含义如下:
+* `--model_dir`: 模型路径
+* `--data_dir`: 数据集路径
+* `--test_list`: 测试集列表路径
+* `--image_shape`: 网络输入图像大小(w, h)
+
+## 模型导出
+```bash
+paddlex --export_inference --model_dir output/best_model \
+--save_dir output/export
+```
+其中参数含义如下:
+* `--model_dir`: 模型路径
+* `--save_dir`: 导出模型保存路径
+
+## 离线量化
+```bash
+python quant_offline.py --model_dir output/best_model \
+--data_dir data/mini_supervisely \
+--quant_list data/mini_supervisely/val.txt \
+--save_dir output/quant_offline \
+--image_shape 192 192
+```
+其中参数含义如下:
+* `--model_dir`: 待量化模型路径
+* `--data_dir`: 数据集路径
+* `--quant_list`: 量化数据集列表路径,一般直接选择训练集或验证集
+* `--save_dir`: 量化模型保存路径
+* `--image_shape`: 网络输入图像大小(w, h)

+ 314 - 0
examples/human_segmentation/bg_replace.py

@@ -0,0 +1,314 @@
+# coding: utf8
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+import os
+import os.path as osp
+import cv2
+import numpy as np
+
+from postprocess import postprocess, threshold_mask
+import paddlex as pdx
+import paddlex.utils.logging as logging
+from paddlex.seg import transforms
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(
+        description='HumanSeg inference for video')
+    parser.add_argument(
+        '--model_dir',
+        dest='model_dir',
+        help='Model path for inference',
+        type=str)
+    parser.add_argument(
+        '--image_path',
+        dest='image_path',
+        help='Image including human',
+        type=str,
+        default=None)
+    parser.add_argument(
+        '--background_image_path',
+        dest='background_image_path',
+        help='Background image for replacing',
+        type=str,
+        default=None)
+    parser.add_argument(
+        '--video_path',
+        dest='video_path',
+        help='Video path for inference',
+        type=str,
+        default=None)
+    parser.add_argument(
+        '--background_video_path',
+        dest='background_video_path',
+        help='Background video path for replacing',
+        type=str,
+        default=None)
+    parser.add_argument(
+        '--save_dir',
+        dest='save_dir',
+        help='The directory for saving the inference results',
+        type=str,
+        default='./output')
+    parser.add_argument(
+        "--image_shape",
+        dest="image_shape",
+        help="The image shape for net inputs.",
+        nargs=2,
+        default=[192, 192],
+        type=int)
+
+    return parser.parse_args()
+
+
+def bg_replace(label_map, img, bg):
+    h, w, _ = img.shape
+    bg = cv2.resize(bg, (w, h))
+    label_map = np.repeat(label_map[:, :, np.newaxis], 3, axis=2)
+    comb = (label_map * img + (1 - label_map) * bg).astype(np.uint8)
+    return comb
+
+
+def recover(img, im_info):
+    if im_info[0] == 'resize':
+        w, h = im_info[1][1], im_info[1][0]
+        img = cv2.resize(img, (w, h), cv2.INTER_LINEAR)
+    elif im_info[0] == 'padding':
+        w, h = im_info[1][0], im_info[1][0]
+        img = img[0:h, 0:w, :]
+    return img
+
+
+def infer(args):
+    resize_h = args.image_shape[1]
+    resize_w = args.image_shape[0]
+
+    test_transforms = transforms.Compose([transforms.Normalize()])
+    model = pdx.load_model(args.model_dir)
+
+    if not osp.exists(args.save_dir):
+        os.makedirs(args.save_dir)
+
+    # 图像背景替换
+    if args.image_path is not None:
+        if not osp.exists(args.image_path):
+            raise Exception('The --image_path is not existed: {}'.format(
+                args.image_path))
+        if args.background_image_path is None:
+            raise Exception(
+                'The --background_image_path is not set. Please set it')
+        else:
+            if not osp.exists(args.background_image_path):
+                raise Exception(
+                    'The --background_image_path is not existed: {}'.format(
+                        args.background_image_path))
+
+        img = cv2.imread(args.image_path)
+        im_shape = img.shape
+        im_scale_x = float(resize_w) / float(im_shape[1])
+        im_scale_y = float(resize_h) / float(im_shape[0])
+        im = cv2.resize(
+            img,
+            None,
+            None,
+            fx=im_scale_x,
+            fy=im_scale_y,
+            interpolation=cv2.INTER_LINEAR)
+        image = im.astype('float32')
+        im_info = ('resize', im_shape[0:2])
+        pred = model.predict(image, test_transforms)
+        label_map = pred['label_map']
+        label_map = recover(label_map, im_info)
+        bg = cv2.imread(args.background_image_path)
+        save_name = osp.basename(args.image_path)
+        save_path = osp.join(args.save_dir, save_name)
+        result = bg_replace(label_map, img, bg)
+        cv2.imwrite(save_path, result)
+
+    # 视频背景替换,如果提供背景视频则以背景视频作为背景,否则采用提供的背景图片
+    else:
+        is_video_bg = False
+        if args.background_video_path is not None:
+            if not osp.exists(args.background_video_path):
+                raise Exception(
+                    'The --background_video_path is not existed: {}'.format(
+                        args.background_video_path))
+            is_video_bg = True
+        elif args.background_image_path is not None:
+            if not osp.exists(args.background_image_path):
+                raise Exception(
+                    'The --background_image_path is not existed: {}'.format(
+                        args.background_image_path))
+        else:
+            raise Exception(
+                'Please offer backgound image or video. You should set --backbground_iamge_paht or --background_video_path'
+            )
+
+        disflow = cv2.DISOpticalFlow_create(
+            cv2.DISOPTICAL_FLOW_PRESET_ULTRAFAST)
+        prev_gray = np.zeros((resize_h, resize_w), np.uint8)
+        prev_cfd = np.zeros((resize_h, resize_w), np.float32)
+        is_init = True
+        if args.video_path is not None:
+            logging.info('Please wait. It is computing......')
+            if not osp.exists(args.video_path):
+                raise Exception('The --video_path is not existed: {}'.format(
+                    args.video_path))
+
+            cap_video = cv2.VideoCapture(args.video_path)
+            fps = cap_video.get(cv2.CAP_PROP_FPS)
+            width = int(cap_video.get(cv2.CAP_PROP_FRAME_WIDTH))
+            height = int(cap_video.get(cv2.CAP_PROP_FRAME_HEIGHT))
+            save_name = osp.basename(args.video_path)
+            save_name = save_name.split('.')[0]
+            save_path = osp.join(args.save_dir, save_name + '.avi')
+
+            cap_out = cv2.VideoWriter(
+                save_path,
+                cv2.VideoWriter_fourcc('M', 'J', 'P', 'G'), fps,
+                (width, height))
+
+            if is_video_bg:
+                cap_bg = cv2.VideoCapture(args.background_video_path)
+                frames_bg = cap_bg.get(cv2.CAP_PROP_FRAME_COUNT)
+                current_frame_bg = 1
+            else:
+                img_bg = cv2.imread(args.background_image_path)
+            while cap_video.isOpened():
+                ret, frame = cap_video.read()
+                if ret:
+                    im_shape = frame.shape
+                    im_scale_x = float(resize_w) / float(im_shape[1])
+                    im_scale_y = float(resize_h) / float(im_shape[0])
+                    im = cv2.resize(
+                        frame,
+                        None,
+                        None,
+                        fx=im_scale_x,
+                        fy=im_scale_y,
+                        interpolation=cv2.INTER_LINEAR)
+                    image = im.astype('float32')
+                    im_info = ('resize', im_shape[0:2])
+                    pred = model.predict(image, test_transforms)
+                    score_map = pred['score_map']
+                    cur_gray = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
+                    cur_gray = cv2.resize(cur_gray, (resize_w, resize_h))
+                    score_map = 255 * score_map[:, :, 1]
+                    optflow_map = postprocess(cur_gray, score_map, prev_gray, prev_cfd, \
+                                              disflow, is_init)
+                    prev_gray = cur_gray.copy()
+                    prev_cfd = optflow_map.copy()
+                    is_init = False
+                    optflow_map = cv2.GaussianBlur(optflow_map, (3, 3), 0)
+                    optflow_map = threshold_mask(
+                        optflow_map, thresh_bg=0.2, thresh_fg=0.8)
+                    score_map = recover(optflow_map, im_info)
+
+                    #循环读取背景帧
+                    if is_video_bg:
+                        ret_bg, frame_bg = cap_bg.read()
+                        if ret_bg:
+                            if current_frame_bg == frames_bg:
+                                current_frame_bg = 1
+                                cap_bg.set(cv2.CAP_PROP_POS_FRAMES, 0)
+                        else:
+                            break
+                        current_frame_bg += 1
+                        comb = bg_replace(score_map, frame, frame_bg)
+                    else:
+                        comb = bg_replace(score_map, frame, img_bg)
+
+                    cap_out.write(comb)
+                else:
+                    break
+
+            if is_video_bg:
+                cap_bg.release()
+            cap_video.release()
+            cap_out.release()
+
+        # 当没有输入预测图像和视频的时候,则打开摄像头
+        else:
+            cap_video = cv2.VideoCapture(0)
+            if not cap_video.isOpened():
+                raise IOError("Error opening video stream or file, "
+                              "--video_path whether existing: {}"
+                              " or camera whether working".format(
+                                  args.video_path))
+                return
+
+            if is_video_bg:
+                cap_bg = cv2.VideoCapture(args.background_video_path)
+                frames_bg = cap_bg.get(cv2.CAP_PROP_FRAME_COUNT)
+                current_frame_bg = 1
+            else:
+                img_bg = cv2.imread(args.background_image_path)
+            while cap_video.isOpened():
+                ret, frame = cap_video.read()
+                if ret:
+                    im_shape = frame.shape
+                    im_scale_x = float(resize_w) / float(im_shape[1])
+                    im_scale_y = float(resize_h) / float(im_shape[0])
+                    im = cv2.resize(
+                        frame,
+                        None,
+                        None,
+                        fx=im_scale_x,
+                        fy=im_scale_y,
+                        interpolation=cv2.INTER_LINEAR)
+                    image = im.astype('float32')
+                    im_info = ('resize', im_shape[0:2])
+                    pred = model.predict(image, test_transforms)
+                    score_map = pred['score_map']
+                    cur_gray = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
+                    cur_gray = cv2.resize(cur_gray, (resize_w, resize_h))
+                    score_map = 255 * score_map[:, :, 1]
+                    optflow_map = postprocess(cur_gray, score_map, prev_gray, prev_cfd, \
+                                              disflow, is_init)
+                    prev_gray = cur_gray.copy()
+                    prev_cfd = optflow_map.copy()
+                    is_init = False
+                    optflow_map = cv2.GaussianBlur(optflow_map, (3, 3), 0)
+                    optflow_map = threshold_mask(
+                        optflow_map, thresh_bg=0.2, thresh_fg=0.8)
+                    score_map = recover(optflow_map, im_info)
+
+                    #循环读取背景帧
+                    if is_video_bg:
+                        ret_bg, frame_bg = cap_bg.read()
+                        if ret_bg:
+                            if current_frame_bg == frames_bg:
+                                current_frame_bg = 1
+                                cap_bg.set(cv2.CAP_PROP_POS_FRAMES, 0)
+                        else:
+                            break
+                        current_frame_bg += 1
+                        comb = bg_replace(score_map, frame, frame_bg)
+                    else:
+                        comb = bg_replace(score_map, frame, img_bg)
+                    cv2.imshow('HumanSegmentation', comb)
+                    if cv2.waitKey(1) & 0xFF == ord('q'):
+                        break
+                else:
+                    break
+            if is_video_bg:
+                cap_bg.release()
+            cap_video.release()
+
+
+if __name__ == "__main__":
+    args = parse_args()
+    infer(args)

+ 33 - 0
examples/human_segmentation/data/download_data.py

@@ -0,0 +1,33 @@
+# Copyright (c) 2020  PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License"
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import sys
+import os
+
+LOCAL_PATH = os.path.dirname(os.path.abspath(__file__))
+
+import paddlex as pdx
+
+
+def download_data(savepath):
+    url = "https://paddleseg.bj.bcebos.com/humanseg/data/mini_supervisely.zip"
+    pdx.utils.download_and_decompress(url=url, path=savepath)
+
+    url = "https://paddleseg.bj.bcebos.com/humanseg/data/video_test.zip"
+    pdx.utils.download_and_decompress(url=url, path=savepath)
+
+
+if __name__ == "__main__":
+    download_data(LOCAL_PATH)
+    print("Data download finish!")

+ 85 - 0
examples/human_segmentation/eval.py

@@ -0,0 +1,85 @@
+# coding: utf8
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+import paddlex as pdx
+import paddlex.utils.logging as logging
+from paddlex.seg import transforms
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(description='HumanSeg training')
+    parser.add_argument(
+        '--model_dir',
+        dest='model_dir',
+        help='Model path for evaluating',
+        type=str,
+        default='output/best_model')
+    parser.add_argument(
+        '--data_dir',
+        dest='data_dir',
+        help='The root directory of dataset',
+        type=str)
+    parser.add_argument(
+        '--val_list',
+        dest='val_list',
+        help='Val list file of dataset',
+        type=str,
+        default=None)
+    parser.add_argument(
+        '--batch_size',
+        dest='batch_size',
+        help='Mini batch size',
+        type=int,
+        default=128)
+    parser.add_argument(
+        "--image_shape",
+        dest="image_shape",
+        help="The image shape for net inputs.",
+        nargs=2,
+        default=[192, 192],
+        type=int)
+    return parser.parse_args()
+
+
+def dict2str(dict_input):
+    out = ''
+    for k, v in dict_input.items():
+        try:
+            v = round(float(v), 6)
+        except:
+            pass
+        out = out + '{}={}, '.format(k, v)
+    return out.strip(', ')
+
+
+def evaluate(args):
+    eval_transforms = transforms.Compose(
+        [transforms.Resize(args.image_shape), transforms.Normalize()])
+
+    eval_dataset = pdx.datasets.SegDataset(
+        data_dir=args.data_dir,
+        file_list=args.val_list,
+        transforms=eval_transforms)
+
+    model = pdx.load_model(args.model_dir)
+    metrics = model.evaluate(eval_dataset, args.batch_size)
+    logging.info('[EVAL] Finished, {} .'.format(dict2str(metrics)))
+
+
+if __name__ == '__main__':
+    args = parse_args()
+
+    evaluate(args)

+ 109 - 0
examples/human_segmentation/infer.py

@@ -0,0 +1,109 @@
+# coding: utf8
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+import os
+import os.path as osp
+import cv2
+import numpy as np
+import tqdm
+
+import paddlex as pdx
+from paddlex.seg import transforms
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(
+        description='HumanSeg prediction and visualization')
+    parser.add_argument(
+        '--model_dir',
+        dest='model_dir',
+        help='Model path for prediction',
+        type=str)
+    parser.add_argument(
+        '--data_dir',
+        dest='data_dir',
+        help='The root directory of dataset',
+        type=str)
+    parser.add_argument(
+        '--test_list',
+        dest='test_list',
+        help='Test list file of dataset',
+        type=str)
+    parser.add_argument(
+        '--save_dir',
+        dest='save_dir',
+        help='The directory for saving the inference results',
+        type=str,
+        default='./output/result')
+    parser.add_argument(
+        "--image_shape",
+        dest="image_shape",
+        help="The image shape for net inputs.",
+        nargs=2,
+        default=[192, 192],
+        type=int)
+    return parser.parse_args()
+
+
+def infer(args):
+    def makedir(path):
+        sub_dir = osp.dirname(path)
+        if not osp.exists(sub_dir):
+            os.makedirs(sub_dir)
+
+    test_transforms = transforms.Compose(
+        [transforms.Resize(args.image_shape), transforms.Normalize()])
+    model = pdx.load_model(args.model_dir)
+    added_saved_path = osp.join(args.save_dir, 'added')
+    mat_saved_path = osp.join(args.save_dir, 'mat')
+    scoremap_saved_path = osp.join(args.save_dir, 'scoremap')
+
+    with open(args.test_list, 'r') as f:
+        files = f.readlines()
+
+    for file in tqdm.tqdm(files):
+        file = file.strip()
+        im_file = osp.join(args.data_dir, file)
+        im = cv2.imread(im_file)
+        result = model.predict(im_file, transforms=test_transforms)
+
+        # save added image
+        added_image = pdx.seg.visualize(
+            im_file, result, weight=0.6, save_dir=None)
+        added_image_file = osp.join(added_saved_path, file)
+        makedir(added_image_file)
+        cv2.imwrite(added_image_file, added_image)
+
+        # save score map
+        score_map = result['score_map'][:, :, 1]
+        score_map = (score_map * 255).astype(np.uint8)
+        score_map_file = osp.join(scoremap_saved_path, file)
+        makedir(score_map_file)
+        cv2.imwrite(score_map_file, score_map)
+
+        # save mat image
+        score_map = np.expand_dims(score_map, axis=-1)
+        mat_image = np.concatenate([im, score_map], axis=2)
+        mat_file = osp.join(mat_saved_path, file)
+        ext = osp.splitext(mat_file)[-1]
+        mat_file = mat_file.replace(ext, '.png')
+        makedir(mat_file)
+        cv2.imwrite(mat_file, mat_image)
+
+
+if __name__ == '__main__':
+    args = parse_args()
+    infer(args)

+ 125 - 0
examples/human_segmentation/postprocess.py

@@ -0,0 +1,125 @@
+# coding: utf8
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import numpy as np
+
+
+def cal_optical_flow_tracking(pre_gray, cur_gray, prev_cfd, dl_weights,
+                              disflow):
+    """计算光流跟踪匹配点和光流图
+    输入参数:
+        pre_gray: 上一帧灰度图
+        cur_gray: 当前帧灰度图
+        prev_cfd: 上一帧光流图
+        dl_weights: 融合权重图
+        disflow: 光流数据结构
+    返回值:
+        is_track: 光流点跟踪二值图,即是否具有光流点匹配
+        track_cfd: 光流跟踪图
+    """
+    check_thres = 8
+    h, w = pre_gray.shape[:2]
+    track_cfd = np.zeros_like(prev_cfd)
+    is_track = np.zeros_like(pre_gray)
+    flow_fw = disflow.calc(pre_gray, cur_gray, None)
+    flow_bw = disflow.calc(cur_gray, pre_gray, None)
+    flow_fw = np.round(flow_fw).astype(np.int)
+    flow_bw = np.round(flow_bw).astype(np.int)
+    y_list = np.array(range(h))
+    x_list = np.array(range(w))
+    yv, xv = np.meshgrid(y_list, x_list)
+    yv, xv = yv.T, xv.T
+    cur_x = xv + flow_fw[:, :, 0]
+    cur_y = yv + flow_fw[:, :, 1]
+
+    # 超出边界不跟踪
+    not_track = (cur_x < 0) + (cur_x >= w) + (cur_y < 0) + (cur_y >= h)
+    flow_bw[~not_track] = flow_bw[cur_y[~not_track], cur_x[~not_track]]
+    not_track += (np.square(flow_fw[:, :, 0] + flow_bw[:, :, 0]) +
+                  np.square(flow_fw[:, :, 1] + flow_bw[:, :, 1])
+                  ) >= check_thres
+    track_cfd[cur_y[~not_track], cur_x[~not_track]] = prev_cfd[~not_track]
+
+    is_track[cur_y[~not_track], cur_x[~not_track]] = 1
+
+    not_flow = np.all(np.abs(flow_fw) == 0,
+                      axis=-1) * np.all(np.abs(flow_bw) == 0, axis=-1)
+    dl_weights[cur_y[not_flow], cur_x[not_flow]] = 0.05
+    return track_cfd, is_track, dl_weights
+
+
+def fuse_optical_flow_tracking(track_cfd, dl_cfd, dl_weights, is_track):
+    """光流追踪图和人像分割结构融合
+    输入参数:
+        track_cfd: 光流追踪图
+        dl_cfd: 当前帧分割结果
+        dl_weights: 融合权重图
+        is_track: 光流点匹配二值图
+    返回
+        cur_cfd: 光流跟踪图和人像分割结果融合图
+    """
+    fusion_cfd = dl_cfd.copy()
+    is_track = is_track.astype(np.bool)
+    fusion_cfd[is_track] = dl_weights[is_track] * dl_cfd[is_track] + (
+        1 - dl_weights[is_track]) * track_cfd[is_track]
+    # 确定区域
+    index_certain = ((dl_cfd > 0.9) + (dl_cfd < 0.1)) * is_track
+    index_less01 = (dl_weights < 0.1) * index_certain
+    fusion_cfd[index_less01] = 0.3 * dl_cfd[index_less01] + 0.7 * track_cfd[
+        index_less01]
+    index_larger09 = (dl_weights >= 0.1) * index_certain
+    fusion_cfd[index_larger09] = 0.4 * dl_cfd[
+        index_larger09] + 0.6 * track_cfd[index_larger09]
+    return fusion_cfd
+
+
+def threshold_mask(img, thresh_bg, thresh_fg):
+    dst = (img / 255.0 - thresh_bg) / (thresh_fg - thresh_bg)
+    dst[np.where(dst > 1)] = 1
+    dst[np.where(dst < 0)] = 0
+    return dst.astype(np.float32)
+
+
+def postprocess(cur_gray, scoremap, prev_gray, pre_cfd, disflow, is_init):
+    """光流优化
+    Args:
+        cur_gray : 当前帧灰度图
+        pre_gray : 前一帧灰度图
+        pre_cfd  :前一帧融合结果
+        scoremap : 当前帧分割结果
+        difflow  : 光流
+        is_init : 是否第一帧
+    Returns:
+        fusion_cfd : 光流追踪图和预测结果融合图
+    """
+    h, w = scoremap.shape
+    cur_cfd = scoremap.copy()
+
+    if is_init:
+        if h <= 64 or w <= 64:
+            disflow.setFinestScale(1)
+        elif h <= 160 or w <= 160:
+            disflow.setFinestScale(2)
+        else:
+            disflow.setFinestScale(3)
+        fusion_cfd = cur_cfd
+    else:
+        weights = np.ones((h, w), np.float32) * 0.3
+        track_cfd, is_track, weights = cal_optical_flow_tracking(
+            prev_gray, cur_gray, pre_cfd, weights, disflow)
+        fusion_cfd = fuse_optical_flow_tracking(track_cfd, cur_cfd, weights,
+                                                is_track)
+
+    return fusion_cfd

+ 40 - 0
examples/human_segmentation/pretrain_weights/download_pretrain_weights.py

@@ -0,0 +1,40 @@
+# coding: utf8
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import sys
+import os
+
+LOCAL_PATH = os.path.dirname(os.path.abspath(__file__))
+
+import paddlex as pdx
+import paddlehub as hub
+
+model_urls = {
+    "PaddleX_HumanSeg_Server_Params":
+    "https://bj.bcebos.com/paddlex/models/humanseg/humanseg_server_params.tar",
+    "PaddleX_HumanSeg_Server_Inference":
+    "https://bj.bcebos.com/paddlex/models/humanseg/humanseg_server_inference.tar",
+    "PaddleX_HumanSeg_Mobile_Params":
+    "https://bj.bcebos.com/paddlex/models/humanseg/humanseg_mobile_params.tar",
+    "PaddleX_HumanSeg_Mobile_Inference":
+    "https://bj.bcebos.com/paddlex/models/humanseg/humanseg_mobile_inference.tar",
+    "PaddleX_HumanSeg_Mobile_Quant":
+    "https://bj.bcebos.com/paddlex/models/humanseg/humanseg_mobile_quant.tar"
+}
+
+if __name__ == "__main__":
+    for model_name, url in model_urls.items():
+        pdx.utils.download_and_decompress(url=url, path=LOCAL_PATH)
+    print("Pretrained Model download success!")

+ 85 - 0
examples/human_segmentation/quant_offline.py

@@ -0,0 +1,85 @@
+# coding: utf8
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+import paddlex as pdx
+from paddlex.seg import transforms
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(description='HumanSeg training')
+    parser.add_argument(
+        '--model_dir',
+        dest='model_dir',
+        help='Model path for quant',
+        type=str,
+        default='output/best_model')
+    parser.add_argument(
+        '--batch_size',
+        dest='batch_size',
+        help='Mini batch size',
+        type=int,
+        default=1)
+    parser.add_argument(
+        '--batch_nums',
+        dest='batch_nums',
+        help='Batch number for quant',
+        type=int,
+        default=10)
+    parser.add_argument(
+        '--data_dir',
+        dest='data_dir',
+        help='the root directory of dataset',
+        type=str)
+    parser.add_argument(
+        '--quant_list',
+        dest='quant_list',
+        help='Image file list for model quantization, it can be vat.txt or train.txt',
+        type=str,
+        default=None)
+    parser.add_argument(
+        '--save_dir',
+        dest='save_dir',
+        help='The directory for saving the quant model',
+        type=str,
+        default='./output/quant_offline')
+    parser.add_argument(
+        "--image_shape",
+        dest="image_shape",
+        help="The image shape for net inputs.",
+        nargs=2,
+        default=[192, 192],
+        type=int)
+    return parser.parse_args()
+
+
+def evaluate(args):
+    eval_transforms = transforms.Compose(
+        [transforms.Resize(args.image_shape), transforms.Normalize()])
+
+    eval_dataset = pdx.datasets.SegDataset(
+        data_dir=args.data_dir,
+        file_list=args.quant_list,
+        transforms=eval_transforms)
+
+    model = pdx.load_model(args.model_dir)
+    pdx.slim.export_quant_model(model, eval_dataset, args.batch_size,
+                                args.batch_nums, args.save_dir)
+
+
+if __name__ == '__main__':
+    args = parse_args()
+
+    evaluate(args)

+ 156 - 0
examples/human_segmentation/train.py

@@ -0,0 +1,156 @@
+# coding: utf8
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+
+import paddlex as pdx
+from paddlex.seg import transforms
+
+MODEL_TYPE = ['HumanSegMobile', 'HumanSegServer']
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(description='HumanSeg training')
+    parser.add_argument(
+        '--model_type',
+        dest='model_type',
+        help="Model type for traing, which is one of ('HumanSegMobile', 'HumanSegServer')",
+        type=str,
+        default='HumanSegMobile')
+    parser.add_argument(
+        '--data_dir',
+        dest='data_dir',
+        help='The root directory of dataset',
+        type=str)
+    parser.add_argument(
+        '--train_list',
+        dest='train_list',
+        help='Train list file of dataset',
+        type=str)
+    parser.add_argument(
+        '--val_list',
+        dest='val_list',
+        help='Val list file of dataset',
+        type=str,
+        default=None)
+    parser.add_argument(
+        '--save_dir',
+        dest='save_dir',
+        help='The directory for saving the model snapshot',
+        type=str,
+        default='./output')
+    parser.add_argument(
+        '--num_classes',
+        dest='num_classes',
+        help='Number of classes',
+        type=int,
+        default=2)
+    parser.add_argument(
+        "--image_shape",
+        dest="image_shape",
+        help="The image shape for net inputs.",
+        nargs=2,
+        default=[192, 192],
+        type=int)
+    parser.add_argument(
+        '--num_epochs',
+        dest='num_epochs',
+        help='Number epochs for training',
+        type=int,
+        default=100)
+    parser.add_argument(
+        '--batch_size',
+        dest='batch_size',
+        help='Mini batch size',
+        type=int,
+        default=128)
+    parser.add_argument(
+        '--learning_rate',
+        dest='learning_rate',
+        help='Learning rate',
+        type=float,
+        default=0.01)
+    parser.add_argument(
+        '--pretrain_weights',
+        dest='pretrain_weights',
+        help='The path of pretrianed weight',
+        type=str,
+        default=None)
+    parser.add_argument(
+        '--resume_checkpoint',
+        dest='resume_checkpoint',
+        help='The path of resume checkpoint',
+        type=str,
+        default=None)
+    parser.add_argument(
+        '--use_vdl',
+        dest='use_vdl',
+        help='Whether to use visualdl',
+        action='store_true')
+    parser.add_argument(
+        '--save_interval_epochs',
+        dest='save_interval_epochs',
+        help='The interval epochs for save a model snapshot',
+        type=int,
+        default=5)
+
+    return parser.parse_args()
+
+
+def train(args):
+    train_transforms = transforms.Compose([
+        transforms.Resize(args.image_shape), transforms.RandomHorizontalFlip(),
+        transforms.Normalize()
+    ])
+
+    eval_transforms = transforms.Compose(
+        [transforms.Resize(args.image_shape), transforms.Normalize()])
+
+    train_dataset = pdx.datasets.SegDataset(
+        data_dir=args.data_dir,
+        file_list=args.train_list,
+        transforms=train_transforms,
+        shuffle=True)
+    eval_dataset = pdx.datasets.SegDataset(
+        data_dir=args.data_dir,
+        file_list=args.val_list,
+        transforms=eval_transforms)
+
+    if args.model_type == 'HumanSegMobile':
+        model = pdx.seg.HRNet(
+            num_classes=args.num_classes, width='18_small_v1')
+    elif args.model_type == 'HumanSegServer':
+        model = pdx.seg.DeepLabv3p(
+            num_classes=args.num_classes, backbone='Xception65')
+    else:
+        raise ValueError(
+            "--model_type: {} is set wrong, it shold be one of ('HumanSegMobile', "
+            "'HumanSegLite', 'HumanSegServer')".format(args.model_type))
+    model.train(
+        num_epochs=args.num_epochs,
+        train_dataset=train_dataset,
+        train_batch_size=args.batch_size,
+        eval_dataset=eval_dataset,
+        save_interval_epochs=args.save_interval_epochs,
+        learning_rate=args.learning_rate,
+        pretrain_weights=args.pretrain_weights,
+        resume_checkpoint=args.resume_checkpoint,
+        save_dir=args.save_dir,
+        use_vdl=args.use_vdl)
+
+
+if __name__ == '__main__':
+    args = parse_args()
+    train(args)

+ 187 - 0
examples/human_segmentation/video_infer.py

@@ -0,0 +1,187 @@
+# coding: utf8
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+import os
+import os.path as osp
+import cv2
+import numpy as np
+
+from postprocess import postprocess, threshold_mask
+import paddlex as pdx
+import paddlex.utils.logging as logging
+from paddlex.seg import transforms
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(
+        description='HumanSeg inference for video')
+    parser.add_argument(
+        '--model_dir',
+        dest='model_dir',
+        help='Model path for inference',
+        type=str)
+    parser.add_argument(
+        '--video_path',
+        dest='video_path',
+        help='Video path for inference, camera will be used if the path not existing',
+        type=str,
+        default=None)
+    parser.add_argument(
+        '--save_dir',
+        dest='save_dir',
+        help='The directory for saving the inference results',
+        type=str,
+        default='./output')
+    parser.add_argument(
+        "--image_shape",
+        dest="image_shape",
+        help="The image shape for net inputs.",
+        nargs=2,
+        default=[192, 192],
+        type=int)
+
+    return parser.parse_args()
+
+
+def recover(img, im_info):
+    if im_info[0] == 'resize':
+        w, h = im_info[1][1], im_info[1][0]
+        img = cv2.resize(img, (w, h), cv2.INTER_LINEAR)
+    elif im_info[0] == 'padding':
+        w, h = im_info[1][0], im_info[1][0]
+        img = img[0:h, 0:w, :]
+    return img
+
+
+def video_infer(args):
+    resize_h = args.image_shape[1]
+    resize_w = args.image_shape[0]
+
+    model = pdx.load_model(args.model_dir)
+    test_transforms = transforms.Compose([transforms.Normalize()])
+    if not args.video_path:
+        cap = cv2.VideoCapture(0)
+    else:
+        cap = cv2.VideoCapture(args.video_path)
+    if not cap.isOpened():
+        raise IOError("Error opening video stream or file, "
+                      "--video_path whether existing: {}"
+                      " or camera whether working".format(args.video_path))
+        return
+
+    width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
+    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
+
+    disflow = cv2.DISOpticalFlow_create(cv2.DISOPTICAL_FLOW_PRESET_ULTRAFAST)
+    prev_gray = np.zeros((resize_h, resize_w), np.uint8)
+    prev_cfd = np.zeros((resize_h, resize_w), np.float32)
+    is_init = True
+
+    fps = cap.get(cv2.CAP_PROP_FPS)
+    if args.video_path:
+        logging.info("Please wait. It is computing......")
+        # 用于保存预测结果视频
+        if not osp.exists(args.save_dir):
+            os.makedirs(args.save_dir)
+        out = cv2.VideoWriter(
+            osp.join(args.save_dir, 'result.avi'),
+            cv2.VideoWriter_fourcc('M', 'J', 'P', 'G'), fps, (width, height))
+        # 开始获取视频帧
+        while cap.isOpened():
+            ret, frame = cap.read()
+            if ret:
+                im_shape = frame.shape
+                im_scale_x = float(resize_w) / float(im_shape[1])
+                im_scale_y = float(resize_h) / float(im_shape[0])
+                im = cv2.resize(
+                    frame,
+                    None,
+                    None,
+                    fx=im_scale_x,
+                    fy=im_scale_y,
+                    interpolation=cv2.INTER_LINEAR)
+                image = im.astype('float32')
+                im_info = ('resize', im_shape[0:2])
+                pred = model.predict(image, test_transforms)
+                score_map = pred['score_map']
+                cur_gray = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
+                score_map = 255 * score_map[:, :, 1]
+                optflow_map = postprocess(cur_gray, score_map, prev_gray, prev_cfd, \
+                        disflow, is_init)
+                prev_gray = cur_gray.copy()
+                prev_cfd = optflow_map.copy()
+                is_init = False
+                optflow_map = cv2.GaussianBlur(optflow_map, (3, 3), 0)
+                optflow_map = threshold_mask(
+                    optflow_map, thresh_bg=0.2, thresh_fg=0.8)
+                img_matting = np.repeat(
+                    optflow_map[:, :, np.newaxis], 3, axis=2)
+                img_matting = recover(img_matting, im_info)
+                bg_im = np.ones_like(img_matting) * 255
+                comb = (img_matting * frame +
+                        (1 - img_matting) * bg_im).astype(np.uint8)
+                out.write(comb)
+            else:
+                break
+        cap.release()
+        out.release()
+
+    else:
+        while cap.isOpened():
+            ret, frame = cap.read()
+            if ret:
+                im_shape = frame.shape
+                im_scale_x = float(resize_w) / float(im_shape[1])
+                im_scale_y = float(resize_h) / float(im_shape[0])
+                im = cv2.resize(
+                    frame,
+                    None,
+                    None,
+                    fx=im_scale_x,
+                    fy=im_scale_y,
+                    interpolation=cv2.INTER_LINEAR)
+                image = im.astype('float32')
+                im_info = ('resize', im_shape[0:2])
+                pred = model.predict(image, test_transforms)
+                score_map = pred['score_map']
+                cur_gray = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
+                cur_gray = cv2.resize(cur_gray, (resize_w, resize_h))
+                score_map = 255 * score_map[:, :, 1]
+                optflow_map = postprocess(cur_gray, score_map, prev_gray, prev_cfd, \
+                                          disflow, is_init)
+                prev_gray = cur_gray.copy()
+                prev_cfd = optflow_map.copy()
+                is_init = False
+                optflow_map = cv2.GaussianBlur(optflow_map, (3, 3), 0)
+                optflow_map = threshold_mask(
+                    optflow_map, thresh_bg=0.2, thresh_fg=0.8)
+                img_matting = np.repeat(
+                    optflow_map[:, :, np.newaxis], 3, axis=2)
+                img_matting = recover(img_matting, im_info)
+                bg_im = np.ones_like(img_matting) * 255
+                comb = (img_matting * frame +
+                        (1 - img_matting) * bg_im).astype(np.uint8)
+                cv2.imshow('HumanSegmentation', comb)
+                if cv2.waitKey(1) & 0xFF == ord('q'):
+                    break
+            else:
+                break
+        cap.release()
+
+
+if __name__ == "__main__":
+    args = parse_args()
+    video_infer(args)

+ 0 - 21
new_tutorials/train/README.md

@@ -1,21 +0,0 @@
-# 使用教程——训练模型
-
-本目录下整理了使用PaddleX训练模型的示例代码,代码中均提供了示例数据的自动下载,并均使用单张GPU卡进行训练。
-
-|代码 | 模型任务 | 数据 |
-|------|--------|---------|
-|classification/mobilenetv2.py | 图像分类MobileNetV2 | 蔬菜分类 |
-|classification/resnet50.py | 图像分类ResNet50 | 蔬菜分类 |
-|detection/faster_rcnn_r50_fpn.py | 目标检测FasterRCNN | 昆虫检测 |
-|detection/mask_rcnn_f50_fpn.py | 实例分割MaskRCNN | 垃圾分拣 |
-|segmentation/deeplabv3p.py | 语义分割DeepLabV3| 视盘分割 |
-|segmentation/unet.py | 语义分割UNet | 视盘分割 |
-|segmentation/hrnet.py | 语义分割HRNet | 视盘分割 |
-|segmentation/fast_scnn.py | 语义分割FastSCNN | 视盘分割 |
-
-
-## 开始训练
-在安装PaddleX后,使用如下命令开始训练
-```
-python classification/mobilenetv2.py
-```

+ 0 - 47
new_tutorials/train/classification/mobilenetv2.py

@@ -1,47 +0,0 @@
-import os
-# 选择使用0号卡
-os.environ['CUDA_VISIBLE_DEVICES'] = '0'
-
-from paddlex.cls import transforms
-import paddlex as pdx
-
-# 下载和解压蔬菜分类数据集
-veg_dataset = 'https://bj.bcebos.com/paddlex/datasets/vegetables_cls.tar.gz'
-pdx.utils.download_and_decompress(veg_dataset, path='./')
-
-# 定义训练和验证时的transforms
-# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/transforms/cls_transforms.html#composedclstransforms
-train_transforms = transforms.ComposedClsTransforms(mode='train', crop_size=[224, 224])
-eval_transforms = transforms.ComposedClsTransforms(mode='eval', crop_size=[224, 224])
-
-# 定义训练和验证所用的数据集
-# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/datasets/classification.html#imagenet
-train_dataset = pdx.datasets.ImageNet(
-    data_dir='vegetables_cls',
-    file_list='vegetables_cls/train_list.txt',
-    label_list='vegetables_cls/labels.txt',
-    transforms=train_transforms,
-    shuffle=True)
-eval_dataset = pdx.datasets.ImageNet(
-    data_dir='vegetables_cls',
-    file_list='vegetables_cls/val_list.txt',
-    label_list='vegetables_cls/labels.txt',
-    transforms=eval_transforms)
-
-# 初始化模型,并进行训练
-# 可使用VisualDL查看训练指标
-# VisualDL启动方式: visualdl --logdir output/mobilenetv2/vdl_log --port 8001
-# 浏览器打开 https://0.0.0.0:8001即可
-# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
-
-# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/models/classification.html#resnet50
-model = pdx.cls.MobileNetV2(num_classes=len(train_dataset.labels))
-model.train(
-    num_epochs=10,
-    train_dataset=train_dataset,
-    train_batch_size=32,
-    eval_dataset=eval_dataset,
-    lr_decay_epochs=[4, 6, 8],
-    learning_rate=0.025,
-    save_dir='output/mobilenetv2',
-    use_vdl=True)

+ 0 - 56
new_tutorials/train/classification/resnet50.py

@@ -1,56 +0,0 @@
-import os
-# 选择使用0号卡
-os.environ['CUDA_VISIBLE_DEVICES'] = '0'
-
-import paddle.fluid as fluid
-from paddlex.cls import transforms
-import paddlex as pdx
-
-# 下载和解压蔬菜分类数据集
-veg_dataset = 'https://bj.bcebos.com/paddlex/datasets/vegetables_cls.tar.gz'
-pdx.utils.download_and_decompress(veg_dataset, path='./')
-
-# 定义训练和验证时的transforms
-# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/transforms/cls_transforms.html#composedclstransforms
-train_transforms = transforms.ComposedClsTransforms(mode='train', crop_size=[224, 224])
-eval_transforms = transforms.ComposedClsTransforms(mode='eval', crop_size=[224, 224])
-
-# 定义训练和验证所用的数据集
-# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/datasets/classification.html#imagenet
-train_dataset = pdx.datasets.ImageNet(
-    data_dir='vegetables_cls',
-    file_list='vegetables_cls/train_list.txt',
-    label_list='vegetables_cls/labels.txt',
-    transforms=train_transforms,
-    shuffle=True)
-eval_dataset = pdx.datasets.ImageNet(
-    data_dir='vegetables_cls',
-    file_list='vegetables_cls/val_list.txt',
-    label_list='vegetables_cls/labels.txt',
-    transforms=eval_transforms)
-
-# PaddleX支持自定义构建优化器
-step_each_epoch = train_dataset.num_samples // 32
-learning_rate = fluid.layers.cosine_decay(
-    learning_rate=0.025, step_each_epoch=step_each_epoch, epochs=10)
-optimizer = fluid.optimizer.Momentum(
-    learning_rate=learning_rate,
-    momentum=0.9,
-    regularization=fluid.regularizer.L2Decay(4e-5))
-
-# 初始化模型,并进行训练
-# 可使用VisualDL查看训练指标
-# VisualDL启动方式: visualdl --logdir output/resnet50/vdl_log --port 8001
-# 浏览器打开 https://0.0.0.0:8001即可
-# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
-
-# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/models/classification.html#resnet50
-model = pdx.cls.ResNet50(num_classes=len(train_dataset.labels))
-model.train(
-    num_epochs=10,
-    train_dataset=train_dataset,
-    train_batch_size=32,
-    eval_dataset=eval_dataset,
-    optimizer=optimizer,
-    save_dir='output/resnet50',
-    use_vdl=True)

+ 0 - 49
new_tutorials/train/detection/faster_rcnn_r50_fpn.py

@@ -1,49 +0,0 @@
-import os
-# 选择使用0号卡
-os.environ['CUDA_VISIBLE_DEVICES'] = '0'
-
-from paddlex.det import transforms
-import paddlex as pdx
-
-# 下载和解压昆虫检测数据集
-insect_dataset = 'https://bj.bcebos.com/paddlex/datasets/insect_det.tar.gz'
-pdx.utils.download_and_decompress(insect_dataset, path='./')
-
-# 定义训练和验证时的transforms
-# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/transforms/det_transforms.html#composedrcnntransforms
-train_transforms = transforms.ComposedRCNNTransforms(mode='train', min_max_size=[800, 1333])
-eval_transforms = transforms.ComposedRCNNTransforms(mode='eval', min_max_size=[800, 1333])
-
-# 定义训练和验证所用的数据集
-# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/datasets/detection.html#vocdetection
-train_dataset = pdx.datasets.VOCDetection(
-    data_dir='insect_det',
-    file_list='insect_det/train_list.txt',
-    label_list='insect_det/labels.txt',
-    transforms=train_transforms,
-    shuffle=True)
-eval_dataset = pdx.datasets.VOCDetection(
-    data_dir='insect_det',
-    file_list='insect_det/val_list.txt',
-    label_list='insect_det/labels.txt',
-    transforms=eval_transforms)
-
-# 初始化模型,并进行训练
-# 可使用VisualDL查看训练指标
-# VisualDL启动方式: visualdl --logdir output/faster_rcnn_r50_fpn/vdl_log --port 8001
-# 浏览器打开 https://0.0.0.0:8001即可
-# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
-# num_classes 需要设置为包含背景类的类别数,即: 目标类别数量 + 1
-
-# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/models/detection.html#fasterrcnn
-num_classes = len(train_dataset.labels) + 1
-model = pdx.det.FasterRCNN(num_classes=num_classes)
-model.train(
-    num_epochs=12,
-    train_dataset=train_dataset,
-    train_batch_size=2,
-    eval_dataset=eval_dataset,
-    learning_rate=0.0025,
-    lr_decay_epochs=[8, 11],
-    save_dir='output/faster_rcnn_r50_fpn',
-    use_vdl=True)

+ 0 - 48
new_tutorials/train/detection/mask_rcnn_r50_fpn.py

@@ -1,48 +0,0 @@
-import os
-# 选择使用0号卡
-os.environ['CUDA_VISIBLE_DEVICES'] = '0'
-
-from paddlex.det import transforms
-import paddlex as pdx
-
-# 下载和解压小度熊分拣数据集
-xiaoduxiong_dataset = 'https://bj.bcebos.com/paddlex/datasets/xiaoduxiong_ins_det.tar.gz'
-pdx.utils.download_and_decompress(xiaoduxiong_dataset, path='./')
-
-# 定义训练和验证时的transforms
-# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/transforms/det_transforms.html#composedrcnntransforms
-train_transforms = transforms.ComposedRCNNTransforms(mode='train', min_max_size=[800, 1333])
-eval_transforms = transforms.ComposedRCNNTransforms(mode='eval', min_max_size=[800, 1333])
-
-# 定义训练和验证所用的数据集
-# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/datasets/detection.html#cocodetection
-train_dataset = pdx.datasets.CocoDetection(
-    data_dir='xiaoduxiong_ins_det/JPEGImages',
-    ann_file='xiaoduxiong_ins_det/train.json',
-    transforms=train_transforms,
-    shuffle=True)
-eval_dataset = pdx.datasets.CocoDetection(
-    data_dir='xiaoduxiong_ins_det/JPEGImages',
-    ann_file='xiaoduxiong_ins_det/val.json',
-    transforms=eval_transforms)
-
-# 初始化模型,并进行训练
-# 可使用VisualDL查看训练指标
-# VisualDL启动方式: visualdl --logdir output/mask_rcnn_r50_fpn/vdl_log --port 8001
-# 浏览器打开 https://0.0.0.0:8001即可
-# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
-# num_classes 需要设置为包含背景类的类别数,即: 目标类别数量 + 1
-
-# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/models/instance_segmentation.html#maskrcnn
-num_classes = len(train_dataset.labels) + 1
-model = pdx.det.MaskRCNN(num_classes=num_classes)
-model.train(
-    num_epochs=12,
-    train_dataset=train_dataset,
-    train_batch_size=1,
-    eval_dataset=eval_dataset,
-    learning_rate=0.00125,
-    warmup_steps=10,
-    lr_decay_epochs=[8, 11],
-    save_dir='output/mask_rcnn_r50_fpn',
-    use_vdl=True)

+ 0 - 48
new_tutorials/train/detection/yolov3_darknet53.py

@@ -1,48 +0,0 @@
-import os
-# 选择使用0号卡
-os.environ['CUDA_VISIBLE_DEVICES'] = '0'
-
-from paddlex.det import transforms
-import paddlex as pdx
-
-# 下载和解压昆虫检测数据集
-insect_dataset = 'https://bj.bcebos.com/paddlex/datasets/insect_det.tar.gz'
-pdx.utils.download_and_decompress(insect_dataset, path='./')
-
-# 定义训练和验证时的transforms
-# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/transforms/det_transforms.html#composedyolotransforms
-train_transforms = transforms.ComposedYOLOv3Transforms(mode='train', shape=[608, 608])
-eval_transforms = transforms.ComposedYOLOv3Transforms(mode='eva', shape=[608, 608])
-
-# 定义训练和验证所用的数据集
-# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/datasets/detection.html#vocdetection
-train_dataset = pdx.datasets.VOCDetection(
-    data_dir='insect_det',
-    file_list='insect_det/train_list.txt',
-    label_list='insect_det/labels.txt',
-    transforms=train_transforms,
-    shuffle=True)
-eval_dataset = pdx.datasets.VOCDetection(
-    data_dir='insect_det',
-    file_list='insect_det/val_list.txt',
-    label_list='insect_det/labels.txt',
-    transforms=eval_transforms)
-
-# 初始化模型,并进行训练
-# 可使用VisualDL查看训练指标
-# VisualDL启动方式: visualdl --logdir output/yolov3_darknet/vdl_log --port 8001
-# 浏览器打开 https://0.0.0.0:8001即可
-# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
-
-# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/models/detection.html#yolov3
-num_classes = len(train_dataset.labels)
-model = pdx.det.YOLOv3(num_classes=num_classes, backbone='DarkNet53')
-model.train(
-    num_epochs=270,
-    train_dataset=train_dataset,
-    train_batch_size=8,
-    eval_dataset=eval_dataset,
-    learning_rate=0.000125,
-    lr_decay_epochs=[210, 240],
-    save_dir='output/yolov3_darknet53',
-    use_vdl=True)

+ 0 - 51
new_tutorials/train/segmentation/deeplabv3p.py

@@ -1,51 +0,0 @@
-import os
-# 选择使用0号卡
-os.environ['CUDA_VISIBLE_DEVICES'] = '0'
-
-import paddlex as pdx
-from paddlex.seg import transforms
-
-# 下载和解压视盘分割数据集
-optic_dataset = 'https://bj.bcebos.com/paddlex/datasets/optic_disc_seg.tar.gz'
-pdx.utils.download_and_decompress(optic_dataset, path='./')
-
-# 定义训练和验证时的transforms
-# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/transforms/seg_transforms.html#composedsegtransforms
-train_transforms = transforms.ComposedSegTransforms(mode='train', train_crop_size=[769, 769])
-eval_transforms = transforms.ComposedSegTransforms(mode='eval')
-
-train_transforms.add_augmenters([
-    transforms.RandomRotate()
-])
-
-# 定义训练和验证所用的数据集
-# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/datasets/semantic_segmentation.html#segdataset
-train_dataset = pdx.datasets.SegDataset(
-    data_dir='optic_disc_seg',
-    file_list='optic_disc_seg/train_list.txt',
-    label_list='optic_disc_seg/labels.txt',
-    transforms=train_transforms,
-    shuffle=True)
-eval_dataset = pdx.datasets.SegDataset(
-    data_dir='optic_disc_seg',
-    file_list='optic_disc_seg/val_list.txt',
-    label_list='optic_disc_seg/labels.txt',
-    transforms=eval_transforms)
-
-# 初始化模型,并进行训练
-# 可使用VisualDL查看训练指标
-# VisualDL启动方式: visualdl --logdir output/deeplab/vdl_log --port 8001
-# 浏览器打开 https://0.0.0.0:8001即可
-# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
-
-# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/models/semantic_segmentation.html#deeplabv3p
-num_classes = len(train_dataset.labels)
-model = pdx.seg.DeepLabv3p(num_classes=num_classes)
-model.train(
-    num_epochs=40,
-    train_dataset=train_dataset,
-    train_batch_size=4,
-    eval_dataset=eval_dataset,
-    learning_rate=0.01,
-    save_dir='output/deeplab',
-    use_vdl=True)

+ 0 - 47
new_tutorials/train/segmentation/hrnet.py

@@ -1,47 +0,0 @@
-import os
-# 选择使用0号卡
-os.environ['CUDA_VISIBLE_DEVICES'] = '0'
-
-import paddlex as pdx
-from paddlex.seg import transforms
-
-# 下载和解压视盘分割数据集
-optic_dataset = 'https://bj.bcebos.com/paddlex/datasets/optic_disc_seg.tar.gz'
-pdx.utils.download_and_decompress(optic_dataset, path='./')
-
-# 定义训练和验证时的transforms
-# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/transforms/seg_transforms.html#composedsegtransforms
-train_transforms = transforms.ComposedSegTransforms(mode='train', train_crop_size=[769, 769])
-eval_transforms = transforms.ComposedSegTransforms(mode='eval')
-
-# 定义训练和验证所用的数据集
-# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/datasets/semantic_segmentation.html#segdataset
-train_dataset = pdx.datasets.SegDataset(
-    data_dir='optic_disc_seg',
-    file_list='optic_disc_seg/train_list.txt',
-    label_list='optic_disc_seg/labels.txt',
-    transforms=train_transforms,
-    shuffle=True)
-eval_dataset = pdx.datasets.SegDataset(
-    data_dir='optic_disc_seg',
-    file_list='optic_disc_seg/val_list.txt',
-    label_list='optic_disc_seg/labels.txt',
-    transforms=eval_transforms)
-
-# 初始化模型,并进行训练
-# 可使用VisualDL查看训练指标
-# VisualDL启动方式: visualdl --logdir output/unet/vdl_log --port 8001
-# 浏览器打开 https://0.0.0.0:8001即可
-# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
-
-# https://paddlex.readthedocs.io/zh_CN/latest/apis/models/semantic_segmentation.html#hrnet
-num_classes = len(train_dataset.labels)
-model = pdx.seg.HRNet(num_classes=num_classes)
-model.train(
-    num_epochs=20,
-    train_dataset=train_dataset,
-    train_batch_size=4,
-    eval_dataset=eval_dataset,
-    learning_rate=0.01,
-    save_dir='output/hrnet',
-    use_vdl=True)

+ 0 - 47
new_tutorials/train/segmentation/unet.py

@@ -1,47 +0,0 @@
-import os
-# 选择使用0号卡
-os.environ['CUDA_VISIBLE_DEVICES'] = '0'
-
-import paddlex as pdx
-from paddlex.seg import transforms
-
-# 下载和解压视盘分割数据集
-optic_dataset = 'https://bj.bcebos.com/paddlex/datasets/optic_disc_seg.tar.gz'
-pdx.utils.download_and_decompress(optic_dataset, path='./')
-
-# 定义训练和验证时的transforms
-# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/transforms/seg_transforms.html#composedsegtransforms
-train_transforms = transforms.ComposedSegTransforms(mode='train', train_crop_size=[769, 769])
-eval_transforms = transforms.ComposedSegTransforms(mode='eval')
-
-# 定义训练和验证所用的数据集
-# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/datasets/semantic_segmentation.html#segdataset
-train_dataset = pdx.datasets.SegDataset(
-    data_dir='optic_disc_seg',
-    file_list='optic_disc_seg/train_list.txt',
-    label_list='optic_disc_seg/labels.txt',
-    transforms=train_transforms,
-    shuffle=True)
-eval_dataset = pdx.datasets.SegDataset(
-    data_dir='optic_disc_seg',
-    file_list='optic_disc_seg/val_list.txt',
-    label_list='optic_disc_seg/labels.txt',
-    transforms=eval_transforms)
-
-# 初始化模型,并进行训练
-# 可使用VisualDL查看训练指标
-# VisualDL启动方式: visualdl --logdir output/unet/vdl_log --port 8001
-# 浏览器打开 https://0.0.0.0:8001即可
-# 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
-
-# API说明: https://paddlex.readthedocs.io/zh_CN/latest/apis/models/semantic_segmentation.html#unet
-num_classes = len(train_dataset.labels)
-model = pdx.seg.UNet(num_classes=num_classes)
-model.train(
-    num_epochs=20,
-    train_dataset=train_dataset,
-    train_batch_size=4,
-    eval_dataset=eval_dataset,
-    learning_rate=0.01,
-    save_dir='output/unet',
-    use_vdl=True)

+ 1 - 1
paddlex/__init__.py

@@ -53,4 +53,4 @@ log_level = 2
 
 from . import interpret
 
-__version__ = '1.0.6'
+__version__ = '1.0.7'

+ 12 - 11
paddlex/command.py

@@ -1,11 +1,11 @@
 # Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
-# 
+#
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
-# 
+#
 #     http://www.apache.org/licenses/LICENSE-2.0
-# 
+#
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@@ -15,6 +15,7 @@
 from six import text_type as _text_type
 import argparse
 import sys
+import paddlex.utils.logging as logging
 
 
 def arg_parser():
@@ -94,15 +95,15 @@ def main():
     if args.export_onnx:
         assert args.model_dir is not None, "--model_dir should be defined while exporting onnx model"
         assert args.save_dir is not None, "--save_dir should be defined to create onnx model"
-        assert args.fixed_input_shape is not None, "--fixed_input_shape should be defined [w,h] to create onnx model, such as [224,224]"
 
-        fixed_input_shape = []
-        if args.fixed_input_shape is not None:
-            fixed_input_shape = eval(args.fixed_input_shape)
-            assert len(
-                fixed_input_shape
-            ) == 2, "len of fixed input shape must == 2, such as [224,224]"
-        model = pdx.load_model(args.model_dir, fixed_input_shape)
+        model = pdx.load_model(args.model_dir)
+        if model.status == "Normal" or model.status == "Prune":
+            logging.error(
+                "Only support inference model, try to export model first as below,",
+                exit=False)
+            logging.error(
+                "paddlex --export_inference --model_dir model_path --save_dir infer_model"
+            )
         pdx.convertor.export_onnx_model(model, args.save_dir)
 
 

+ 15 - 117
paddlex/convertor.py

@@ -1,11 +1,11 @@
 # Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
-# 
+#
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
-# 
+#
 #     http://www.apache.org/licenses/LICENSE-2.0
-# 
+#
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@@ -30,119 +30,17 @@ def export_onnx(model_dir, save_dir, fixed_input_shape):
 
 
 def export_onnx_model(model, save_dir):
-    support_list = [
-        'ResNet18', 'ResNet34', 'ResNet50', 'ResNet101', 'ResNet50_vd',
-        'ResNet101_vd', 'ResNet50_vd_ssld', 'ResNet101_vd_ssld', 'DarkNet53',
-        'MobileNetV1', 'MobileNetV2', 'DenseNet121', 'DenseNet161',
-        'DenseNet201'
-    ]
-    if model.__class__.__name__ not in support_list:
-        raise Exception("Model: {} unsupport export to ONNX".format(
-            model.__class__.__name__))
-    try:
-        from fluid.utils import op_io_info, init_name_prefix
-        from onnx import helper, checker
-        import fluid_onnx.ops as ops
-        from fluid_onnx.variables import paddle_variable_to_onnx_tensor, paddle_onnx_weight
-        from debug.model_check import debug_model, Tracker
-    except Exception as e:
+    if model.model_type == "detector" or model.__class__.__name__ == "FastSCNN":
         logging.error(
-            "Import Module Failed! Please install paddle2onnx. Related requirements see https://github.com/PaddlePaddle/paddle2onnx."
+            "Only image classifier models and semantic segmentation models(except FastSCNN) are supported to export to ONNX"
         )
-        raise e
-    place = fluid.CPUPlace()
-    exe = fluid.Executor(place)
-    inference_scope = fluid.global_scope()
-    with fluid.scope_guard(inference_scope):
-        test_input_names = [
-            var.name for var in list(model.test_inputs.values())
-        ]
-        inputs_outputs_list = ["fetch", "feed"]
-        weights, weights_value_info = [], []
-        global_block = model.test_prog.global_block()
-        for var_name in global_block.vars:
-            var = global_block.var(var_name)
-            if var_name not in test_input_names\
-                and var.persistable:
-                weight, val_info = paddle_onnx_weight(
-                    var=var, scope=inference_scope)
-                weights.append(weight)
-                weights_value_info.append(val_info)
-
-        # Create inputs
-        inputs = [
-            paddle_variable_to_onnx_tensor(v, global_block)
-            for v in test_input_names
-        ]
-        logging.INFO("load the model parameter done.")
-        onnx_nodes = []
-        op_check_list = []
-        op_trackers = []
-        nms_first_index = -1
-        nms_outputs = []
-        for block in model.test_prog.blocks:
-            for op in block.ops:
-                if op.type in ops.node_maker:
-                    # TODO: deal with the corner case that vars in
-                    #     different blocks have the same name
-                    node_proto = ops.node_maker[str(op.type)](
-                        operator=op, block=block)
-                    op_outputs = []
-                    last_node = None
-                    if isinstance(node_proto, tuple):
-                        onnx_nodes.extend(list(node_proto))
-                        last_node = list(node_proto)
-                    else:
-                        onnx_nodes.append(node_proto)
-                        last_node = [node_proto]
-                    tracker = Tracker(str(op.type), last_node)
-                    op_trackers.append(tracker)
-                    op_check_list.append(str(op.type))
-                    if op.type == "multiclass_nms" and nms_first_index < 0:
-                        nms_first_index = 0
-                    if nms_first_index >= 0:
-                        _, _, output_op = op_io_info(op)
-                        for output in output_op:
-                            nms_outputs.extend(output_op[output])
-                else:
-                    if op.type not in ['feed', 'fetch']:
-                        op_check_list.append(op.type)
-        logging.info('The operator sets to run test case.')
-        logging.info(set(op_check_list))
-
-        # Create outputs
-        # Get the new names for outputs if they've been renamed in nodes' making
-        renamed_outputs = op_io_info.get_all_renamed_outputs()
-        test_outputs = list(model.test_outputs.values())
-        test_outputs_names = [var.name for var in model.test_outputs.values()]
-        test_outputs_names = [
-            name if name not in renamed_outputs else renamed_outputs[name]
-            for name in test_outputs_names
-        ]
-        outputs = [
-            paddle_variable_to_onnx_tensor(v, global_block)
-            for v in test_outputs_names
-        ]
-
-        # Make graph
-        onnx_name = 'paddlex.onnx'
-        onnx_graph = helper.make_graph(
-            nodes=onnx_nodes,
-            name=onnx_name,
-            initializer=weights,
-            inputs=inputs + weights_value_info,
-            outputs=outputs)
-
-        # Make model
-        onnx_model = helper.make_model(
-            onnx_graph, producer_name='PaddlePaddle')
-
-        # Model check
-        checker.check_model(onnx_model)
-        if onnx_model is not None:
-            onnx_model_file = os.path.join(save_dir, onnx_name)
-            if not os.path.exists(save_dir):
-                os.mkdir(save_dir)
-            with open(onnx_model_file, 'wb') as f:
-                f.write(onnx_model.SerializeToString())
-            logging.info("Saved converted model to path: %s" % onnx_model_file)
+    try:
+        import x2paddle
+        if x2paddle.__version__ < '0.7.4':
+            logging.error("You need to upgrade x2paddle >= 0.7.4")
+    except:
+        logging.error(
+            "You need to install x2paddle first, pip install x2paddle>=0.7.4")
+    from x2paddle.op_mapper.paddle_op_mapper import PaddleOpMapper
+    mapper = PaddleOpMapper()
+    mapper.convert(model.test_prog, save_dir)

+ 2 - 5
paddlex/cv/datasets/coco.py

@@ -100,7 +100,7 @@ class CocoDetection(VOCDetection):
             gt_score = np.ones((num_bbox, 1), dtype=np.float32)
             is_crowd = np.zeros((num_bbox, 1), dtype=np.int32)
             difficult = np.zeros((num_bbox, 1), dtype=np.int32)
-            gt_poly = None
+            gt_poly = [None] * num_bbox
 
             for i, box in enumerate(bboxes):
                 catid = box['category_id']
@@ -108,8 +108,6 @@ class CocoDetection(VOCDetection):
                 gt_bbox[i, :] = box['clean_bbox']
                 is_crowd[i][0] = box['iscrowd']
                 if 'segmentation' in box:
-                    if gt_poly is None:
-                        gt_poly = [None] * num_bbox
                     gt_poly[i] = box['segmentation']
 
             im_info = {
@@ -121,10 +119,9 @@ class CocoDetection(VOCDetection):
                 'gt_class': gt_class,
                 'gt_bbox': gt_bbox,
                 'gt_score': gt_score,
+                'gt_poly': gt_poly,
                 'difficult': difficult
             }
-            if gt_poly is not None:
-                label_info['gt_poly'] = gt_poly
 
             coco_rec = (im_info, label_info)
             self.file_list.append([im_fname, coco_rec])

+ 5 - 6
paddlex/cv/datasets/easydata_cls.py

@@ -39,14 +39,14 @@ class EasyDataCls(ImageNet):
             线程和'process'进程两种方式。默认为'process'(Windows和Mac下会强制使用thread,该参数无效)。
         shuffle (bool): 是否需要对数据集中样本打乱顺序。默认为False。
     """
-    
+
     def __init__(self,
                  data_dir,
                  file_list,
                  label_list,
                  transforms=None,
                  num_workers='auto',
-                 buffer_size=100,
+                 buffer_size=8,
                  parallel_method='process',
                  shuffle=False):
         super(ImageNet, self).__init__(
@@ -58,7 +58,7 @@ class EasyDataCls(ImageNet):
         self.file_list = list()
         self.labels = list()
         self._epoch = 0
-        
+
         with open(label_list, encoding=get_encoding(label_list)) as f:
             for line in f:
                 item = line.strip()
@@ -73,8 +73,8 @@ class EasyDataCls(ImageNet):
                 if not osp.isfile(json_file):
                     continue
                 if not osp.exists(img_file):
-                    raise IOError(
-                        'The image file {} is not exist!'.format(img_file))
+                    raise IOError('The image file {} is not exist!'.format(
+                        img_file))
                 with open(json_file, mode='r', \
                           encoding=get_encoding(json_file)) as j:
                     json_info = json.load(j)
@@ -83,4 +83,3 @@ class EasyDataCls(ImageNet):
         self.num_samples = len(self.file_list)
         logging.info("{} samples in file {}".format(
             len(self.file_list), file_list))
-    

+ 3 - 3
paddlex/cv/datasets/imagenet.py

@@ -45,7 +45,7 @@ class ImageNet(Dataset):
                  label_list,
                  transforms=None,
                  num_workers='auto',
-                 buffer_size=100,
+                 buffer_size=8,
                  parallel_method='process',
                  shuffle=False):
         super(ImageNet, self).__init__(
@@ -70,8 +70,8 @@ class ImageNet(Dataset):
                     continue
                 full_path = osp.join(data_dir, items[0])
                 if not osp.exists(full_path):
-                    raise IOError(
-                        'The image file {} is not exist!'.format(full_path))
+                    raise IOError('The image file {} is not exist!'.format(
+                        full_path))
                 self.file_list.append([full_path, int(items[1])])
         self.num_samples = len(self.file_list)
         logging.info("{} samples in file {}".format(

+ 10 - 9
paddlex/cv/datasets/seg_dataset.py

@@ -1,4 +1,4 @@
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
@@ -28,7 +28,7 @@ class SegDataset(Dataset):
     Args:
         data_dir (str): 数据集所在的目录路径。
         file_list (str): 描述数据集图片文件和对应标注文件的文件路径(文本内每行路径为相对data_dir的相对路)。
-        label_list (str): 描述数据集包含的类别信息文件路径。
+        label_list (str): 描述数据集包含的类别信息文件路径。默认值为None。
         transforms (list): 数据集中每个样本的预处理/增强算子。
         num_workers (int): 数据集中样本在预处理过程中的线程或进程数。默认为4。
         buffer_size (int): 数据集中样本在预处理过程中队列的缓存长度,以样本数为单位。默认为100。
@@ -40,7 +40,7 @@ class SegDataset(Dataset):
     def __init__(self,
                  data_dir,
                  file_list,
-                 label_list,
+                 label_list=None,
                  transforms=None,
                  num_workers='auto',
                  buffer_size=100,
@@ -56,10 +56,11 @@ class SegDataset(Dataset):
         self.labels = list()
         self._epoch = 0
 
-        with open(label_list, encoding=get_encoding(label_list)) as f:
-            for line in f:
-                item = line.strip()
-                self.labels.append(item)
+        if label_list is not None:
+            with open(label_list, encoding=get_encoding(label_list)) as f:
+                for line in f:
+                    item = line.strip()
+                    self.labels.append(item)
 
         with open(file_list, encoding=get_encoding(file_list)) as f:
             for line in f:
@@ -69,8 +70,8 @@ class SegDataset(Dataset):
                 full_path_im = osp.join(data_dir, items[0])
                 full_path_label = osp.join(data_dir, items[1])
                 if not osp.exists(full_path_im):
-                    raise IOError(
-                        'The image file {} is not exist!'.format(full_path_im))
+                    raise IOError('The image file {} is not exist!'.format(
+                        full_path_im))
                 if not osp.exists(full_path_label):
                     raise IOError('The image file {} is not exist!'.format(
                         full_path_label))

+ 49 - 10
paddlex/cv/datasets/voc.py

@@ -17,6 +17,7 @@ import copy
 import os
 import os.path as osp
 import random
+import re
 import numpy as np
 from collections import OrderedDict
 import xml.etree.ElementTree as ET
@@ -104,23 +105,60 @@ class VOCDetection(Dataset):
                 else:
                     ct = int(tree.find('id').text)
                     im_id = np.array([int(tree.find('id').text)])
-
-                objs = tree.findall('object')
-                im_w = float(tree.find('size').find('width').text)
-                im_h = float(tree.find('size').find('height').text)
+                pattern = re.compile('<object>', re.IGNORECASE)
+                obj_tag = pattern.findall(
+                    str(ET.tostringlist(tree.getroot())))[0][1:-1]
+                objs = tree.findall(obj_tag)
+                pattern = re.compile('<size>', re.IGNORECASE)
+                size_tag = pattern.findall(
+                    str(ET.tostringlist(tree.getroot())))[0][1:-1]
+                size_element = tree.find(size_tag)
+                pattern = re.compile('<width>', re.IGNORECASE)
+                width_tag = pattern.findall(
+                    str(ET.tostringlist(size_element)))[0][1:-1]
+                im_w = float(size_element.find(width_tag).text)
+                pattern = re.compile('<height>', re.IGNORECASE)
+                height_tag = pattern.findall(
+                    str(ET.tostringlist(size_element)))[0][1:-1]
+                im_h = float(size_element.find(height_tag).text)
                 gt_bbox = np.zeros((len(objs), 4), dtype=np.float32)
                 gt_class = np.zeros((len(objs), 1), dtype=np.int32)
                 gt_score = np.ones((len(objs), 1), dtype=np.float32)
                 is_crowd = np.zeros((len(objs), 1), dtype=np.int32)
                 difficult = np.zeros((len(objs), 1), dtype=np.int32)
                 for i, obj in enumerate(objs):
-                    cname = obj.find('name').text.strip()
+                    pattern = re.compile('<name>', re.IGNORECASE)
+                    name_tag = pattern.findall(str(ET.tostringlist(obj)))[0][
+                        1:-1]
+                    cname = obj.find(name_tag).text.strip()
                     gt_class[i][0] = cname2cid[cname]
-                    _difficult = int(obj.find('difficult').text)
-                    x1 = float(obj.find('bndbox').find('xmin').text)
-                    y1 = float(obj.find('bndbox').find('ymin').text)
-                    x2 = float(obj.find('bndbox').find('xmax').text)
-                    y2 = float(obj.find('bndbox').find('ymax').text)
+                    pattern = re.compile('<difficult>', re.IGNORECASE)
+                    diff_tag = pattern.findall(str(ET.tostringlist(obj)))[0][
+                        1:-1]
+                    try:
+                        _difficult = int(obj.find(diff_tag).text)
+                    except Exception:
+                        _difficult = 0
+                    pattern = re.compile('<bndbox>', re.IGNORECASE)
+                    box_tag = pattern.findall(str(ET.tostringlist(obj)))[0][1:
+                                                                            -1]
+                    box_element = obj.find(box_tag)
+                    pattern = re.compile('<xmin>', re.IGNORECASE)
+                    xmin_tag = pattern.findall(
+                        str(ET.tostringlist(box_element)))[0][1:-1]
+                    x1 = float(box_element.find(xmin_tag).text)
+                    pattern = re.compile('<ymin>', re.IGNORECASE)
+                    ymin_tag = pattern.findall(
+                        str(ET.tostringlist(box_element)))[0][1:-1]
+                    y1 = float(box_element.find(ymin_tag).text)
+                    pattern = re.compile('<xmax>', re.IGNORECASE)
+                    xmax_tag = pattern.findall(
+                        str(ET.tostringlist(box_element)))[0][1:-1]
+                    x2 = float(box_element.find(xmax_tag).text)
+                    pattern = re.compile('<ymax>', re.IGNORECASE)
+                    ymax_tag = pattern.findall(
+                        str(ET.tostringlist(box_element)))[0][1:-1]
+                    y2 = float(box_element.find(ymax_tag).text)
                     x1 = max(0, x1)
                     y1 = max(0, y1)
                     if im_w > 0.5 and im_h > 0.5:
@@ -149,6 +187,7 @@ class VOCDetection(Dataset):
                     'gt_class': gt_class,
                     'gt_bbox': gt_bbox,
                     'gt_score': gt_score,
+                    'gt_poly': [],
                     'difficult': difficult
                 }
                 voc_rec = (im_info, label_info)

+ 8 - 7
paddlex/cv/models/hrnet.py

@@ -1,11 +1,11 @@
 # copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
-# 
+#
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
-# 
+#
 #     http://www.apache.org/licenses/LICENSE-2.0
-# 
+#
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@@ -24,11 +24,12 @@ class HRNet(DeepLabv3p):
 
     Args:
         num_classes (int): 类别数。
-        width (int): 高分辨率分支中特征层的通道数量。默认值为18。可选择取值为[18, 30, 32, 40, 44, 48, 60, 64]。
+        width (int|str): 高分辨率分支中特征层的通道数量。默认值为18。可选择取值为[18, 30, 32, 40, 44, 48, 60, 64, '18_small_v1']。
+            '18_small_v1'是18的轻量级版本。
         use_bce_loss (bool): 是否使用bce loss作为网络的损失函数,只能用于两类分割。可与dice loss同时使用。默认False。
         use_dice_loss (bool): 是否使用dice loss作为网络的损失函数,只能用于两类分割,可与bce loss同时使用。
             当use_bce_loss和use_dice_loss都为False时,使用交叉熵损失函数。默认False。
-        class_weight (list/str): 交叉熵损失函数各类损失的权重。当class_weight为list的时候,长度应为
+        class_weight (list|str): 交叉熵损失函数各类损失的权重。当class_weight为list的时候,长度应为
             num_classes。当class_weight为str时, weight.lower()应为'dynamic',这时会根据每一轮各类像素的比重
             自行计算相应的权重,每一类的权重为:每类的比例 * num_classes。class_weight取默认值None是,各类的权重1,
             即平时使用的交叉熵损失函数。
@@ -168,6 +169,6 @@ class HRNet(DeepLabv3p):
         return super(HRNet, self).train(
             num_epochs, train_dataset, train_batch_size, eval_dataset,
             save_interval_epochs, log_interval_steps, save_dir,
-            pretrain_weights, optimizer, learning_rate, lr_decay_power, use_vdl,
-            sensitivities_file, eval_metric_loss, early_stop,
+            pretrain_weights, optimizer, learning_rate, lr_decay_power,
+            use_vdl, sensitivities_file, eval_metric_loss, early_stop,
             early_stop_patience, resume_checkpoint)

+ 1 - 0
paddlex/cv/models/load_model.py

@@ -108,6 +108,7 @@ def load_model(model_dir, fixed_input_shape=None):
 
     logging.info("Model[{}] loaded.".format(info['Model']))
     model.trainable = False
+    model.status = status
     return model
 
 

+ 2 - 0
paddlex/cv/models/slim/prune.py

@@ -158,6 +158,7 @@ def prune_program(model, prune_params_ratios=None):
         prune_params_ratios (dict): 由裁剪参数名和裁剪率组成的字典,当为None时
             使用默认裁剪参数名和裁剪率。默认为None。
     """
+    assert model.status == 'Normal', 'Only the models saved while training are supported!'
     place = model.places[0]
     train_prog = model.train_prog
     eval_prog = model.test_prog
@@ -235,6 +236,7 @@ def cal_params_sensitivities(model, save_file, eval_dataset, batch_size=8):
 
             其中``weight_0``是卷积Kernel名;``sensitivities['weight_0']``是一个字典,key是裁剪率,value是敏感度。
     """
+    assert model.status == 'Normal', 'Only the models saved while training are supported!'
     if os.path.exists(save_file):
         os.remove(save_file)
 

+ 32 - 1
paddlex/cv/models/slim/prune_config.py

@@ -19,6 +19,8 @@ import paddle.fluid as fluid
 import paddlex
 
 sensitivities_data = {
+    'AlexNet':
+    'https://bj.bcebos.com/paddlex/slim_prune/alexnet_sensitivities.data',
     'ResNet18':
     'https://bj.bcebos.com/paddlex/slim_prune/resnet18.sensitivities',
     'ResNet34':
@@ -41,6 +43,10 @@ sensitivities_data = {
     'https://bj.bcebos.com/paddlex/slim_prune/mobilenetv3_large.sensitivities',
     'MobileNetV3_small':
     'https://bj.bcebos.com/paddlex/slim_prune/mobilenetv3_small.sensitivities',
+    'MobileNetV3_large_ssld':
+    'https://bj.bcebos.com/paddlex/slim_prune/mobilenetv3_large_ssld_sensitivities.data',
+    'MobileNetV3_small_ssld':
+    'https://bj.bcebos.com/paddlex/slim_prune/mobilenetv3_small_ssld_sensitivities.data',
     'DenseNet121':
     'https://bj.bcebos.com/paddlex/slim_prune/densenet121.sensitivities',
     'DenseNet161':
@@ -51,6 +57,8 @@ sensitivities_data = {
     'https://bj.bcebos.com/paddlex/slim_prune/xception41.sensitivities',
     'Xception65':
     'https://bj.bcebos.com/paddlex/slim_prune/xception65.sensitivities',
+    'ShuffleNetV2':
+    'https://bj.bcebos.com/paddlex/slim_prune/shufflenetv2_sensitivities.data',
     'YOLOv3_MobileNetV1':
     'https://bj.bcebos.com/paddlex/slim_prune/yolov3_mobilenetv1.sensitivities',
     'YOLOv3_MobileNetV3_large':
@@ -143,7 +151,8 @@ def get_prune_params(model):
     if model_type.startswith('ResNet') or \
             model_type.startswith('DenseNet') or \
             model_type.startswith('DarkNet') or \
-            model_type.startswith('AlexNet'):
+            model_type.startswith('AlexNet') or \
+            model_type.startswith('ShuffleNetV2'):
         for block in program.blocks:
             for param in block.all_parameters():
                 pd_var = fluid.global_scope().find_var(param.name)
@@ -152,6 +161,28 @@ def get_prune_params(model):
                     prune_names.append(param.name)
         if model_type == 'AlexNet':
             prune_names.remove('conv5_weights')
+        if model_type == 'ShuffleNetV2':
+            not_prune_names = ['stage_2_1_conv5_weights',
+                        'stage_2_1_conv3_weights',
+                        'stage_2_2_conv3_weights',
+                        'stage_2_3_conv3_weights',
+                        'stage_2_4_conv3_weights',
+                        'stage_3_1_conv5_weights',
+                        'stage_3_1_conv3_weights',
+                        'stage_3_2_conv3_weights',
+                        'stage_3_3_conv3_weights',
+                        'stage_3_4_conv3_weights',
+                        'stage_3_5_conv3_weights',
+                        'stage_3_6_conv3_weights',
+                        'stage_3_7_conv3_weights',
+                        'stage_3_8_conv3_weights',
+                        'stage_4_1_conv5_weights',
+                        'stage_4_1_conv3_weights',
+                        'stage_4_2_conv3_weights',
+                        'stage_4_3_conv3_weights',
+                        'stage_4_4_conv3_weights',]
+            for name in not_prune_names:
+                 prune_names.remove(name)
     elif model_type == "MobileNetV1":
         prune_names.append("conv1_weights")
         for param in program.global_block().all_parameters():

+ 1 - 1
paddlex/cv/models/utils/pretrain_weights.py

@@ -81,7 +81,7 @@ coco_pretrain = {
     'YOLOv3_MobileNetV1_COCO':
     'https://paddlemodels.bj.bcebos.com/object_detection/yolov3_mobilenet_v1.tar',
     'YOLOv3_MobileNetV3_large_COCO':
-    'https://paddlemodels.bj.bcebos.com/object_detection/yolov3_mobilenet_v3.pdparams',
+    'https://bj.bcebos.com/paddlex/models/yolov3_mobilenet_v3.tar',
     'YOLOv3_ResNet34_COCO':
     'https://paddlemodels.bj.bcebos.com/object_detection/yolov3_r34.tar',
     'YOLOv3_ResNet50_vd_COCO':

+ 55 - 22
paddlex/cv/nets/hrnet.py

@@ -51,15 +51,38 @@ class HRNet(object):
 
         self.width = width
         self.has_se = has_se
+        self.num_modules = {
+            '18_small_v1': [1, 1, 1, 1],
+            '18': [1, 1, 4, 3],
+            '30': [1, 1, 4, 3],
+            '32': [1, 1, 4, 3],
+            '40': [1, 1, 4, 3],
+            '44': [1, 1, 4, 3],
+            '48': [1, 1, 4, 3],
+            '60': [1, 1, 4, 3],
+            '64': [1, 1, 4, 3]
+        }
+        self.num_blocks = {
+            '18_small_v1': [[1], [2, 2], [2, 2, 2], [2, 2, 2, 2]],
+            '18': [[4], [4, 4], [4, 4, 4], [4, 4, 4, 4]],
+            '30': [[4], [4, 4], [4, 4, 4], [4, 4, 4, 4]],
+            '32': [[4], [4, 4], [4, 4, 4], [4, 4, 4, 4]],
+            '40': [[4], [4, 4], [4, 4, 4], [4, 4, 4, 4]],
+            '44': [[4], [4, 4], [4, 4, 4], [4, 4, 4, 4]],
+            '48': [[4], [4, 4], [4, 4, 4], [4, 4, 4, 4]],
+            '60': [[4], [4, 4], [4, 4, 4], [4, 4, 4, 4]],
+            '64': [[4], [4, 4], [4, 4, 4], [4, 4, 4, 4]]
+        }
         self.channels = {
-            18: [[18, 36], [18, 36, 72], [18, 36, 72, 144]],
-            30: [[30, 60], [30, 60, 120], [30, 60, 120, 240]],
-            32: [[32, 64], [32, 64, 128], [32, 64, 128, 256]],
-            40: [[40, 80], [40, 80, 160], [40, 80, 160, 320]],
-            44: [[44, 88], [44, 88, 176], [44, 88, 176, 352]],
-            48: [[48, 96], [48, 96, 192], [48, 96, 192, 384]],
-            60: [[60, 120], [60, 120, 240], [60, 120, 240, 480]],
-            64: [[64, 128], [64, 128, 256], [64, 128, 256, 512]],
+            '18_small_v1': [[32], [16, 32], [16, 32, 64], [16, 32, 64, 128]],
+            '18': [[64], [18, 36], [18, 36, 72], [18, 36, 72, 144]],
+            '30': [[64], [30, 60], [30, 60, 120], [30, 60, 120, 240]],
+            '32': [[64], [32, 64], [32, 64, 128], [32, 64, 128, 256]],
+            '40': [[64], [40, 80], [40, 80, 160], [40, 80, 160, 320]],
+            '44': [[64], [44, 88], [44, 88, 176], [44, 88, 176, 352]],
+            '48': [[64], [48, 96], [48, 96, 192], [48, 96, 192, 384]],
+            '60': [[64], [60, 120], [60, 120, 240], [60, 120, 240, 480]],
+            '64': [[64], [64, 128], [64, 128, 256], [64, 128, 256, 512]],
         }
 
         self.freeze_at = freeze_at
@@ -73,31 +96,38 @@ class HRNet(object):
 
     def net(self, input):
         width = self.width
-        channels_2, channels_3, channels_4 = self.channels[width]
-        num_modules_2, num_modules_3, num_modules_4 = 1, 4, 3
+        channels_1, channels_2, channels_3, channels_4 = self.channels[str(
+            width)]
+        num_modules_1, num_modules_2, num_modules_3, num_modules_4 = self.num_modules[
+            str(width)]
+        num_blocks_1, num_blocks_2, num_blocks_3, num_blocks_4 = self.num_blocks[
+            str(width)]
 
         x = self.conv_bn_layer(
             input=input,
             filter_size=3,
-            num_filters=64,
+            num_filters=channels_1[0],
             stride=2,
             if_act=True,
             name='layer1_1')
         x = self.conv_bn_layer(
             input=x,
             filter_size=3,
-            num_filters=64,
+            num_filters=channels_1[0],
             stride=2,
             if_act=True,
             name='layer1_2')
 
-        la1 = self.layer1(x, name='layer2')
+        la1 = self.layer1(x, num_blocks_1, channels_1, name='layer2')
         tr1 = self.transition_layer([la1], [256], channels_2, name='tr1')
-        st2 = self.stage(tr1, num_modules_2, channels_2, name='st2')
+        st2 = self.stage(
+            tr1, num_modules_2, num_blocks_2, channels_2, name='st2')
         tr2 = self.transition_layer(st2, channels_2, channels_3, name='tr2')
-        st3 = self.stage(tr2, num_modules_3, channels_3, name='st3')
+        st3 = self.stage(
+            tr2, num_modules_3, num_blocks_3, channels_3, name='st3')
         tr3 = self.transition_layer(st3, channels_3, channels_4, name='tr3')
-        st4 = self.stage(tr3, num_modules_4, channels_4, name='st4')
+        st4 = self.stage(
+            tr3, num_modules_4, num_blocks_4, channels_4, name='st4')
 
         # classification
         if self.num_classes:
@@ -139,12 +169,12 @@ class HRNet(object):
         self.end_points = st4
         return st4[-1]
 
-    def layer1(self, input, name=None):
+    def layer1(self, input, num_blocks, channels, name=None):
         conv = input
-        for i in range(4):
+        for i in range(num_blocks[0]):
             conv = self.bottleneck_block(
                 conv,
-                num_filters=64,
+                num_filters=channels[0],
                 downsample=True if i == 0 else False,
                 name=name + '_' + str(i + 1))
         return conv
@@ -178,7 +208,7 @@ class HRNet(object):
         out = []
         for i in range(len(channels)):
             residual = x[i]
-            for j in range(block_num):
+            for j in range(block_num[i]):
                 residual = self.basic_block(
                     residual,
                     channels[i],
@@ -240,10 +270,11 @@ class HRNet(object):
 
     def high_resolution_module(self,
                                x,
+                               num_blocks,
                                channels,
                                multi_scale_output=True,
                                name=None):
-        residual = self.branches(x, 4, channels, name=name)
+        residual = self.branches(x, num_blocks, channels, name=name)
         out = self.fuse_layers(
             residual,
             channels,
@@ -254,6 +285,7 @@ class HRNet(object):
     def stage(self,
               x,
               num_modules,
+              num_blocks,
               channels,
               multi_scale_output=True,
               name=None):
@@ -262,12 +294,13 @@ class HRNet(object):
             if i == num_modules - 1 and multi_scale_output == False:
                 out = self.high_resolution_module(
                     out,
+                    num_blocks,
                     channels,
                     multi_scale_output=False,
                     name=name + '_' + str(i + 1))
             else:
                 out = self.high_resolution_module(
-                    out, channels, name=name + '_' + str(i + 1))
+                    out, num_blocks, channels, name=name + '_' + str(i + 1))
 
         return out
 

+ 2 - 1
paddlex/cv/nets/segmentation/hrnet.py

@@ -82,7 +82,8 @@ class HRNet(object):
         st4[3] = fluid.layers.resize_bilinear(st4[3], out_shape=shape)
 
         out = fluid.layers.concat(st4, axis=1)
-        last_channels = sum(self.backbone.channels[self.backbone.width][-1])
+        last_channels = sum(self.backbone.channels[str(self.backbone.width)][
+            -1])
 
         out = self._conv_bn_layer(
             input=out,

+ 15 - 12
paddlex/cv/transforms/cls_transforms.py

@@ -70,8 +70,8 @@ class Compose(ClsTransform):
         if isinstance(im, np.ndarray):
             if len(im.shape) != 3:
                 raise Exception(
-                    "im should be 3-dimension, but now is {}-dimensions".
-                    format(len(im.shape)))
+                    "im should be 3-dimension, but now is {}-dimensions".format(
+                        len(im.shape)))
         else:
             try:
                 im = cv2.imread(im).astype('float32')
@@ -100,7 +100,9 @@ class Compose(ClsTransform):
         transform_names = [type(x).__name__ for x in self.transforms]
         for aug in augmenters:
             if type(aug).__name__ in transform_names:
-                logging.error("{} is already in ComposedTransforms, need to remove it from add_augmenters().".format(type(aug).__name__))
+                logging.error(
+                    "{} is already in ComposedTransforms, need to remove it from add_augmenters().".
+                    format(type(aug).__name__))
         self.transforms = augmenters + self.transforms
 
 
@@ -139,8 +141,8 @@ class RandomCrop(ClsTransform):
             tuple: 当label为空时,返回的tuple为(im, ),对应图像np.ndarray数据;
                    当label不为空时,返回的tuple为(im, label),分别对应图像np.ndarray数据、图像类别id。
         """
-        im = random_crop(im, self.crop_size, self.lower_scale,
-                         self.lower_ratio, self.upper_ratio)
+        im = random_crop(im, self.crop_size, self.lower_scale, self.lower_ratio,
+                         self.upper_ratio)
         if label is None:
             return (im, )
         else:
@@ -270,14 +272,12 @@ class ResizeByShort(ClsTransform):
         im_short_size = min(im.shape[0], im.shape[1])
         im_long_size = max(im.shape[0], im.shape[1])
         scale = float(self.short_size) / im_short_size
-        if self.max_size > 0 and np.round(scale *
-                                          im_long_size) > self.max_size:
+        if self.max_size > 0 and np.round(scale * im_long_size) > self.max_size:
             scale = float(self.max_size) / float(im_long_size)
         resized_width = int(round(im.shape[1] * scale))
         resized_height = int(round(im.shape[0] * scale))
         im = cv2.resize(
-            im, (resized_width, resized_height),
-            interpolation=cv2.INTER_LINEAR)
+            im, (resized_width, resized_height), interpolation=cv2.INTER_LINEAR)
 
         if label is None:
             return (im, )
@@ -490,13 +490,15 @@ class ComposedClsTransforms(Compose):
             crop_size(int|list): 输入模型里的图像大小
             mean(list): 图像均值
             std(list): 图像方差
+            random_horizontal_flip(bool): 是否以0.5的概率使用随机水平翻转增强,该仅在mode为`train`时生效,默认为True
     """
 
     def __init__(self,
                  mode,
                  crop_size=[224, 224],
                  mean=[0.485, 0.456, 0.406],
-                 std=[0.229, 0.224, 0.225]):
+                 std=[0.229, 0.224, 0.225],
+                 random_horizontal_flip=True):
         width = crop_size
         if isinstance(crop_size, list):
             if crop_size[0] != crop_size[1]:
@@ -512,10 +514,11 @@ class ComposedClsTransforms(Compose):
         if mode == 'train':
             # 训练时的transforms,包含数据增强
             transforms = [
-                RandomCrop(crop_size=width), RandomHorizontalFlip(prob=0.5),
-                Normalize(
+                RandomCrop(crop_size=width), Normalize(
                     mean=mean, std=std)
             ]
+            if random_horizontal_flip:
+                transforms.insert(0, RandomHorizontalFlip())
         else:
             # 验证/预测时的transforms
             transforms = [

+ 36 - 21
paddlex/cv/transforms/det_transforms.py

@@ -160,7 +160,9 @@ class Compose(DetTransform):
         transform_names = [type(x).__name__ for x in self.transforms]
         for aug in augmenters:
             if type(aug).__name__ in transform_names:
-                logging.error("{} is already in ComposedTransforms, need to remove it from add_augmenters().".format(type(aug).__name__))
+                logging.error(
+                    "{} is already in ComposedTransforms, need to remove it from add_augmenters().".
+                    format(type(aug).__name__))
         self.transforms = augmenters + self.transforms
 
 
@@ -220,15 +222,13 @@ class ResizeByShort(DetTransform):
         im_short_size = min(im.shape[0], im.shape[1])
         im_long_size = max(im.shape[0], im.shape[1])
         scale = float(self.short_size) / im_short_size
-        if self.max_size > 0 and np.round(scale *
-                                          im_long_size) > self.max_size:
+        if self.max_size > 0 and np.round(scale * im_long_size) > self.max_size:
             scale = float(self.max_size) / float(im_long_size)
         resized_width = int(round(im.shape[1] * scale))
         resized_height = int(round(im.shape[0] * scale))
         im_resize_info = [resized_height, resized_width, scale]
         im = cv2.resize(
-            im, (resized_width, resized_height),
-            interpolation=cv2.INTER_LINEAR)
+            im, (resized_width, resized_height), interpolation=cv2.INTER_LINEAR)
         im_info['im_resize_info'] = np.array(im_resize_info).astype(np.float32)
         if label_info is None:
             return (im, im_info)
@@ -268,8 +268,7 @@ class Padding(DetTransform):
                 if not isinstance(target_size, tuple) and not isinstance(
                         target_size, list):
                     raise TypeError(
-                        "Padding: Type of target_size must in (int|list|tuple)."
-                    )
+                        "Padding: Type of target_size must in (int|list|tuple).")
                 elif len(target_size) != 2:
                     raise ValueError(
                         "Padding: Length of target_size must equal 2.")
@@ -454,8 +453,7 @@ class RandomHorizontalFlip(DetTransform):
             ValueError: 数据长度不匹配。
         """
         if not isinstance(im, np.ndarray):
-            raise TypeError(
-                "RandomHorizontalFlip: image is not a numpy array.")
+            raise TypeError("RandomHorizontalFlip: image is not a numpy array.")
         if len(im.shape) != 3:
             raise ValueError(
                 "RandomHorizontalFlip: image is not 3-dimensional.")
@@ -736,7 +734,7 @@ class MixupImage(DetTransform):
             gt_poly2 = im_info['mixup'][2]['gt_poly']
         is_crowd1 = label_info['is_crowd']
         is_crowd2 = im_info['mixup'][2]['is_crowd']
-        
+
         if 0 not in gt_class1 and 0 not in gt_class2:
             gt_bbox = np.concatenate((gt_bbox1, gt_bbox2), axis=0)
             gt_class = np.concatenate((gt_class1, gt_class2), axis=0)
@@ -785,9 +783,7 @@ class RandomExpand(DetTransform):
         fill_value (list): 扩张图像的初始填充值(0-255)。默认为[123.675, 116.28, 103.53]。
     """
 
-    def __init__(self,
-                 ratio=4.,
-                 prob=0.5,
+    def __init__(self, ratio=4., prob=0.5,
                  fill_value=[123.675, 116.28, 103.53]):
         super(RandomExpand, self).__init__()
         assert ratio > 1.01, "expand ratio must be larger than 1.01"
@@ -1281,21 +1277,25 @@ class ComposedRCNNTransforms(Compose):
             min_max_size(list): 图像在缩放时,最小边和最大边的约束条件
             mean(list): 图像均值
             std(list): 图像方差
+            random_horizontal_flip(bool): 是否以0.5的概率使用随机水平翻转增强,该仅在mode为`train`时生效,默认为True
     """
 
     def __init__(self,
                  mode,
                  min_max_size=[800, 1333],
                  mean=[0.485, 0.456, 0.406],
-                 std=[0.229, 0.224, 0.225]):
+                 std=[0.229, 0.224, 0.225],
+                 random_horizontal_flip=True):
         if mode == 'train':
             # 训练时的transforms,包含数据增强
             transforms = [
-                RandomHorizontalFlip(prob=0.5), Normalize(
+                Normalize(
                     mean=mean, std=std), ResizeByShort(
                         short_size=min_max_size[0], max_size=min_max_size[1]),
                 Padding(coarsest_stride=32)
             ]
+            if random_horizontal_flip:
+                transforms.insert(0, RandomHorizontalFlip())
         else:
             # 验证/预测时的transforms
             transforms = [
@@ -1325,9 +1325,14 @@ class ComposedYOLOv3Transforms(Compose):
         Args:
             mode(str): 图像处理流程所处阶段,训练/验证/预测,分别对应'train', 'eval', 'test'
             shape(list): 输入模型中图像的大小,输入模型的图像会被Resize成此大小
-            mixup_epoch(int): 模型训练过程中,前mixup_epoch会使用mixup策略
+            mixup_epoch(int): 模型训练过程中,前mixup_epoch会使用mixup策略, 若设为-1,则表示不使用该策略
             mean(list): 图像均值
             std(list): 图像方差
+            random_distort(bool): 数据增强方式,参数仅在mode为`train`时生效,表示是否在训练过程中随机扰动图像,默认为True
+            random_expand(bool): 数据增强方式,参数仅在mode为`train`时生效,表示是否在训练过程中随机扩张图像,默认为True
+            random_crop(bool): 数据增强方式,参数仅在mode为`train`时生效,表示是否在训练过程中随机裁剪图像,默认为True
+            random_horizontal_flip(bool): 数据增强方式,参数仅在mode为`train`时生效,表示是否在训练过程中随机水平翻转图像,默认为True
+
     """
 
     def __init__(self,
@@ -1335,7 +1340,11 @@ class ComposedYOLOv3Transforms(Compose):
                  shape=[608, 608],
                  mixup_epoch=250,
                  mean=[0.485, 0.456, 0.406],
-                 std=[0.229, 0.224, 0.225]):
+                 std=[0.229, 0.224, 0.225],
+                 random_distort=True,
+                 random_expand=True,
+                 random_crop=True,
+                 random_horizontal_flip=True):
         width = shape
         if isinstance(shape, list):
             if shape[0] != shape[1]:
@@ -1350,12 +1359,18 @@ class ComposedYOLOv3Transforms(Compose):
         if mode == 'train':
             # 训练时的transforms,包含数据增强
             transforms = [
-                MixupImage(mixup_epoch=mixup_epoch), RandomDistort(),
-                RandomExpand(), RandomCrop(), Resize(
-                    target_size=width,
-                    interp='RANDOM'), RandomHorizontalFlip(), Normalize(
+                MixupImage(mixup_epoch=mixup_epoch), Resize(
+                    target_size=width, interp='RANDOM'), Normalize(
                         mean=mean, std=std)
             ]
+            if random_horizontal_flip:
+                transforms.insert(1, RandomHorizontalFlip())
+            if random_crop:
+                transforms.insert(1, RandomCrop())
+            if random_expand:
+                transforms.insert(1, RandomExpand())
+            if random_distort:
+                transforms.insert(1, RandomDistort())
         else:
             # 验证/预测时的transforms
             transforms = [

+ 20 - 9
paddlex/cv/transforms/seg_transforms.py

@@ -116,7 +116,9 @@ class Compose(SegTransform):
         transform_names = [type(x).__name__ for x in self.transforms]
         for aug in augmenters:
             if type(aug).__name__ in transform_names:
-                logging.error("{} is already in ComposedTransforms, need to remove it from add_augmenters().".format(type(aug).__name__))
+                logging.error(
+                    "{} is already in ComposedTransforms, need to remove it from add_augmenters().".
+                    format(type(aug).__name__))
         self.transforms = augmenters + self.transforms
 
 
@@ -401,8 +403,7 @@ class ResizeByShort(SegTransform):
         im_short_size = min(im.shape[0], im.shape[1])
         im_long_size = max(im.shape[0], im.shape[1])
         scale = float(self.short_size) / im_short_size
-        if self.max_size > 0 and np.round(scale *
-                                          im_long_size) > self.max_size:
+        if self.max_size > 0 and np.round(scale * im_long_size) > self.max_size:
             scale = float(self.max_size) / float(im_long_size)
         resized_width = int(round(im.shape[1] * scale))
         resized_height = int(round(im.shape[0] * scale))
@@ -1113,25 +1114,35 @@ class ComposedSegTransforms(Compose):
 
         Args:
             mode(str): 图像处理所处阶段,训练/验证/预测,分别对应'train', 'eval', 'test'
-            train_crop_size(list): 模型训练阶段,随机从原图crop的大小
+            min_max_size(list): 训练过程中,图像的最长边会随机resize至此区间(短边按比例相应resize);预测阶段,图像最长边会resize至此区间中间值,即(min_size+max_size)/2。默认为[400, 600]
+            train_crop_size(list): 仅在mode为'train`时生效,训练过程中,随机从图像中裁剪出对应大小的子图(如若原图小于此大小,则会padding到此大小),默认为[400, 600]
             mean(list): 图像均值
             std(list): 图像方差
+            random_horizontal_flip(bool): 数据增强方式,仅在mode为`train`时生效,表示训练过程是否随机水平翻转图像,默认为True
     """
 
     def __init__(self,
                  mode,
-                 train_crop_size=[769, 769],
+                 min_max_size=[400, 600],
+                 train_crop_size=[512, 512],
                  mean=[0.5, 0.5, 0.5],
-                 std=[0.5, 0.5, 0.5]):
+                 std=[0.5, 0.5, 0.5],
+                 random_horizontal_flip=True):
         if mode == 'train':
             # 训练时的transforms,包含数据增强
             transforms = [
-                RandomHorizontalFlip(prob=0.5), ResizeStepScaling(),
+                ResizeRangeScaling(
+                    min_value=min(min_max_size), max_value=max(min_max_size)),
                 RandomPaddingCrop(crop_size=train_crop_size), Normalize(
                     mean=mean, std=std)
             ]
+            if random_horizontal_flip:
+                transforms.insert(0, RandomHorizontalFlip())
         else:
             # 验证/预测时的transforms
-            transforms = [Normalize(mean=mean, std=std)]
-
+            long_size = (min(min_max_size) + max(min_max_size)) // 2
+            transforms = [
+                ResizeByLong(long_size=long_size), Normalize(
+                    mean=mean, std=std)
+            ]
         super(ComposedSegTransforms, self).__init__(transforms)

+ 4 - 2
paddlex/interpret/visualize.py

@@ -70,8 +70,10 @@ def normlime(img_file,
              normlime_weights_file=None):
     """使用NormLIME算法将模型预测结果的可解释性可视化。
 
-    NormLIME是利用一定数量的样本来出一个全局的解释。NormLIME会提前计算一定数量的测
-    试样本的LIME结果,然后对相同的特征进行权重的归一化,这样来得到一个全局的输入和输出的关系。
+    NormLIME是利用一定数量的样本来出一个全局的解释。由于NormLIME计算量较大,此处采用一种简化的方式:
+    使用一定数量的测试样本(目前默认使用所有测试样本),对每个样本进行特征提取,映射到同一个特征空间;
+    然后以此特征做为输入,以模型输出做为输出,使用线性回归对其进行拟合,得到一个全局的输入和输出的关系。
+    之后,对一测试样本进行解释时,使用NormLIME全局的解释,来对LIME的结果进行滤波,使最终的可视化结果更加稳定。
 
     注意1:dataset读取的是一个数据集,该数据集不宜过大,否则计算时间会较长,但应包含所有类别的数据。
     注意2:NormLIME可解释性结果可视化目前只支持分类模型。

+ 2 - 2
setup.py

@@ -15,11 +15,11 @@
 import setuptools
 import sys
 
-long_description = "PaddleX. A end-to-end deeplearning model development toolkit base on PaddlePaddle\n\n"
+long_description = "PaddlePaddle Entire Process Development Toolkit"
 
 setuptools.setup(
     name="paddlex",
-    version='1.0.6',
+    version='1.0.7',
     author="paddlex",
     author_email="paddlex@baidu.com",
     description=long_description,

+ 1 - 1
new_tutorials/train/segmentation/fast_scnn.py → tutorials/train/segmentation/fast_scnn.py

@@ -35,7 +35,7 @@ eval_dataset = pdx.datasets.SegDataset(
 # 浏览器打开 https://0.0.0.0:8001即可
 # 其中0.0.0.0为本机访问,如为远程服务, 改成相应机器IP
 
-# https://paddlex.readthedocs.io/zh_CN/latest/apis/models/semantic_segmentation.html#hrnet
+# https://paddlex.readthedocs.io/zh_CN/latest/apis/models/semantic_segmentation.html#fastscnn
 num_classes = len(train_dataset.labels)
 model = pdx.seg.FastSCNN(num_classes=num_classes)
 model.train(