Channingss 5 vuotta sitten
vanhempi
commit
7041ed1fd8

+ 13 - 3
README.md

@@ -8,16 +8,26 @@
 
 PaddleX是基于飞桨技术生态的全流程深度学习模型开发工具。具备易集成,易使用,全流程等特点。PaddleX作为深度学习开发工具,不仅提供了开源的内核代码,可供用户灵活使用或集成,同时也提供了配套的前端可视化客户端套件,让用户以可视化的方式进行模型开发,免去代码开发过程,访问[PaddleX官网](https://www.paddlepaddle.org.cn/paddle/paddlex)获取更多相关细节。
 ## 安装
-查看[PaddleX安装文档](docs/install.md)
+### pip安装(使用Python代码进行模型训练)
+> **依赖**
+> - cython
+> - pycocotools
+> - python3
+```
+pip install paddlex -i https://mirror.baidu.com/pypi/simple
+```
+
+### PaddleX模型训练客户端安装(使用可视化界面进行模型训练)
+> 进入官网[下载使用](https://www.paddlepaddle.org.cn/paddle/paddleX)
 
 ## 文档
 推荐访问[PaddleX在线使用文档](https://paddlex.readthedocs.io/zh_CN/latest/index.html),快速查阅读使用教程和API文档说明。
 
-常用文档
 - [10分钟快速上手PaddleX模型训练](docs/quick_start.md)
 - [PaddleX使用教程](docs/tutorials)
 - [PaddleX模型库](docs/model_zoo.md)
-- [导出模型部署](docs/deploy/deploy.md)
+- [导出模型部署](docs/deploy.md)
+- [使用PaddleX客户端进行模型训练](docs/client_use.md)
 
 
 ## 反馈

+ 59 - 0
docs/anaconda_install.md

@@ -0,0 +1,59 @@
+# Anaconda安装使用
+Anaconda是一个开源的Python发行版本,其包含了conda、Python等180多个科学包及其依赖项。使用Anaconda可以通过创建多个独立的Python环境,避免用户的Python环境安装太多不同版本依赖导致冲突。
+
+## Windows安装Anaconda
+### 第一步 下载
+在Anaconda官网[(https://www.anaconda.com/products/individual)](https://www.anaconda.com/products/individual)选择下载Windows Python3.7 64-Bit版本
+
+### 第二步 安装
+运行下载的安装包(以.exe为后辍),根据引导完成安装, 用户可自行修改安装目录(如下图)
+![](./images/anaconda_windows.png)
+
+### 第三步 使用
+- 点击Windows系统左下角的Windows图标,打开:所有程序->Anaconda3/2(64-bit)->Anaconda Prompt  
+- 在命令行中执行下述命令
+```cmd
+# 创建名为my_paddlex的环境,指定Python版本为3.7
+conda create -n my_paddlex python=3.7
+# 进入my_paddlex环境
+conda activate my_paddlex
+# 安装git
+conda install git
+# 安装pycocotools
+pip install cython
+pip install git+https://github.com/philferriere/cocoapi.git#subdirectory=PythonAPI
+# 安装paddlepaddle-gpu
+pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
+# 安装paddlex
+pip install paddlex -i https://mirror.baidu.com/pypi/simple
+```  
+按如上方式配置后,即可在环境中使用PaddleX了,命令行输入`python`回车后,`import paddlex`试试吧,之后再次使用都可以通过打开'所有程序->Anaconda3/2(64-bit)->Anaconda Prompt',再执行`conda activate my_paddlex`进入环境后,即可再次使用paddlex
+
+## Linux/Mac安装
+
+### 第一步 下载
+在Anaconda官网[(https://www.anaconda.com/products/individual)](https://www.anaconda.com/products/individual)选择下载对应系统 Python3.7版本下载(Mac下载Command Line Installer版本即可)
+
+### 第二步 安装
+打开终端,在终端安装Anaconda
+```
+# ~/Downloads/Anaconda3-2019.07-Linux-x86_64.sh即下载的文件
+bash ~/Downloads/Anaconda3-2019.07-Linux-x86_64.sh
+```
+安装过程中一直回车即可,如提示设置安装路径,可根据需求修改,一般默认即可。
+
+### 第三步 使用
+```
+# 创建名为my_paddlex的环境,指定Python版本为3.7
+conda create -n my_paddlex python=3.7
+# 进入paddlex环境
+conda activate my_paddlex
+# 安装pycocotools
+pip install cython
+pip install pycocotools
+# 安装paddlepaddle-gpu
+pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
+# 安装paddlex
+pip install paddlex -i https://mirror.baidu.com/pypi/simple
+```
+按如上方式配置后,即可在环境中使用PaddleX了,终端输入`python`回车后,`import paddlex`试试吧,之后再次使用只需再打开终端,再执行`conda activate my_paddlex`进入环境后,即可使用paddlex

BIN
docs/apis/images/insect_bbox_pr_curve(iou-0.5).png


BIN
docs/apis/images/xiaoduxiong_bbox_pr_curve(iou-0.5).png


BIN
docs/apis/images/xiaoduxiong_segm_pr_curve(iou-0.5).png


+ 25 - 39
docs/apis/transforms/det_transforms.md

@@ -122,56 +122,42 @@ paddlex.det.transforms.MixupImage(alpha=1.5, beta=1.5, mixup_epoch=-1)
 
 ## RandomExpand类
 ```python
-paddlex.det.transforms.RandomExpand(max_ratio=4., prob=0.5, mean=[127.5, 127.5, 127.5])
+paddlex.det.transforms.RandomExpand(ratio=4., prob=0.5, fill_value=[123.675, 116.28, 103.53])
 ```
 
-随机扩张图像,模型训练时的数据增强操作,模型训练时的数据增强操作  
-1. 随机选取扩张比例(扩张比例大于1时才进行扩张)。  
-2. 计算扩张后图像大小。  
-3. 初始化像素值为数据集均值的图像,并将原图像随机粘贴于该图像上。  
+随机扩张图像,模型训练时的数据增强操作。
+1. 随机选取扩张比例(扩张比例大于1时才进行扩张)。
+2. 计算扩张后图像大小。
+3. 初始化像素值为输入填充值的图像,并将原图像随机粘贴于该图像上。
 4. 根据原图像粘贴位置换算出扩张后真实标注框的位置坐标。
+5. 根据原图像粘贴位置换算出扩张后真实分割区域的位置坐标。
 
 ### 参数
-* **max_ratio** (float): 图像扩张的最大比例。默认为4.0。
+* **ratio** (float): 图像扩张的最大比例。默认为4.0。
 * **prob** (float): 随机扩张的概率。默认为0.5。
-* **mean** (list): 图像数据集的均值(0-255)。默认为[127.5, 127.5, 127.5]。
+* **fill_value** (list): 扩张图像的初始填充值(0-255)。默认为[123.675, 116.28, 103.53]。
 
 ## RandomCrop类
 ```python
-paddlex.det.transforms.RandomCrop(batch_sampler=None, satisfy_all=False, avoid_no_bbox=True)
+paddlex.det.transforms.RandomCrop(aspect_ratio=[.5, 2.], thresholds=[.0, .1, .3, .5, .7, .9], scaling=[.3, 1.], num_attempts=50, allow_no_crop=True, cover_all_box=False)
 ```
 
 随机裁剪图像,模型训练时的数据增强操作。  
-1. 根据batch_sampler计算获取裁剪候选区域的位置。  
-    (1) 根据min scale、max scale、min aspect ratio、max aspect ratio计算随机剪裁的高、宽。  
-    (2) 根据随机剪裁的高、宽随机选取剪裁的起始点。  
-    (3) 筛选出裁剪候选区域:  
-    * 当satisfy_all为True时,需所有真实标注框与裁剪候选区域的重叠度满足需求时,该裁剪候选区域才可保留。  
-    * 当satisfy_all为False时,当有一个真实标注框与裁剪候选区域的重叠度满足需求时,该裁剪候选区域就可保留。  
-2. 遍历所有裁剪候选区域:  
-    (1) 若真实标注框与候选裁剪区域不重叠,或其中心点不在候选裁剪区域,则将该真实标注框去除。  
-    (2) 计算相对于该候选裁剪区域,真实标注框的位置,并筛选出对应的类别、混合得分。  
-    (3) 若avoid_no_bbox为False,返回当前裁剪后的信息即可;反之,要找到一个裁剪区域中真实标注框个数不为0的区域,才返回裁剪后的信息
+1. 若allow_no_crop为True,则在thresholds加入’no_crop’。
+2. 随机打乱thresholds。
+3. 遍历thresholds中各元素:
+    (1) 如果当前thresh为’no_crop’,则返回原始图像和标注信息。
+    (2) 随机取出aspect_ratio和scaling中的值并由此计算出候选裁剪区域的高、宽、起始点。
+    (3) 计算真实标注框与候选裁剪区域IoU,若全部真实标注框的IoU都小于thresh,则继续第3步。
+    (4) 如果cover_all_box为True且存在真实标注框的IoU小于thresh,则继续第3步。
+    (5) 筛选出位于候选裁剪区域内的真实标注框,若有效框的个数为0,则继续第3步,否则进行第4步。
+4. 换算有效真值标注框相对候选裁剪区域的位置坐标。
+5. 换算有效分割区域相对候选裁剪区域的位置坐标
 
 ### 参数
-* **batch_sampler** (list): 随机裁剪参数的多种组合,每种组合包含8个值,如下:
-    - max sample (int):满足当前组合的裁剪区域的个数上限。
-    - max trial (int): 查找满足当前组合的次数。
-    - min scale (float): 裁剪面积相对原面积,每条边缩短比例的最小限制。
-    - max scale (float): 裁剪面积相对原面积,每条边缩短比例的最大限制。
-    - min aspect ratio (float): 裁剪后短边缩放比例的最小限制。
-    - max aspect ratio (float): 裁剪后短边缩放比例的最大限制。
-    - min overlap (float): 真实标注框与裁剪图像重叠面积的最小限制。
-    - max overlap (float): 真实标注框与裁剪图像重叠面积的最大限制。
-
-    默认值为None,当为None时采用如下设置:
-
-    [[1, 1, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0],  
-     [1, 50, 0.3, 1.0, 0.5, 2.0, 0.1, 1.0],  
-     [1, 50, 0.3, 1.0, 0.5, 2.0, 0.3, 1.0],  
-     [1, 50, 0.3, 1.0, 0.5, 2.0, 0.5, 1.0],  
-     [1, 50, 0.3, 1.0, 0.5, 2.0, 0.7, 1.0],  
-     [1, 50, 0.3, 1.0, 0.5, 2.0, 0.9, 1.0],  
-     [1, 50, 0.3, 1.0, 0.5, 2.0, 0.0, 1.0]]
-* **satisfy_all** (bool): 是否需要所有标注框满足条件,裁剪候选区域才保留。默认为False。
-* **avoid_no_bbox** (bool): 是否对裁剪图像不存在标注框的图像进行保留。默认为True。
+* **aspect_ratio** (list): 裁剪后短边缩放比例的取值范围,以[min, max]形式表示。默认值为[.5, 2.]。
+* **thresholds** (list): 判断裁剪候选区域是否有效所需的IoU阈值取值列表。默认值为[.0, .1, .3, .5, .7, .9]。
+* **scaling** (list): 裁剪面积相对原面积的取值范围,以[min, max]形式表示。默认值为[.3, 1.]。
+* **num_attempts** (int): 在放弃寻找有效裁剪区域前尝试的次数。默认值为50。
+* **allow_no_crop** (bool): 是否允许未进行裁剪。默认值为True。
+* **cover_all_box** (bool): 是否要求所有的真实标注框都必须在裁剪区域内。默认值为False。

+ 69 - 5
docs/apis/visualize.md

@@ -3,7 +3,7 @@ PaddleX提供了一系列模型预测和结果分析的可视化函数。
 
 ## 目标检测/实例分割预测结果可视化
 ```
-paddlex.det.visualize(image, result, threshold=0.5, save_dir=None)
+paddlex.det.visualize(image, result, threshold=0.5, save_dir='./')
 ```
 将目标检测/实例分割模型预测得到的Box框和Mask在原图上进行可视化
 
@@ -11,7 +11,7 @@ paddlex.det.visualize(image, result, threshold=0.5, save_dir=None)
 > * **image** (str): 原图文件路径。  
 > * **result** (str): 模型预测结果。
 > * **threshold**(float): score阈值,将Box置信度低于该阈值的框过滤不进行可视化。默认0.5
-> * **save_dir**(str): 可视化结果保存路径。若为None,则表示不保存,该函数将可视化的结果以np.ndarray的形式返回;若设为目录路径,则将可视化结果保存至该目录下
+> * **save_dir**(str): 可视化结果保存路径。若为None,则表示不保存,该函数将可视化的结果以np.ndarray的形式返回;若设为目录路径,则将可视化结果保存至该目录下。默认值为'./'。
 
 ### 使用示例
 > 点击下载如下示例中的[模型](https://bj.bcebos.com/paddlex/models/xiaoduxiong_epoch_12.tar.gz)和[测试图片](https://bj.bcebos.com/paddlex/datasets/xiaoduxiong.jpeg)
@@ -23,17 +23,81 @@ pdx.det.visualize('xiaoduxiong.jpeg', result, save_dir='./')
 # 预测结果保存在./visualize_xiaoduxiong.jpeg
 ```
 
+## 目标检测/实例分割准确率-召回率可视化
+```
+paddlex.det.draw_pr_curve(eval_details_file=None, gt=None, pred_bbox=None, pred_mask=None, iou_thresh=0.5, save_dir='./')
+```
+将目标检测/实例分割模型评估结果中各个类别的准确率和召回率的对应关系进行可视化,同时可视化召回率和置信度阈值的对应关系。
+
+### 参数
+> * **eval_details_file** (str): 模型评估结果的保存路径,包含真值信息和预测结果。默认值为None。
+> * **gt** (list): 数据集的真值信息。默认值为None。
+> * **pred_bbox** (list): 模型在数据集上的预测框。默认值为None。
+> * **pred_mask** (list): 模型在数据集上的预测mask。默认值为None。
+> * **iou_thresh** (float): 判断预测框或预测mask为真阳时的IoU阈值。默认值为0.5。
+> * **save_dir** (str): 可视化结果保存路径。默认值为'./'。
+
+**注意:**`eval_details_file`的优先级更高,只要`eval_details_file`不为None,就会从`eval_details_file`提取真值信息和预测结果做分析。当`eval_details_file`为None时,则用`gt`、`pred_mask`、`pred_mask`做分析。
+
+### 使用示例
+> 示例一:
+点击下载如下示例中的[模型](https://bj.bcebos.com/paddlex/models/xiaoduxiong_epoch_12.tar.gz)和[数据集](https://bj.bcebos.com/paddlex/datasets/xiaoduxiong_ins_det.tar.gz)
+```
+import os
+# 选择使用0号卡
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+
+from paddlex.det import transforms
+import paddlex as pdx
+
+eval_transforms = transforms.Compose([
+    transforms.Normalize(),
+    transforms.ResizeByShort(short_size=800, max_size=1333),
+    transforms.Padding(coarsest_stride=32)
+])
+
+eval_dataset = pdx.datasets.CocoDetection(
+    data_dir='xiaoduxiong_ins_det/JPEGImages',
+    ann_file='xiaoduxiong_ins_det/val.json',
+    transforms=eval_transforms)
+
+model = pdx.load_model('xiaoduxiong_epoch_12')
+metrics, evaluate_details = model.evaluate(eval_dataset, batch_size=1, return_details=True)
+gt = evaluate_details['gt']
+bbox = evaluate_details['bbox']
+mask = evaluate_details['mask']
+
+# 分别可视化bbox和mask的准召曲线
+pdx.det.draw_pr_curve(gt=gt, pred_bbox=bbox, pred_mask=mask, save_dir='./xiaoduxiong')
+```
+预测框的各个类别的准确率和召回率的对应关系、召回率和置信度阈值的对应关系可视化如下:
+![](./images/xiaoduxiong_bbox_pr_curve(iou-0.5).png)
+
+预测mask的各个类别的准确率和召回率的对应关系、召回率和置信度阈值的对应关系可视化如下:
+![](./images/xiaoduxiong_segm_pr_curve(iou-0.5).png)
+
+> 示例二:
+使用[yolov3_darknet53.py示例代码](https://github.com/PaddlePaddle/PaddleX/blob/develop/tutorials/train/detection/yolov3_darknet53.py)训练完成后,加载模型评估结果文件进行分析:
+
+```
+import paddlex as pdx
+eval_details_file = 'output/yolov3_darknet53/best_model/eval_details.json'
+pdx.det.draw_pr_curve(eval_details_file, save_dir='./insect')
+```
+预测框的各个类别的准确率和召回率的对应关系、召回率和置信度阈值的对应关系可视化如下:
+![](./images/insect_bbox_pr_curve(iou-0.5).png)
+
 ## 语义分割预测结果可视化
 ```
-paddlex.seg.visualize(image, result, weight=0.6, save_dir=None)
+paddlex.seg.visualize(image, result, weight=0.6, save_dir='./')
 ```
 将语义分割模型预测得到的Mask在原图上进行可视化
 
 ### 参数
 > * **image** (str): 原图文件路径。  
 > * **result** (str): 模型预测结果。
-> * **weight**(float): mask可视化结果与原图权重因子,weight表示原图的权重。默认0.6
-> * **save_dir**(str): 可视化结果保存路径。若为None,则表示不保存,该函数将可视化的结果以np.ndarray的形式返回;若设为目录路径,则将可视化结果保存至该目录下
+> * **weight**(float): mask可视化结果与原图权重因子,weight表示原图的权重。默认0.6
+> * **save_dir**(str): 可视化结果保存路径。若为None,则表示不保存,该函数将可视化的结果以np.ndarray的形式返回;若设为目录路径,则将可视化结果保存至该目录下。默认值为'./'。
 
 ### 使用示例
 > 点击下载如下示例中的[模型](https://bj.bcebos.com/paddlex/models/cityscape_deeplab.tar.gz)和[测试图片](https://bj.bcebos.com/paddlex/datasets/city.png)

+ 75 - 0
docs/client_use.md

@@ -0,0 +1,75 @@
+# 使用PaddleX客户端进行模型训练
+
+**第一步:下载PaddleX客户端**
+
+您需要前往 [官网](https://www.paddlepaddle.org.cn/paddle/paddlex)填写基本信息后下载试用PaddleX客户端
+
+
+**第二步:准备数据**
+
+在开始模型训练前,您需要根据不同的任务类型,将数据标注为相应的格式。目前PaddleX支持【图像分类】、【目标检测】、【语义分割】、【实例分割】四种任务类型。不同类型任务的数据处理方式可查看[数据标注方式](https://github.com/PaddlePaddle/PaddleX/tree/master/DataAnnotation/AnnotationNote)。
+
+
+**第三步:导入我的数据集**
+
+① 数据标注完成后,您需要根据不同的任务,将数据和标注文件,按照客户端提示更名并保存到正确的文件中。
+
+② 在客户端新建数据集,选择与数据集匹配的任务类型,并选择数据集对应的路径,将数据集导入。
+
+![](./images/00_loaddata.png)
+
+③ 选定导入数据集后,客户端会自动校验数据及标注文件是否合规,校验成功后,您可根据实际需求,将数据集按比例划分为训练集、验证集、测试集。
+
+④ 您可在「数据分析」模块按规则预览您标注的数据集,双击单张图片可放大查看。
+
+![](./images/01_datasplit.png)
+
+
+
+**第四步:创建项目**
+
+① 在完成数据导入后,您可以点击「新建项目」创建一个项目。
+
+② 您可根据实际任务需求选择项目的任务类型,需要注意项目所采用的数据集也带有任务类型属性,两者需要进行匹配。
+
+![](./images/02_newproject.png)
+
+
+
+**第五步:项目开发**
+
+① **数据选择**:项目创建完成后,您需要选择已载入客户端并校验后的数据集,并点击下一步,进入参数配置页面。
+
+![](./images/03_choosedata.png)
+
+② **参数配置**:主要分为**模型参数**、**训练参数**、**优化策略**三部分。您可根据实际需求选择模型结构及对应的训练参数、优化策略,使得任务效果最佳。
+
+![](./images/04_parameter.png)
+
+参数配置完成后,点击启动训练,模型开始训练并进行效果评估。
+
+③ **训练可视化**:
+
+在训练过程中,您可通过VisualDL查看模型训练过程时的参数变化、日志详情,及当前最优的训练集和验证集训练指标。模型在训练过程中通过点击"终止训练"随时终止训练过程。
+
+![](./images/05_train.png)
+
+![](./images/06_VisualDL.png)
+
+模型训练结束后,点击”下一步“,进入模型评估页面。
+
+
+
+④ **模型评估**
+
+在模型评估页面,您可将训练后的模型应用在切分时留出的「验证数据集」以测试模型在验证集上的效果。评估方法包括混淆矩阵、精度、召回率等。在这个页面,您也可以直接查看模型在测试数据集上的预测效果。
+
+根据评估结果,您可决定进入模型发布页面,或返回先前步骤调整参数配置重新进行训练。
+
+![](./images/07_evaluate.png)
+
+⑤**模型发布**
+
+当模型效果满意后,您可根据实际的生产环境需求,选择将模型发布为需要的版本。
+
+![](./images/08_deploy.png)

BIN
docs/images/00_loaddata.png


BIN
docs/images/01_datasplit.png


BIN
docs/images/02_newproject.png


BIN
docs/images/03_choosedata.png


BIN
docs/images/04_parameter.png


BIN
docs/images/05_train.png


BIN
docs/images/06_VisualDL.png


BIN
docs/images/07_evaluate.png


BIN
docs/images/08_deploy.png


BIN
docs/images/anaconda_windows.png


+ 2 - 1
docs/index.rst

@@ -19,9 +19,10 @@ PaddleX是基于飞桨技术生态的深度学习全流程开发工具。具备
    tutorials/index.rst
    metrics.md
    deploy.md
+   client_use.md
    FAQ.md
 
-* PaddleX版本: v0.1.0
+* PaddleX版本: v0.1.6
 * 项目官网: http://www.paddlepaddle.org.cn/paddle/paddlex  
 * 项目GitHub: https://github.com/PaddlePaddle/PaddleX/tree/develop  
 * 官方QQ用户群: 1045148026  

+ 3 - 1
docs/install.md

@@ -1,6 +1,8 @@
 # 安装
 
-> 以下安装过程默认用户已安装好**paddlepaddle-gpu或paddlepaddle(版本大于或等于1.7.1)**,paddlepaddle安装方式参照[飞桨官网](https://www.paddlepaddle.org.cn/install/quick)
+以下安装过程默认用户已安装好**paddlepaddle-gpu或paddlepaddle(版本大于或等于1.7.1)**,paddlepaddle安装方式参照[飞桨官网](https://www.paddlepaddle.org.cn/install/quick)
+
+> 推荐使用Anaconda Python环境,Anaconda下安装PaddleX参考文档[Anaconda安装使用](./anaconda_install.md)
 
 ## Github代码安装
 github代码会跟随开发进度不断更新

+ 2 - 2
docs/quick_start.md

@@ -1,6 +1,6 @@
 # 10分钟快速上手使用
 
-本文档在一个小数据集上展示了如何通过PaddleX进行训练,您可以阅读PaddleX的**使用教程**来了解更多模型任务的训练使用方式。本示例同步在AIStudio上,可直接[在线体验模型训练](https://aistudio.baidu.com/aistudio/projectdetail/423472)
+本文档在一个小数据集上展示了如何通过PaddleX进行训练,您可以阅读PaddleX的**使用教程**来了解更多模型任务的训练使用方式。本示例同步在AIStudio上,可直接[在线体验模型训练](https://aistudio.baidu.com/aistudio/projectdetail/439860)
 
 ## 1. 准备蔬菜分类数据集
 ```
@@ -42,7 +42,7 @@ train_dataset = pdx.datasets.ImageNet(
     shuffle=True)
 eval_dataset = pdx.datasets.ImageNet(
     data_dir='vegetables_cls',
-    file_list='vegetables_cls/train_list.txt',
+    file_list='vegetables_cls/val_list.txt',
     label_list='vegetables_cls/labels.txt',
     transforms=eval_transforms)
 ```

+ 1 - 1
paddlex/__init__.py

@@ -25,4 +25,4 @@ load_model = cv.models.load_model
 datasets = cv.datasets
 
 log_level = 2
-__version__ = '0.1.5.github'
+__version__ = '0.1.6.github'

+ 134 - 2
paddlex/cv/models/utils/visualize.py

@@ -15,10 +15,14 @@
 import os
 import cv2
 import numpy as np
+import matplotlib.pyplot as plt
 from PIL import Image, ImageDraw
 
+import paddlex.utils.logging as logging
+from .detection_eval import fixed_linspace, backup_linspace, loadRes
 
-def visualize_detection(image, result, threshold=0.5, save_dir=None):
+
+def visualize_detection(image, result, threshold=0.5, save_dir='./'):
     """
         Visualize bbox and mask results
     """
@@ -31,11 +35,12 @@ def visualize_detection(image, result, threshold=0.5, save_dir=None):
             os.makedirs(save_dir)
         out_path = os.path.join(save_dir, 'visualize_{}'.format(image_name))
         image.save(out_path, quality=95)
+        logging.info('The visualized result is saved as {}'.format(out_path))
     else:
         return image
 
 
-def visualize_segmentation(image, result, weight=0.6, save_dir=None):
+def visualize_segmentation(image, result, weight=0.6, save_dir='./'):
     """
     Convert segment result to color image, and save added image.
     Args:
@@ -62,6 +67,7 @@ def visualize_segmentation(image, result, weight=0.6, save_dir=None):
         image_name = os.path.split(image)[-1]
         out_path = os.path.join(save_dir, 'visualize_{}'.format(image_name))
         cv2.imwrite(out_path, vis_result)
+        logging.info('The visualized result is saved as {}'.format(out_path))
     else:
         return vis_result
 
@@ -160,3 +166,129 @@ def draw_bbox_mask(image, results, threshold=0.5, alpha=0.7):
             img_array[idx[0], idx[1], :] += alpha * color_mask
             image = Image.fromarray(img_array.astype('uint8'))
     return image
+
+
+def draw_pr_curve(eval_details_file=None,
+                  gt=None,
+                  pred_bbox=None,
+                  pred_mask=None,
+                  iou_thresh=0.5,
+                  save_dir='./'):
+    if eval_details_file is not None:
+        import json
+        with open(eval_details_file, 'r') as f:
+            eval_details = json.load(f)
+            pred_bbox = eval_details['bbox']
+            if 'mask' in eval_details:
+                pred_mask = eval_details['mask']
+            gt = eval_details['gt']
+    if gt is None or pred_bbox is None:
+        raise Exception(
+            "gt/pred_bbox/pred_mask is None now, please set right eval_details_file or gt/pred_bbox/pred_mask."
+        )
+    if pred_bbox is not None and len(pred_bbox) == 0:
+        raise Exception("There is no predicted bbox.")
+    if pred_mask is not None and len(pred_mask) == 0:
+        raise Exception("There is no predicted mask.")
+    from pycocotools.coco import COCO
+    from pycocotools.cocoeval import COCOeval
+    coco = COCO()
+    coco.dataset = gt
+    coco.createIndex()
+
+    def _summarize(coco_gt, ap=1, iouThr=None, areaRng='all', maxDets=100):
+        p = coco_gt.params
+        aind = [i for i, aRng in enumerate(p.areaRngLbl) if aRng == areaRng]
+        mind = [i for i, mDet in enumerate(p.maxDets) if mDet == maxDets]
+        if ap == 1:
+            # dimension of precision: [TxRxKxAxM]
+            s = coco_gt.eval['precision']
+            # IoU
+            if iouThr is not None:
+                t = np.where(iouThr == p.iouThrs)[0]
+                s = s[t]
+            s = s[:, :, :, aind, mind]
+        else:
+            # dimension of recall: [TxKxAxM]
+            s = coco_gt.eval['recall']
+            if iouThr is not None:
+                t = np.where(iouThr == p.iouThrs)[0]
+                s = s[t]
+            s = s[:, :, aind, mind]
+        if len(s[s > -1]) == 0:
+            mean_s = -1
+        else:
+            mean_s = np.mean(s[s > -1])
+        return mean_s
+
+    def cal_pr(coco_gt, coco_dt, iou_thresh, save_dir, style='bbox'):
+        from pycocotools.cocoeval import COCOeval
+        coco_dt = loadRes(coco_gt, coco_dt)
+        np.linspace = fixed_linspace
+        coco_eval = COCOeval(coco_gt, coco_dt, style)
+        coco_eval.params.iouThrs = np.linspace(
+            iou_thresh, iou_thresh, 1, endpoint=True)
+        np.linspace = backup_linspace
+        coco_eval.evaluate()
+        coco_eval.accumulate()
+        stats = _summarize(coco_eval, iouThr=iou_thresh)
+        catIds = coco_gt.getCatIds()
+        if len(catIds) != coco_eval.eval['precision'].shape[2]:
+            raise Exception(
+                "The category number must be same as the third dimension of precisions."
+            )
+        x = np.arange(0.0, 1.01, 0.01)
+        color_map = get_color_map_list(256)[1:256]
+
+        plt.subplot(1, 2, 1)
+        plt.title(style + " precision-recall IoU={}".format(iou_thresh))
+        plt.xlabel("recall")
+        plt.ylabel("precision")
+        plt.xlim(0, 1.01)
+        plt.ylim(0, 1.01)
+        plt.grid(linestyle='--', linewidth=1)
+        plt.plot([0, 1], [0, 1], 'r--', linewidth=1)
+        my_x_ticks = np.arange(0, 1.01, 0.1)
+        my_y_ticks = np.arange(0, 1.01, 0.1)
+        plt.xticks(my_x_ticks, fontsize=5)
+        plt.yticks(my_y_ticks, fontsize=5)
+        for idx, catId in enumerate(catIds):
+            pr_array = coco_eval.eval['precision'][0, :, idx, 0, 2]
+            precision = pr_array[pr_array > -1]
+            ap = np.mean(precision) if precision.size else float('nan')
+            nm = coco_gt.loadCats(catId)[0]['name'] + ' AP={:0.2f}'.format(
+                float(ap * 100))
+            color = tuple(color_map[idx])
+            color = [float(c) / 255 for c in color]
+            color.append(0.75)
+            plt.plot(x, pr_array, color=color, label=nm, linewidth=1)
+        plt.legend(loc="lower left", fontsize=5)
+
+        plt.subplot(1, 2, 2)
+        plt.title(style + " score-recall IoU={}".format(iou_thresh))
+        plt.xlabel('recall')
+        plt.ylabel('score')
+        plt.xlim(0, 1.01)
+        plt.ylim(0, 1.01)
+        plt.grid(linestyle='--', linewidth=1)
+        plt.xticks(my_x_ticks, fontsize=5)
+        plt.yticks(my_y_ticks, fontsize=5)
+        for idx, catId in enumerate(catIds):
+            nm = coco_gt.loadCats(catId)[0]['name']
+            sr_array = coco_eval.eval['scores'][0, :, idx, 0, 2]
+            color = tuple(color_map[idx])
+            color = [float(c) / 255 for c in color]
+            color.append(0.75)
+            plt.plot(x, sr_array, color=color, label=nm, linewidth=1)
+        plt.legend(loc="lower left", fontsize=5)
+        plt.savefig(
+            os.path.join(save_dir, "./{}_pr_curve(iou-{}).png".format(
+                style, iou_thresh)),
+            dpi=800)
+        plt.close()
+
+    if not os.path.exists(save_dir):
+        os.makedirs(save_dir)
+    cal_pr(coco, pred_bbox, iou_thresh, save_dir, style='bbox')
+    if pred_mask is not None:
+        cal_pr(coco, pred_mask, iou_thresh, save_dir, style='segm')

+ 2 - 2
paddlex/cv/models/yolo_v3.py

@@ -97,9 +97,9 @@ class YOLOv3(BaseAPI):
         elif backbone_name == 'MobileNetV1':
             backbone = paddlex.cv.nets.MobileNetV1(norm_type='sync_bn')
         elif backbone_name.startswith('MobileNetV3'):
-            models_name = backbone_name.split('_')[1]
+            model_name = backbone_name.split('_')[1]
             backbone = paddlex.cv.nets.MobileNetV3(
-                norm_type='sync_bn', models_name=models_name)
+                norm_type='sync_bn', model_name=model_name)
         return backbone
 
     def build_net(self, mode='train'):

+ 134 - 341
paddlex/cv/transforms/box_utils.py

@@ -19,25 +19,6 @@ import cv2
 import scipy
 
 
-def meet_emit_constraint(src_bbox, sample_bbox):
-    center_x = (src_bbox[2] + src_bbox[0]) / 2
-    center_y = (src_bbox[3] + src_bbox[1]) / 2
-    if center_x >= sample_bbox[0] and \
-            center_x <= sample_bbox[2] and \
-            center_y >= sample_bbox[1] and \
-            center_y <= sample_bbox[3]:
-        return True
-    return False
-
-
-def clip_bbox(src_bbox):
-    src_bbox[0] = max(min(src_bbox[0], 1.0), 0.0)
-    src_bbox[1] = max(min(src_bbox[1], 1.0), 0.0)
-    src_bbox[2] = max(min(src_bbox[2], 1.0), 0.0)
-    src_bbox[3] = max(min(src_bbox[3], 1.0), 0.0)
-    return src_bbox
-
-
 def bbox_area(src_bbox):
     if src_bbox[2] < src_bbox[0] or src_bbox[3] < src_bbox[1]:
         return 0.
@@ -47,189 +28,6 @@ def bbox_area(src_bbox):
         return width * height
 
 
-def is_overlap(object_bbox, sample_bbox):
-    if object_bbox[0] >= sample_bbox[2] or \
-       object_bbox[2] <= sample_bbox[0] or \
-       object_bbox[1] >= sample_bbox[3] or \
-       object_bbox[3] <= sample_bbox[1]:
-        return False
-    else:
-        return True
-
-
-def filter_and_process(sample_bbox, bboxes, labels, scores=None):
-    new_bboxes = []
-    new_labels = []
-    new_scores = []
-    for i in range(len(bboxes)):
-        new_bbox = [0, 0, 0, 0]
-        obj_bbox = [bboxes[i][0], bboxes[i][1], bboxes[i][2], bboxes[i][3]]
-        if not meet_emit_constraint(obj_bbox, sample_bbox):
-            continue
-        if not is_overlap(obj_bbox, sample_bbox):
-            continue
-        sample_width = sample_bbox[2] - sample_bbox[0]
-        sample_height = sample_bbox[3] - sample_bbox[1]
-        new_bbox[0] = (obj_bbox[0] - sample_bbox[0]) / sample_width
-        new_bbox[1] = (obj_bbox[1] - sample_bbox[1]) / sample_height
-        new_bbox[2] = (obj_bbox[2] - sample_bbox[0]) / sample_width
-        new_bbox[3] = (obj_bbox[3] - sample_bbox[1]) / sample_height
-        new_bbox = clip_bbox(new_bbox)
-        if bbox_area(new_bbox) > 0:
-            new_bboxes.append(new_bbox)
-            new_labels.append([labels[i][0]])
-            if scores is not None:
-                new_scores.append([scores[i][0]])
-    bboxes = np.array(new_bboxes)
-    labels = np.array(new_labels)
-    scores = np.array(new_scores)
-    return bboxes, labels, scores
-
-
-def bbox_area_sampling(bboxes, labels, scores, target_size, min_size):
-    new_bboxes = []
-    new_labels = []
-    new_scores = []
-    for i, bbox in enumerate(bboxes):
-        w = float((bbox[2] - bbox[0]) * target_size)
-        h = float((bbox[3] - bbox[1]) * target_size)
-        if w * h < float(min_size * min_size):
-            continue
-        else:
-            new_bboxes.append(bbox)
-            new_labels.append(labels[i])
-            if scores is not None and scores.size != 0:
-                new_scores.append(scores[i])
-    bboxes = np.array(new_bboxes)
-    labels = np.array(new_labels)
-    scores = np.array(new_scores)
-    return bboxes, labels, scores
-
-
-def generate_sample_bbox(sampler):
-    scale = np.random.uniform(sampler[2], sampler[3])
-    aspect_ratio = np.random.uniform(sampler[4], sampler[5])
-    aspect_ratio = max(aspect_ratio, (scale**2.0))
-    aspect_ratio = min(aspect_ratio, 1 / (scale**2.0))
-    bbox_width = scale * (aspect_ratio**0.5)
-    bbox_height = scale / (aspect_ratio**0.5)
-    xmin_bound = 1 - bbox_width
-    ymin_bound = 1 - bbox_height
-    xmin = np.random.uniform(0, xmin_bound)
-    ymin = np.random.uniform(0, ymin_bound)
-    xmax = xmin + bbox_width
-    ymax = ymin + bbox_height
-    sampled_bbox = [xmin, ymin, xmax, ymax]
-    return sampled_bbox
-
-
-def generate_sample_bbox_square(sampler, image_width, image_height):
-    scale = np.random.uniform(sampler[2], sampler[3])
-    aspect_ratio = np.random.uniform(sampler[4], sampler[5])
-    aspect_ratio = max(aspect_ratio, (scale**2.0))
-    aspect_ratio = min(aspect_ratio, 1 / (scale**2.0))
-    bbox_width = scale * (aspect_ratio**0.5)
-    bbox_height = scale / (aspect_ratio**0.5)
-    if image_height < image_width:
-        bbox_width = bbox_height * image_height / image_width
-    else:
-        bbox_height = bbox_width * image_width / image_height
-    xmin_bound = 1 - bbox_width
-    ymin_bound = 1 - bbox_height
-    xmin = np.random.uniform(0, xmin_bound)
-    ymin = np.random.uniform(0, ymin_bound)
-    xmax = xmin + bbox_width
-    ymax = ymin + bbox_height
-    sampled_bbox = [xmin, ymin, xmax, ymax]
-    return sampled_bbox
-
-
-def data_anchor_sampling(bbox_labels, image_width, image_height, scale_array,
-                         resize_width):
-    num_gt = len(bbox_labels)
-    # np.random.randint range: [low, high)
-    rand_idx = np.random.randint(0, num_gt) if num_gt != 0 else 0
-
-    if num_gt != 0:
-        norm_xmin = bbox_labels[rand_idx][0]
-        norm_ymin = bbox_labels[rand_idx][1]
-        norm_xmax = bbox_labels[rand_idx][2]
-        norm_ymax = bbox_labels[rand_idx][3]
-
-        xmin = norm_xmin * image_width
-        ymin = norm_ymin * image_height
-        wid = image_width * (norm_xmax - norm_xmin)
-        hei = image_height * (norm_ymax - norm_ymin)
-        range_size = 0
-
-        area = wid * hei
-        for scale_ind in range(0, len(scale_array) - 1):
-            if area > scale_array[scale_ind] ** 2 and area < \
-                    scale_array[scale_ind + 1] ** 2:
-                range_size = scale_ind + 1
-                break
-
-        if area > scale_array[len(scale_array) - 2]**2:
-            range_size = len(scale_array) - 2
-
-        scale_choose = 0.0
-        if range_size == 0:
-            rand_idx_size = 0
-        else:
-            # np.random.randint range: [low, high)
-            rng_rand_size = np.random.randint(0, range_size + 1)
-            rand_idx_size = rng_rand_size % (range_size + 1)
-
-        if rand_idx_size == range_size:
-            min_resize_val = scale_array[rand_idx_size] / 2.0
-            max_resize_val = min(2.0 * scale_array[rand_idx_size],
-                                 2 * math.sqrt(wid * hei))
-            scale_choose = random.uniform(min_resize_val, max_resize_val)
-        else:
-            min_resize_val = scale_array[rand_idx_size] / 2.0
-            max_resize_val = 2.0 * scale_array[rand_idx_size]
-            scale_choose = random.uniform(min_resize_val, max_resize_val)
-
-        sample_bbox_size = wid * resize_width / scale_choose
-
-        w_off_orig = 0.0
-        h_off_orig = 0.0
-        if sample_bbox_size < max(image_height, image_width):
-            if wid <= sample_bbox_size:
-                w_off_orig = np.random.uniform(xmin + wid - sample_bbox_size,
-                                               xmin)
-            else:
-                w_off_orig = np.random.uniform(xmin,
-                                               xmin + wid - sample_bbox_size)
-
-            if hei <= sample_bbox_size:
-                h_off_orig = np.random.uniform(ymin + hei - sample_bbox_size,
-                                               ymin)
-            else:
-                h_off_orig = np.random.uniform(ymin,
-                                               ymin + hei - sample_bbox_size)
-
-        else:
-            w_off_orig = np.random.uniform(image_width - sample_bbox_size, 0.0)
-            h_off_orig = np.random.uniform(image_height - sample_bbox_size,
-                                           0.0)
-
-        w_off_orig = math.floor(w_off_orig)
-        h_off_orig = math.floor(h_off_orig)
-
-        # Figure out top left coordinates.
-        w_off = float(w_off_orig / image_width)
-        h_off = float(h_off_orig / image_height)
-
-        sampled_bbox = [
-            w_off, h_off, w_off + float(sample_bbox_size / image_width),
-            h_off + float(sample_bbox_size / image_height)
-        ]
-        return sampled_bbox
-    else:
-        return 0
-
-
 def jaccard_overlap(sample_bbox, object_bbox):
     if sample_bbox[0] >= object_bbox[2] or \
         sample_bbox[2] <= object_bbox[0] or \
@@ -249,143 +47,143 @@ def jaccard_overlap(sample_bbox, object_bbox):
     return overlap
 
 
-def intersect_bbox(bbox1, bbox2):
-    if bbox2[0] > bbox1[2] or bbox2[2] < bbox1[0] or \
-        bbox2[1] > bbox1[3] or bbox2[3] < bbox1[1]:
-        intersection_box = [0.0, 0.0, 0.0, 0.0]
-    else:
-        intersection_box = [
-            max(bbox1[0], bbox2[0]),
-            max(bbox1[1], bbox2[1]),
-            min(bbox1[2], bbox2[2]),
-            min(bbox1[3], bbox2[3])
-        ]
-    return intersection_box
-
-
-def bbox_coverage(bbox1, bbox2):
-    inter_box = intersect_bbox(bbox1, bbox2)
-    intersect_size = bbox_area(inter_box)
-
-    if intersect_size > 0:
-        bbox1_size = bbox_area(bbox1)
-        return intersect_size / bbox1_size
-    else:
-        return 0.
+def iou_matrix(a, b):
+    tl_i = np.maximum(a[:, np.newaxis, :2], b[:, :2])
+    br_i = np.minimum(a[:, np.newaxis, 2:], b[:, 2:])
 
+    area_i = np.prod(br_i - tl_i, axis=2) * (tl_i < br_i).all(axis=2)
+    area_a = np.prod(a[:, 2:] - a[:, :2], axis=1)
+    area_b = np.prod(b[:, 2:] - b[:, :2], axis=1)
+    area_o = (area_a[:, np.newaxis] + area_b - area_i)
+    return area_i / (area_o + 1e-10)
 
-def satisfy_sample_constraint(sampler,
-                              sample_bbox,
-                              gt_bboxes,
-                              satisfy_all=False):
-    if sampler[6] == 0 and sampler[7] == 0:
-        return True
-    satisfied = []
-    for i in range(len(gt_bboxes)):
-        object_bbox = [
-            gt_bboxes[i][0], gt_bboxes[i][1], gt_bboxes[i][2], gt_bboxes[i][3]
-        ]
-        overlap = jaccard_overlap(sample_bbox, object_bbox)
-        if sampler[6] != 0 and \
-                overlap < sampler[6]:
-            satisfied.append(False)
-            continue
-        if sampler[7] != 0 and \
-                overlap > sampler[7]:
-            satisfied.append(False)
-            continue
-        satisfied.append(True)
-        if not satisfy_all:
-            return True
-
-    if satisfy_all:
-        return np.all(satisfied)
-    else:
-        return False
 
+def crop_box_with_center_constraint(box, crop):
+    cropped_box = box.copy()
+
+    cropped_box[:, :2] = np.maximum(box[:, :2], crop[:2])
+    cropped_box[:, 2:] = np.minimum(box[:, 2:], crop[2:])
+    cropped_box[:, :2] -= crop[:2]
+    cropped_box[:, 2:] -= crop[:2]
+
+    centers = (box[:, :2] + box[:, 2:]) / 2
+    valid = np.logical_and(crop[:2] <= centers, centers < crop[2:]).all(axis=1)
+    valid = np.logical_and(
+        valid, (cropped_box[:, :2] < cropped_box[:, 2:]).all(axis=1))
+
+    return cropped_box, np.where(valid)[0]
+
+
+def is_poly(segm):
+    if not isinstance(segm, (list, dict)):
+        raise Exception("Invalid segm type: {}".format(type(segm)))
+    return isinstance(segm, list)
 
-def satisfy_sample_constraint_coverage(sampler, sample_bbox, gt_bboxes):
-    if sampler[6] == 0 and sampler[7] == 0:
-        has_jaccard_overlap = False
-    else:
-        has_jaccard_overlap = True
-    if sampler[8] == 0 and sampler[9] == 0:
-        has_object_coverage = False
-    else:
-        has_object_coverage = True
-
-    if not has_jaccard_overlap and not has_object_coverage:
-        return True
-    found = False
-    for i in range(len(gt_bboxes)):
-        object_bbox = [
-            gt_bboxes[i][0], gt_bboxes[i][1], gt_bboxes[i][2], gt_bboxes[i][3]
-        ]
-        if has_jaccard_overlap:
-            overlap = jaccard_overlap(sample_bbox, object_bbox)
-            if sampler[6] != 0 and \
-                    overlap < sampler[6]:
-                continue
-            if sampler[7] != 0 and \
-                    overlap > sampler[7]:
-                continue
-            found = True
-        if has_object_coverage:
-            object_coverage = bbox_coverage(object_bbox, sample_bbox)
-            if sampler[8] != 0 and \
-                    object_coverage < sampler[8]:
-                continue
-            if sampler[9] != 0 and \
-                    object_coverage > sampler[9]:
-                continue
-            found = True
-        if found:
-            return True
-    return found
-
-
-def crop_image_sampling(img, sample_bbox, image_width, image_height,
-                        target_size):
-    # no clipping here
-    xmin = int(sample_bbox[0] * image_width)
-    xmax = int(sample_bbox[2] * image_width)
-    ymin = int(sample_bbox[1] * image_height)
-    ymax = int(sample_bbox[3] * image_height)
-
-    w_off = xmin
-    h_off = ymin
-    width = xmax - xmin
-    height = ymax - ymin
-    cross_xmin = max(0.0, float(w_off))
-    cross_ymin = max(0.0, float(h_off))
-    cross_xmax = min(float(w_off + width - 1.0), float(image_width))
-    cross_ymax = min(float(h_off + height - 1.0), float(image_height))
-    cross_width = cross_xmax - cross_xmin
-    cross_height = cross_ymax - cross_ymin
-
-    roi_xmin = 0 if w_off >= 0 else abs(w_off)
-    roi_ymin = 0 if h_off >= 0 else abs(h_off)
-    roi_width = cross_width
-    roi_height = cross_height
-
-    roi_y1 = int(roi_ymin)
-    roi_y2 = int(roi_ymin + roi_height)
-    roi_x1 = int(roi_xmin)
-    roi_x2 = int(roi_xmin + roi_width)
-
-    cross_y1 = int(cross_ymin)
-    cross_y2 = int(cross_ymin + cross_height)
-    cross_x1 = int(cross_xmin)
-    cross_x2 = int(cross_xmin + cross_width)
-
-    sample_img = np.zeros((height, width, 3))
-    sample_img[roi_y1: roi_y2, roi_x1: roi_x2] = \
-        img[cross_y1: cross_y2, cross_x1: cross_x2]
-
-    sample_img = cv2.resize(
-        sample_img, (target_size, target_size), interpolation=cv2.INTER_AREA)
-
-    return sample_img
+
+def crop_image(img, crop):
+    x1, y1, x2, y2 = crop
+    return img[y1:y2, x1:x2, :]
+
+
+def crop_segms(segms, valid_ids, crop, height, width):
+    def _crop_poly(segm, crop):
+        xmin, ymin, xmax, ymax = crop
+        crop_coord = [xmin, ymin, xmin, ymax, xmax, ymax, xmax, ymin]
+        crop_p = np.array(crop_coord).reshape(4, 2)
+        crop_p = Polygon(crop_p)
+
+        crop_segm = list()
+        for poly in segm:
+            poly = np.array(poly).reshape(len(poly) // 2, 2)
+            polygon = Polygon(poly)
+            if not polygon.is_valid:
+                exterior = polygon.exterior
+                multi_lines = exterior.intersection(exterior)
+                polygons = shapely.ops.polygonize(multi_lines)
+                polygon = MultiPolygon(polygons)
+            multi_polygon = list()
+            if isinstance(polygon, MultiPolygon):
+                multi_polygon = copy.deepcopy(polygon)
+            else:
+                multi_polygon.append(copy.deepcopy(polygon))
+            for per_polygon in multi_polygon:
+                inter = per_polygon.intersection(crop_p)
+                if not inter:
+                    continue
+                if isinstance(inter, (MultiPolygon, GeometryCollection)):
+                    for part in inter:
+                        if not isinstance(part, Polygon):
+                            continue
+                        part = np.squeeze(
+                            np.array(part.exterior.coords[:-1]).reshape(1, -1))
+                        part[0::2] -= xmin
+                        part[1::2] -= ymin
+                        crop_segm.append(part.tolist())
+                elif isinstance(inter, Polygon):
+                    crop_poly = np.squeeze(
+                        np.array(inter.exterior.coords[:-1]).reshape(1, -1))
+                    crop_poly[0::2] -= xmin
+                    crop_poly[1::2] -= ymin
+                    crop_segm.append(crop_poly.tolist())
+                else:
+                    continue
+        return crop_segm
+
+    def _crop_rle(rle, crop, height, width):
+        if 'counts' in rle and type(rle['counts']) == list:
+            rle = mask_util.frPyObjects(rle, height, width)
+        mask = mask_util.decode(rle)
+        mask = mask[crop[1]:crop[3], crop[0]:crop[2]]
+        rle = mask_util.encode(np.array(mask, order='F', dtype=np.uint8))
+        return rle
+
+    crop_segms = []
+    for id in valid_ids:
+        segm = segms[id]
+        if is_poly(segm):
+            import copy
+            import shapely.ops
+            import logging
+            from shapely.geometry import Polygon, MultiPolygon, GeometryCollection
+            logging.getLogger("shapely").setLevel(logging.WARNING)
+            # Polygon format
+            crop_segms.append(_crop_poly(segm, crop))
+        else:
+            # RLE format
+            import pycocotools.mask as mask_util
+            crop_segms.append(_crop_rle(segm, crop, height, width))
+    return crop_segms
+
+
+def expand_segms(segms, x, y, height, width, ratio):
+    def _expand_poly(poly, x, y):
+        expanded_poly = np.array(poly)
+        expanded_poly[0::2] += x
+        expanded_poly[1::2] += y
+        return expanded_poly.tolist()
+
+    def _expand_rle(rle, x, y, height, width, ratio):
+        if 'counts' in rle and type(rle['counts']) == list:
+            rle = mask_util.frPyObjects(rle, height, width)
+        mask = mask_util.decode(rle)
+        expanded_mask = np.full((int(height * ratio), int(width * ratio)),
+                                0).astype(mask.dtype)
+        expanded_mask[y:y + height, x:x + width] = mask
+        rle = mask_util.encode(
+            np.array(expanded_mask, order='F', dtype=np.uint8))
+        return rle
+
+    expanded_segms = []
+    for segm in segms:
+        if is_poly(segm):
+            # Polygon format
+            expanded_segms.append([_expand_poly(poly, x, y) for poly in segm])
+        else:
+            # RLE format
+            import pycocotools.mask as mask_util
+            expanded_segms.append(
+                _expand_rle(segm, x, y, height, width, ratio))
+    return expanded_segms
 
 
 def box_horizontal_flip(bboxes, width):
@@ -409,15 +207,10 @@ def segms_horizontal_flip(segms, height, width):
         if 'counts' in rle and type(rle['counts']) == list:
             rle = mask_util.frPyObjects([rle], height, width)
         mask = mask_util.decode(rle)
-        mask = mask[:, ::-1, :]
+        mask = mask[:, ::-1]
         rle = mask_util.encode(np.array(mask, order='F', dtype=np.uint8))
         return rle
 
-    def is_poly(segm):
-        if not isinstance(segm, (list, dict)):
-            raise Exception("Invalid segm type: {}".format(type(segm)))
-        return isinstance(segm, list)
-
     flipped_segms = []
     for segm in segms:
         if is_poly(segm):

+ 0 - 3
paddlex/cv/transforms/cls_transforms.py

@@ -385,15 +385,12 @@ class RandomDistort:
             'saturation': self.saturation_prob,
             'hue': self.hue_prob,
         }
-        im = im.astype('uint8')
-        im = Image.fromarray(im)
         for id in range(len(ops)):
             params = params_dict[ops[id].__name__]
             prob = prob_dict[ops[id].__name__]
             params['im'] = im
             if np.random.uniform(0, 1) < prob:
                 im = ops[id](**params)
-        im = np.asarray(im).astype('float32')
         if label is None:
             return (im, )
         else:

+ 162 - 166
paddlex/cv/transforms/det_transforms.py

@@ -12,13 +12,20 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-from .ops import *
-from .box_utils import *
+try:
+    from collections.abc import Sequence
+except Exception:
+    from collections import Sequence
+
 import random
 import os.path as osp
 import numpy as np
-from PIL import Image, ImageEnhance
+
 import cv2
+from PIL import Image, ImageEnhance
+
+from .ops import *
+from .box_utils import *
 
 
 class Compose:
@@ -81,7 +88,7 @@ class Compose:
                 im = cv2.imread(im_file).astype('float32')
             except:
                 raise TypeError(
-                   'Can\'t read The image file {}!'.format(im_file))
+                    'Can\'t read The image file {}!'.format(im_file))
             im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
             # make default im_info with [h, w, 1]
             im_info['im_resize_info'] = np.array(
@@ -551,15 +558,13 @@ class RandomDistort:
             'saturation': self.saturation_prob,
             'hue': self.hue_prob
         }
-        im = im.astype('uint8')
-        im = Image.fromarray(im)
         for id in range(4):
             params = params_dict[ops[id].__name__]
             prob = prob_dict[ops[id].__name__]
             params['im'] = im
+            
             if np.random.uniform(0, 1) < prob:
                 im = ops[id](**params)
-        im = np.asarray(im).astype('float32')
         if label_info is None:
             return (im, im_info)
         else:
@@ -608,7 +613,7 @@ class MixupImage:
             img1.astype('float32') * factor
         img[:img2.shape[0], :img2.shape[1], :] += \
             img2.astype('float32') * (1.0 - factor)
-        return img.astype('uint8')
+        return img.astype('float32')
 
     def __call__(self, im, im_info=None, label_info=None):
         """
@@ -675,9 +680,17 @@ class MixupImage:
         gt_score2 = im_info['mixup'][2]['gt_score']
         gt_score = np.concatenate(
             (gt_score1 * factor, gt_score2 * (1. - factor)), axis=0)
+        if 'gt_poly' in label_info:
+            gt_poly1 = label_info['gt_poly']
+            gt_poly2 = im_info['mixup'][2]['gt_poly']
+            label_info['gt_poly'] = gt_poly1 + gt_poly2
+        is_crowd1 = label_info['is_crowd']
+        is_crowd2 = im_info['mixup'][2]['is_crowd']
+        is_crowd = np.concatenate((is_crowd1, is_crowd2), axis=0)
         label_info['gt_bbox'] = gt_bbox
         label_info['gt_score'] = gt_score
         label_info['gt_class'] = gt_class
+        label_info['is_crowd'] = is_crowd
         im_info['augment_shape'] = np.array([im.shape[0],
                                              im.shape[1]]).astype('int32')
         im_info.pop('mixup')
@@ -689,23 +702,30 @@ class MixupImage:
 
 class RandomExpand:
     """随机扩张图像,模型训练时的数据增强操作。
-
     1. 随机选取扩张比例(扩张比例大于1时才进行扩张)。
     2. 计算扩张后图像大小。
-    3. 初始化像素值为数据集均值的图像,并将原图像随机粘贴于该图像上。
+    3. 初始化像素值为输入填充值的图像,并将原图像随机粘贴于该图像上。
     4. 根据原图像粘贴位置换算出扩张后真实标注框的位置坐标。
-
+    5. 根据原图像粘贴位置换算出扩张后真实分割区域的位置坐标。
     Args:
-        max_ratio (float): 图像扩张的最大比例。默认为4.0。
+        ratio (float): 图像扩张的最大比例。默认为4.0。
         prob (float): 随机扩张的概率。默认为0.5。
-        mean (list): 图像数据集的均值(0-255)。默认为[127.5, 127.5, 127.5]。
-
+        fill_value (list): 扩张图像的初始填充值(0-255)。默认为[123.675, 116.28, 103.53]。
     """
 
-    def __init__(self, max_ratio=4., prob=0.5, mean=[127.5, 127.5, 127.5]):
-        self.max_ratio = max_ratio
-        self.mean = mean
+    def __init__(self,
+                 ratio=4.,
+                 prob=0.5,
+                 fill_value=[123.675, 116.28, 103.53]):
+        super(RandomExpand, self).__init__()
+        assert ratio > 1.01, "expand ratio must be larger than 1.01"
+        self.ratio = ratio
         self.prob = prob
+        assert isinstance(fill_value, Sequence), \
+            "fill value must be sequence"
+        if not isinstance(fill_value, tuple):
+            fill_value = tuple(fill_value)
+        self.fill_value = fill_value
 
     def __call__(self, im, im_info=None, label_info=None):
         """
@@ -713,7 +733,6 @@ class RandomExpand:
             im (np.ndarray): 图像np.ndarray数据。
             im_info (dict, 可选): 存储与图像相关的信息。
             label_info (dict, 可选): 存储与标注框相关的信息。
-
         Returns:
             tuple: 当label_info为空时,返回的tuple为(im, im_info),分别对应图像np.ndarray数据、存储与图像相关信息的字典;
                    当label_info不为空时,返回的tuple为(im, im_info, label_info),分别对应图像np.ndarray数据、
@@ -725,7 +744,6 @@ class RandomExpand:
                                           其中n代表真实标注框的个数。
                        - gt_class (np.ndarray): 随机扩张后每个真实标注框对应的类别序号,形状为(n, 1),
                                            其中n代表真实标注框的个数。
-
         Raises:
             TypeError: 形参数据类型不满足需求。
         """
@@ -740,108 +758,68 @@ class RandomExpand:
                 'gt_class' not in label_info:
             raise TypeError('Cannot do RandomExpand! ' + \
                             'Becasuse gt_bbox/gt_class is not in label_info!')
-        prob = np.random.uniform(0, 1)
+        if np.random.uniform(0., 1.) < self.prob:
+            return (im, im_info, label_info)
+
         augment_shape = im_info['augment_shape']
-        im_width = augment_shape[1]
-        im_height = augment_shape[0]
-        gt_bbox = label_info['gt_bbox']
-        gt_class = label_info['gt_class']
-
-        if prob < self.prob:
-            if self.max_ratio - 1 >= 0.01:
-                expand_ratio = np.random.uniform(1, self.max_ratio)
-                height = int(im_height * expand_ratio)
-                width = int(im_width * expand_ratio)
-                h_off = math.floor(np.random.uniform(0, height - im_height))
-                w_off = math.floor(np.random.uniform(0, width - im_width))
-                expand_bbox = [
-                    -w_off / im_width, -h_off / im_height,
-                    (width - w_off) / im_width, (height - h_off) / im_height
-                ]
-                expand_im = np.ones((height, width, 3))
-                expand_im = np.uint8(expand_im * np.squeeze(self.mean))
-                expand_im = Image.fromarray(expand_im)
-                im = im.astype('uint8')
-                im = Image.fromarray(im)
-                expand_im.paste(im, (int(w_off), int(h_off)))
-                expand_im = np.asarray(expand_im)
-                for i in range(gt_bbox.shape[0]):
-                    gt_bbox[i][0] = gt_bbox[i][0] / im_width
-                    gt_bbox[i][1] = gt_bbox[i][1] / im_height
-                    gt_bbox[i][2] = gt_bbox[i][2] / im_width
-                    gt_bbox[i][3] = gt_bbox[i][3] / im_height
-                gt_bbox, gt_class, _ = filter_and_process(
-                    expand_bbox, gt_bbox, gt_class)
-                for i in range(gt_bbox.shape[0]):
-                    gt_bbox[i][0] = gt_bbox[i][0] * width
-                    gt_bbox[i][1] = gt_bbox[i][1] * height
-                    gt_bbox[i][2] = gt_bbox[i][2] * width
-                    gt_bbox[i][3] = gt_bbox[i][3] * height
-                im = expand_im.astype('float32')
-                label_info['gt_bbox'] = gt_bbox
-                label_info['gt_class'] = gt_class
-                im_info['augment_shape'] = np.array([height,
-                                                     width]).astype('int32')
-        if label_info is None:
-            return (im, im_info)
-        else:
+        height = int(augment_shape[0])
+        width = int(augment_shape[1])
+
+        expand_ratio = np.random.uniform(1., self.ratio)
+        h = int(height * expand_ratio)
+        w = int(width * expand_ratio)
+        if not h > height or not w > width:
             return (im, im_info, label_info)
+        y = np.random.randint(0, h - height)
+        x = np.random.randint(0, w - width)
+        canvas = np.ones((h, w, 3), dtype=np.float32)
+        canvas *= np.array(self.fill_value, dtype=np.float32)
+        canvas[y:y + height, x:x + width, :] = im
+
+        im_info['augment_shape'] = np.array([h, w]).astype('int32')
+        if 'gt_bbox' in label_info and len(label_info['gt_bbox']) > 0:
+            label_info['gt_bbox'] += np.array([x, y] * 2, dtype=np.float32)
+        if 'gt_poly' in label_info and len(label_info['gt_poly']) > 0:
+            label_info['gt_poly'] = expand_segms(label_info['gt_poly'], x, y,
+                                                 height, width, expand_ratio)
+        return (canvas, im_info, label_info)
 
 
 class RandomCrop:
     """随机裁剪图像。
-
-    1. 根据batch_sampler计算获取裁剪候选区域的位置。
-        (1) 根据min scale、max scale、min aspect ratio、max aspect ratio计算随机剪裁的高、宽。
-        (2) 根据随机剪裁的高、宽随机选取剪裁的起始点。
-        (3) 筛选出裁剪候选区域:
-            - 当satisfy_all为True时,需所有真实标注框与裁剪候选区域的重叠度满足需求时,该裁剪候选区域才可保留。
-            - 当satisfy_all为False时,当有一个真实标注框与裁剪候选区域的重叠度满足需求时,该裁剪候选区域就可保留。
-    2. 遍历所有裁剪候选区域:
-        (1) 若真实标注框与候选裁剪区域不重叠,或其中心点不在候选裁剪区域,
-            则将该真实标注框去除。
-        (2) 计算相对于该候选裁剪区域,真实标注框的位置,并筛选出对应的类别、混合得分。
-        (3) 若avoid_no_bbox为False,返回当前裁剪后的信息即可;
-            反之,要找到一个裁剪区域中真实标注框个数不为0的区域,才返回裁剪后的信息。
+    1. 若allow_no_crop为True,则在thresholds加入’no_crop’。
+    2. 随机打乱thresholds。
+    3. 遍历thresholds中各元素:
+        (1) 如果当前thresh为’no_crop’,则返回原始图像和标注信息。
+        (2) 随机取出aspect_ratio和scaling中的值并由此计算出候选裁剪区域的高、宽、起始点。
+        (3) 计算真实标注框与候选裁剪区域IoU,若全部真实标注框的IoU都小于thresh,则继续第3步。
+        (4) 如果cover_all_box为True且存在真实标注框的IoU小于thresh,则继续第3步。
+        (5) 筛选出位于候选裁剪区域内的真实标注框,若有效框的个数为0,则继续第3步,否则进行第4步。
+    4. 换算有效真值标注框相对候选裁剪区域的位置坐标。
+    5. 换算有效分割区域相对候选裁剪区域的位置坐标。
 
     Args:
-        batch_sampler (list): 随机裁剪参数的多种组合,每种组合包含8个值,如下:
-            - max sample (int):满足当前组合的裁剪区域的个数上限。
-            - max trial (int): 查找满足当前组合的次数。
-            - min scale (float): 裁剪面积相对原面积,每条边缩短比例的最小限制。
-            - max scale (float): 裁剪面积相对原面积,每条边缩短比例的最大限制。
-            - min aspect ratio (float): 裁剪后短边缩放比例的最小限制。
-            - max aspect ratio (float): 裁剪后短边缩放比例的最大限制。
-            - min overlap (float): 真实标注框与裁剪图像重叠面积的最小限制。
-            - max overlap (float): 真实标注框与裁剪图像重叠面积的最大限制。
-            默认值为None,当为None时采用如下设置:
-                [[1, 1, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0],
-                 [1, 50, 0.3, 1.0, 0.5, 2.0, 0.1, 1.0],
-                 [1, 50, 0.3, 1.0, 0.5, 2.0, 0.3, 1.0],
-                 [1, 50, 0.3, 1.0, 0.5, 2.0, 0.5, 1.0],
-                 [1, 50, 0.3, 1.0, 0.5, 2.0, 0.7, 1.0],
-                 [1, 50, 0.3, 1.0, 0.5, 2.0, 0.9, 1.0],
-                 [1, 50, 0.3, 1.0, 0.5, 2.0, 0.0, 1.0]]
-        satisfy_all (bool): 是否需要所有标注框满足条件,裁剪候选区域才保留。默认为False。
-        avoid_no_bbox (bool): 是否对裁剪图像不存在标注框的图像进行保留。默认为True。
-
+        aspect_ratio (list): 裁剪后短边缩放比例的取值范围,以[min, max]形式表示。默认值为[.5, 2.]。
+        thresholds (list): 判断裁剪候选区域是否有效所需的IoU阈值取值列表。默认值为[.0, .1, .3, .5, .7, .9]。
+        scaling (list): 裁剪面积相对原面积的取值范围,以[min, max]形式表示。默认值为[.3, 1.]。
+        num_attempts (int): 在放弃寻找有效裁剪区域前尝试的次数。默认值为50。
+        allow_no_crop (bool): 是否允许未进行裁剪。默认值为True。
+        cover_all_box (bool): 是否要求所有的真实标注框都必须在裁剪区域内。默认值为False。
     """
 
     def __init__(self,
-                 batch_sampler=None,
-                 satisfy_all=False,
-                 avoid_no_bbox=True):
-        if batch_sampler is None:
-            batch_sampler = [[1, 1, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0],
-                             [1, 50, 0.3, 1.0, 0.5, 2.0, 0.1, 1.0],
-                             [1, 50, 0.3, 1.0, 0.5, 2.0, 0.3, 1.0],
-                             [1, 50, 0.3, 1.0, 0.5, 2.0, 0.5, 1.0],
-                             [1, 50, 0.3, 1.0, 0.5, 2.0, 0.7, 1.0],
-                             [1, 50, 0.3, 1.0, 0.5, 2.0, 0.9, 1.0],
-                             [1, 50, 0.3, 1.0, 0.5, 2.0, 0.0, 1.0]]
-        self.batch_sampler = batch_sampler
-        self.satisfy_all = satisfy_all
-        self.avoid_no_bbox = avoid_no_bbox
+                 aspect_ratio=[.5, 2.],
+                 thresholds=[.0, .1, .3, .5, .7, .9],
+                 scaling=[.3, 1.],
+                 num_attempts=50,
+                 allow_no_crop=True,
+                 cover_all_box=False):
+        self.aspect_ratio = aspect_ratio
+        self.thresholds = thresholds
+        self.scaling = scaling
+        self.num_attempts = num_attempts
+        self.allow_no_crop = allow_no_crop
+        self.cover_all_box = cover_all_box
 
     def __call__(self, im, im_info=None, label_info=None):
         """
@@ -876,66 +854,84 @@ class RandomCrop:
                 'gt_class' not in label_info:
             raise TypeError('Cannot do RandomCrop! ' + \
                             'Becasuse gt_bbox/gt_class is not in label_info!')
+
+        if len(label_info['gt_bbox']) == 0:
+            return (im, im_info, label_info)
+
         augment_shape = im_info['augment_shape']
-        im_width = augment_shape[1]
-        im_height = augment_shape[0]
+        w = augment_shape[1]
+        h = augment_shape[0]
         gt_bbox = label_info['gt_bbox']
-        gt_bbox_tmp = gt_bbox.copy()
-        for i in range(gt_bbox_tmp.shape[0]):
-            gt_bbox_tmp[i][0] = gt_bbox[i][0] / im_width
-            gt_bbox_tmp[i][1] = gt_bbox[i][1] / im_height
-            gt_bbox_tmp[i][2] = gt_bbox[i][2] / im_width
-            gt_bbox_tmp[i][3] = gt_bbox[i][3] / im_height
-        gt_class = label_info['gt_class']
-
-        gt_score = None
-        if 'gt_score' in label_info:
-            gt_score = label_info['gt_score']
-        sampled_bbox = []
-        gt_bbox_tmp = gt_bbox_tmp.tolist()
-        for sampler in self.batch_sampler:
-            found = 0
-            for i in range(sampler[1]):
-                if found >= sampler[0]:
-                    break
-                sample_bbox = generate_sample_bbox(sampler)
-                if satisfy_sample_constraint(sampler, sample_bbox, gt_bbox_tmp,
-                                             self.satisfy_all):
-                    sampled_bbox.append(sample_bbox)
-                    found = found + 1
-        im = np.array(im)
-        while sampled_bbox:
-            idx = int(np.random.uniform(0, len(sampled_bbox)))
-            sample_bbox = sampled_bbox.pop(idx)
-            sample_bbox = clip_bbox(sample_bbox)
-            crop_bbox, crop_class, crop_score = \
-                filter_and_process(sample_bbox, gt_bbox_tmp, gt_class, gt_score)
-            if self.avoid_no_bbox:
-                if len(crop_bbox) < 1:
+        thresholds = list(self.thresholds)
+        if self.allow_no_crop:
+            thresholds.append('no_crop')
+        np.random.shuffle(thresholds)
+
+        for thresh in thresholds:
+            if thresh == 'no_crop':
+                return (im, im_info, label_info)
+
+            found = False
+            for i in range(self.num_attempts):
+                scale = np.random.uniform(*self.scaling)
+                min_ar, max_ar = self.aspect_ratio
+                aspect_ratio = np.random.uniform(
+                    max(min_ar, scale**2), min(max_ar, scale**-2))
+                crop_h = int(h * scale / np.sqrt(aspect_ratio))
+                crop_w = int(w * scale * np.sqrt(aspect_ratio))
+                crop_y = np.random.randint(0, h - crop_h)
+                crop_x = np.random.randint(0, w - crop_w)
+                crop_box = [crop_x, crop_y, crop_x + crop_w, crop_y + crop_h]
+                iou = iou_matrix(gt_bbox, np.array([crop_box],
+                                                   dtype=np.float32))
+                if iou.max() < thresh:
                     continue
-            xmin = int(sample_bbox[0] * im_width)
-            xmax = int(sample_bbox[2] * im_width)
-            ymin = int(sample_bbox[1] * im_height)
-            ymax = int(sample_bbox[3] * im_height)
-            im = im[ymin:ymax, xmin:xmax]
-            for i in range(crop_bbox.shape[0]):
-                crop_bbox[i][0] = crop_bbox[i][0] * (xmax - xmin)
-                crop_bbox[i][1] = crop_bbox[i][1] * (ymax - ymin)
-                crop_bbox[i][2] = crop_bbox[i][2] * (xmax - xmin)
-                crop_bbox[i][3] = crop_bbox[i][3] * (ymax - ymin)
-            label_info['gt_bbox'] = crop_bbox
-            label_info['gt_class'] = crop_class
-            label_info['gt_score'] = crop_score
-            im_info['augment_shape'] = np.array([ymax - ymin,
-                                                 xmax - xmin]).astype('int32')
-            if label_info is None:
-                return (im, im_info)
-            else:
+
+                if self.cover_all_box and iou.min() < thresh:
+                    continue
+
+                cropped_box, valid_ids = crop_box_with_center_constraint(
+                    gt_bbox, np.array(crop_box, dtype=np.float32))
+                if valid_ids.size > 0:
+                    found = True
+                    break
+
+            if found:
+                if 'gt_poly' in label_info and len(label_info['gt_poly']) > 0:
+                    crop_polys = crop_segms(label_info['gt_poly'], valid_ids,
+                                            np.array(crop_box, dtype=np.int64),
+                                            h, w)
+                    if [] in crop_polys:
+                        delete_id = list()
+                        valid_polys = list()
+                        for id, crop_poly in enumerate(crop_polys):
+                            if crop_poly == []:
+                                delete_id.append(id)
+                            else:
+                                valid_polys.append(crop_poly)
+                        valid_ids = np.delete(valid_ids, delete_id)
+                        if len(valid_polys) == 0:
+                            return (im, im_info, label_info)
+                        label_info['gt_poly'] = valid_polys
+                    else:
+                        label_info['gt_poly'] = crop_polys
+                im = crop_image(im, crop_box)
+                label_info['gt_bbox'] = np.take(cropped_box, valid_ids, axis=0)
+                label_info['gt_class'] = np.take(
+                    label_info['gt_class'], valid_ids, axis=0)
+                im_info['augment_shape'] = np.array(
+                    [crop_box[3] - crop_box[1],
+                     crop_box[2] - crop_box[0]]).astype('int32')
+                if 'gt_score' in label_info:
+                    label_info['gt_score'] = np.take(
+                        label_info['gt_score'], valid_ids, axis=0)
+
+                if 'is_crowd' in label_info:
+                    label_info['is_crowd'] = np.take(
+                        label_info['is_crowd'], valid_ids, axis=0)
                 return (im, im_info, label_info)
-        if label_info is None:
-            return (im, im_info)
-        else:
-            return (im, im_info, label_info)
+
+        return (im, im_info, label_info)
 
 
 class ArrangeFasterRCNN:

+ 24 - 15
paddlex/cv/transforms/ops.py

@@ -111,32 +111,41 @@ def bgr2rgb(im):
     return im[:, :, ::-1]
 
 
-def brightness(im, brightness_lower, brightness_upper):
-    brightness_delta = np.random.uniform(brightness_lower, brightness_upper)
-    im = ImageEnhance.Brightness(im).enhance(brightness_delta)
+def hue(im, hue_lower, hue_upper):
+    delta = np.random.uniform(hue_lower, hue_upper)
+    u = np.cos(delta * np.pi)
+    w = np.sin(delta * np.pi)
+    bt = np.array([[1.0, 0.0, 0.0], [0.0, u, -w], [0.0, w, u]])
+    tyiq = np.array([[0.299, 0.587, 0.114], [0.596, -0.274, -0.321],
+                     [0.211, -0.523, 0.311]])
+    ityiq = np.array([[1.0, 0.956, 0.621], [1.0, -0.272, -0.647],
+                      [1.0, -1.107, 1.705]])
+    t = np.dot(np.dot(ityiq, bt), tyiq).T
+    im = np.dot(im, t)
     return im
 
 
-def contrast(im, contrast_lower, contrast_upper):
-    contrast_delta = np.random.uniform(contrast_lower, contrast_upper)
-    im = ImageEnhance.Contrast(im).enhance(contrast_delta)
+def saturation(im, saturation_lower, saturation_upper):
+    delta = np.random.uniform(saturation_lower, saturation_upper)
+    gray = im * np.array([[[0.299, 0.587, 0.114]]], dtype=np.float32)
+    gray = gray.sum(axis=2, keepdims=True)
+    gray *= (1.0 - delta)
+    im *= delta
+    im += gray
     return im
 
 
-def saturation(im, saturation_lower, saturation_upper):
-    saturation_delta = np.random.uniform(saturation_lower, saturation_upper)
-    im = ImageEnhance.Color(im).enhance(saturation_delta)
+def contrast(im, contrast_lower, contrast_upper):
+    delta = np.random.uniform(contrast_lower, contrast_upper)
+    im *= delta
     return im
 
 
-def hue(im, hue_lower, hue_upper):
-    hue_delta = np.random.uniform(hue_lower, hue_upper)
-    im = np.array(im.convert('HSV'))
-    im[:, :, 0] = im[:, :, 0] + hue_delta
-    im = Image.fromarray(im, mode='HSV').convert('RGB')
+def brightness(im, brightness_lower, brightness_upper):
+    delta = np.random.uniform(brightness_lower, brightness_upper)
+    im += delta
     return im
 
-
 def rotate(im, rotate_lower, rotate_upper):
     rotate_delta = np.random.uniform(rotate_lower, rotate_upper)
     im = im.rotate(int(rotate_delta))

+ 0 - 3
paddlex/cv/transforms/seg_transforms.py

@@ -959,15 +959,12 @@ class RandomDistort:
             'saturation': self.saturation_prob,
             'hue': self.hue_prob
         }
-        im = im.astype('uint8')
-        im = Image.fromarray(im)
         for id in range(4):
             params = params_dict[ops[id].__name__]
             prob = prob_dict[ops[id].__name__]
             params['im'] = im
             if np.random.uniform(0, 1) < prob:
                 im = ops[id](**params)
-        im = np.asarray(im).astype('float32')
         if label is None:
             return (im, im_info)
         else:

+ 1 - 0
paddlex/det.py

@@ -20,3 +20,4 @@ YOLOv3 = cv.models.YOLOv3
 MaskRCNN = cv.models.MaskRCNN
 transforms = cv.transforms.det_transforms
 visualize = cv.models.utils.visualize.visualize_detection
+draw_pr_curve = cv.models.utils.visualize.draw_pr_curve

+ 1 - 0
requirements.txt

@@ -6,3 +6,4 @@ cython
 pycocotools
 visualdl=1.3.0
 paddleslim=1.0.1
+shapely

+ 1 - 1
setup.py

@@ -19,7 +19,7 @@ long_description = "PaddleX. A end-to-end deeplearning model development toolkit
 
 setuptools.setup(
     name="paddlex",
-    version='0.1.5',
+    version='0.1.6',
     author="paddlex",
     author_email="paddlex@baidu.com",
     description=long_description,