Procházet zdrojové kódy

support keypoint detection pipeline (#2818)

* support keypoint pipeline

* support ema for pp-tinypose train
学卿 před 10 měsíci
rodič
revize
193c661e87
40 změnil soubory, kde provedl 2930 přidání a 10 odebrání
  1. 1 0
      docs/data_annotations/cv_modules/keypoint_detection.en.md
  2. 1 0
      docs/data_annotations/cv_modules/keypoint_detection.md
  3. 8 1
      docs/module_usage/tutorials/cv_modules/human_detection.en.md
  4. 9 0
      docs/module_usage/tutorials/cv_modules/human_detection.md
  5. 3 0
      docs/module_usage/tutorials/cv_modules/keypoint_detection.en.md
  6. 283 0
      docs/module_usage/tutorials/cv_modules/keypoint_detection.md
  7. 3 0
      docs/pipeline_usage/tutorials/cv_pipelines/human_keypoint_detection.en.md
  8. 812 0
      docs/pipeline_usage/tutorials/cv_pipelines/human_keypoint_detection.md
  9. 9 0
      docs/support_list/models_list.en.md
  10. 9 0
      docs/support_list/models_list.md
  11. 40 0
      paddlex/configs/modules/keypoint_detection/PP-TinyPose_128x96.yaml
  12. 40 0
      paddlex/configs/modules/keypoint_detection/PP-TinyPose_256x192.yaml
  13. 17 0
      paddlex/configs/pipelines/human_keypoint_detection.yaml
  14. 1 0
      paddlex/inference/models_new/__init__.py
  15. 15 0
      paddlex/inference/models_new/keypoint_detection/__init__.py
  16. 188 0
      paddlex/inference/models_new/keypoint_detection/predictor.py
  17. 359 0
      paddlex/inference/models_new/keypoint_detection/processors.py
  18. 177 0
      paddlex/inference/models_new/keypoint_detection/result.py
  19. 2 2
      paddlex/inference/models_new/object_detection/predictor.py
  20. 13 1
      paddlex/inference/models_new/object_detection/processors.py
  21. 1 0
      paddlex/inference/pipelines_new/__init__.py
  22. 15 0
      paddlex/inference/pipelines_new/keypoint_detection/__init__.py
  23. 135 0
      paddlex/inference/pipelines_new/keypoint_detection/pipeline.py
  24. 2 0
      paddlex/inference/utils/official_models.py
  25. 7 0
      paddlex/modules/__init__.py
  26. 18 0
      paddlex/modules/keypoint_detection/__init__.py
  27. 56 0
      paddlex/modules/keypoint_detection/dataset_checker/__init__.py
  28. 15 0
      paddlex/modules/keypoint_detection/dataset_checker/dataset_src/__init__.py
  29. 86 0
      paddlex/modules/keypoint_detection/dataset_checker/dataset_src/check_dataset.py
  30. 13 0
      paddlex/modules/keypoint_detection/dataset_checker/dataset_src/utils/__init__.py
  31. 119 0
      paddlex/modules/keypoint_detection/dataset_checker/dataset_src/utils/visualizer.py
  32. 41 0
      paddlex/modules/keypoint_detection/evaluator.py
  33. 22 0
      paddlex/modules/keypoint_detection/exportor.py
  34. 16 0
      paddlex/modules/keypoint_detection/model_list.py
  35. 39 0
      paddlex/modules/keypoint_detection/trainer.py
  36. 10 3
      paddlex/modules/object_detection/trainer.py
  37. 151 0
      paddlex/repo_apis/PaddleDetection_api/configs/PP-TinyPose_128x96.yaml
  38. 148 0
      paddlex/repo_apis/PaddleDetection_api/configs/PP-TinyPose_256x192.yaml
  39. 16 0
      paddlex/repo_apis/PaddleDetection_api/object_det/config.py
  40. 30 3
      paddlex/repo_apis/PaddleDetection_api/object_det/register.py

+ 1 - 0
docs/data_annotations/cv_modules/keypoint_detection.en.md

@@ -0,0 +1 @@
+Coming soon...

+ 1 - 0
docs/data_annotations/cv_modules/keypoint_detection.md

@@ -0,0 +1 @@
+Coming soon...

+ 8 - 1
docs/module_usage/tutorials/cv_modules/human_detection.en.md

@@ -242,4 +242,11 @@ Similar to model training and evaluation, the following steps are required:
 Other related parameters can be set by modifying the fields under `Global` and `Predict` in the `.yaml` configuration file. For details, please refer to [PaddleX Common Model Configuration File Parameter Description](../../instructions/config_parameters_common.en.md).
 
 #### 4.4.2 Model Integration
-The weights you produce can be directly integrated into the human detection module. You can refer to the Python sample code in [Quick Integration](#iii-quick-integration) and simply replace the model with the path to your trained model.
+
+1. **Pipeline Integration**
+
+The pedestrian detection module can be integrated into PaddleX pipelines such as [**Human Keypoint Detection**](../../../pipeline_usage/tutorials/cv_pipelines/human_keypoint_detection.en.md). You can update the keypoint detection module in the production line simply by replacing the model path. In production line integration, you can deploy your models using high-performance deployment and service deployment methods.
+
+2. **Module Integration**
+
+The weights you produce can be directly integrated into the pedestrian detection module. You can refer to the Python example code in [Quick Integration](#iii-quick-integration). Simply replace the model with the path to your trained model to complete the integration.

+ 9 - 0
docs/module_usage/tutorials/cv_modules/human_detection.md

@@ -241,4 +241,13 @@ python main.py -c paddlex/configs/human_detection/PP-YOLOE-S_human.yaml \
 其他相关参数均可通过修改`.yaml`配置文件中的`Global`和`Predict`下的字段来进行设置,详细请参考[PaddleX通用模型配置文件参数说明](../../instructions/config_parameters_common.md)。
 
 #### 4.4.2 模型集成
+
+模型可以直接集成到 PaddleX 产线中,也可以直接集成到您自己的项目中。
+
+1.**产线集成**
+
+行人检测模块可以集成的PaddleX产线有[**人体关键点检测**](../../../pipeline_usage/tutorials/cv_pipelines/human_keypoint_detection.md),只需要替换模型路径即可完成相关产线的关键点检测模块的模型更新。在产线集成中,你可以使用高性能部署和服务化部署来部署你得到的模型。
+
+2.**模块集成**
+
 您产出的权重可以直接集成到行人检测模块中,可以参考[快速集成](#三快速集成)的 Python 示例代码,只需要将模型替换为你训练的到的模型路径即可。

+ 3 - 0
docs/module_usage/tutorials/cv_modules/keypoint_detection.en.md

@@ -0,0 +1,3 @@
+[简体中文](keypoint_detection.md) | English
+
+Coming soon...

+ 283 - 0
docs/module_usage/tutorials/cv_modules/keypoint_detection.md

@@ -0,0 +1,283 @@
+简体中文 | [English](keypoint_detection.en.md)
+
+# 关键点检测模块使用教程
+
+## 一、概述
+关键点检测是计算机视觉领域中的一个重要任务,旨在识别图像或视频中物体(如人体、面部、手势等)的特定关键点位置。通过检测这些关键点,可以实现对物体的姿态估计、动作识别、人机交互、动画生成等多种应用。关键点检测在增强现实、虚拟现实、运动捕捉等领域都有广泛的应用。
+
+关键点检测算法主要包括 Top-Down 和 Bottom-Up 两种方案。Top-Down 方案通常依赖一个目标检测算法识别出感兴趣物体的边界框,关键点检测模型的输入为经过裁剪的单个目标,输出为这个目标的关键点预测结果,模型的准确率会更高,但速度会随着对象数量的增加而变慢。不同的是,Bottom-Up 方法不依赖于先进行目标检测,而是直接对整个图像进行关键点检测,然后对这些点进行分组或连接以形成多个姿势实例,其速度是固定的,不会随着物体数量的增加而变慢,但精度会更低。
+
+## 二、支持模型列表
+
+<table>
+  <tr>
+    <th >模型</th>
+    <th >方案</th>
+    <th >输入尺寸</th>
+    <th >AP(0.5:0.95)</th>
+    <th >GPU推理耗时(ms)</th>
+    <th >CPU推理耗时 (ms)</th>
+    <th >模型存储大小(M)</th>
+    <th >介绍</th>
+  </tr>
+  <tr>
+    <td>PP-TinyPose_128x96</td>
+    <td>Top-Down</td>
+    <td>128*96</td>
+    <td>58.4</td>
+    <td></td>
+    <td></td>
+    <td>4.9</td>
+    <td rowspan="2">PP-TinyPose 是百度飞桨视觉团队自研的针对移动端设备优化的实时关键点检测模型,可流畅地在移动端设备上执行多人姿态估计任务</td>
+  </tr>
+  <tr>
+    <td>PP-TinyPose_256x192</td>
+    <td>Top-Down</td>
+    <td>128*96</td>
+    <td>68.3</td>
+    <td></td>
+    <td></td>
+    <td>4.9</td>
+  </tr>
+</table>
+
+**注:以上精度指标为COCO数据集 AP(0.5:0.95),所依赖的检测框为ground truth标注得到。所有模型 GPU 推理耗时基于 NVIDIA Tesla T4 机器,精度类型为 FP32, CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz,线程数为8,精度类型为 FP32。**
+
+
+## 三、快速集成
+> ❗ 在快速集成前,请先安装 PaddleX 的 wheel 包,详细请参考 [PaddleX本地安装教程](../../../installation/installation.md)
+
+完成wheel包的安装后,几行代码即可完成关键点检测模块的推理,可以任意切换该模块下的模型,您也可以将关键点检测模块中的模型推理集成到您的项目中。运行以下代码前,请您下载[示例图片](https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/keypoint_detection_002.jpg)到本地。
+
+```python
+from paddlex import create_model
+
+model_name = "PP-TinyPose_128x96"
+
+model = create_model(model_name)
+output = model.predict("keypoint_detection_002.jpg", batch_size=1)
+
+for res in output:
+    res.print(json_format=False)
+    res.save_to_img("./output/")
+    res.save_to_json("./output/res.json")
+
+```
+
+关于更多 PaddleX 的单模型推理的 API 的使用方法,可以参考[PaddleX单模型Python脚本使用说明](../../instructions/model_python_API.md)。
+
+
+## 四、二次开发
+如果你追求更高精度的现有模型,可以使用PaddleX的二次开发能力,开发更好的关键点检测模型。在使用PaddleX开发关键点检测模型之前,请务必安装PaddleX的PaddleDetection插件,安装过程可以参考 [PaddleX本地安装教程](../../../installation/installation.md)。
+
+### 4.1 数据准备
+在进行模型训练前,需要准备相应任务模块的数据集。PaddleX 针对每一个模块提供了数据校验功能,**只有通过数据校验的数据才可以进行模型训练**。此外,PaddleX为每一个模块都提供了demo数据集,您可以基于官方提供的 Demo 数据完成后续的开发。若您希望用私有数据集进行后续的模型训练,可以参考[PaddleX关键点检测任务模块数据标注教程](../../../data_annotations/cv_modules/keypoint_detection.md)。
+
+#### 4.1.1 Demo 数据下载
+您可以参考下面的命令将 Demo 数据集下载到指定文件夹:
+
+```bash
+cd /path/to/paddlex
+wget https://paddle-model-ecology.bj.bcebos.com/paddlex/data/keypoint_coco_examples.tar -P ./dataset
+tar -xf ./dataset/keypoint_coco_examples.tar -C ./dataset/
+```
+#### 4.1.2 数据校验
+一行命令即可完成数据校验:
+
+```bash
+python main.py -c paddlex/configs/keypoint_detection/PP-TinyPose_128x96.yaml \
+    -o Global.mode=check_dataset \
+    -o Global.dataset_dir=./dataset/keypoint_coco_examples
+```
+执行上述命令后,PaddleX 会对数据集进行校验,并统计数据集的基本信息,命令运行成功后会在log中打印出`Check dataset passed !`信息。校验结果文件保存在`./output/check_dataset_result.json`,同时相关产出会保存在当前目录的`./output/check_dataset`目录下,产出目录中包括可视化的示例样本图片和样本分布直方图。
+
+<details>
+  <summary>👉 <b>校验结果详情(点击展开)</b></summary>
+
+
+校验结果文件具体内容为:
+
+```bash
+{
+  "done_flag": true,
+  "check_pass": true,
+  "attributes": {
+    "num_classes": 1,
+    "train_samples": 500,
+    "train_sample_paths": [
+      "check_dataset/demo_img/000000560108.jpg",
+      "check_dataset/demo_img/000000434662.jpg",
+      "check_dataset/demo_img/000000540556.jpg",
+      ...
+    ],
+    "val_samples": 100,
+    "val_sample_paths": [
+      "check_dataset/demo_img/000000463730.jpg",
+      "check_dataset/demo_img/000000085329.jpg",
+      "check_dataset/demo_img/000000459153.jpg",
+      ...
+    ]
+  },
+  "analysis": {
+    "histogram": "check_dataset/histogram.png"
+  },
+  "dataset_path": "keypoint_coco_examples",
+  "show_type": "image",
+  "dataset_type": "KeypointTopDownCocoDetDataset"
+}
+```
+上述校验结果中,`check_pass` 为 `True` 表示数据集格式符合要求,其他部分指标的说明如下:
+
+* `attributes.num_classes`:该数据集类别数为 1;
+* `attributes.train_samples`:该数据集训练集样本数量为500;
+* `attributes.val_samples`:该数据集验证集样本数量为 100;
+* `attributes.train_sample_paths`:该数据集训练集样本可视化图片相对路径列表;
+* `attributes.val_sample_paths`:该数据集验证集样本可视化图片相对路径列表;
+
+
+数据集校验还对数据集中所有类别的样本数量分布情况进行了分析,并绘制了分布直方图(histogram.png):
+
+![](https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/modules/keypoint_det/01.png)
+</details>
+
+#### 4.1.3 数据集格式转换/数据集划分(可选)
+在您完成数据校验之后,可以通过**修改配置文件**或是**追加超参数**的方式对数据集的格式进行转换,也可以对数据集的训练/验证比例进行重新划分。
+
+<details>
+  <summary>👉 <b>格式转换/数据集划分详情(点击展开)</b></summary>
+
+
+
+**(1)数据集格式转换**
+
+关键点检测不支持数据格式转换。
+
+**(2)数据集划分**
+
+数据集划分的参数可以通过修改配置文件中 `CheckDataset` 下的字段进行设置,配置文件中部分参数的示例说明如下:
+
+* `CheckDataset`:
+  * `split`:
+    * `enable`: 是否进行重新划分数据集,为 `True` 时进行数据集格式转换,默认为 `False`;
+    * `train_percent`: 如果重新划分数据集,则需要设置训练集的百分比,类型为0-100之间的任意整数,需要保证与 `val_percent` 的值之和为100;
+
+
+例如,您想重新划分数据集为 训练集占比90%、验证集占比10%,则需将配置文件修改为:
+
+```bash
+......
+CheckDataset:
+  ......
+  split:
+    enable: True
+    train_percent: 90
+    val_percent: 10
+  ......
+```
+随后执行命令:
+
+```bash
+python main.py -c paddlex/configs/keypoint_detection/PP-TinyPose_128x96.yaml \
+    -o Global.mode=check_dataset \
+    -o Global.dataset_dir=./dataset/keypoint_coco_examples
+```
+数据划分执行之后,原有标注文件会被在原路径下重命名为 `xxx.bak`。
+
+以上参数同样支持通过追加命令行参数的方式进行设置:
+
+```bash
+python main.py -c paddlex/configs/keypoint_detection/PP-TinyPose_128x96.yaml  \
+    -o Global.mode=check_dataset \
+    -o Global.dataset_dir=./dataset/keypoint_coco_examples \
+    -o CheckDataset.split.enable=True \
+    -o CheckDataset.split.train_percent=90 \
+    -o CheckDataset.split.val_percent=10
+```
+</details>
+
+
+
+### 4.2 模型训练
+一条命令即可完成模型的训练,以此处`PP-TinyPose_128x96`的训练为例:
+
+```bash
+python main.py -c paddlex/configs/keypoint_detection/PP-TinyPose_128x96.yaml \
+    -o Global.mode=train \
+    -o Global.dataset_dir=./dataset/keypoint_coco_examples
+```
+需要如下几步:
+
+* 指定模型的`.yaml` 配置文件路径(此处为`PP-TinyPose_128x96.yaml`,训练其他模型时,需要的指定相应的配置文件,模型和配置的文件的对应关系,可以查阅[PaddleX模型列表(CPU/GPU)](../../../support_list/models_list.md))
+* 指定模式为模型训练:`-o Global.mode=train`
+* 指定训练数据集路径:`-o Global.dataset_dir`
+其他相关参数均可通过修改`.yaml`配置文件中的`Global`和`Train`下的字段来进行设置,也可以通过在命令行中追加参数来进行调整。如指定前 2 卡 gpu 训练:`-o Global.device=gpu:0,1`;设置训练轮次数为 10:`-o Train.epochs_iters=10`。更多可修改的参数及其详细解释,可以查阅模型对应任务模块的配置文件说明[PaddleX通用模型配置文件参数说明](../../instructions/config_parameters_common.md)。
+
+<details>
+  <summary>👉 <b>更多说明(点击展开)</b></summary>
+
+
+* 模型训练过程中,PaddleX 会自动保存模型权重文件,默认为`output`,如需指定保存路径,可通过配置文件中 `-o Global.output` 字段进行设置。
+* PaddleX 对您屏蔽了动态图权重和静态图权重的概念。在模型训练的过程中,会同时产出动态图和静态图的权重,在模型推理时,默认选择静态图权重推理。
+* 在完成模型训练后,所有产出保存在指定的输出目录(默认为`./output/`)下,通常有以下产出:
+
+* `train_result.json`:训练结果记录文件,记录了训练任务是否正常完成,以及产出的权重指标、相关文件路径等;
+* `train.log`:训练日志文件,记录了训练过程中的模型指标变化、loss 变化等;
+* `config.yaml`:训练配置文件,记录了本次训练的超参数的配置;
+* `.pdparams`、`.pdema`、`.pdopt.pdstate`、`.pdiparams`、`.pdmodel`:模型权重相关文件,包括网络参数、优化器、EMA、静态图网络参数、静态图网络结构等;
+</details>
+
+## **4.3 模型评估**
+在完成模型训练后,可以对指定的模型权重文件在验证集上进行评估,验证模型精度。使用 PaddleX 进行模型评估,一条命令即可完成模型的评估:
+
+```bash
+python main.py -c paddlex/configs/keypoint_detection/PP-TinyPose_128x96.yaml \
+    -o Global.mode=evaluate \
+    -o Global.dataset_dir=./dataset/keypoint_coco_examples
+```
+与模型训练类似,需要如下几步:
+
+* 指定模型的`.yaml` 配置文件路径(此处为`PP-TinyPose_128x96.yaml`)
+* 指定模式为模型评估:`-o Global.mode=evaluate`
+* 指定验证数据集路径:`-o Global.dataset_dir`
+其他相关参数均可通过修改`.yaml`配置文件中的`Global`和`Evaluate`下的字段来进行设置,详细请参考[PaddleX通用模型配置文件参数说明](../../instructions/config_parameters_common.md)。
+
+<details>
+  <summary>👉 <b>更多说明(点击展开)</b></summary>
+
+
+在模型评估时,需要指定模型权重文件路径,每个配置文件中都内置了默认的权重保存路径,如需要改变,只需要通过追加命令行参数的形式进行设置即可,如`-o Evaluate.weight_path=./output/best_model/best_model/model.pdparams`。
+
+在完成模型评估后,会产出`evaluate_result.json,其记录了`评估的结果,具体来说,记录了评估任务是否正常完成,以及模型的评估指标,包含 AP;
+
+</details>
+
+### **4.4 模型推理**
+在完成模型的训练和评估后,即可使用训练好的模型权重进行推理预测。在PaddleX中实现模型推理预测可以通过两种方式:命令行和wheel 包。
+
+#### 4.4.1 模型推理
+* 通过命令行的方式进行推理预测,只需如下一条命令。运行以下代码前,请您下载[示例图片](https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/keypoint_detection_002.jpg)到本地。
+```bash
+python main.py -c paddlex/configs/keypoint_detection/PP-TinyPose_128x96.yaml \
+    -o Global.mode=predict \
+    -o Predict.model_dir="./output/best_model/inference" \
+    -o Predict.input="keypoint_detection_002.jpg"
+```
+与模型训练和评估类似,需要如下几步:
+
+* 指定模型的`.yaml` 配置文件路径(此处为`PP-TinyPose_128x96.yaml`)
+* 指定模式为模型推理预测:`-o Global.mode=predict`
+* 指定模型权重路径:`-o Predict.model_dir="./output/best_model/inference"`
+* 指定输入数据路径:`-o Predict.input="..."`
+其他相关参数均可通过修改`.yaml`配置文件中的`Global`和`Predict`下的字段来进行设置,详细请参考[PaddleX通用模型配置文件参数说明](../../instructions/config_parameters_common.md)。
+
+#### 4.4.2 模型集成
+
+模型可以直接集成到 PaddleX 产线中,也可以直接集成到您自己的项目中。
+
+1.**产线集成**
+
+关键点检测模块可以集成的PaddleX产线有[**人体关键点检测**](../../../pipeline_usage/tutorials/cv_pipelines/human_keypoint_detection.md),只需要替换模型路径即可完成相关产线的关键点检测模块的模型更新。在产线集成中,你可以使用高性能部署和服务化部署来部署你得到的模型。
+
+2.**模块集成**
+
+您产出的权重可以直接集成到关键点检测模块中,可以参考[快速集成](#三快速集成)的 Python 示例代码,只需要将模型替换为你训练的到的模型路径即可。

+ 3 - 0
docs/pipeline_usage/tutorials/cv_pipelines/human_keypoint_detection.en.md

@@ -0,0 +1,3 @@
+简体中文 | English
+
+Coming soon...

+ 812 - 0
docs/pipeline_usage/tutorials/cv_pipelines/human_keypoint_detection.md

@@ -0,0 +1,812 @@
+简体中文 | [English](human_keypoint_detection.en.md)
+
+# 人体关键点检测产线使用教程
+
+## 1. 人体关键点检测产线介绍
+
+人体关键点检测旨在通过识别和定位人体的特定关节和部位,来实现对人体姿态和动作的分析。该任务不仅需要在图像中检测出人体,还需要精确获取人体的关键点位置,如肩膀、肘部、膝盖等,从而进行姿态估计和行为识别。人体关键点检测广泛应用于运动分析、健康监测、动画制作和人机交互等场景。
+
+PaddleX 的人体关键点检测产线是一个 Top-Down 方案,由行人检测和关键点检测两个模块组成,针对移动端设备优化,可精确流畅地在移动端设备上执行多人姿态估计任务。
+
+<img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/refs/heads/main/images/pipelines/human_keypoint_detection/01.jpg">
+
+<b>人体关键点检测产线中包含了行人检测模块和关键点检测模块</b>,有若干模型可供选择,您可以根据下边的 benchmark 数据来选择使用的模型。<b>如您更考虑模型精度,请选择精度较高的模型,如您更考虑模型推理速度,请选择推理速度较快的模型,如您更考虑模型存储大小,请选择存储大小较小的模型</b>。
+
+<summary> 👉模型列表详情</summary>
+
+<b>行人检测模块:</b>
+
+<table>
+  <tr>
+    <th >模型</th>
+    <th >mAP(0.5:0.95)</th>
+    <th >mAP(0.5)</th>
+    <th >GPU推理耗时(ms)</th>
+    <th >CPU推理耗时 (ms)</th>
+    <th >模型存储大小(M)</th>
+    <th >介绍</th>
+  </tr>
+  <tr>
+    <td>PP-YOLOE-L_human</td>
+    <td>48.0</td>
+    <td>81.9</td>
+    <td>32.8</td>
+    <td>777.7</td>
+    <td>196.02</td>
+    <td rowspan="2">基于PP-YOLOE的行人检测模型</td>
+  </tr>
+  <tr>
+    <td>PP-YOLOE-S_human</td>
+    <td>42.5</td>
+    <td>77.9</td>
+    <td>15.0</td>
+    <td>179.3</td>
+    <td>28.79</td>
+  </tr>
+</table>
+
+<b>注:以上精度指标为CrowdHuman数据集 mAP(0.5:0.95)。所有模型 GPU 推理耗时基于 NVIDIA Tesla T4 机器,精度类型为 FP32, CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz,线程数为8,精度类型为 FP32。</b>
+
+<b>关键点检测模块:</b>
+
+<table>
+  <tr>
+    <th >模型</th>
+    <th >方案</th>
+    <th >输入尺寸</th>
+    <th >AP(0.5:0.95)</th>
+    <th >GPU推理耗时(ms)</th>
+    <th >CPU推理耗时 (ms)</th>
+    <th >模型存储大小(M)</th>
+    <th >介绍</th>
+  </tr>
+  <tr>
+    <td>PP-TinyPose_128x96</td>
+    <td>Top-Down</td>
+    <td>128*96</td>
+    <td>58.4</td>
+    <td></td>
+    <td></td>
+    <td>4.9</td>
+    <td rowspan="2">PP-TinyPose 是百度飞桨视觉团队自研的针对移动端设备优化的实时关键点检测模型,可流畅地在移动端设备上执行多人姿态估计任务</td>
+  </tr>
+  <tr>
+    <td>PP-TinyPose_256x192</td>
+    <td>Top-Down</td>
+    <td>256*192</td>
+    <td>68.3</td>
+    <td></td>
+    <td></td>
+    <td>4.9</td>
+  </tr>
+</table>
+
+<b>注:以上精度指标为COCO数据集 AP(0.5:0.95),所依赖的检测框为ground truth标注得到。所有模型 GPU 推理耗时基于 NVIDIA Tesla T4 机器,精度类型为 FP32, CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz,线程数为8,精度类型为 FP32。</b>
+
+## 2. 快速开始
+
+PaddleX 所提供的预训练的模型产线均可以快速体验效果,你可以在本地使用 Python 体验通用图像识别产线的效果。
+
+### 2.1 在线体验
+
+暂不支持在线体验。
+
+### 2.2 本地体验
+
+> ❗ 在本地使用人体关键点检测产线前,请确保您已经按照[PaddleX安装教程](../../../installation/installation.md)完成了PaddleX的wheel包安装。
+
+#### 2.2.1 命令行方式体验
+
+一行命令即可快速体验人体关键点检测产线效果,使用 [测试文件](https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/keypoint_detection_001.jpg),并将 `--input` 替换为本地路径,进行预测
+
+```bash
+paddlex --pipeline object_detection --input keypoint_detection_001.jpg --device gpu:0
+```
+参数说明:
+
+```
+--pipeline:产线名称,此处为人体关键点检测产线
+--input:待处理的输入图片的本地路径或URL
+--device 使用的GPU序号(例如gpu:0表示使用第0块GPU,gpu:1,2表示使用第1、2块GPU),也可选择使用CPU(--device cpu)
+```
+
+在执行上述命令时,加载的是默认的人体关键点检测产线配置文件,若您需要自定义配置文件,可执行如下命令获取:
+
+<details><summary> 👉点击展开</summary>
+
+<pre><code class="language-bash">paddlex --get_pipeline_config human_keypoint_detection
+</code></pre>
+<p>执行后,人体关键点检测产线配置文件将被保存在当前路径。若您希望自定义保存位置,可执行如下命令(假设自定义保存位置为<code>./my_path</code>):</p>
+<pre><code class="language-bash">paddlex --get_pipeline_config human_keypoint_detection --save_path ./my_path
+</code></pre></details>
+
+#### 2.2.2 Python脚本方式集成
+几行代码即可完成人体关键点检测产线的快速推理。
+
+```python
+from paddlex import create_pipeline
+
+pipeline = create_pipeline(pipeline="human_keypoint_detection")
+
+output = pipeline.predict("keypoint_detection_001.jpg")
+for res in output:
+    res.print()
+    res.save_to_img("./output/")
+```
+
+在上述 Python 脚本中,执行了如下几个步骤:
+
+(1)实例化 `create_pipeline` 实例化产线对象:具体参数说明如下:
+
+|参数|参数说明|参数类型|默认值|
+|-|-|-|-|
+|`pipeline`|产线名称或是产线配置文件路径。如为产线名称,则必须为 PaddleX 所支持的产线。|`str`|无|
+|`device`|产线模型推理设备。支持:“gpu”,“cpu”。|`str`|`gpu`|
+|`enable_hpi`|是否启用高性能推理,仅当该产线支持高性能推理时可用。|`bool`|`False`|
+
+(2)调用产线对象的 `predict` 方法进行推理预测:`predict` 方法参数为`x`,用于输入待预测数据,支持多种输入方式,具体示例如下:
+
+| 参数类型      | 参数说明                                                                                                  |
+|---------------|-----------------------------------------------------------------------------------------------------------|
+| Python Var    | 支持直接传入Python变量,如numpy.ndarray表示的图像数据。                                               |
+| str         | 支持传入待预测数据文件路径,如图像文件的本地路径:`/root/data/img.jpg`。                                   |
+| str           | 支持传入待预测数据文件URL,如图像文件的网络URL:[示例](https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_object_detection_002.png)。|
+| str           | 支持传入本地目录,该目录下需包含待预测数据文件,如本地路径:`/root/data/`。                               |
+| dict          | 支持传入字典类型,字典的key需与具体任务对应,如图像分类任务对应\"img\",字典的val支持上述类型数据,例如:`{\"img\": \"/root/data1\"}`。|
+| list          | 支持传入列表,列表元素需为上述类型数据,如`[numpy.ndarray, numpy.ndarray],[\"/root/data/img1.jpg\", \"/root/data/img2.jpg\"]`,`[\"/root/data1\", \"/root/data2\"]`,`[{\"img\": \"/root/data1\"}, {\"img\": \"/root/data2/img.jpg\"}]`。|
+
+(3)调用`predict`方法获取预测结果:`predict` 方法为`generator`,因此需要通过调用获得预测结果,`predict`方法以batch为单位对数据进行预测,因此预测结果为list形式表示的一组预测结果。
+
+(4)对预测结果进行处理:每个样本的预测结果均为`dict`类型,且支持打印,或保存为文件,支持保存的类型与具体产线相关,如:
+
+| 方法         | 说明                        | 方法参数                                                                                               |
+|--------------|-----------------------------|--------------------------------------------------------------------------------------------------------|
+| print        | 打印结果到终端              | `- format_json`:bool类型,是否对输出内容进行使用json缩进格式化,默认为True;<br>`- indent`:int类型,json格式化设置,仅当format_json为True时有效,默认为4;<br>`- ensure_ascii`:bool类型,json格式化设置,仅当format_json为True时有效,默认为False; |
+| save_to_json | 将结果保存为json格式的文件   | `- save_path`:str类型,保存的文件路径,当为目录时,保存文件命名与输入文件类型命名一致;<br>`- indent`:int类型,json格式化设置,默认为4;<br>`- ensure_ascii`:bool类型,json格式化设置,默认为False; |
+| save_to_img  | 将结果保存为图像格式的文件  | `- save_path`:str类型,保存的文件路径,当为目录时,保存文件命名与输入文件类型命名一致; |
+
+若您获取了配置文件,即可对目标检测产线各项配置进行自定义,只需要修改 `create_pipeline` 方法中的 `pipeline` 参数值为产线配置文件路径即可。
+
+例如,若您的配置文件保存在 `./my_path/human_keypoint_detection.yaml` ,则只需执行:
+
+```python
+from paddlex import create_pipeline
+pipeline = create_pipeline(pipeline="./my_path/human_keypoint_detection.yaml")
+output = pipeline.predict("keypoint_detection_001.jpg")
+for res in output:
+    res.print() ## 打印预测的结构化输出
+    res.save_to_img("./output/") ## 保存结果可视化图像
+    res.save_to_json("./output/") ## 保存预测的结构化输出
+```
+
+## 3. 开发集成/部署
+
+如果人体关键点检测产线可以达到您对产线推理速度和精度的要求,您可以直接进行开发集成/部署。
+
+若您需要将通用图像识别产线直接应用在您的Python项目中,可以参考 [2.2.2 Python脚本方式](#222-python脚本方式集成)中的示例代码。
+
+此外,PaddleX 也提供了其他三种部署方式,详细说明如下:
+
+🚀 <b>高性能推理</b>:在实际生产环境中,许多应用对部署策略的性能指标(尤其是响应速度)有着较严苛的标准,以确保系统的高效运行与用户体验的流畅性。为此,PaddleX 提供高性能推理插件,旨在对模型推理及前后处理进行深度性能优化,实现端到端流程的显著提速,详细的高性能推理流程请参考[PaddleX高性能推理指南](../../../pipeline_deploy/high_performance_inference.md)。
+
+☁️ <b>服务化部署</b>:服务化部署是实际生产环境中常见的一种部署形式。通过将推理功能封装为服务,客户端可以通过网络请求来访问这些服务,以获取推理结果。PaddleX 支持用户以低成本实现产线的服务化部署,详细的服务化部署流程请参考[PaddleX服务化部署指南](../../../pipeline_deploy/service_deploy.md)。
+
+下面是API参考和多语言服务调用示例:
+
+<details><summary>API参考</summary>
+
+<p>对于服务提供的所有操作:</p>
+<ul>
+<li>响应体以及POST请求的请求体均为JSON数据(JSON对象)。</li>
+<li>当请求处理成功时,响应状态码为<code>200</code>,响应体的属性如下:</li>
+</ul>
+<table>
+<thead>
+<tr>
+<th>名称</th>
+<th>类型</th>
+<th>含义</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td><code>errorCode</code></td>
+<td><code>integer</code></td>
+<td>错误码。固定为<code>0</code>。</td>
+</tr>
+<tr>
+<td><code>errorMsg</code></td>
+<td><code>string</code></td>
+<td>错误说明。固定为<code>"Success"</code>。</td>
+</tr>
+</tbody>
+</table>
+<p>响应体还可能有<code>result</code>属性,类型为<code>object</code>,其中存储操作结果信息。</p>
+<ul>
+<li>当请求处理未成功时,响应体的属性如下:</li>
+</ul>
+<table>
+<thead>
+<tr>
+<th>名称</th>
+<th>类型</th>
+<th>含义</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td><code>errorCode</code></td>
+<td><code>integer</code></td>
+<td>错误码。与响应状态码相同。</td>
+</tr>
+<tr>
+<td><code>errorMsg</code></td>
+<td><code>string</code></td>
+<td>错误说明。</td>
+</tr>
+</tbody>
+</table>
+<p>服务提供的操作如下:</p>
+<ul>
+<li><b><code>infer</code></b></li>
+</ul>
+<p>获取图像OCR结果。</p>
+<p><code>POST /ocr</code></p>
+<ul>
+<li>请求体的属性如下:</li>
+</ul>
+<table>
+<thead>
+<tr>
+<th>名称</th>
+<th>类型</th>
+<th>含义</th>
+<th>是否必填</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td><code>image</code></td>
+<td><code>string</code></td>
+<td>服务可访问的图像文件的URL或图像文件内容的Base64编码结果。</td>
+<td>是</td>
+</tr>
+<tr>
+<td><code>inferenceParams</code></td>
+<td><code>object</code></td>
+<td>推理参数。</td>
+<td>否</td>
+</tr>
+</tbody>
+</table>
+<p><code>inferenceParams</code>的属性如下:</p>
+<table>
+<thead>
+<tr>
+<th>名称</th>
+<th>类型</th>
+<th>含义</th>
+<th>是否必填</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td><code>maxLongSide</code></td>
+<td><code>integer</code></td>
+<td>推理时,若文本检测模型的输入图像较长边的长度大于<code>maxLongSide</code>,则将对图像进行缩放,使其较长边的长度等于<code>maxLongSide</code>。</td>
+<td>否</td>
+</tr>
+</tbody>
+</table>
+<ul>
+<li>请求处理成功时,响应体的<code>result</code>具有如下属性:</li>
+</ul>
+<table>
+<thead>
+<tr>
+<th>名称</th>
+<th>类型</th>
+<th>含义</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td><code>texts</code></td>
+<td><code>array</code></td>
+<td>文本位置、内容和得分。</td>
+</tr>
+<tr>
+<td><code>image</code></td>
+<td><code>string</code></td>
+<td>OCR结果图,其中标注检测到的文本位置。图像为JPEG格式,使用Base64编码。</td>
+</tr>
+</tbody>
+</table>
+<p><code>texts</code>中的每个元素为一个<code>object</code>,具有如下属性:</p>
+<table>
+<thead>
+<tr>
+<th>名称</th>
+<th>类型</th>
+<th>含义</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td><code>poly</code></td>
+<td><code>array</code></td>
+<td>文本位置。数组中元素依次为包围文本的多边形的顶点坐标。</td>
+</tr>
+<tr>
+<td><code>text</code></td>
+<td><code>string</code></td>
+<td>文本内容。</td>
+</tr>
+<tr>
+<td><code>score</code></td>
+<td><code>number</code></td>
+<td>文本识别得分。</td>
+</tr>
+</tbody>
+</table>
+<p><code>result</code>示例如下:</p>
+<pre><code class="language-json">{
+&quot;texts&quot;: [
+{
+&quot;poly&quot;: [
+[
+444,
+244
+],
+[
+705,
+244
+],
+[
+705,
+311
+],
+[
+444,
+311
+]
+],
+&quot;text&quot;: &quot;北京南站&quot;,
+&quot;score&quot;: 0.9
+},
+{
+&quot;poly&quot;: [
+[
+992,
+248
+],
+[
+1263,
+251
+],
+[
+1263,
+318
+],
+[
+992,
+315
+]
+],
+&quot;text&quot;: &quot;天津站&quot;,
+&quot;score&quot;: 0.5
+}
+],
+&quot;image&quot;: &quot;xxxxxx&quot;
+}
+</code></pre></details>
+
+<details><summary>多语言调用服务示例</summary>
+
+<details>
+<summary>Python</summary>
+
+
+<pre><code class="language-python">import base64
+import requests
+
+API_URL = &quot;http://localhost:8080/ocr&quot; # 服务URL
+image_path = &quot;./demo.jpg&quot;
+output_image_path = &quot;./out.jpg&quot;
+
+# 对本地图像进行Base64编码
+with open(image_path, &quot;rb&quot;) as file:
+    image_bytes = file.read()
+    image_data = base64.b64encode(image_bytes).decode(&quot;ascii&quot;)
+
+payload = {&quot;image&quot;: image_data}  # Base64编码的文件内容或者图像URL
+
+# 调用API
+response = requests.post(API_URL, json=payload)
+
+# 处理接口返回数据
+assert response.status_code == 200
+result = response.json()[&quot;result&quot;]
+with open(output_image_path, &quot;wb&quot;) as file:
+    file.write(base64.b64decode(result[&quot;image&quot;]))
+print(f&quot;Output image saved at {output_image_path}&quot;)
+print(&quot;\nDetected texts:&quot;)
+print(result[&quot;texts&quot;])
+</code></pre></details>
+
+<details><summary>C++</summary>
+
+<pre><code class="language-cpp">#include &lt;iostream&gt;
+#include &quot;cpp-httplib/httplib.h&quot; // https://github.com/Huiyicc/cpp-httplib
+#include &quot;nlohmann/json.hpp&quot; // https://github.com/nlohmann/json
+#include &quot;base64.hpp&quot; // https://github.com/tobiaslocker/base64
+
+int main() {
+    httplib::Client client(&quot;localhost:8080&quot;);
+    const std::string imagePath = &quot;./demo.jpg&quot;;
+    const std::string outputImagePath = &quot;./out.jpg&quot;;
+
+    httplib::Headers headers = {
+        {&quot;Content-Type&quot;, &quot;application/json&quot;}
+    };
+
+    // 对本地图像进行Base64编码
+    std::ifstream file(imagePath, std::ios::binary | std::ios::ate);
+    std::streamsize size = file.tellg();
+    file.seekg(0, std::ios::beg);
+
+    std::vector&lt;char&gt; buffer(size);
+    if (!file.read(buffer.data(), size)) {
+        std::cerr &lt;&lt; &quot;Error reading file.&quot; &lt;&lt; std::endl;
+        return 1;
+    }
+    std::string bufferStr(reinterpret_cast&lt;const char*&gt;(buffer.data()), buffer.size());
+    std::string encodedImage = base64::to_base64(bufferStr);
+
+    nlohmann::json jsonObj;
+    jsonObj[&quot;image&quot;] = encodedImage;
+    std::string body = jsonObj.dump();
+
+    // 调用API
+    auto response = client.Post(&quot;/ocr&quot;, headers, body, &quot;application/json&quot;);
+    // 处理接口返回数据
+    if (response &amp;&amp; response-&gt;status == 200) {
+        nlohmann::json jsonResponse = nlohmann::json::parse(response-&gt;body);
+        auto result = jsonResponse[&quot;result&quot;];
+
+        encodedImage = result[&quot;image&quot;];
+        std::string decodedString = base64::from_base64(encodedImage);
+        std::vector&lt;unsigned char&gt; decodedImage(decodedString.begin(), decodedString.end());
+        std::ofstream outputImage(outPutImagePath, std::ios::binary | std::ios::out);
+        if (outputImage.is_open()) {
+            outputImage.write(reinterpret_cast&lt;char*&gt;(decodedImage.data()), decodedImage.size());
+            outputImage.close();
+            std::cout &lt;&lt; &quot;Output image saved at &quot; &lt;&lt; outPutImagePath &lt;&lt; std::endl;
+        } else {
+            std::cerr &lt;&lt; &quot;Unable to open file for writing: &quot; &lt;&lt; outPutImagePath &lt;&lt; std::endl;
+        }
+
+        auto texts = result[&quot;texts&quot;];
+        std::cout &lt;&lt; &quot;\nDetected texts:&quot; &lt;&lt; std::endl;
+        for (const auto&amp; text : texts) {
+            std::cout &lt;&lt; text &lt;&lt; std::endl;
+        }
+    } else {
+        std::cout &lt;&lt; &quot;Failed to send HTTP request.&quot; &lt;&lt; std::endl;
+        return 1;
+    }
+
+    return 0;
+}
+</code></pre></details>
+
+<details><summary>Java</summary>
+
+<pre><code class="language-java">import okhttp3.*;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.JsonNode;
+import com.fasterxml.jackson.databind.node.ObjectNode;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.util.Base64;
+
+public class Main {
+    public static void main(String[] args) throws IOException {
+        String API_URL = &quot;http://localhost:8080/ocr&quot;; // 服务URL
+        String imagePath = &quot;./demo.jpg&quot;; // 本地图像
+        String outputImagePath = &quot;./out.jpg&quot;; // 输出图像
+
+        // 对本地图像进行Base64编码
+        File file = new File(imagePath);
+        byte[] fileContent = java.nio.file.Files.readAllBytes(file.toPath());
+        String imageData = Base64.getEncoder().encodeToString(fileContent);
+
+        ObjectMapper objectMapper = new ObjectMapper();
+        ObjectNode params = objectMapper.createObjectNode();
+        params.put(&quot;image&quot;, imageData); // Base64编码的文件内容或者图像URL
+
+        // 创建 OkHttpClient 实例
+        OkHttpClient client = new OkHttpClient();
+        MediaType JSON = MediaType.Companion.get(&quot;application/json; charset=utf-8&quot;);
+        RequestBody body = RequestBody.Companion.create(params.toString(), JSON);
+        Request request = new Request.Builder()
+                .url(API_URL)
+                .post(body)
+                .build();
+
+        // 调用API并处理接口返回数据
+        try (Response response = client.newCall(request).execute()) {
+            if (response.isSuccessful()) {
+                String responseBody = response.body().string();
+                JsonNode resultNode = objectMapper.readTree(responseBody);
+                JsonNode result = resultNode.get(&quot;result&quot;);
+                String base64Image = result.get(&quot;image&quot;).asText();
+                JsonNode texts = result.get(&quot;texts&quot;);
+
+                byte[] imageBytes = Base64.getDecoder().decode(base64Image);
+                try (FileOutputStream fos = new FileOutputStream(outputImagePath)) {
+                    fos.write(imageBytes);
+                }
+                System.out.println(&quot;Output image saved at &quot; + outputImagePath);
+                System.out.println(&quot;\nDetected texts: &quot; + texts.toString());
+            } else {
+                System.err.println(&quot;Request failed with code: &quot; + response.code());
+            }
+        }
+    }
+}
+</code></pre></details>
+
+<details><summary>Go</summary>
+
+<pre><code class="language-go">package main
+
+import (
+    &quot;bytes&quot;
+    &quot;encoding/base64&quot;
+    &quot;encoding/json&quot;
+    &quot;fmt&quot;
+    &quot;io/ioutil&quot;
+    &quot;net/http&quot;
+)
+
+func main() {
+    API_URL := &quot;http://localhost:8080/ocr&quot;
+    imagePath := &quot;./demo.jpg&quot;
+    outputImagePath := &quot;./out.jpg&quot;
+
+    // 对本地图像进行Base64编码
+    imageBytes, err := ioutil.ReadFile(imagePath)
+    if err != nil {
+        fmt.Println(&quot;Error reading image file:&quot;, err)
+        return
+    }
+    imageData := base64.StdEncoding.EncodeToString(imageBytes)
+
+    payload := map[string]string{&quot;image&quot;: imageData} // Base64编码的文件内容或者图像URL
+    payloadBytes, err := json.Marshal(payload)
+    if err != nil {
+        fmt.Println(&quot;Error marshaling payload:&quot;, err)
+        return
+    }
+
+    // 调用API
+    client := &amp;http.Client{}
+    req, err := http.NewRequest(&quot;POST&quot;, API_URL, bytes.NewBuffer(payloadBytes))
+    if err != nil {
+        fmt.Println(&quot;Error creating request:&quot;, err)
+        return
+    }
+
+    res, err := client.Do(req)
+    if err != nil {
+        fmt.Println(&quot;Error sending request:&quot;, err)
+        return
+    }
+    defer res.Body.Close()
+
+    // 处理接口返回数据
+    body, err := ioutil.ReadAll(res.Body)
+    if err != nil {
+        fmt.Println(&quot;Error reading response body:&quot;, err)
+        return
+    }
+    type Response struct {
+        Result struct {
+            Image      string   `json:&quot;image&quot;`
+            Texts []map[string]interface{} `json:&quot;texts&quot;`
+        } `json:&quot;result&quot;`
+    }
+    var respData Response
+    err = json.Unmarshal([]byte(string(body)), &amp;respData)
+    if err != nil {
+        fmt.Println(&quot;Error unmarshaling response body:&quot;, err)
+        return
+    }
+
+    outputImageData, err := base64.StdEncoding.DecodeString(respData.Result.Image)
+    if err != nil {
+        fmt.Println(&quot;Error decoding base64 image data:&quot;, err)
+        return
+    }
+    err = ioutil.WriteFile(outputImagePath, outputImageData, 0644)
+    if err != nil {
+        fmt.Println(&quot;Error writing image to file:&quot;, err)
+        return
+    }
+    fmt.Printf(&quot;Image saved at %s.jpg\n&quot;, outputImagePath)
+    fmt.Println(&quot;\nDetected texts:&quot;)
+    for _, text := range respData.Result.Texts {
+        fmt.Println(text)
+    }
+}
+</code></pre></details>
+
+<details><summary>C#</summary>
+
+<pre><code class="language-csharp">using System;
+using System.IO;
+using System.Net.Http;
+using System.Net.Http.Headers;
+using System.Text;
+using System.Threading.Tasks;
+using Newtonsoft.Json.Linq;
+
+class Program
+{
+    static readonly string API_URL = &quot;http://localhost:8080/ocr&quot;;
+    static readonly string imagePath = &quot;./demo.jpg&quot;;
+    static readonly string outputImagePath = &quot;./out.jpg&quot;;
+
+    static async Task Main(string[] args)
+    {
+        var httpClient = new HttpClient();
+
+        // 对本地图像进行Base64编码
+        byte[] imageBytes = File.ReadAllBytes(imagePath);
+        string image_data = Convert.ToBase64String(imageBytes);
+
+        var payload = new JObject{ { &quot;image&quot;, image_data } }; // Base64编码的文件内容或者图像URL
+        var content = new StringContent(payload.ToString(), Encoding.UTF8, &quot;application/json&quot;);
+
+        // 调用API
+        HttpResponseMessage response = await httpClient.PostAsync(API_URL, content);
+        response.EnsureSuccessStatusCode();
+
+        // 处理接口返回数据
+        string responseBody = await response.Content.ReadAsStringAsync();
+        JObject jsonResponse = JObject.Parse(responseBody);
+
+        string base64Image = jsonResponse[&quot;result&quot;][&quot;image&quot;].ToString();
+        byte[] outputImageBytes = Convert.FromBase64String(base64Image);
+
+        File.WriteAllBytes(outputImagePath, outputImageBytes);
+        Console.WriteLine($&quot;Output image saved at {outputImagePath}&quot;);
+        Console.WriteLine(&quot;\nDetected texts:&quot;);
+        Console.WriteLine(jsonResponse[&quot;result&quot;][&quot;texts&quot;].ToString());
+    }
+}
+</code></pre></details>
+
+<details><summary>Node.js</summary>
+
+<pre><code class="language-js">const axios = require('axios');
+const fs = require('fs');
+
+const API_URL = 'http://localhost:8080/ocr'
+const imagePath = './demo.jpg'
+const outputImagePath = &quot;./out.jpg&quot;;
+
+let config = {
+   method: 'POST',
+   maxBodyLength: Infinity,
+   url: API_URL,
+   data: JSON.stringify({
+    'image': encodeImageToBase64(imagePath)  // Base64编码的文件内容或者图像URL
+  })
+};
+
+// 对本地图像进行Base64编码
+function encodeImageToBase64(filePath) {
+  const bitmap = fs.readFileSync(filePath);
+  return Buffer.from(bitmap).toString('base64');
+}
+
+// 调用API
+axios.request(config)
+.then((response) =&gt; {
+    // 处理接口返回数据
+    const result = response.data[&quot;result&quot;];
+    const imageBuffer = Buffer.from(result[&quot;image&quot;], 'base64');
+    fs.writeFile(outputImagePath, imageBuffer, (err) =&gt; {
+      if (err) throw err;
+      console.log(`Output image saved at ${outputImagePath}`);
+    });
+    console.log(&quot;\nDetected texts:&quot;);
+    console.log(result[&quot;texts&quot;]);
+})
+.catch((error) =&gt; {
+  console.log(error);
+});
+</code></pre></details>
+
+<details><summary>PHP</summary>
+
+<pre><code class="language-php">&lt;?php
+
+$API_URL = &quot;http://localhost:8080/ocr&quot;; // 服务URL
+$image_path = &quot;./demo.jpg&quot;;
+$output_image_path = &quot;./out.jpg&quot;;
+
+// 对本地图像进行Base64编码
+$image_data = base64_encode(file_get_contents($image_path));
+$payload = array(&quot;image&quot; =&gt; $image_data); // Base64编码的文件内容或者图像URL
+
+// 调用API
+$ch = curl_init($API_URL);
+curl_setopt($ch, CURLOPT_POST, true);
+curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($payload));
+curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
+$response = curl_exec($ch);
+curl_close($ch);
+
+// 处理接口返回数据
+$result = json_decode($response, true)[&quot;result&quot;];
+file_put_contents($output_image_path, base64_decode($result[&quot;image&quot;]));
+echo &quot;Output image saved at &quot; . $output_image_path . &quot;\n&quot;;
+echo &quot;\nDetected texts:\n&quot;;
+print_r($result[&quot;texts&quot;]);
+
+?&gt;
+</code></pre></details>
+</details>
+<br/>
+
+📱 <b>端侧部署</b>:端侧部署是一种将计算和数据处理功能放在用户设备本身上的方式,设备可以直接处理数据,而不需要依赖远程的服务器。PaddleX 支持将模型部署在 Android 等端侧设备上,详细的端侧部署流程请参考[PaddleX端侧部署指南](../../../pipeline_deploy/edge_deploy.md)。
+您可以根据需要选择合适的方式部署模型产线,进而进行后续的 AI 应用集成。
+
+
+## 4. 二次开发
+
+如果人体关键点检测产线提供的默认模型权重在您的场景中精度或速度不满意,您可以尝试利用<b>您自己拥有的特定领域或应用场景的数据</b>对现有模型进行进一步的<b>微调</b>,以提升该产线的在您的场景中的识别效果。
+
+### 4.1 模型微调
+
+由于人体关键点检测产线包含两个模块(行人检测模块和关键点检测模块),模型产线的效果不及预期可能来自于其中任何一个模块。
+
+您可以对识别效果差的图片进行分析,如果在分析过程中发现有较多的行人目标未被检测出来,那么可能是行人检测模型存在不足,您需要参考[行人检测模块开发教程](../../../module_usage/tutorials/cv_modules/human_detection.md)中的[二次开发](../../../module_usage/tutorials/cv_modules/human_detection.md#四二次开发)章节,使用您的私有数据集对行人检测模型进行微调;如果在已检测到行人出现关键点检测错误,这表明关键点检测模型需要进一步改进,您需要参考[关键点检测模块开发教程](../../../module_usage/tutorials/cv_modules/keypoint_detection.md)中的[二次开发](../../../module_usage/tutorials/cv_modules/keypoint_detection.md#四二次开发)章节,对关键点检测模型进行微调。
+
+### 4.2 模型应用
+
+当您使用私有数据集完成微调训练后,可获得本地模型权重文件。
+
+若您需要使用微调后的模型权重,只需对产线配置文件做修改,将微调后模型权重的本地路径替换至产线配置文件中的对应位置即可:
+
+```yaml
+Pipeline:
+  human_det_model: PP-YOLOE-S_human       #可修改为微调后行人检测模型的本地路径
+  keypoint_det_model: PP-TinyPose_128x96  #可修改为微调后关键点检测模型的本地路径
+  human_det_batch_size: 1
+  keypoint_det_batch_size: 1
+  device: gpu
+```
+随后, 参考[2.2 本地体验](#22-本地体验)中的命令行方式或Python脚本方式,加载修改后的产线配置文件即可。
+
+##  5. 多硬件支持
+
+PaddleX 支持英伟达 GPU、昆仑芯 XPU、昇腾 NPU和寒武纪 MLU 等多种主流硬件设备,<b>仅需修改 `--device`参数</b>即可完成不同硬件之间的无缝切换。
+
+例如,使用Python运行通用图像识别产线时,将运行设备从英伟达 GPU 更改为昇腾 NPU,仅需将脚本中的 `device` 修改为 npu 即可:
+
+```python
+from paddlex import create_pipeline
+
+pipeline = create_pipeline(
+    pipeline="human_keypoint_detection",
+    device="npu:0" # gpu:0 --> npu:0
+    )
+```
+
+若您想在更多种类的硬件上使用通用图像识别产线,请参考[PaddleX多硬件使用指南](../../../other_devices_support/multi_devices_use_guide.md)。

+ 9 - 0
docs/support_list/models_list.en.md

@@ -1402,6 +1402,15 @@ PaddleX incorporates multiple pipelines, each containing several modules, and ea
 </table>
 <b>Note: The above accuracy metrics are evaluated on the </b>[MVTec AD](https://www.mvtec.com/company/research/datasets/mvtec-ad)<b> dataset using the average anomaly score.</b>
 
+## [Keypoint Detection Module](../module_usage/tutorials//cv_modules/keypoint_detection.en.md)
+
+| Model Name|Method|Input Size|AP(0.5:0.95)|GPU Inference Time (ms)|CPU Inference Time (ms)|Model Size|YAML File|
+|-|-|-|-|-|-|-|-|
+| PP-TinyPose_128x96    | Top-Down| 128*96   | 58.4         |                   |                  | 4.9 M        | [PP-TinyPose_128x96.yaml](../../paddlex/configs/modules/keypoint_detection/PP-TinyPose_128x96.yaml) |
+| PP-TinyPose_256x192   | Top-Down| 128*96   | 68.3         |                   |                  | 4.9 M        | [PP-TinyPose_128x96.yaml](../../paddlex/configs/modules/keypoint_detection/PP-TinyPose_128x96.yaml) |
+
+**Note: The above accuracy metrics are based on the COCO dataset with AP(0.5:0.95), and the detection boxes are obtained from ground truth annotations. All model GPU inference times are measured on machines with NVIDIA Tesla T4 GPUs, using FP32 precision. The CPU inference speeds are based on an Intel® Xeon® Gold 5117 CPU @ 2.00GHz, with 8 threads and FP32 precision.**
+
 ## [Semantic Segmentation Module](../module_usage/tutorials/cv_modules/semantic_segmentation.en.md)
 <table>
 <thead>

+ 9 - 0
docs/support_list/models_list.md

@@ -1400,6 +1400,15 @@ PaddleX 内置了多条产线,每条产线都包含了若干模块,每个模
 </table>
 <b>注:以上精度指标为 </b>[MVTec AD](https://www.mvtec.com/company/research/datasets/mvtec-ad)<b> 验证集 平均异常分数。</b>
 
+## [关键点检测模块](../module_usage/tutorials//cv_modules/keypoint_detection.md)
+
+| 模型|方案|输入尺寸|AP(0.5:0.95)|GPU推理耗时(ms|CPU推理耗时 (ms)|模型存储大小|yaml文件|
+|-|-|-|-|-|-|-|-|
+| PP-TinyPose_128x96    | Top-Down| 128*96   | 58.4         |                   |                  | 4.9 M        | [PP-TinyPose_128x96.yaml](../../paddlex/configs/modules/keypoint_detection/PP-TinyPose_128x96.yaml) |
+| PP-TinyPose_256x192   | Top-Down| 128*96   | 68.3         |                   |                  | 4.9 M        | [PP-TinyPose_128x96.yaml](../../paddlex/configs/modules/keypoint_detection/PP-TinyPose_128x96.yaml) |
+
+**注:以上精度指标为COCO数据集 AP(0.5:0.95),所依赖的检测框为ground truth标注得到。所有模型 GPU 推理耗时基于 NVIDIA Tesla T4 机器,精度类型为 FP32, CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz,线程数为8,精度类型为 FP32。**
+
 ## [语义分割模块](../module_usage/tutorials/cv_modules/semantic_segmentation.md)
 <table>
 <thead>

+ 40 - 0
paddlex/configs/modules/keypoint_detection/PP-TinyPose_128x96.yaml

@@ -0,0 +1,40 @@
+Global:
+  model: PP-TinyPose_128x96
+  mode: check_dataset # check_dataset/train/evaluate/predict
+  dataset_dir: "/paddle/dataset/paddlex/det/keypoint_coco_examples"
+  device: gpu:0,1,2,3
+  output: "output"
+
+CheckDataset:
+  convert:
+    enable: False
+    src_dataset_type: null
+  split:
+    enable: False
+    train_percent: null
+    val_percent: null
+
+Train:
+  num_classes: 1
+  epochs_iters: 50
+  batch_size: 16
+  learning_rate: 0.001
+  pretrain_weight_path: "https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-TinyPose_128x96_pretrained.pdparams"
+  warmup_steps: 100
+  resume_path: null
+  log_interval: 10
+  eval_interval: 1
+
+Evaluate:
+  weight_path: "output/best_model/best_model.pdparams"
+  log_interval: 10
+
+Predict:
+  batch_size: 1
+  model_dir: "output/best_model/inference"
+  input: "https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/keypoint_detection_002.jpg"
+  kernel_option:
+    run_mode: paddle
+
+Export:
+  weight_path: "https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-TinyPose_128x96_pretrained.pdparams"

+ 40 - 0
paddlex/configs/modules/keypoint_detection/PP-TinyPose_256x192.yaml

@@ -0,0 +1,40 @@
+Global:
+  model: PP-TinyPose_256x192
+  mode: check_dataset # check_dataset/train/evaluate/predict
+  dataset_dir: "/paddle/dataset/paddlex/det/keypoint_coco_examples"
+  device: gpu:0,1,2,3
+  output: "output"
+
+CheckDataset:
+  convert:
+    enable: False
+    src_dataset_type: null
+  split:
+    enable: False
+    train_percent: null
+    val_percent: null
+
+Train:
+  num_classes: 1
+  epochs_iters: 50
+  batch_size: 16
+  learning_rate: 0.001
+  pretrain_weight_path: "https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-TinyPose_256x192_pretrained.pdparams"
+  warmup_steps: 100
+  resume_path: null
+  log_interval: 10
+  eval_interval: 1
+
+Evaluate:
+  weight_path: "output/best_model/best_model.pdparams"
+  log_interval: 10
+
+Predict:
+  batch_size: 1
+  model_dir: "output/best_model/inference"
+  input: "https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/keypoint_detection_002.jpg"
+  kernel_option:
+    run_mode: paddle
+
+Export:
+  weight_path: "https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/PP-TinyPose_256x192_pretrained.pdparams"

+ 17 - 0
paddlex/configs/pipelines/human_keypoint_detection.yaml

@@ -0,0 +1,17 @@
+pipeline_name: human_keypoint_detection
+
+SubModules:
+  ObjectDetection:
+    module_name: object_detection
+    model_name: PP-YOLOE-S_human
+    model_dir: null
+    batch_size: 1
+    threshold: null
+    imgsz: null
+  KeypointDetection:
+    module_name: keypoint_detection
+    model_name: PP-TinyPose_128x96
+    model_dir: null
+    batch_size: 1
+    flip: False
+    use_udp: null

+ 1 - 0
paddlex/inference/models_new/__init__.py

@@ -22,6 +22,7 @@ from .base import BasePredictor, BasicPredictor
 
 from .image_classification import ClasPredictor
 from .object_detection import DetPredictor
+from .keypoint_detection import KptPredictor
 from .text_detection import TextDetPredictor
 from .text_recognition import TextRecPredictor
 from .table_structure_recognition import TablePredictor

+ 15 - 0
paddlex/inference/models_new/keypoint_detection/__init__.py

@@ -0,0 +1,15 @@
+# copyright (c) 2024 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from .predictor import KptPredictor

+ 188 - 0
paddlex/inference/models_new/keypoint_detection/predictor.py

@@ -0,0 +1,188 @@
+# copyright (c) 2024 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from typing import Any, List, Optional, Sequence
+
+import numpy as np
+
+from ....modules.keypoint_detection.model_list import MODELS
+from ....utils import logging
+from ...common.batch_sampler import ImageBatchSampler
+
+from ..common import ToBatch
+from ..object_detection import DetPredictor
+from .processors import TopDownAffine, KptPostProcess
+from .result import KptResult
+
+
+class KptBatchSampler(ImageBatchSampler):
+    def sample(self, inputs):
+        if not isinstance(inputs, list):
+            inputs = [inputs]
+
+        batch = []
+        for input in inputs:
+            if isinstance(input, (np.ndarray, dict)):
+                batch.append(input)
+                if len(batch) == self.batch_size:
+                    yield batch
+                    batch = []
+            elif isinstance(input, str):
+                file_path = (
+                    self._download_from_url(input)
+                    if input.startswith("http")
+                    else input
+                )
+                file_list = self._get_files_list(file_path)
+                for file_path in file_list:
+                    batch.append(file_path)
+                    if len(batch) == self.batch_size:
+                        yield batch
+                        batch = []
+            else:
+                logging.warning(
+                    f"Not supported input data type! Only `numpy.ndarray` and `str` are supported! So has been ignored: {input}."
+                )
+        if len(batch) > 0:
+            yield batch
+
+
+class KptPredictor(DetPredictor):
+
+    entities = MODELS
+
+    flip_perm = [  # The left-right joints exchange order list
+        [1, 2],
+        [3, 4],
+        [5, 6],
+        [7, 8],
+        [9, 10],
+        [11, 12],
+        [13, 14],
+        [15, 16],
+    ]
+
+    def __init__(
+        self,
+        *args,
+        flip: bool = False,
+        use_udp: Optional[bool] = None,
+        **kwargs,
+    ):
+        """Keypoint Predictor
+
+        Args:
+            flip (bool): Whether to do flipping test. Default value is ``False``.
+            use_udp (Optional[bool]): Whether to use unbiased data processing. Default value is ``None``.
+
+        """
+        self.flip = flip
+        self.use_udp = use_udp
+        super().__init__(*args, **kwargs)
+        for op in self.pre_ops:
+            if isinstance(op, TopDownAffine):
+                self.input_size = op.input_size
+                break
+        if any([name in self.model_name for name in ["PP-TinyPose"]]):
+            self.shift_heatmap = True
+        else:
+            self.shift_heatmap = False
+
+    def _build_batch_sampler(self):
+        return KptBatchSampler()
+
+    def _get_result_class(self):
+        return KptResult
+
+    def _format_output(self, pred: Sequence[Any]) -> List[dict]:
+        """Transform batch outputs into a list of single image output."""
+
+        return [
+            {
+                "heatmap": res[0],
+                "masks": res[1],
+            }
+            for res in zip(*pred)
+        ]
+
+    def flip_back(self, output_flipped, matched_parts):
+        assert (
+            output_flipped.ndim == 4
+        ), "output_flipped should be [batch_size, num_joints, height, width]"
+
+        output_flipped = output_flipped[:, :, :, ::-1]
+
+        for pair in matched_parts:
+            tmp = output_flipped[:, pair[0], :, :].copy()
+            output_flipped[:, pair[0], :, :] = output_flipped[:, pair[1], :, :]
+            output_flipped[:, pair[1], :, :] = tmp
+
+        return output_flipped
+
+    def process(self, batch_data: List[dict]):
+        """
+        Process a batch of data through the preprocessing, inference, and postprocessing.
+
+        Args:
+            batch_data (List[Union[str, np.ndarray], ...]): A batch of input data (e.g., image file paths).
+
+        Returns:
+            dict: A dictionary containing the input path, raw image, class IDs, scores, and label names
+                for every instance of the batch. Keys include 'input_path', 'input_img', 'class_ids', 'scores', and 'label_names'.
+        """
+        datas = batch_data
+        # preprocess
+        for pre_op in self.pre_ops[:-1]:
+            datas = pre_op(datas)
+
+        # use `ToBatch` format batch inputs
+        batch_inputs = self.pre_ops[-1]([data["img"] for data in datas])
+
+        # do infer
+        batch_preds = self.infer(batch_inputs)
+
+        if self.flip:
+            # flip w
+            batch_inputs[0] = np.flip(batch_inputs[0], axis=3)
+            preds_flipped = self.infer(batch_inputs)
+
+            output_flipped = self.flip_back(preds_flipped[0], self.flip_perm)
+            if self.shift_heatmap:
+                output_flipped[:, :, :, 1:] = output_flipped.copy()[:, :, :, 0:-1]
+            batch_preds[0] = (batch_preds[0] + output_flipped) * 0.5
+
+        # process a batch of predictions into a list of single image result
+        preds_list = self._format_output(batch_preds)
+
+        # postprocess
+        keypoints = self.post_op(preds_list, datas)
+
+        return {
+            "input_path": [data.get("img_path", None) for data in datas],
+            "input_img": [data["ori_img"] for data in datas],
+            "kpts": keypoints,
+        }
+
+    @DetPredictor.register("TopDownEvalAffine")
+    def build_topdown_affine(self, trainsize, use_udp=False):
+        return TopDownAffine(
+            input_size=trainsize,
+            use_udp=use_udp if self.use_udp is None else self.use_udp,
+        )
+
+    def build_to_batch(self):
+        return ToBatch()
+
+    def build_postprocess(self):
+        return KptPostProcess(use_dark=True)

+ 359 - 0
paddlex/inference/models_new/keypoint_detection/processors.py

@@ -0,0 +1,359 @@
+# copyright (c) 2024 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import math
+from typing import List, Sequence, Tuple, Union, Optional
+
+import cv2
+import numpy as np
+from numpy import ndarray
+
+from ..object_detection.processors import get_affine_transform
+
+Number = Union[int, float]
+Kpts = List[dict]
+
+
+def get_warp_matrix(
+    theta: float, size_input: ndarray, size_dst: ndarray, size_target: ndarray
+) -> ndarray:
+    """This code is based on
+        https://github.com/open-mmlab/mmpose/blob/master/mmpose/core/post_processing/post_transforms.py
+        Calculate the transformation matrix under the constraint of unbiased.
+    Paper ref: Huang et al. The Devil is in the Details: Delving into Unbiased
+    Data Processing for Human Pose Estimation (CVPR 2020).
+    Args:
+        theta (float): Rotation angle in degrees.
+        size_input (np.ndarray): Size of input image [w, h].
+        size_dst (np.ndarray): Size of output image [w, h].
+        size_target (np.ndarray): Size of ROI in input plane [w, h].
+    Returns:
+        matrix (np.ndarray): A matrix for transformation.
+    """
+    theta = np.deg2rad(theta)
+    matrix = np.zeros((2, 3), dtype=np.float32)
+
+    scale_x = size_dst[0] / size_target[0]
+    scale_y = size_dst[1] / size_target[1]
+
+    matrix[0, 0] = np.cos(theta) * scale_x
+    matrix[0, 1] = -np.sin(theta) * scale_x
+    matrix[0, 2] = scale_x * (
+        -0.5 * size_input[0] * np.cos(theta)
+        + 0.5 * size_input[1] * np.sin(theta)
+        + 0.5 * size_target[0]
+    )
+    matrix[1, 0] = np.sin(theta) * scale_y
+    matrix[1, 1] = np.cos(theta) * scale_y
+    matrix[1, 2] = scale_y * (
+        -0.5 * size_input[0] * np.sin(theta)
+        - 0.5 * size_input[1] * np.cos(theta)
+        + 0.5 * size_target[1]
+    )
+
+    return matrix
+
+
+class TopDownAffine:
+    """refer to https://github.com/open-mmlab/mmpose/blob/71ec36ebd63c475ab589afc817868e749a61491f/mmpose/datasets/transforms/topdown_transforms.py#L13
+    Get the bbox image as the model input by affine transform.
+
+    Args:
+        input_size (Tuple[int, int]): The input image size of the model in
+            [w, h]. The bbox region will be cropped and resize to `input_size`
+        use_udp (bool): Whether use unbiased data processing. See
+            `UDP (CVPR 2020)`_ for details. Defaults to ``False``
+
+    .. _`UDP (CVPR 2020)`: https://arxiv.org/abs/1911.07524
+    """
+
+    def __init__(self, input_size: Tuple[int, int], use_udp: bool = False):
+        assert (
+            all([isinstance(i, int) for i in input_size]) and len(input_size) == 2
+        ), f"Invalid input_size {input_size}"
+        self.input_size = input_size
+        self.use_udp = use_udp
+
+    def apply(
+        self,
+        img: ndarray,
+        center: Optional[Union[Tuple[Number, Number], ndarray]] = None,
+        scale: Optional[Union[Tuple[Number, Number], ndarray]] = None,
+    ) -> Tuple[ndarray, ndarray, ndarray]:
+        """Applies a wrapaffine to the input image based on the specified center, scale.
+
+        Args:
+            img (ndarray): The input image as a NumPy ndarray.
+            center (Optional[Union[Tuple[Number, Number], ndarray]], optional): Center of the bounding box (x, y)
+            scale (Optional[Union[Tuple[Number, Number], ndarray]], optional): Scale of the bounding box
+            wrt [width, height].
+
+        Returns:
+            Tuple[ndarray, ndarray, ndarray]: The transformed image,
+            the center used for the transformation, and the scale used for the transformation.
+        """
+        rot = 0
+        imshape = np.array(img.shape[:2][::-1])
+        if isinstance(center, Sequence):
+            center = np.array(center)
+        if isinstance(scale, Sequence):
+            scale = np.array(scale)
+
+        center = center if center is not None else imshape / 2.0
+        scale = scale if scale is not None else imshape
+        if self.use_udp:
+            trans = get_warp_matrix(
+                rot,
+                center * 2.0,
+                [self.input_size[0] - 1.0, self.input_size[1] - 1.0],
+                scale,
+            )
+            img = cv2.warpAffine(
+                img,
+                trans,
+                (int(self.input_size[0]), int(self.input_size[1])),
+                flags=cv2.INTER_LINEAR,
+            )
+        else:
+            trans = get_affine_transform(center, scale, rot, self.input_size)
+            img = cv2.warpAffine(
+                img,
+                trans,
+                (int(self.input_size[0]), int(self.input_size[1])),
+                flags=cv2.INTER_LINEAR,
+            )
+
+        return img, center, scale
+
+    def __call__(self, datas: List[dict]) -> List[dict]:
+        for data in datas:
+            ori_img = data["img"]
+            if "ori_img" not in data:
+                data["ori_img"] = ori_img
+            if "ori_img_size" not in data:
+                data["ori_img_size"] = [ori_img.shape[1], ori_img.shape[0]]
+
+            img, center, scale = self.apply(
+                ori_img, data.get("center", None), data.get("scale", None)
+            )
+            data["img"] = img
+            data["center"] = center
+            data["scale"] = scale
+
+            img_size = [img.shape[1], img.shape[0]]
+            data["img_size"] = img_size  # [size_w, size_h]
+
+        return datas
+
+
+def affine_transform(pt: ndarray, t: ndarray):
+    """Apply an affine transformation to a 2D point.
+
+    Args:
+        pt (numpy.ndarray): A 2D point represented as a 2-element array.
+        t (numpy.ndarray): A 3x3 affine transformation matrix.
+
+    Returns:
+        numpy.ndarray: The transformed 2D point.
+    """
+    new_pt = np.array([pt[0], pt[1], 1.0]).T
+    new_pt = np.dot(t, new_pt)
+    return new_pt[:2]
+
+
+def transform_preds(
+    coords: ndarray,
+    center: Tuple[float, float],
+    scale: Tuple[float, float],
+    output_size: Tuple[int, int],
+) -> ndarray:
+    """Transform coordinates to the target space using an affine transformation.
+
+    Args:
+        coords (numpy.ndarray): Original coordinates, shape (N, 2).
+        center (tuple): Center point for the transformation.
+        scale (tuple): Scale factor for the transformation.
+        output_size (tuple): Size of the output space.
+
+    Returns:
+        numpy.ndarray: Transformed coordinates, shape (N, 2).
+    """
+    target_coords = np.zeros(coords.shape)
+    trans = get_affine_transform(center, scale, 0, output_size, inv=1)
+    for p in range(coords.shape[0]):
+        target_coords[p, 0:2] = affine_transform(coords[p, 0:2], trans)
+    return target_coords
+
+
+class KptPostProcess:
+    """Save Result Transform"""
+
+    def __init__(self, use_dark=True):
+        self.use_dark = use_dark
+
+    def apply(self, heatmap: ndarray, center: ndarray, scale: ndarray) -> Kpts:
+        """apply"""
+        # TODO: add batch support
+        heatmap, center, scale = heatmap[None, ...], center[None, ...], scale[None, ...]
+        preds, maxvals = self.get_final_preds(heatmap, center, scale)
+        keypoints, scores = np.concatenate((preds, maxvals), axis=-1), np.mean(
+            maxvals.squeeze(-1), axis=1
+        )
+
+        return [
+            {"keypoints": kpt, "kpt_score": score}
+            for kpt, score in zip(keypoints, scores)
+        ]
+
+    def __call__(self, batch_outputs: List[dict], datas: List[dict]) -> List[Kpts]:
+        """Apply the post-processing to a batch of outputs.
+
+        Args:
+            batch_outputs (List[dict]): The list of detection outputs.
+            datas (List[dict]): The list of input data.
+
+        Returns:
+            List[dict]: The list of post-processed keypoints.
+        """
+        return [
+            self.apply(output["heatmap"], data["center"], data["scale"])
+            for data, output in zip(datas, batch_outputs)
+        ]
+
+    def get_final_preds(
+        self, heatmaps: ndarray, center: ndarray, scale: ndarray, kernelsize: int = 3
+    ):
+        """the highest heatvalue location with a quarter offset in the
+        direction from the highest response to the second highest response.
+        Args:
+            heatmaps (numpy.ndarray): The predicted heatmaps
+            center (numpy.ndarray): The boxes center
+            scale (numpy.ndarray): The scale factor
+        Returns:
+            preds: numpy.ndarray([batch_size, num_joints, 2]), keypoints coords
+            maxvals: numpy.ndarray([batch_size, num_joints, 1]), the maximum confidence of the keypoints
+        """
+        coords, maxvals = self.get_max_preds(heatmaps)
+        heatmap_height = heatmaps.shape[2]
+        heatmap_width = heatmaps.shape[3]
+
+        if self.use_dark:
+            coords = self.dark_postprocess(heatmaps, coords, kernelsize)
+        else:
+            for n in range(coords.shape[0]):
+                for p in range(coords.shape[1]):
+                    hm = heatmaps[n][p]
+                    px = int(math.floor(coords[n][p][0] + 0.5))
+                    py = int(math.floor(coords[n][p][1] + 0.5))
+                    if 1 < px < heatmap_width - 1 and 1 < py < heatmap_height - 1:
+                        diff = np.array(
+                            [
+                                hm[py][px + 1] - hm[py][px - 1],
+                                hm[py + 1][px] - hm[py - 1][px],
+                            ]
+                        )
+                        coords[n][p] += np.sign(diff) * 0.25
+        preds = coords.copy()
+        # Transform back
+        for i in range(coords.shape[0]):
+            preds[i] = transform_preds(
+                coords[i], center[i], scale[i], [heatmap_width, heatmap_height]
+            )
+
+        return preds, maxvals
+
+    def get_max_preds(self, heatmaps: ndarray) -> Tuple[ndarray, ndarray]:
+        """get predictions from score maps
+        Args:
+            heatmaps: numpy.ndarray([batch_size, num_joints, height, width])
+        Returns:
+            preds: numpy.ndarray([batch_size, num_joints, 2]), keypoints coords
+            maxvals: numpy.ndarray([batch_size, num_joints, 2]), the maximum confidence of the keypoints
+        """
+        assert isinstance(heatmaps, np.ndarray), "heatmaps should be numpy.ndarray"
+        assert heatmaps.ndim == 4, "batch_images should be 4-ndim"
+        batch_size = heatmaps.shape[0]
+        num_joints = heatmaps.shape[1]
+        width = heatmaps.shape[3]
+        heatmaps_reshaped = heatmaps.reshape((batch_size, num_joints, -1))
+        idx = np.argmax(heatmaps_reshaped, 2)
+        maxvals = np.amax(heatmaps_reshaped, 2)
+        maxvals = maxvals.reshape((batch_size, num_joints, 1))
+        idx = idx.reshape((batch_size, num_joints, 1))
+        preds = np.tile(idx, (1, 1, 2)).astype(np.float32)
+        preds[:, :, 0] = (preds[:, :, 0]) % width
+        preds[:, :, 1] = np.floor((preds[:, :, 1]) / width)
+        pred_mask = np.tile(np.greater(maxvals, 0.0), (1, 1, 2))
+        pred_mask = pred_mask.astype(np.float32)
+        preds *= pred_mask
+
+        return preds, maxvals
+
+    def gaussian_blur(self, heatmap: ndarray, kernel: int) -> ndarray:
+        border = (kernel - 1) // 2
+        batch_size = heatmap.shape[0]
+        num_joints = heatmap.shape[1]
+        height = heatmap.shape[2]
+        width = heatmap.shape[3]
+        for i in range(batch_size):
+            for j in range(num_joints):
+                origin_max = np.max(heatmap[i, j])
+                dr = np.zeros((height + 2 * border, width + 2 * border))
+                dr[border:-border, border:-border] = heatmap[i, j].copy()
+                dr = cv2.GaussianBlur(dr, (kernel, kernel), 0)
+                heatmap[i, j] = dr[border:-border, border:-border].copy()
+                heatmap[i, j] *= origin_max / np.max(heatmap[i, j])
+
+        return heatmap
+
+    def dark_parse(self, hm: ndarray, coord: ndarray):
+        heatmap_height = hm.shape[0]
+        heatmap_width = hm.shape[1]
+        px = int(coord[0])
+        py = int(coord[1])
+        if 1 < px < heatmap_width - 2 and 1 < py < heatmap_height - 2:
+            dx = 0.5 * (hm[py][px + 1] - hm[py][px - 1])
+            dy = 0.5 * (hm[py + 1][px] - hm[py - 1][px])
+            dxx = 0.25 * (hm[py][px + 2] - 2 * hm[py][px] + hm[py][px - 2])
+            dxy = 0.25 * (
+                hm[py + 1][px + 1]
+                - hm[py - 1][px + 1]
+                - hm[py + 1][px - 1]
+                + hm[py - 1][px - 1]
+            )
+            dyy = 0.25 * (hm[py + 2 * 1][px] - 2 * hm[py][px] + hm[py - 2 * 1][px])
+            derivative = np.matrix([[dx], [dy]])
+            hessian = np.matrix([[dxx, dxy], [dxy, dyy]])
+            if dxx * dyy - dxy**2 != 0:
+                hessianinv = hessian.I
+                offset = -hessianinv * derivative
+                offset = np.squeeze(np.array(offset.T), axis=0)
+                coord += offset
+
+        return coord
+
+    def dark_postprocess(
+        self, hm: ndarray, coords: ndarray, kernelsize: int
+    ) -> ndarray:
+        """
+        refer to https://github.com/ilovepose/DarkPose/lib/core/inference.py
+        """
+        hm = self.gaussian_blur(hm, kernelsize)
+        hm = np.maximum(hm, 1e-10)
+        hm = np.log(hm)
+        for n in range(coords.shape[0]):
+            for p in range(coords.shape[1]):
+                coords[n, p] = self.dark_parse(hm[n][p], coords[n][p])
+
+        return coords

+ 177 - 0
paddlex/inference/models_new/keypoint_detection/result.py

@@ -0,0 +1,177 @@
+# copyright (c) 2024 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import cv2
+import math
+
+import matplotlib.pyplot as plt
+import numpy as np
+
+from ...common.result import BaseCVResult
+
+
+def get_color(idx):
+    idx = idx * 3
+    color = ((37 * idx) % 255, (17 * idx) % 255, (29 * idx) % 255)
+    return color
+
+
+def draw_keypoints(img, results, visual_thresh=0.1, ids=None):
+    plt.switch_backend("agg")
+    skeletons = results["keypoints"]
+    skeletons = np.array(skeletons)
+    if len(skeletons) > 0:
+        kpt_nums = skeletons.shape[1]
+    if kpt_nums == 17:  # plot coco keypoint
+        EDGES = [
+            (0, 1),
+            (0, 2),
+            (1, 3),
+            (2, 4),
+            (3, 5),
+            (4, 6),
+            (5, 7),
+            (6, 8),
+            (7, 9),
+            (8, 10),
+            (5, 11),
+            (6, 12),
+            (11, 13),
+            (12, 14),
+            (13, 15),
+            (14, 16),
+            (11, 12),
+        ]
+    else:  # plot mpii keypoint
+        EDGES = [
+            (0, 1),
+            (1, 2),
+            (3, 4),
+            (4, 5),
+            (2, 6),
+            (3, 6),
+            (6, 7),
+            (7, 8),
+            (8, 9),
+            (10, 11),
+            (11, 12),
+            (13, 14),
+            (14, 15),
+            (8, 12),
+            (8, 13),
+        ]
+    NUM_EDGES = len(EDGES)
+
+    colors = [
+        [255, 0, 0],
+        [255, 85, 0],
+        [255, 170, 0],
+        [255, 255, 0],
+        [170, 255, 0],
+        [85, 255, 0],
+        [0, 255, 0],
+        [0, 255, 85],
+        [0, 255, 170],
+        [0, 255, 255],
+        [0, 170, 255],
+        [0, 85, 255],
+        [0, 0, 255],
+        [85, 0, 255],
+        [170, 0, 255],
+        [255, 0, 255],
+        [255, 0, 170],
+        [255, 0, 85],
+    ]
+    plt.figure()
+    color_set = results["colors"] if "colors" in results else None
+
+    if "bbox" in results and ids is None:
+        bboxs = results["bbox"]
+        for j, rect in enumerate(bboxs):
+            xmin, ymin, xmax, ymax = rect
+            color = (
+                colors[0] if color_set is None else colors[color_set[j] % len(colors)]
+            )
+            cv2.rectangle(img, (xmin, ymin), (xmax, ymax), color, 1)
+
+    canvas = img.copy()
+    for i in range(kpt_nums):
+        for j in range(len(skeletons)):
+            if skeletons[j][i, 2] < visual_thresh:
+                continue
+            if ids is None:
+                color = (
+                    colors[i]
+                    if color_set is None
+                    else colors[color_set[j] % len(colors)]
+                )
+            else:
+                color = get_color(ids[j])
+
+            cv2.circle(
+                canvas,
+                tuple(skeletons[j][i, 0:2].astype("int32")),
+                2,
+                color,
+                thickness=-1,
+            )
+
+    stickwidth = 1
+
+    for i in range(NUM_EDGES):
+        for j in range(len(skeletons)):
+            edge = EDGES[i]
+            if (
+                skeletons[j][edge[0], 2] < visual_thresh
+                or skeletons[j][edge[1], 2] < visual_thresh
+            ):
+                continue
+
+            cur_canvas = canvas.copy()
+            X = [skeletons[j][edge[0], 1], skeletons[j][edge[1], 1]]
+            Y = [skeletons[j][edge[0], 0], skeletons[j][edge[1], 0]]
+            mX = np.mean(X)
+            mY = np.mean(Y)
+            length = ((X[0] - X[1]) ** 2 + (Y[0] - Y[1]) ** 2) ** 0.5
+            angle = math.degrees(math.atan2(X[0] - X[1], Y[0] - Y[1]))
+            polygon = cv2.ellipse2Poly(
+                (int(mY), int(mX)), (int(length / 2), stickwidth), int(angle), 0, 360, 1
+            )
+            if ids is None:
+                color = (
+                    colors[i]
+                    if color_set is None
+                    else colors[color_set[j] % len(colors)]
+                )
+            else:
+                color = get_color(ids[j])
+            cv2.fillConvexPoly(cur_canvas, polygon, color)
+            canvas = cv2.addWeighted(canvas, 0.4, cur_canvas, 0.6, 0)
+    plt.close()
+    return canvas
+
+
+class KptResult(BaseCVResult):
+    """Save Result Transform"""
+
+    def _to_img(self):
+        """apply"""
+        if "kpts" in self:  # for single module result
+            keypoints = [kpt["keypoints"] for kpt in self["kpts"]]
+        else:
+            keypoints = [
+                obj["keypoints"] for obj in self["boxes"]
+            ]  # for top-down pipeline result
+        image = draw_keypoints(self["input_img"], dict(keypoints=np.stack(keypoints)))
+        return {"res": image}

+ 2 - 2
paddlex/inference/models_new/object_detection/predictor.py

@@ -260,7 +260,7 @@ class DetPredictor(BasicPredictor):
         return WarpAffine(input_h=input_h, input_w=input_w, keep_res=keep_res)
 
     def build_to_batch(self):
-        model_names_required_imgsize = [
+        models_required_imgsize = [
             "DETR",
             "DINO",
             "RCNN",
@@ -269,7 +269,7 @@ class DetPredictor(BasicPredictor):
             "BlazeFace",
             "BlazeFace-FPN-SSH",
         ]
-        if any(name in self.model_name for name in model_names_required_imgsize):
+        if any(name in self.model_name for name in models_required_imgsize):
             ordered_required_keys = (
                 "img_size",
                 "img",

+ 13 - 1
paddlex/inference/models_new/object_detection/processors.py

@@ -29,7 +29,7 @@ Number = Union[int, float]
 class ReadImage(CommonReadImage):
     """Reads images from a list of raw image data or file paths."""
 
-    def __call__(self, raw_imgs: List[Union[ndarray, str]]) -> List[dict]:
+    def __call__(self, raw_imgs: List[Union[ndarray, str, dict]]) -> List[dict]:
         """Processes the input list of raw image data or file paths and returns a list of dictionaries containing image information.
 
         Args:
@@ -43,6 +43,18 @@ class ReadImage(CommonReadImage):
             data = dict()
             if isinstance(raw_img, str):
                 data["img_path"] = raw_img
+            if isinstance(raw_img, dict):
+                if "img" in raw_img:
+                    src_img = raw_img["img"]
+                elif "img_path" in raw_img:
+                    src_img = raw_img["img_path"]
+                    data["img_path"] = src_img
+                else:
+                    raise ValueError(
+                        "When raw_img is dict, must have one of keys ['img', 'img_path']."
+                    )
+                data.update(raw_img)
+                raw_img = src_img
             img = self.read(raw_img)
             data["img"] = img
             data["ori_img"] = img

+ 1 - 0
paddlex/inference/pipelines_new/__init__.py

@@ -44,6 +44,7 @@ from .semantic_segmentation import SemanticSegmentationPipeline
 from .instance_segmentation import InstanceSegmentationPipeline
 from .small_object__detection import SmallObjectDetectionPipeline
 from .rotated_object__detection import RotatedObjectDetectionPipeline
+from .keypoint_detection import KeypointDetectionPipeline
 
 
 def get_pipeline_path(pipeline_name: str) -> str:

+ 15 - 0
paddlex/inference/pipelines_new/keypoint_detection/__init__.py

@@ -0,0 +1,15 @@
+# copyright (c) 2024 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from .pipeline import KeypointDetectionPipeline

+ 135 - 0
paddlex/inference/pipelines_new/keypoint_detection/pipeline.py

@@ -0,0 +1,135 @@
+# copyright (c) 2024 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from typing import Any, Dict, Optional, Union, Tuple
+import numpy as np
+from ...utils.pp_option import PaddlePredictorOption
+from ..base import BasePipeline
+
+# [TODO] 待更新models_new到models
+from ...models_new.keypoint_detection.result import KptResult
+
+Number = Union[int, float]
+
+
+class KeypointDetectionPipeline(BasePipeline):
+    """Keypoint Detection pipeline"""
+
+    entities = "human_keypoint_detection"
+
+    def __init__(
+        self,
+        config: Dict,
+        device: str = None,
+        pp_option: PaddlePredictorOption = None,
+        use_hpip: bool = False,
+        hpi_params: Optional[Dict[str, Any]] = None,
+    ) -> None:
+        """
+        Initializes the class with given configurations and options.
+
+        Args:
+            config (Dict): Configuration dictionary containing model and other parameters.
+            device (str): The device to run the prediction on. Default is None.
+            pp_option (PaddlePredictorOption): Options for PaddlePaddle predictor. Default is None.
+            use_hpip (bool): Whether to use high-performance inference (hpip) for prediction. Defaults to False.
+            hpi_params (Optional[Dict[str, Any]]): HPIP specific parameters. Default is None.
+        """
+        super().__init__(
+            device=device, pp_option=pp_option, use_hpip=use_hpip, hpi_params=hpi_params
+        )
+
+        # create object detection model
+        model_cfg = config["SubModules"]["ObjectDetection"]
+        model_kwargs = {}
+        if "threshold" in model_cfg:
+            model_kwargs["threshold"] = model_cfg["threshold"]
+        if "imgsz" in model_cfg:
+            model_kwargs["imgsz"] = model_cfg["imgsz"]
+        self.det_model = self.create_model(model_cfg, **model_kwargs)
+
+        # create keypoint detection model
+        model_cfg = config["SubModules"]["KeypointDetection"]
+        model_kwargs = {}
+        if "flip" in model_cfg:
+            model_kwargs["flip"] = model_cfg["flip"]
+        if "use_udp" in model_cfg:
+            model_kwargs["use_udp"] = model_cfg["use_udp"]
+        self.kpt_model = self.create_model(model_cfg, **model_kwargs)
+
+        self.kpt_input_size = self.kpt_model.input_size
+
+    def _box_xyxy2cs(
+        self, bbox: Union[Number, np.ndarray], padding: float = 1.25
+    ) -> Tuple[np.ndarray, np.ndarray]:
+        """
+        Convert bounding box from (x1, y1, x2, y2) to center and scale.
+
+        Args:
+            bbox (Union[Number, np.ndarray]): The bounding box coordinates (x1, y1, x2, y2).
+            padding (float): The padding factor to adjust the scale of the bounding box.
+
+        Returns:
+            Tuple[np.ndarray, np.ndarray]: The center and scale of the bounding box.
+        """
+        x1, y1, x2, y2 = bbox[:4]
+        center = np.array([x1 + x2, y1 + y2]) * 0.5
+
+        # reshape bbox to fixed aspect ratio
+        aspect_ratio = self.kpt_input_size[0] / self.kpt_input_size[1]
+        w, h = x2 - x1, y2 - y1
+        if w > aspect_ratio * h:
+            h = w / aspect_ratio
+        elif w < aspect_ratio * h:
+            w = h * aspect_ratio
+
+        scale = np.array([w, h]) * padding
+
+        return center, scale
+
+    def predict(
+        self, input: str | list[str] | np.ndarray | list[np.ndarray], **kwargs
+    ) -> KptResult:
+        """Predicts image classification results for the given input.
+
+        Args:
+            input (str | list[str] | np.ndarray | list[np.ndarray]): The input image(s) or path(s) to the images.
+            **kwargs: Additional keyword arguments that can be passed to the function.
+
+        Returns:
+            KptResult: The predicted KeyPoint Detection results.
+        """
+
+        for det_res in self.det_model(input):
+            ori_img, img_path = det_res["input_img"], det_res["input_path"]
+            single_img_res = {"input_path": img_path, "input_img": ori_img, "boxes": []}
+            for box in det_res["boxes"]:
+                center, scale = self._box_xyxy2cs(box["coordinate"])
+                kpt_res = next(
+                    self.kpt_model(
+                        {
+                            "img": ori_img,
+                            "center": center,
+                            "scale": scale,
+                        }
+                    )
+                )
+                single_img_res["boxes"].append(
+                    {
+                        "coordinate": box["coordinate"],
+                        "det_score": box["score"],
+                        "keypoints": kpt_res["kpts"][0]["keypoints"],
+                    }
+                )
+            yield KptResult(single_img_res)

+ 2 - 0
paddlex/inference/utils/official_models.py

@@ -318,6 +318,8 @@ PP-LCNet_x1_0_vehicle_attribute_infer.tar",
     "PP-TSMv2-LCNetV2_16frames_uniform": "https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-TSMv2-LCNetV2_16frames_uniform_infer.tar",
     "MaskFormer_tiny": "https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/MaskFormer_tiny_infer.tar",
     "MaskFormer_small": "https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/MaskFormer_small_infer.tar",
+    "PP-TinyPose_128x96": "https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-TinyPose_128x96_infer.tar",
+    "PP-TinyPose_256x192": "https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0rc0/PP-TinyPose_256x192_infer.tar",
 }
 
 

+ 7 - 0
paddlex/modules/__init__.py

@@ -104,6 +104,13 @@ from .face_recognition import (
 
 from .ts_forecast import TSFCDatasetChecker, TSFCTrainer, TSFCEvaluator
 
+from .keypoint_detection import (
+    KeypointDatasetChecker,
+    KeypointTrainer,
+    KeypointEvaluator,
+    KeypointExportor,
+)
+
 from .video_classification import (
     VideoClsDatasetChecker,
     VideoClsTrainer,

+ 18 - 0
paddlex/modules/keypoint_detection/__init__.py

@@ -0,0 +1,18 @@
+# copyright (c) 2024 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from .trainer import KeypointTrainer
+from .dataset_checker import KeypointDatasetChecker
+from .evaluator import KeypointEvaluator
+from .exportor import KeypointExportor

+ 56 - 0
paddlex/modules/keypoint_detection/dataset_checker/__init__.py

@@ -0,0 +1,56 @@
+# copyright (c) 2024 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+from ...object_detection.dataset_checker import COCODatasetChecker
+from .dataset_src import check
+from ..model_list import MODELS
+
+
+class KeypointDatasetChecker(COCODatasetChecker):
+    """Dataset Checker for Object Detection Model"""
+
+    entities = MODELS
+    sample_num = 10
+
+    def get_dataset_type(self) -> str:
+        """return the dataset type
+
+        Returns:
+            str: dataset type
+        """
+        return "KeypointTopDownCocoDetDataset"
+
+    def check_dataset(self, dataset_dir: str, sample_num: int = sample_num) -> dict:
+        """check if the dataset meets the specifications and get dataset summary
+
+        Args:
+            dataset_dir (str): the root directory of dataset.
+            sample_num (int): the number to be sampled.
+        Returns:
+            dict: dataset summary.
+        """
+        return check(dataset_dir, self.output)
+
+    def convert_dataset(self, src_dataset_dir: str) -> str:
+        """convert the dataset from other type to specified type
+
+        Args:
+            src_dataset_dir (str): the root directory of dataset.
+
+        Returns:
+            str: the root directory of converted dataset.
+        """
+        dst_dataset_dir = src_dataset_dir
+        return dst_dataset_dir

+ 15 - 0
paddlex/modules/keypoint_detection/dataset_checker/dataset_src/__init__.py

@@ -0,0 +1,15 @@
+# copyright (c) 2024 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from .check_dataset import check

+ 86 - 0
paddlex/modules/keypoint_detection/dataset_checker/dataset_src/check_dataset.py

@@ -0,0 +1,86 @@
+# copyright (c) 2024 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+import os
+import os.path as osp
+from collections import defaultdict, Counter
+from pathlib import Path
+from PIL import Image, ImageOps
+import json
+from pycocotools.coco import COCO
+
+from .....utils.errors import DatasetFileNotFoundError
+from .utils.visualizer import draw_keypoint
+
+
+def check(dataset_dir, output, sample_num=10):
+    """check dataset"""
+    dataset_dir = osp.abspath(dataset_dir)
+    if not osp.exists(dataset_dir) or not osp.isdir(dataset_dir):
+        raise DatasetFileNotFoundError(file_path=dataset_dir)
+
+    sample_cnts = dict()
+    sample_paths = defaultdict(list)
+    im_sizes = defaultdict(Counter)
+    tags = ["instance_train", "instance_val"]
+    for _, tag in enumerate(tags):
+        file_list = osp.join(dataset_dir, f"annotations/{tag}.json")
+        if not osp.exists(file_list):
+            if tag in ("instance_train", "instance_val"):
+                # train and val file lists must exist
+                raise DatasetFileNotFoundError(
+                    file_path=file_list,
+                    solution=f"Ensure that both `instance_train.json` and `instance_val.json` exist in \
+{dataset_dir}/annotations",
+                )
+            else:
+                continue
+        else:
+            with open(file_list, "r", encoding="utf-8") as f:
+                jsondata = json.load(f)
+
+            coco = COCO(file_list)
+            num_class = len(coco.getCatIds())
+
+            vis_save_dir = osp.join(output, "demo_img")
+
+            image_info = jsondata["images"]
+            sample_cnts[tag] = len(image_info)
+            sample_num = min(sample_num, len(image_info))
+            for i in range(sample_num):
+                file_name = image_info[i]["file_name"]
+                img_id = image_info[i]["id"]
+                img_path = osp.join(dataset_dir, "images", file_name)
+                if not osp.exists(img_path):
+                    raise DatasetFileNotFoundError(file_path=img_path)
+                img = Image.open(img_path)
+                img = ImageOps.exif_transpose(img)
+                vis_im = draw_keypoint(img, coco, img_id)
+                vis_path = osp.join(vis_save_dir, file_name)
+                Path(vis_path).parent.mkdir(parents=True, exist_ok=True)
+                vis_im.save(vis_path)
+                sample_path = osp.join(
+                    "check_dataset", os.path.relpath(vis_path, output)
+                )
+                sample_paths[tag].append(sample_path)
+
+    attrs = {}
+    attrs["num_classes"] = num_class
+    attrs["train_samples"] = sample_cnts["instance_train"]
+    attrs["train_sample_paths"] = sample_paths["instance_train"]
+
+    attrs["val_samples"] = sample_cnts["instance_val"]
+    attrs["val_sample_paths"] = sample_paths["instance_val"]
+    return attrs

+ 13 - 0
paddlex/modules/keypoint_detection/dataset_checker/dataset_src/utils/__init__.py

@@ -0,0 +1,13 @@
+# copyright (c) 2024 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

+ 119 - 0
paddlex/modules/keypoint_detection/dataset_checker/dataset_src/utils/visualizer.py

@@ -0,0 +1,119 @@
+# copyright (c) 2024 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import numpy as np
+from PIL import ImageDraw
+from pycocotools.coco import COCO
+
+
+def draw_keypoint(image, coco_info: COCO, img_id):
+    """
+    Draw keypoints on image for the COCO human keypoint dataset with 17 keypoints.
+
+    Args:
+        image: PIL.Image object
+        coco_info: COCO object (from pycocotools.coco)
+        img_id: Image ID
+
+    Returns:
+        image: PIL.Image object with keypoints and skeleton drawn
+    """
+
+    # Initialize the drawing context
+    image = image.convert("RGB")
+    draw = ImageDraw.Draw(image)
+    image_size = image.size
+    width = int(max(image_size) * 0.005)  # Line thickness for drawing
+
+    # Define the skeleton connections based on COCO keypoint indexes
+    skeleton = [
+        [15, 13],
+        [13, 11],
+        [16, 14],
+        [14, 12],
+        [11, 12],
+        [5, 11],
+        [6, 12],
+        [5, 6],
+        [5, 7],
+        [6, 8],
+        [7, 9],
+        [8, 10],
+        [1, 2],
+        [0, 1],
+        [0, 2],
+        [1, 3],
+        [2, 4],
+        [3, 5],
+        [4, 6],
+    ]
+
+    # Define colors for each keypoint (you can customize these colors)
+    keypoint_colors = [
+        (255, 0, 0),  # Nose
+        (255, 85, 0),  # Left Eye
+        (255, 170, 0),  # Right Eye
+        (255, 255, 0),  # Left Ear
+        (170, 255, 0),  # Right Ear
+        (85, 255, 0),  # Left Shoulder
+        (0, 255, 0),  # Right Shoulder
+        (0, 255, 85),  # Left Elbow
+        (0, 255, 170),  # Right Elbow
+        (0, 255, 255),  # Left Wrist
+        (0, 170, 255),  # Right Wrist
+        (0, 85, 255),  # Left Hip
+        (0, 0, 255),  # Right Hip
+        (85, 0, 255),  # Left Knee
+        (170, 0, 255),  # Right Knee
+        (255, 0, 255),  # Left Ankle
+        (255, 0, 170),  # Right Ankle
+    ]
+
+    # Get annotations for the image
+    annotations = coco_info.loadAnns(coco_info.getAnnIds(imgIds=img_id))
+
+    # Loop over each person annotation
+    for ann in annotations:
+        keypoints = ann.get("keypoints", [])
+        if not keypoints:
+            continue  # Skip if no keypoints are present
+
+        # Reshape keypoints into (num_keypoints, 3)
+        keypoints = np.array(keypoints).reshape(-1, 3)
+
+        # Draw keypoints
+        for idx, (x, y, v) in enumerate(keypoints):
+            if v == 2:  # v=2 means the keypoint is labeled and visible
+                radius = max(1, int(width / 2))
+                x, y = float(x), float(y)
+                color = keypoint_colors[idx % len(keypoint_colors)]
+                draw.ellipse(
+                    (x - radius, y - radius, x + radius, y + radius), fill=color
+                )
+
+        # Draw skeleton by connecting keypoints
+        for sk in skeleton:
+            kp1_idx, kp2_idx = sk[0], sk[1]
+            x1, y1, v1 = keypoints[kp1_idx]
+            x2, y2, v2 = keypoints[kp2_idx]
+            if v1 == 2 and v2 == 2:
+                # Both keypoints are visible
+                x1, y1 = float(x1), float(y1)
+                x2, y2 = float(x2), float(y2)
+                draw.line(
+                    (x1, y1, x2, y2),
+                    fill=(0, 255, 0),  # Line color (you can customize)
+                    width=width,
+                )
+
+    return image

+ 41 - 0
paddlex/modules/keypoint_detection/evaluator.py

@@ -0,0 +1,41 @@
+# copyright (c) 2024 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+from ..object_detection import DetEvaluator
+from .model_list import MODELS
+
+
+class KeypointEvaluator(DetEvaluator):
+    """Object Detection Model Evaluator"""
+
+    entities = MODELS
+
+    def update_config(self):
+        """update evalution config"""
+        if self.eval_config.log_interval:
+            self.pdx_config.update_log_interval(self.eval_config.log_interval)
+        metric = self.pdx_config.metric
+        data_fields = (
+            self.pdx_config.TrainDataset["data_fields"]
+            if "data_fields" in self.pdx_config.TrainDataset
+            else None
+        )
+        self.pdx_config.update_dataset(
+            self.global_config.dataset_dir,
+            "KeypointTopDownCocoDataset",
+            data_fields=data_fields,
+            metric=metric,
+        )
+        self.pdx_config.update_weights(self.eval_config.weight_path)

+ 22 - 0
paddlex/modules/keypoint_detection/exportor.py

@@ -0,0 +1,22 @@
+# copyright (c) 2024 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from ..base import BaseExportor
+from .model_list import MODELS
+
+
+class KeypointExportor(BaseExportor):
+    """Object Detection Model Exportor"""
+
+    entities = MODELS

+ 16 - 0
paddlex/modules/keypoint_detection/model_list.py

@@ -0,0 +1,16 @@
+# copyright (c) 2024 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+MODELS = ["PP-TinyPose_128x96", "PP-TinyPose_256x192"]

+ 39 - 0
paddlex/modules/keypoint_detection/trainer.py

@@ -0,0 +1,39 @@
+# copyright (c) 2024 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+from .model_list import MODELS
+from ..object_detection import DetTrainer
+
+
+class KeypointTrainer(DetTrainer):
+    """Human Pose Estimation Model Trainer"""
+
+    entities = MODELS
+
+    def _update_dataset(self):
+        """update dataset settings"""
+        metric = self.pdx_config.metric
+        data_fields = (
+            self.pdx_config.TrainDataset["data_fields"]
+            if "data_fields" in self.pdx_config.TrainDataset
+            else None
+        )
+
+        self.pdx_config.update_dataset(
+            self.global_config.dataset_dir,
+            "KeypointTopDownCocoDataset",
+            data_fields=data_fields,
+            metric=metric,
+        )

+ 10 - 3
paddlex/modules/object_detection/trainer.py

@@ -28,11 +28,16 @@ class DetTrainer(BaseTrainer):
 
     def _update_dataset(self):
         """update dataset settings"""
-        metric = self.pdx_config.metric if 'metric' in self.pdx_config else 'COCO'
-        data_fields = self.pdx_config.TrainDataset['data_fields'] if 'data_fields' in self.pdx_config.TrainDataset else None
+        metric = self.pdx_config.metric if "metric" in self.pdx_config else "COCO"
+        data_fields = (
+            self.pdx_config.TrainDataset["data_fields"]
+            if "data_fields" in self.pdx_config.TrainDataset
+            else None
+        )
 
         self.pdx_config.update_dataset(
-            self.global_config.dataset_dir, "COCODetDataset",
+            self.global_config.dataset_dir,
+            "COCODetDataset",
             data_fields=data_fields,
             metric=metric,
         )
@@ -64,6 +69,8 @@ class DetTrainer(BaseTrainer):
             epochs_iters = self.train_config.epochs_iters
         else:
             epochs_iters = self.pdx_config.get_epochs_iters()
+        if self.train_config.warmup_steps is not None:
+            self.pdx_config.update_warmup_steps(self.train_config.warmup_steps)
         if self.global_config.output is not None:
             self.pdx_config.update_save_dir(self.global_config.output)
 

+ 151 - 0
paddlex/repo_apis/PaddleDetection_api/configs/PP-TinyPose_128x96.yaml

@@ -0,0 +1,151 @@
+use_gpu: true
+log_iter: 5
+save_dir: output
+snapshot_epoch: 10
+weights: output/tinypose_128x96/model_final
+epoch: 420
+num_joints: &num_joints 17
+pixel_std: &pixel_std 200
+metric: KeyPointTopDownCOCOEval
+num_classes: 1
+train_height: &train_height 128
+train_width: &train_width 96
+trainsize: &trainsize [*train_width, *train_height]
+hmsize: &hmsize [24, 32]
+flip_perm: &flip_perm [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12], [13, 14], [15, 16]]
+use_ema: true
+
+# AMP training
+init_loss_scaling: 32752
+master_grad: true
+
+#####model
+architecture: TopDownHRNet
+
+TopDownHRNet:
+  backbone: LiteHRNet
+  post_process: HRNetPostProcess
+  flip_perm: *flip_perm
+  num_joints: *num_joints
+  width: &width 40
+  loss: KeyPointMSELoss
+  use_dark: true
+
+LiteHRNet:
+  network_type: wider_naive
+  freeze_at: -1
+  freeze_norm: false
+  return_idx: [0]
+
+KeyPointMSELoss:
+  use_target_weight: true
+  loss_scale: 1.0
+
+#####optimizer
+LearningRate:
+  base_lr: 0.008
+  schedulers:
+  - !PiecewiseDecay
+    milestones: [380, 410]
+    gamma: 0.1
+  - !LinearWarmup
+    start_factor: 0.001
+    steps: 500
+
+OptimizerBuilder:
+  optimizer:
+    type: Adam
+  regularizer:
+    factor: 0.0
+    type: L2
+
+
+#####data
+TrainDataset:
+  !KeypointTopDownCocoDataset
+    image_dir: ""
+    anno_path: aic_coco_train_cocoformat.json
+    dataset_dir: dataset
+    num_joints: *num_joints
+    trainsize: *trainsize
+    pixel_std: *pixel_std
+    use_gt_bbox: True
+
+
+EvalDataset:
+  !KeypointTopDownCocoDataset
+    image_dir: val2017
+    anno_path: annotations/person_keypoints_val2017.json
+    dataset_dir: dataset/coco
+    num_joints: *num_joints
+    trainsize: *trainsize
+    pixel_std: *pixel_std
+    use_gt_bbox: True
+    image_thre: 0.5
+
+TestDataset:
+  !ImageFolder
+    anno_path: dataset/coco/keypoint_imagelist.txt
+
+worker_num: 2
+global_mean: &global_mean [0.485, 0.456, 0.406]
+global_std: &global_std [0.229, 0.224, 0.225]
+TrainReader:
+  sample_transforms:
+    - RandomFlipHalfBodyTransform:
+        scale: 0.25
+        rot: 30
+        num_joints_half_body: 8
+        prob_half_body: 0.3
+        pixel_std: *pixel_std
+        trainsize: *trainsize
+        upper_body_ids: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
+        flip_pairs: *flip_perm
+    - AugmentationbyInformantionDropping:
+        prob_cutout: 0.5
+        offset_factor: 0.05
+        num_patch: 1
+        trainsize: *trainsize
+    - TopDownAffine:
+        trainsize: *trainsize
+        use_udp: true
+    - ToHeatmapsTopDown_DARK:
+        hmsize: *hmsize
+        sigma: 1
+  batch_transforms:
+    - NormalizeImage:
+        mean: *global_mean
+        std: *global_std
+        is_scale: true
+    - Permute: {}
+  batch_size: 512
+  shuffle: true
+  drop_last: false
+
+EvalReader:
+  sample_transforms:
+    - TopDownAffine:
+        trainsize: *trainsize
+        use_udp: true
+  batch_transforms:
+    - NormalizeImage:
+        mean: *global_mean
+        std: *global_std
+        is_scale: true
+    - Permute: {}
+  batch_size: 16
+
+TestReader:
+  inputs_def:
+    image_shape: [3, *train_height, *train_width]
+  sample_transforms:
+    - Decode: {}
+    - TopDownEvalAffine:
+        trainsize: *trainsize
+    - NormalizeImage:
+        mean: *global_mean
+        std: *global_std
+        is_scale: true
+    - Permute: {}
+  batch_size: 1
+  fuse_normalize: false

+ 148 - 0
paddlex/repo_apis/PaddleDetection_api/configs/PP-TinyPose_256x192.yaml

@@ -0,0 +1,148 @@
+use_gpu: true
+log_iter: 5
+save_dir: output
+snapshot_epoch: 10
+weights: output/tinypose_256x192/model_final
+epoch: 420
+num_joints: &num_joints 17
+pixel_std: &pixel_std 200
+metric: KeyPointTopDownCOCOEval
+num_classes: 1
+train_height: &train_height 256
+train_width: &train_width 192
+trainsize: &trainsize [*train_width, *train_height]
+hmsize: &hmsize [48, 64]
+flip_perm: &flip_perm [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12], [13, 14], [15, 16]]
+use_ema: true
+
+
+#####model
+architecture: TopDownHRNet
+
+TopDownHRNet:
+  backbone: LiteHRNet
+  post_process: HRNetPostProcess
+  flip_perm: *flip_perm
+  num_joints: *num_joints
+  width: &width 40
+  loss: KeyPointMSELoss
+  use_dark: true
+
+LiteHRNet:
+  network_type: wider_naive
+  freeze_at: -1
+  freeze_norm: false
+  return_idx: [0]
+
+KeyPointMSELoss:
+  use_target_weight: true
+  loss_scale: 1.0
+
+#####optimizer
+LearningRate:
+  base_lr: 0.002
+  schedulers:
+  - !PiecewiseDecay
+    milestones: [380, 410]
+    gamma: 0.1
+  - !LinearWarmup
+    start_factor: 0.001
+    steps: 500
+
+OptimizerBuilder:
+  optimizer:
+    type: Adam
+  regularizer:
+    factor: 0.0
+    type: L2
+
+
+#####data
+TrainDataset:
+  !KeypointTopDownCocoDataset
+    image_dir: ""
+    anno_path: aic_coco_train_cocoformat.json
+    dataset_dir: dataset
+    num_joints: *num_joints
+    trainsize: *trainsize
+    pixel_std: *pixel_std
+    use_gt_bbox: True
+
+
+EvalDataset:
+  !KeypointTopDownCocoDataset
+    image_dir: val2017
+    anno_path: annotations/person_keypoints_val2017.json
+    dataset_dir: dataset/coco
+    num_joints: *num_joints
+    trainsize: *trainsize
+    pixel_std: *pixel_std
+    use_gt_bbox: True
+    image_thre: 0.5
+
+TestDataset:
+  !ImageFolder
+    anno_path: dataset/coco/keypoint_imagelist.txt
+
+worker_num: 2
+global_mean: &global_mean [0.485, 0.456, 0.406]
+global_std: &global_std [0.229, 0.224, 0.225]
+TrainReader:
+  sample_transforms:
+    - RandomFlipHalfBodyTransform:
+        scale: 0.25
+        rot: 30
+        num_joints_half_body: 8
+        prob_half_body: 0.3
+        pixel_std: *pixel_std
+        trainsize: *trainsize
+        upper_body_ids: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
+        flip_pairs: *flip_perm
+    - AugmentationbyInformantionDropping:
+        prob_cutout: 0.5
+        offset_factor: 0.05
+        num_patch: 1
+        trainsize: *trainsize
+    - TopDownAffine:
+        trainsize: *trainsize
+        use_udp: true
+    - ToHeatmapsTopDown_DARK:
+        hmsize: *hmsize
+        sigma: 2
+  batch_transforms:
+    - NormalizeImage:
+        mean: *global_mean
+        std: *global_std
+        is_scale: true
+    - Permute: {}
+  batch_size: 128
+  shuffle: true
+  drop_last: false
+
+EvalReader:
+  sample_transforms:
+    - TopDownAffine:
+        trainsize: *trainsize
+        use_udp: true
+  batch_transforms:
+    - NormalizeImage:
+        mean: *global_mean
+        std: *global_std
+        is_scale: true
+    - Permute: {}
+  batch_size: 16
+
+TestReader:
+  inputs_def:
+    image_shape: [3, *train_height, *train_width]
+  sample_transforms:
+    - Decode: {}
+    - TopDownEvalAffine:
+        trainsize: *trainsize
+    - NormalizeImage:
+        mean: *global_mean
+        std: *global_std
+        is_scale: true
+    - Permute: {}
+  batch_size: 1
+  fuse_normalize: false

+ 16 - 0
paddlex/repo_apis/PaddleDetection_api/object_det/config.py

@@ -89,6 +89,22 @@ class DetConfig(BaseConfig, PPDetConfigMixin):
                 val_anno_path,
                 test_anno_path,
             )
+        elif dataset_type == "KeypointTopDownCocoDataset":
+            ds_cfg = {
+                "TrainDataset": {
+                    "image_dir": image_dir,
+                    "anno_path": train_anno_path,
+                    "dataset_dir": dataset_path,
+                },
+                "EvalDataset": {
+                    "image_dir": image_dir,
+                    "anno_path": val_anno_path,
+                    "dataset_dir": dataset_path,
+                },
+                "TestDataset": {
+                    "anno_path": test_anno_path,
+                },
+            }
         else:
             raise ValueError(f"{repr(dataset_type)} is not supported.")
         self.update(ds_cfg)

+ 30 - 3
paddlex/repo_apis/PaddleDetection_api/object_det/register.py

@@ -863,7 +863,6 @@ register_model_info(
     }
 )
 
-
 register_model_info(
     {
         "model_name": "PicoDet_LCNet_x2_5_face",
@@ -879,7 +878,6 @@ register_model_info(
     }
 )
 
-
 register_model_info(
     {
         "model_name": "BlazeFace",
@@ -895,7 +893,6 @@ register_model_info(
     }
 )
 
-
 register_model_info(
     {
         "model_name": "BlazeFace-FPN-SSH",
@@ -1032,3 +1029,33 @@ register_model_info(
         },
     }
 )
+
+register_model_info(
+    {
+        "model_name": "PP-TinyPose_128x96",
+        "suite": "Det",
+        "config_path": osp.join(PDX_CONFIG_DIR, "PP-TinyPose_128x96.yaml"),
+        "supported_apis": ["train", "evaluate", "predict", "export", "infer"],
+        "supported_dataset_types": ["KeypointTopDownCocoDetDataset"],
+        "supported_train_opts": {
+            "device": ["cpu", "gpu_nxcx", "xpu", "npu", "mlu"],
+            "dy2st": False,
+            "amp": ["OFF"],
+        },
+    }
+)
+
+register_model_info(
+    {
+        "model_name": "PP-TinyPose_256x192",
+        "suite": "Det",
+        "config_path": osp.join(PDX_CONFIG_DIR, "PP-TinyPose_256x192.yaml"),
+        "supported_apis": ["train", "evaluate", "predict", "export", "infer"],
+        "supported_dataset_types": ["KeypointTopDownCocoDetDataset"],
+        "supported_train_opts": {
+            "device": ["cpu", "gpu_nxcx", "xpu", "npu", "mlu"],
+            "dy2st": False,
+            "amp": ["OFF"],
+        },
+    }
+)