Prechádzať zdrojové kódy

yowo docs for paddlex (#2863)

* yowo docs for paddlex

* fix image ad
Sunflower7788 10 mesiacov pred
rodič
commit
e018555fdb

+ 70 - 0
docs/data_annotations/video_modules/video_detection.en.md

@@ -0,0 +1,70 @@
+# PaddleX Video Detection Task Module Data Annotation Tutorial
+
+This document will guide you through the process of annotating data for video detection tasks.
+
+## 1. Annotation
+
+### 1.1 Data Preparation
+
+Collect the raw video data and organize each video into different folders based on their categories. Supported video formats include '.mp4', '.avi', '.mov', '.mkv'. For example:
+
+```bash
+dataset_dir  # Root directory of the dataset; the directory name can be changed
+├── class1    # Directory for storing videos of each category; the directory name can be changed, containing multiple video files
+├── class2
+├── ...
+
+```
+
+### 1.2 Video Data Conversion
+
+The raw video data needs to be converted into image frames. You can use the [convert_video_to_images.py](https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/doc_images/applications/video_det_dataset_prepare/convert_video_to_images.py) script and run the following command:
+
+```bash
+python convert_video_to_images.py dataset_dir dataset_dir/rgb-images
+```
+Here, `dataset_dir` is the directory with videos to be annotated from section 1.1. `rgb-images` is the directory where the converted image frames will be saved, with each video's images stored in a separate subdirectory.
+
+### 1.3 Image Frame Data Annotation
+
+#### Annotation Process
+
+* For annotation, you can refer to the [Object Detection Annotation Documentation](../cv_modules/object_detection.md). Use `Labelme` or `PaddleLabel` to annotate the bounding boxes for each image, and save them in the coco.json format.
+
+* After completing the annotation, you need to organize the annotation files into the format required in section 2. Data Format. Use the [convert_coco_to_txt.py](https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/doc_images/applications/video_det_dataset_prepare/convert_coco_to_txt.py) script to convert the coco.json annotation files into txt format. Place these files in the `labels` directory, with each annotation file corresponding to the image frames of a single video.
+
+
+```bash
+python convert_coco_to_txt.py coco.json dataset_dir
+```
+ `dataset_dir` is the directory where you want to save the txt files.
+ `coco.json` is the annotation file in coco format.
+
+##  2. Data Format
+
+* The directory structure for the final dataset should be as follows:
+
+```bash
+ dataset_dir    # Root directory of the dataset; the directory name can be changed
+├── rgb-images     # Directory for saving video images; the directory name cannot be changed
+│   ├── class1    # Directory for storing videos of each category; the directory name can be changed
+│   │   ├── video1_images  # Directory for storing images from each video in the category; the directory name can be changed
+│   │   │   ├── 00001.jpg  # Image naming rule: frame number.jpg, e.g., 00001.jpg
+│   │   │   ├── 00002.jpg
+│   │   ├── video2_images
+│   ├── class2
+│   ├── ...
+├── labels       # Directory for saving labels of the video images; the directory name cannot be changed and corresponds to the subdirectory names in rgb-images
+│   ├── class1    # Directory for storing video labels of each category; the directory name can be changed
+│   │   ├── video1_images  # Directory for storing image labels from each video in the category; the directory name can be changed
+│   │   │   ├── 00001.txt  # Each txt file corresponds to the label of an image, with each line formatted as: classid x1 y1 x2 y2
+│   │   │   ├── 00002.txt
+│   │   ├── video2_images
+│   ├── class2
+│   ├── ...
+├── label_map.txt # File mapping label ids to category names; the file name cannot be changed. Each line provides a category id and name, e.g., 1 Basketball
+├── train.txt     # Training set annotation file, with each line providing the path to a video image label, e.g., labels/Biking/v_Biking_g02_c01/00228.txt
+└── val.txt       # Validation set annotation file, with each line providing the path to a video image label
+```
+
+The annotation files use an image format. Please refer to the above specifications to prepare your data. Additionally, you can refer to the [example dataset](https://paddle-model-ecology.bj.bcebos.com/paddlex/data/video_det_examples.tar) for guidance.

+ 75 - 0
docs/data_annotations/video_modules/video_detection.md

@@ -0,0 +1,75 @@
+---
+comments: true
+---
+
+# PaddleX视频检测任务模块数据标注教程
+
+本文档将介绍如何完成视频检测相关任务的数据标注。
+
+## 1. 标注
+
+### 1.1 数据准备
+
+收集原始视频数据,讲每个视频按照类别分别放在不同的文件夹中, 支持'.mp4', '.avi', '.mov', '.mkv'格式视频。例如:
+
+```bash
+dataset_dir  # 数据集根目录,目录名称可以改变
+├── class1    # 每个类别的视频图像的保存目录,目录名称可以改变,目录下有多个视频文件
+├── class2
+├── ...
+
+```
+
+### 1.2 视频数据转换
+
+原始视频数据需要转换成图像帧,可以使用[convert_video_to_images.py](https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/doc_images/applications/video_det_dataset_prepare/convert_video_to_images.py)脚本,执行以下命令:
+
+```bash
+python convert_video_to_images.py dataset_dir dataset_dir/rgb-images
+```
+ `dataset_dir` 为1.1中待标注的视频目录。
+ `rgb-images` 为转换后的图像帧保存目录,每个视频的图像保存在一个同个目录下。
+
+### 1.2 图像帧数据标注
+
+### 标注过程
+
+* 标注可以参考[目标检测标注文档](../cv_modules/object_detection.md),使用`Labelme`或者 `PaddleLabel` 把每张图的待检测框标注出来, 并存为coco.json的数据格式。
+
+* 标注完成后,需要将标注文件整理为下面的2.数据格式中要求的格式,使用 [convert_coco_to_txt.py](https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/doc_images/applications/video_det_dataset_prepare/convert_coco_to_txt.py)脚本,将coco.json的标注文件转为txt格式, 放在`labels`目录下,每个标注文件对应一个视频的图像帧。
+
+
+```bash
+python convert_coco_to_txt.py coco.json dataset_dir
+```
+ `dataset_dir` 为1.1中待标注的视频目录。
+ `coco.json` 为标注后保存的coco.json文件。
+
+
+##  2. 数据格式
+* PaddleX 针对视频检测任务定义的数据集,组织结构和标注格式如下:
+
+```bash
+dataset_dir    # 数据集根目录,目录名称可以改变
+├── rgb-images     # 视频图像的保存目录,目录名称不可以改变
+│   ├── class1    # 每个类别的视频图像的保存目录,目录名称可以改变
+│   │   ├── video1_images  # 该类别下每个视频的图像保存目录,目录名称可以改变
+│   │   │   ├── 00001.jpg  # 图像命名规则为:帧号.jpg,例如:00001.jpg。
+│   │   │   ├── 00002.jpg
+│   │   ├── video2_images
+│   ├── class2
+│   ├── ...
+├── labels       # 视频图像的标签保存目录,目录名称不可以改变,与rgb-images的子目录名称对应
+│   ├── class1    # 每个类别的视频图像的保存目录,目录名称可以改变
+│   │   ├── video1_images  # 该类别下每个视频的图像标签保存目录,目录名称可以改变
+│   │   │   ├── 00001.txt  # 每个txt文件对应一个图像的标签,每行内容举例:classid x1 y1 x2 y2。
+│   │   │   ├── 00002.txt
+│   │   ├── video2_images
+│   ├── class2
+│   ├── ...
+├── label_map.txt # 标注id和类别名称的对应关系,文件名称不可改变。每行给出类别id和类别名称,内容举例:1 Basketball
+├── train.txt     # 训练集标注文件,每行给出视频图像的标签路径,内容举例:labels/Biking/v_Biking_g02_c01/00228.txt
+└── val.txt       # 验证集标注文件,每行给出视频图像的标签路径
+
+```
+标注文件采用图像格式。请大家参考上述规范准备数据,此外可以参考[示例数据集](https://paddle-model-ecology.bj.bcebos.com/paddlex/data/video_det_examples.tar)。

+ 218 - 0
docs/module_usage/tutorials/video_modules/video_detection.en.md

@@ -0,0 +1,218 @@
+---
+comments: true
+---
+
+# Video Detection Module Development Tutorial
+
+## I. Overview
+Video detection tasks are a critical component of computer vision systems, focusing on identifying and locating objects or events within video sequences. Video detection involves decomposing the video into individual frame sequences and then analyzing these frames to recognize detected objects or actions, such as detecting pedestrians in surveillance videos or identifying specific activities like "running," "jumping," or "playing guitar" in sports or entertainment videos.
+
+The output of the video detection module includes bounding boxes and class labels for each detected object or event. This information can be used by other modules or systems for further analysis, such as tracking the movement of detected objects, generating alerts, or compiling statistics for decision-making processes. Therefore, video detection plays an important role in various applications ranging from security surveillance and autonomous driving to sports analytics and content moderation.
+
+## II. List of Supported Models
+
+
+<table>
+<tr>
+<th>Model</th><th>Model Download Link</th>
+<th>Frame-mAP (@ IoU 0.5)</th>
+<th>Model Storage Size (M)</th>
+<th>Description</th>
+</tr>
+<tr>
+<td>YOWO</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0b2/YOWO_infer.tar">Inference Model</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/YOWO_pretrained.pdparams">训练模型</a></td>
+<td>80.94</td>
+<td>462.891M</td>
+<td rowspan="1">
+YOWO is a single-stage network with two branches. One branch extracts spatial features of key frames (i.e., the current frame) through a 2D-CNN, while the other branch captures spatiotemporal features of a clip composed of previous frames using a 3D-CNN. To accurately aggregate these features, YOWO employs a channel fusion and attention mechanism to maximize the utilization of inter-channel dependencies. Finally, the fused features are used for frame-level detection.
+</td>
+</tr>
+
+</table>
+
+<p><b>Note: The above accuracy metrics refer to Frame-mAP (@ IoU 0.5) Accuracy on the  <a href="http://www.thumos.info/download.html">UCF101-24</a> test set. </b><b>All model GPU inference times are based on NVIDIA Tesla T4 machines, with precision type FP32. CPU inference speeds are based on Intel® Xeon® Gold 5117 CPU @ 2.00GHz, with 8 threads and precision type FP32.</b></p></details>
+
+## <span id="lable">III. Quick Integration</span>
+> ❗ Before quick integration, please install the PaddleX wheel package. For detailed instructions, refer to the [PaddleX Local Installation Guide](../../../installation/installation.en.md).
+
+After installing the wheel package, you can complete video Detection module inference with just a few lines of code. You can switch between models in this module freely, and you can also integrate the model inference of the video Detection module into your project. Before running the following code, please download the [demo video](https://paddle-model-ecology.bj.bcebos.com/paddlex/videos/demo_video/HorseRiding.avi) to your local machine.
+
+```python
+from paddlex import create_model
+model = create_model("YOWO")
+output = model.predict("HorseRiding.avi", batch_size=1)
+for res in output:
+    res.print(json_format=False)
+    res.save_to_video("./output/")
+    res.save_to_json("./output/res.json")
+```
+For more information on using PaddleX's single-model inference APIs, please refer to the [PaddleX Single-Model Python Script Usage Instructions](../../instructions/model_python_API.en.md).
+
+## IV. Custom Development
+If you are seeking higher accuracy from existing models, you can use PaddleX's custom development capabilities to develop better video Detection models. Before using PaddleX to develop video Detection models, please ensure that you have installed the relevant model training plugins for video Detection in PaddleX. The installation process can be found in the custom development section of the [PaddleX Local Installation Guide](../../../installation/installation.en.md).
+
+### 4.1 Data Preparation
+Before model training, you need to prepare the dataset for the corresponding task module. PaddleX provides data validation functionality for each module, and <b>only data that passes data validation can be used for model training</b>. Additionally, PaddleX provides demo datasets for each module, which you can use to complete subsequent development. If you wish to use your own private dataset for subsequent model training, please refer to the [PaddleX Video Detection Task Module Data Annotation Guide](../../../data_annotations/video_modules/video_detection.en.md).
+
+#### 4.1.1 Demo Data Download
+You can use the following command to download the demo dataset to a specified folder:
+```bash
+cd /path/to/paddlex
+wget https://paddle-model-ecology.bj.bcebos.com/paddlex/data/video_det_examples.tar -P ./dataset
+tar -xf ./dataset/video_det_examples.tar -C ./dataset/
+```
+#### 4.1.2 Data Validation
+One command is all you need to complete data validation:
+
+```bash
+python main.py -c paddlex/configs/video_detection/YOWO.yaml \
+    -o Global.mode=check_dataset \
+    -o Global.dataset_dir=./dataset/video_det_examples
+```
+After executing the above command, PaddleX will validate the dataset and summarize its basic information. If the command runs successfully, it will print `Check dataset passed !` in the log. The validation results file is saved in `./output/check_dataset_result.json`, and related outputs are saved in the `./output/check_dataset` directory in the current directory, including visual examples of sample images and sample distribution histograms.
+
+<details><summary>👉 <b>Validation Results Details (Click to Expand)</b></summary>
+
+<pre><code class="language-bash">
+{
+  "done_flag": true,
+  "check_pass": true,
+  "attributes": {
+    "label_file": "..\/..\/dataset\/video_det_examples\/label_map.txt",
+    "num_classes": 24,
+    "train_samples": 6878,
+    "train_sample_paths": [
+      "check_dataset\/..\/..\/dataset\/video_det_examples\/rgb-images\/SoccerJuggling\/v_SoccerJuggling_g19_c06\/00296.jpg",
+      "check_dataset\/..\/..\/dataset\/video_det_examples\/rgb-images\/SkateBoarding\/v_SkateBoarding_g17_c04\/00026.jpg",
+      "check_dataset\/..\/..\/dataset\/video_det_examples\/rgb-images\/RopeClimbing\/v_RopeClimbing_g01_c03\/00055.jpg",
+      "check_dataset\/..\/..\/dataset\/video_det_examples\/rgb-images\/HorseRiding\/v_HorseRiding_g11_c05\/00132.jpg",
+      "check_dataset\/..\/..\/dataset\/video_det_examples\/rgb-images\/PoleVault\/v_PoleVault_g13_c03\/00089.jpg",
+      "check_dataset\/..\/..\/dataset\/video_det_examples\/rgb-images\/Basketball\/v_Basketball_g13_c04\/00050.jpg",
+      "check_dataset\/..\/..\/dataset\/video_det_examples\/rgb-images\/PoleVault\/v_PoleVault_g01_c05\/00024.jpg",
+      "check_dataset\/..\/..\/dataset\/video_det_examples\/rgb-images\/RopeClimbing\/v_RopeClimbing_g03_c04\/00118.jpg",
+      "check_dataset\/..\/..\/dataset\/video_det_examples\/rgb-images\/GolfSwing\/v_GolfSwing_g01_c06\/00231.jpg",
+      "check_dataset\/..\/..\/dataset\/video_det_examples\/rgb-images\/TrampolineJumping\/v_TrampolineJumping_g02_c02\/00134.jpg"
+    ],
+    "val_samples": 3916,
+    "val_sample_paths": [
+      "check_dataset\/..\/..\/dataset\/video_det_examples\/rgb-images\/IceDancing\/v_IceDancing_g22_c02\/00017.jpg",
+      "check_dataset\/..\/..\/dataset\/video_det_examples\/rgb-images\/TennisSwing\/v_TennisSwing_g04_c02\/00046.jpg",
+      "check_dataset\/..\/..\/dataset\/video_det_examples\/rgb-images\/SoccerJuggling\/v_SoccerJuggling_g08_c03\/00169.jpg",
+      "check_dataset\/..\/..\/dataset\/video_det_examples\/rgb-images\/Fencing\/v_Fencing_g24_c02\/00009.jpg",
+      "check_dataset\/..\/..\/dataset\/video_det_examples\/rgb-images\/Diving\/v_Diving_g16_c02\/00110.jpg",
+      "check_dataset\/..\/..\/dataset\/video_det_examples\/rgb-images\/HorseRiding\/v_HorseRiding_g08_c02\/00079.jpg",
+      "check_dataset\/..\/..\/dataset\/video_det_examples\/rgb-images\/PoleVault\/v_PoleVault_g17_c07\/00008.jpg",
+      "check_dataset\/..\/..\/dataset\/video_det_examples\/rgb-images\/Skiing\/v_Skiing_g20_c06\/00221.jpg",
+      "check_dataset\/..\/..\/dataset\/video_det_examples\/rgb-images\/PoleVault\/v_PoleVault_g17_c07\/00137.jpg",
+      "check_dataset\/..\/..\/dataset\/video_det_examples\/rgb-images\/GolfSwing\/v_GolfSwing_g24_c01\/00093.jpg"
+    ]
+  },
+  "analysis": {
+    "histogram": "check_dataset\/histogram.png"
+  },
+  "dataset_path": "video_det_examples",
+  "show_type": "video",
+  "dataset_type": "VideoDetDataset"
+}
+</code></pre>
+<p>The above validation results, with check_pass being True, indicate that the dataset format meets the requirements. Explanations for other indicators are as follows:</p>
+<ul>
+<li><code>attributes.num_classes</code>: The number of classes in this dataset is 24;</li>
+<li><code>attributes.train_samples</code>: The number of training set samples in this dataset is 6878;</li>
+<li><code>attributes.val_samples</code>: The number of validation set samples in this dataset is 3916;</li>
+<li><code>attributes.train_sample_paths</code>: A list of relative paths to the visual samples in the training set of this dataset;</li>
+<li><code>attributes.val_sample_paths</code>: A list of relative paths to the visual samples in the validation set of this dataset;</li>
+</ul>
+<p>Additionally, the dataset validation analyzes the sample number distribution across all classes in the dataset and generates a distribution histogram (histogram.png):</p>
+<p><img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/modules/video_detection/01.png"></p></details>
+
+#### 4.1.3 Dataset Format Conversion/Dataset Splitting (Optional)
+After completing data validation, you can convert the dataset format or re-split the training/validation ratio of the dataset by <b>modifying the configuration file</b> or <b>appending hyperparameters</b>.
+
+<details><summary>👉 <b>Dataset Format Conversion/Dataset Splitting Details (Click to Expand)</b></summary>
+
+<p><b>(1) Dataset Format Conversion</b></p>
+<p>Image Detection does not currently support data conversion.</p>
+<p><b>(2) Dataset Splitting</b></p>
+<p>Image Detection does not currently support data conversion.</p>
+
+### 4.2 Model Training
+A single command can complete the model training. Taking the training of the video Detection model YOWO as an example:
+```
+python main.py -c paddlex/configs/video_det_examples/YOWO.yaml  \
+    -o Global.mode=train \
+    -o Global.dataset_dir=./dataset/video_det_examples
+```
+
+the following steps are required:
+
+* Specify the path of the model's `.yaml` configuration file (here it is `YOWO.yaml`. When training other models, you need to specify the corresponding configuration files. The relationship between the model and configuration files can be found in the [PaddleX Model List (CPU/GPU)](../../../support_list/models_list.en.md))
+* Specify the mode as model training: `-o Global.mode=train`
+* Specify the path of the training dataset: `-o Global.dataset_dir`. Other related parameters can be set by modifying the fields under `Global` and `Train` in the `.yaml` configuration file, or adjusted by appending parameters in the command line. For example, to specify training on the second GPU: `-o Global.device=gpu:2`; to set the number of training epochs to 10: `-o Train.epochs_iters=10`. For more modifiable parameters and their detailed explanations, refer to the configuration file parameter instructions for the corresponding task module of the model [PaddleX Common Model Configuration File Parameters](../../instructions/config_parameters_common.en.md).
+
+
+<details><summary>👉 <b>More Details (Click to Expand)</b></summary>
+
+<ul>
+<li>During model training, PaddleX automatically saves the model weight files, with the default being <code>output</code>. If you need to specify a save path, you can set it through the <code>-o Global.output</code> field in the configuration file.</li>
+<li>PaddleX shields you from the concepts of dynamic graph weights and static graph weights. During model training, both dynamic and static graph weights are produced, and static graph weights are selected by default for model inference.</li>
+<li>
+<p>After completing the model training, all outputs are saved in the specified output directory (default is <code>./output/</code>), typically including:</p>
+</li>
+<li>
+<p><code>train_result.json</code>: Training result record file, recording whether the training task was completed normally, as well as the output weight metrics, related file paths, etc.;</p>
+</li>
+<li><code>train.log</code>: Training log file, recording changes in model metrics and loss during training;</li>
+<li><code>config.yaml</code>: Training configuration file, recording the hyperparameter configuration for this training session;</li>
+<li><code>.pdparams</code>, <code>.pdema</code>, <code>.pdopt.pdstate</code>, <code>.pdiparams</code>, <code>.pdmodel</code>: Model weight-related files, including network parameters, optimizer, EMA, static graph network parameters, static graph network structure, etc.;</li>
+</ul></details>
+
+## <b>4.3 Model Evaluation</b>
+After completing model training, you can evaluate the specified model weight file on the validation set to verify the model accuracy. Using PaddleX for model evaluation, a single command can complete the model evaluation:
+```bash
+python main.py -c  paddlex/configs/video_detection/YOWO.yaml  \
+    -o Global.mode=evaluate \
+    -o Global.dataset_dir=./dataset/video_det_examples
+```
+Similar to model training, the following steps are required:
+
+* Specify the path of the model's `.yaml` configuration file (here it is `YOWO.yaml`)
+* Specify the mode as model evaluation: `-o Global.mode=evaluate`
+* Specify the path of the validation dataset: `-o Global.dataset_dir`. Other related parameters can be set by modifying the fields under `Global` and `Evaluate` in the `.yaml` configuration. Other related parameters can be set by modifying the fields under `Global` and `Evaluate` in the `.yaml` configuration file. For details, please refer to [PaddleX Common Model Configuration File Parameter Description](../../instructions/config_parameters_common.en.md).
+
+<details><summary>👉 <b>More Details (Click to Expand)</b></summary>
+
+<p>When evaluating the model, you need to specify the model weight file path. Each configuration file has a default weight save path built-in. If you need to change it, simply set it by appending a command line parameter, such as <code>-o Evaluate.weight_path=./output/best_model/best_model.pdparams</code>.</p>
+<p>After completing the model evaluation, an <code>evaluate_result.json</code> file will be generated, which records the evaluation results. Specifically, it records whether the evaluation task was completed successfully and the model's evaluation metrics, including mAP;</p></details>
+
+### <b>4.4 Model Inference and Model Integration</b>
+After completing model training and evaluation, you can use the trained model weights for inference predictions or Python integration.
+
+#### 4.4.1 Model Inference
+To perform inference prediction through the command line, simply use the following command. Before running the following code, please download the [demo video](https://paddle-model-ecology.bj.bcebos.com/paddlex/videos/demo_video/HorseRiding.avi) to your local machine.
+
+```bash
+python main.py -c paddlex/configs/video_detection/YOWO.yaml \
+    -o Global.mode=predict \
+    -o Predict.model_dir="./output/best_model/inference" \
+    -o Predict.input="HorseRiding.avi"
+```
+Similar to model training and evaluation, the following steps are required:
+
+* Specify the `.yaml` configuration file path for the model (here it is `YOWO.yaml`)
+* Specify the mode as model inference prediction: `-o Global.mode=predict`
+* Specify the model weight path: `-o Predict.model_dir="./output/best_model/inference"`
+* Specify the input data path: `-o Predict.input="..."`
+Other related parameters can be set by modifying the fields under `Global` and `Predict` in the `.yaml` configuration file. For details, please refer to [PaddleX Common Model Configuration File Parameter Description](../../instructions/config_parameters_common.en.md).
+
+#### 4.4.2 Model Integration
+The model can be directly integrated into the PaddleX pipelines or directly into your own project.
+
+1.<b>Pipeline Integration</b>
+
+
+The video Detection module can be integrated into the [General Video Detection Pipeline](../../../pipeline_usage/tutorials/video_pipelines/video_detection.en.md) of PaddleX. Simply replace the model path to update the video Detection module of the relevant pipeline. In pipeline integration, you can use high-performance inference and service-oriented deployment to deploy your obtained model.
+
+2.<b>Module Integration</b>
+
+The weights you produce can be directly integrated into the video Detection module. You can refer to the Python example code in <a href="#lable">Quick Integration</a>  and simply replace the model with the path to your trained model.

+ 222 - 0
docs/module_usage/tutorials/video_modules/video_detection.md

@@ -0,0 +1,222 @@
+---
+comments: true
+---
+
+# 视频检测模块使用教程
+
+## 一、概述
+视频检测任务是计算机视觉系统中的关键组成部分,专注于识别和定位视频序列中的物体或事件。视频检测将视频分解为单独的帧序列, 然后分析这些帧以识别检测物体或动作,例如在监控视频中检测行人,或在体育或娱乐视频中识别特定活动,如“跑步”、“跳跃”或“弹吉他”。
+视频检测模块的输出包括每个检测到的物体或事件的边界框和类别标签。这些信息可以被其他模块或系统用于进一步分析,例如跟踪检测到的物体的移动、生成警报或编制统计数据以供决策过程使用。因此,视频检测在从安全监控和自动驾驶到体育分析和内容审核的各种应用中都扮演着重要角色。
+
+## 二、支持模型列表
+
+
+<table>
+<tr>
+<th>模型</th><th>模型下载链接</th>
+<th>Frame-mAP(@ IoU 0.5)</th>
+<th>模型存储大小 (M)</th>
+<th>介绍</th>
+</tr>
+<tr>
+<td>YOWO</td><td><a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_inference_model/paddle3.0b2/YOWO_infer.tar">推理模型</a>/<a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/YOWO_pretrained.pdparams">训练模型</a></td>
+<td>80.94</td>
+<td>462.891M</td>
+<td rowspan="1">
+YOWO是具有两个分支的单阶段网络。一个分支通过2D-CNN提取关键帧(即当前帧)的空间特征,而另一个分支则通过3D-CNN获取由先前帧组成的剪辑的时空特征。为准确汇总这些特征,YOWO使用了一种通道融合和关注机制,最大程度地利用了通道间的依赖性。最后将融合后的特征进行帧级检测。
+</td>
+</tr>
+
+</table>
+
+
+<p><b>注:以上精度指标为 <a href="http://www.thumos.info/download.html">UCF101-24</a> test数据集上的测试指标Frame-mAP (@ IoU 0.5)。所有模型 GPU 推理耗时基于 NVIDIA Tesla T4 机器,精度类型为 FP32, CPU 推理速度基于 Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz,线程数为8,精度类型为 FP32。</b></p></details>
+
+## 三、快速集成
+> ❗ 在快速集成前,请先安装 PaddleX 的 wheel 包,详细请参考 [PaddleX本地安装教程](../../../installation/installation.md)。
+
+完成 wheel 包的安装后,几行代码即可完成视频检测模块的推理,可以任意切换该模块下的模型,您也可以将视频检测的模块中的模型推理集成到您的项目中。运行以下代码前,请您下载[示例视频](https://paddle-model-ecology.bj.bcebos.com/paddlex/videos/demo_video/HorseRiding.avi)到本地。
+
+```python
+from paddlex import create_model
+model = create_model("YOWO")
+output = model.predict("HorseRiding.avi", batch_size=1)
+for res in output:
+    res.print(json_format=False)
+    res.save_to_video("./output/")
+    res.save_to_json("./output/res.json")
+```
+
+关于更多 PaddleX 的单模型推理的 API 的使用方法,可以参考[PaddleX单模型Python脚本使用说明](../../instructions/model_python_API.md)。
+
+## 四、二次开发
+如果你追求更高精度的现有模型,可以使用 PaddleX 的二次开发能力,开发更好的视频检测模型。在使用 PaddleX 开发视频检测模型之前,请务必安装 PaddleX 的 视频检测  [PaddleX本地安装教程](../../../installation/installation.md)中的二次开发部分。
+
+### 4.1 数据准备
+在进行模型训练前,需要准备相应任务模块的数据集。PaddleX 针对每一个模块提供了数据校验功能,<b>只有通过数据校验的数据才可以进行模型训练</b>。此外,PaddleX 为每一个模块都提供了 Demo 数据集,您可以基于官方提供的 Demo 数据完成后续的开发。若您希望用私有数据集进行后续的模型训练,可以参考[PaddleX视频检测任务模块数据标注教程](../../../data_annotations/video_modules/video_detection.md)
+
+#### 4.1.1 Demo 数据下载
+您可以参考下面的命令将 Demo 数据集下载到指定文件夹:
+
+```bash
+cd /path/to/paddlex
+wget https://paddle-model-ecology.bj.bcebos.com/paddlex/data/video_det_examples.tar -P ./dataset
+tar -xf ./dataset/video_det_examples.tar -C ./dataset/
+```
+#### 4.1.2 数据校验
+一行命令即可完成数据校验:
+
+```bash
+python main.py -c paddlex/configs/modules/video_detection/YOWO.yaml \
+    -o Global.mode=check_dataset \
+    -o Global.dataset_dir=./dataset/video_det_examples
+```
+执行上述命令后,PaddleX 会对数据集进行校验,并统计数据集的基本信息。命令运行成功后会在log中打印出`Check dataset passed !`信息。校验结果文件保存在`./output/check_dataset_result.json`,同时相关产出会保存在当前目录的`./output/check_dataset`目录下,产出目录中包括可视化的示例样本图片和样本分布直方图。
+
+<details><summary>👉 <b>校验结果详情(点击展开)</b></summary>
+<p>校验结果文件具体内容为:</p>
+<pre><code class="language-bash">
+{
+  "done_flag": true,
+  "check_pass": true,
+  "attributes": {
+    "label_file": "..\/..\/dataset\/video_det_examples\/label_map.txt",
+    "num_classes": 24,
+    "train_samples": 6878,
+    "train_sample_paths": [
+      "check_dataset\/..\/..\/dataset\/video_det_examples\/rgb-images\/SoccerJuggling\/v_SoccerJuggling_g19_c06\/00296.jpg",
+      "check_dataset\/..\/..\/dataset\/video_det_examples\/rgb-images\/SkateBoarding\/v_SkateBoarding_g17_c04\/00026.jpg",
+      "check_dataset\/..\/..\/dataset\/video_det_examples\/rgb-images\/RopeClimbing\/v_RopeClimbing_g01_c03\/00055.jpg",
+      "check_dataset\/..\/..\/dataset\/video_det_examples\/rgb-images\/HorseRiding\/v_HorseRiding_g11_c05\/00132.jpg",
+      "check_dataset\/..\/..\/dataset\/video_det_examples\/rgb-images\/PoleVault\/v_PoleVault_g13_c03\/00089.jpg",
+      "check_dataset\/..\/..\/dataset\/video_det_examples\/rgb-images\/Basketball\/v_Basketball_g13_c04\/00050.jpg",
+      "check_dataset\/..\/..\/dataset\/video_det_examples\/rgb-images\/PoleVault\/v_PoleVault_g01_c05\/00024.jpg",
+      "check_dataset\/..\/..\/dataset\/video_det_examples\/rgb-images\/RopeClimbing\/v_RopeClimbing_g03_c04\/00118.jpg",
+      "check_dataset\/..\/..\/dataset\/video_det_examples\/rgb-images\/GolfSwing\/v_GolfSwing_g01_c06\/00231.jpg",
+      "check_dataset\/..\/..\/dataset\/video_det_examples\/rgb-images\/TrampolineJumping\/v_TrampolineJumping_g02_c02\/00134.jpg"
+    ],
+    "val_samples": 3916,
+    "val_sample_paths": [
+      "check_dataset\/..\/..\/dataset\/video_det_examples\/rgb-images\/IceDancing\/v_IceDancing_g22_c02\/00017.jpg",
+      "check_dataset\/..\/..\/dataset\/video_det_examples\/rgb-images\/TennisSwing\/v_TennisSwing_g04_c02\/00046.jpg",
+      "check_dataset\/..\/..\/dataset\/video_det_examples\/rgb-images\/SoccerJuggling\/v_SoccerJuggling_g08_c03\/00169.jpg",
+      "check_dataset\/..\/..\/dataset\/video_det_examples\/rgb-images\/Fencing\/v_Fencing_g24_c02\/00009.jpg",
+      "check_dataset\/..\/..\/dataset\/video_det_examples\/rgb-images\/Diving\/v_Diving_g16_c02\/00110.jpg",
+      "check_dataset\/..\/..\/dataset\/video_det_examples\/rgb-images\/HorseRiding\/v_HorseRiding_g08_c02\/00079.jpg",
+      "check_dataset\/..\/..\/dataset\/video_det_examples\/rgb-images\/PoleVault\/v_PoleVault_g17_c07\/00008.jpg",
+      "check_dataset\/..\/..\/dataset\/video_det_examples\/rgb-images\/Skiing\/v_Skiing_g20_c06\/00221.jpg",
+      "check_dataset\/..\/..\/dataset\/video_det_examples\/rgb-images\/PoleVault\/v_PoleVault_g17_c07\/00137.jpg",
+      "check_dataset\/..\/..\/dataset\/video_det_examples\/rgb-images\/GolfSwing\/v_GolfSwing_g24_c01\/00093.jpg"
+    ]
+  },
+  "analysis": {
+    "histogram": "check_dataset\/histogram.png"
+  },
+  "dataset_path": "video_det_examples",
+  "show_type": "video",
+  "dataset_type": "VideoDetDataset"
+}
+</code></pre>
+<p>上述校验结果中,check_pass 为 True 表示数据集格式符合要求,其他部分指标的说明如下:</p>
+<ul>
+<li><code>attributes.num_classes</code>:该数据集类别数为 24;</li>
+<li><code>attributes.train_samples</code>:该数据集训练集样本数量为 6878;</li>
+<li><code>attributes.val_samples</code>:该数据集验证集样本数量为 3916;</li>
+<li><code>attributes.train_sample_paths</code>:该数据集训练集样本可视化视频相对路径列表;</li>
+<li><code>attributes.val_sample_paths</code>:该数据集验证集样本可视化视频相对路径列表;</li>
+</ul>
+<p>另外,数据集校验还对数据集中所有类别的样本数量分布情况进行了分析,并绘制了分布直方图(histogram.png):</p>
+<p><img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/modules/video_detection/01.png"></p></details>
+
+#### 4.1.3 数据集格式转换/数据集划分(可选)
+在您完成数据校验之后,可以通过<b>修改配置文件</b>或是<b>追加超参数</b>的方式对数据集的格式进行转换,也可以对数据集的训练/验证比例进行重新划分。
+
+<details><summary>👉 <b>格式转换/数据集划分详情(点击展开)</b></summary>
+
+<p><b>(1)数据集格式转换</b></p>
+<p>视频检测暂不支持数据转换。</p>
+<p><b>(2)数据集划分</b></p>
+<p>视频检测暂不支持数据划分。</p>
+
+### 4.2 模型训练
+一条命令即可完成模型的训练,以此处视频检测模型 YOWO 的训练为例:
+
+```
+python main.py -c paddlex/configs/modules/video_detection/YOWO.yaml  \
+    -o Global.mode=train \
+    -o Global.dataset_dir=./dataset/video_det_examples
+```
+需要如下几步:
+
+* 指定模型的`.yaml` 配置文件路径(此处为`YOWO.yaml`,训练其他模型时,需要的指定相应的配置文件,模型和配置的文件的对应关系,可以查阅[PaddleX模型列表(CPU/GPU)](../../../support_list/models_list.md))
+* 指定模式为模型训练:`-o Global.mode=train`
+* 指定训练数据集路径:`-o Global.dataset_dir`
+其他相关参数均可通过修改`.yaml`配置文件中的`Global`和`Train`下的字段来进行设置,也可以通过在命令行中追加参数来进行调整。如指定第 2 卡 gpu 训练:`-o Global.device=gpu:2`,视频检测只支持单卡训练;设置训练轮次数为 10:`-o Train.epochs_iters=10`。更多可修改的参数及其详细解释,可以查阅模型对应任务模块的配置文件说明[PaddleX通用模型配置文件参数说明](../../instructions/config_parameters_common.md)。
+
+<details><summary>👉 <b>更多说明(点击展开)</b></summary>
+
+<ul>
+<li>模型训练过程中,PaddleX 会自动保存模型权重文件,默认为<code>output</code>,如需指定保存路径,可通过配置文件中 <code>-o Global.output</code> 字段进行设置。</li>
+<li>PaddleX 对您屏蔽了动态图权重和静态图权重的概念。在模型训练的过程中,会同时产出动态图和静态图的权重,在模型推理时,默认选择静态图权重推理。</li>
+<li>
+<p>在完成模型训练后,所有产出保存在指定的输出目录(默认为<code>./output/</code>)下,通常有以下产出:</p>
+</li>
+<li>
+<p><code>train_result.json</code>:训练结果记录文件,记录了训练任务是否正常完成,以及产出的权重指标、相关文件路径等;</p>
+</li>
+<li><code>train.log</code>:训练日志文件,记录了训练过程中的模型指标变化、loss 变化等;</li>
+<li><code>config.yaml</code>:训练配置文件,记录了本次训练的超参数的配置;</li>
+<li><code>.pdparams</code>、<code>.pdema</code>、<code>.pdopt.pdstate</code>、<code>.pdiparams</code>、<code>.pdmodel</code>:模型权重相关文件,包括网络参数、优化器、EMA、静态图网络参数、静态图网络结构等;</li>
+</ul></details>
+
+## <b>4.3 模型评估</b>
+在完成模型训练后,可以对指定的模型权重文件在验证集上进行评估,验证模型精度。使用 PaddleX 进行模型评估,一条命令即可完成模型的评估:
+
+```bash
+python main.py -c  paddlex/configs/modules/video_detection/YOWO.yaml  \
+    -o Global.mode=evaluate \
+    -o Global.dataset_dir=./dataset/video_det_examples
+```
+与模型训练类似,需要如下几步:
+
+* 指定模型的`.yaml` 配置文件路径(此处为`YOWO.yaml`)
+* 指定模式为模型评估:`-o Global.mode=evaluate`
+* 指定验证数据集路径:`-o Global.dataset_dir`
+其他相关参数均可通过修改`.yaml`配置文件中的`Global`和`Evaluate`下的字段来进行设置,详细请参考[PaddleX通用模型配置文件参数说明](../../instructions/config_parameters_common.md)。
+
+<details><summary>👉 <b>更多说明(点击展开)</b></summary>
+
+<p>在模型评估时,需要指定模型权重文件路径,每个配置文件中都内置了默认的权重保存路径,如需要改变,只需要通过追加命令行参数的形式进行设置即可,如<code>-o Evaluate.weight_path=./output/best_model/best_model.pdparams</code>。</p>
+<p>在完成模型评估后,会产出<code>evaluate_result.json,其记录了</code>评估的结果,具体来说,记录了评估任务是否正常完成,以及模型的评估指标,包含 mAP;</p></details>
+
+### <b>4.4 模型推理和模型集成</b>
+
+在完成模型的训练和评估后,即可使用训练好的模型权重进行推理预测或者进行Python集成。
+
+#### 4.4.1 模型推理
+通过命令行的方式进行推理预测,只需如下一条命令。运行以下代码前,请您下载[示例视频](https://paddle-model-ecology.bj.bcebos.com/paddlex/videos/demo_video/HorseRiding.avi)到本地。
+
+```bash
+python main.py -c paddlex/configs/modules/video_detection/YOWO.yaml \
+    -o Global.mode=predict \
+    -o Predict.model_dir="./output/best_model/inference" \
+    -o Predict.input="HorseRiding.avi"
+```
+与模型训练和评估类似,需要如下几步:
+
+* 指定模型的`.yaml` 配置文件路径(此处为`YOWO.yaml`)
+* 指定模式为模型推理预测:`-o Global.mode=predict`
+* 指定模型权重路径:`-o Predict.model_dir="./output/best_model/inference"`
+* 指定输入数据路径:`-o Predict.input="..."`
+其他相关参数均可通过修改`.yaml`配置文件中的`Global`和`Predict`下的字段来进行设置,详细请参考[PaddleX通用模型配置文件参数说明](../../instructions/config_parameters_common.md)。
+
+#### 4.4.2 模型集成
+模型可以直接集成到 PaddleX 产线中,也可以直接集成到您自己的项目中。
+
+1.<b>产线集成</b>
+
+视频检测模块可以集成的 PaddleX 产线有[通用视频检测产线](../../../pipeline_usage/tutorials/video_pipelines/video_detection.md),只需要替换模型路径即可完成相关产线的视频检测模块的模型更新。在产线集成中,你可以使用高性能部署和服务化部署来部署你得到的模型。
+
+2.<b>模块集成</b>
+
+您产出的权重可以直接集成到视频检测模块中,可以参考[快速集成](#三快速集成)的 Python 示例代码,只需要将模型替换为你训练的到的模型路径即可。

Rozdielové dáta súboru neboli zobrazené, pretože súbor je príliš veľký
+ 72 - 0
docs/pipeline_usage/tutorials/video_pipelines/video_detection.en.md


Rozdielové dáta súboru neboli zobrazené, pretože súbor je príliš veľký
+ 71 - 0
docs/pipeline_usage/tutorials/video_pipelines/video_detection.md


+ 4 - 5
paddlex/inference/models_new/anomaly_detection/result.py

@@ -16,7 +16,7 @@ import copy
 import numpy as np
 from PIL import Image
 
-from ...common.result import BaseCVResult
+from ...common.result import BaseCVResult, StrMixin, JsonMixin
 
 
 class UadResult(BaseCVResult):
@@ -60,12 +60,11 @@ class UadResult(BaseCVResult):
 
     def _to_str(self, *args, **kwargs):
         data = copy.deepcopy(self)
-        data["pred"] = "..."
         data.pop("input_img")
-        return data._to_str(*args, **kwargs)
+        data["pred"] = "..."
+        return StrMixin._to_str(data, *args, **kwargs)
 
     def _to_json(self, *args, **kwargs):
         data = copy.deepcopy(self)
-        data["pred"] = "..."
         data.pop("input_img")
-        return data._to_json(*args, **kwargs)
+        return JsonMixin._to_json(data, *args, **kwargs)

+ 1 - 1
paddlex/inference/models_new/video_detection/processors.py

@@ -425,7 +425,7 @@ class DetVideoPostProcess:
             for out in outputs:
                 preds = []
                 out = paddle.to_tensor(out)
-                all_boxes = get_region_boxes(out, 0.3, len(self.labels))
+                all_boxes = get_region_boxes(out, num_classes=len(self.labels))
                 for i in range(out.shape[0]):
                     boxes = all_boxes[i]
                     boxes = nms(boxes, nms_thresh)

Niektoré súbory nie sú zobrazené, pretože je v týchto rozdielových dátach zmenené mnoho súborov