소스 검색

fix reviewd docs (#3244)

Zhang Zelun 9 달 전
부모
커밋
4bf542790f

+ 2 - 1
docs/module_usage/tutorials/cv_modules/human_detection.en.md

@@ -62,11 +62,12 @@ for res in output:
 
 After running, the result obtained is:
 ```bash
-{'res': "{'input_path': 'human_detection.jpg', 'boxes': [{'cls_id': 0, 'label': 'pedestrian', 'score': 0.9085694551467896, 'coordinate': [259.53326, 342.86493, 307.43408, 464.22394]}, {'cls_id': 0, 'label': 'pedestrian', 'score': 0.8818504810333252, 'coordinate': [170.22249, 317.11432, 260.24777, 470.12704]}, {'cls_id': 0, 'label': 'pedestrian', 'score': 0.8622929453849792, 'coordinate': [402.17957, 345.1815, 458.4271, 479.91724]}, {'cls_id': 0, 'label': 'pedestrian', 'score': 0.8577917218208313, 'coordinate': [522.5973, 360.11118, 614.3201, 480]}, {'cls_id': 0, 'label': 'pedestrian', 'score': 0.8485967516899109, 'coordinate': [25.010237, 338.83722, 57.340042, 426.11932]}, ... ]}"}
+{'res': "{'input_path': 'human_detection.jpg', 'page_index': None, 'boxes': [{'cls_id': 0, 'label': 'pedestrian', 'score': 0.9085694551467896, 'coordinate': [259.53326, 342.86493, 307.43408, 464.22394]}, {'cls_id': 0, 'label': 'pedestrian', 'score': 0.8818504810333252, 'coordinate': [170.22249, 317.11432, 260.24777, 470.12704]}, {'cls_id': 0, 'label': 'pedestrian', 'score': 0.8622929453849792, 'coordinate': [402.17957, 345.1815, 458.4271, 479.91724]}, {'cls_id': 0, 'label': 'pedestrian', 'score': 0.8577917218208313, 'coordinate': [522.5973, 360.11118, 614.3201, 480]}, {'cls_id': 0, 'label': 'pedestrian', 'score': 0.8485967516899109, 'coordinate': [25.010237, 338.83722, 57.340042, 426.11932]}, ... ]}"}
 ```
 
 The meanings of the parameters in the running results are as follows:
 - `input_path`: The path of the input image to be predicted.
+- `page_index`: If the input is a PDF file, it represents the current page number of the PDF; otherwise, it is `None`.
 - `boxes`: Information of each detected object.
   - `cls_id`: Class ID.
   - `label`: Class name.

+ 2 - 1
docs/module_usage/tutorials/cv_modules/human_detection.md

@@ -58,10 +58,11 @@ for res in output:
 
 运行后,得到的结果为:
 ```bash
-{'res': "{'input_path': 'human_detection.jpg', 'boxes': [{'cls_id': 0, 'label': 'pedestrian', 'score': 0.9085694551467896, 'coordinate': [259.53326, 342.86493, 307.43408, 464.22394]}, {'cls_id': 0, 'label': 'pedestrian', 'score': 0.8818504810333252, 'coordinate': [170.22249, 317.11432, 260.24777, 470.12704]}, {'cls_id': 0, 'label': 'pedestrian', 'score': 0.8622929453849792, 'coordinate': [402.17957, 345.1815, 458.4271, 479.91724]}, {'cls_id': 0, 'label': 'pedestrian', 'score': 0.8577917218208313, 'coordinate': [522.5973, 360.11118, 614.3201, 480]}, {'cls_id': 0, 'label': 'pedestrian', 'score': 0.8485967516899109, 'coordinate': [25.010237, 338.83722, 57.340042, 426.11932]}, ... ]}"}
+{'res': "{'input_path': 'human_detection.jpg', 'page_index': None, 'boxes': [{'cls_id': 0, 'label': 'pedestrian', 'score': 0.9085694551467896, 'coordinate': [259.53326, 342.86493, 307.43408, 464.22394]}, {'cls_id': 0, 'label': 'pedestrian', 'score': 0.8818504810333252, 'coordinate': [170.22249, 317.11432, 260.24777, 470.12704]}, {'cls_id': 0, 'label': 'pedestrian', 'score': 0.8622929453849792, 'coordinate': [402.17957, 345.1815, 458.4271, 479.91724]}, {'cls_id': 0, 'label': 'pedestrian', 'score': 0.8577917218208313, 'coordinate': [522.5973, 360.11118, 614.3201, 480]}, {'cls_id': 0, 'label': 'pedestrian', 'score': 0.8485967516899109, 'coordinate': [25.010237, 338.83722, 57.340042, 426.11932]}, ... ]}"}
 ```
 运行结果参数含义如下:
 - `input_path`: 表示输入待预测图像的路径
+- `page_index`: 如果输入是PDF文件,则表示当前是PDF的第几页,否则为 `None`
 - `boxes`: 每个预测出的object的信息
   - `cls_id`: 类别ID
   - `label`: 类别名称

+ 2 - 1
docs/module_usage/tutorials/cv_modules/instance_segmentation.en.md

@@ -181,11 +181,12 @@ for res in output:
 After running, the result obtained is:
 
 ```bash
-{'res': "{'input_path': 'general_instance_segmentation_004.png', 'boxes': [{'cls_id': 0, 'label': 'person', 'score': 0.8723232746124268, 'coordinate': [88.34339, 109.87673, 401.85236, 575.59576]}, {'cls_id': 0, 'label': 'person', 'score': 0.8711188435554504, 'coordinate': [325.114, 1.1152496, 644.10266, 575.359]}, {'cls_id': 0, 'label': 'person', 'score': 0.842758297920227, 'coordinate': [514.18964, 21.760618, 768, 576]}, {'cls_id': 0, 'label': 'person', 'score': 0.8332827091217041, 'coordinate': [0.105075076, 0, 189.23515, 575.9612]}], 'masks': '...'}"}
+{'res': "{'input_path': 'general_instance_segmentation_004.png', 'page_index': None, 'boxes': [{'cls_id': 0, 'label': 'person', 'score': 0.8723232746124268, 'coordinate': [88.34339, 109.87673, 401.85236, 575.59576]}, {'cls_id': 0, 'label': 'person', 'score': 0.8711188435554504, 'coordinate': [325.114, 1.1152496, 644.10266, 575.359]}, {'cls_id': 0, 'label': 'person', 'score': 0.842758297920227, 'coordinate': [514.18964, 21.760618, 768, 576]}, {'cls_id': 0, 'label': 'person', 'score': 0.8332827091217041, 'coordinate': [0.105075076, 0, 189.23515, 575.9612]}], 'masks': '...'}"}
 ```
 
 The meanings of the running results parameters are as follows:
 - `input_path`: Indicates the path of the input image to be predicted.
+- `page_index`: If the input is a PDF file, it represents the current page number of the PDF; otherwise, it is `None`.
 - `boxes`: Information of each detected object.
   - `cls_id`: Class ID.
   - `label`: Class name.

+ 2 - 1
docs/module_usage/tutorials/cv_modules/instance_segmentation.md

@@ -180,10 +180,11 @@ for res in output:
 
 运行后,得到的结果为:
 ```bash
-{'res': {'input_path': 'general_instance_segmentation_004.png', 'boxes': [{'cls_id': 0, 'label': 'person', 'score': 0.897335946559906, 'coordinate': [0, 0.46382904052734375, 195.22256469726562, 572.8294067382812]}, {'cls_id': 0, 'label': 'person', 'score': 0.8606418967247009, 'coordinate': [341.30389404296875, 0, 640.4802856445312, 575.7348022460938]}, {'cls_id': 0, 'label': 'person', 'score': 0.6397128105163574, 'coordinate': [520.0907592773438, 23.334789276123047, 767.5140380859375, 574.5650634765625]}, {'cls_id': 0, 'label': 'person', 'score': 0.6008261442184448, 'coordinate': [91.02522277832031, 112.34088897705078, 405.4962158203125, 574.1039428710938]}, {'cls_id': 0, 'label': 'person', 'score': 0.5031726360321045, 'coordinate': [200.81265258789062, 58.161617279052734, 272.8892517089844, 140.88356018066406]}], 'masks': '...'}}
+{'res': {'input_path': 'general_instance_segmentation_004.png', 'page_index': None, 'boxes': [{'cls_id': 0, 'label': 'person', 'score': 0.897335946559906, 'coordinate': [0, 0.46382904052734375, 195.22256469726562, 572.8294067382812]}, {'cls_id': 0, 'label': 'person', 'score': 0.8606418967247009, 'coordinate': [341.30389404296875, 0, 640.4802856445312, 575.7348022460938]}, {'cls_id': 0, 'label': 'person', 'score': 0.6397128105163574, 'coordinate': [520.0907592773438, 23.334789276123047, 767.5140380859375, 574.5650634765625]}, {'cls_id': 0, 'label': 'person', 'score': 0.6008261442184448, 'coordinate': [91.02522277832031, 112.34088897705078, 405.4962158203125, 574.1039428710938]}, {'cls_id': 0, 'label': 'person', 'score': 0.5031726360321045, 'coordinate': [200.81265258789062, 58.161617279052734, 272.8892517089844, 140.88356018066406]}], 'masks': '...'}}
 ```
 运行结果参数含义如下:
 - `input_path`: 表示输入待预测图像的路径
+- `page_index`: 如果输入是PDF文件,则表示当前是PDF的第几页,否则为 `None`
 - `boxes`: 每个预测出的object的信息
   - `cls_id`: 类别ID
   - `label`: 类别名称

+ 2 - 1
docs/module_usage/tutorials/cv_modules/mainbody_detection.en.md

@@ -52,11 +52,12 @@ for res in output:
 After running, the result obtained is:
 
 ```bash
-{'res': "{'input_path': 'general_object_detection_002.png', 'boxes': [{'cls_id': 0, 'label': 'mainbody', 'score': 0.8161919713020325, 'coordinate': [76.07117, 272.83017, 329.5627, 519.48236]}, {'cls_id': 0, 'label': 'mainbody', 'score': 0.8071584701538086, 'coordinate': [662.7539, 92.804276, 874.7139, 308.21216]}, {'cls_id': 0, 'label': 'mainbody', 'score': 0.754974365234375, 'coordinate': [284.4833, 93.76895, 476.6789, 297.27588]}, {'cls_id': 0, 'label': 'mainbody', 'score': 0.6657832860946655, 'coordinate': [732.1591, 0, 1035.9547, 168.45923]}, {'cls_id': 0, 'label': 'mainbody', 'score': 0.614763081073761, 'coordinate': [763.9127, 280.74258, 925.48065, 439.444]}, ... ]}"}
+{'res': "{'input_path': 'general_object_detection_002.png', 'page_index': None, 'boxes': [{'cls_id': 0, 'label': 'mainbody', 'score': 0.8161919713020325, 'coordinate': [76.07117, 272.83017, 329.5627, 519.48236]}, {'cls_id': 0, 'label': 'mainbody', 'score': 0.8071584701538086, 'coordinate': [662.7539, 92.804276, 874.7139, 308.21216]}, {'cls_id': 0, 'label': 'mainbody', 'score': 0.754974365234375, 'coordinate': [284.4833, 93.76895, 476.6789, 297.27588]}, {'cls_id': 0, 'label': 'mainbody', 'score': 0.6657832860946655, 'coordinate': [732.1591, 0, 1035.9547, 168.45923]}, {'cls_id': 0, 'label': 'mainbody', 'score': 0.614763081073761, 'coordinate': [763.9127, 280.74258, 925.48065, 439.444]}, ... ]}"}
 ```
 
 The meanings of the running results parameters are as follows:
 - `input_path`: Indicates the path of the input image to be predicted.
+- `page_index`: If the input is a PDF file, it represents the current page number of the PDF; otherwise, it is `None`.
 - `boxes`: Information of each detected object.
   - `cls_id`: Class ID.
   - `label`: Class name.

+ 2 - 1
docs/module_usage/tutorials/cv_modules/mainbody_detection.md

@@ -50,10 +50,11 @@ for res in output:
 
 运行后,得到的结果为:
 ```bash
-{'res': "{'input_path': 'general_object_detection_002.png', 'boxes': [{'cls_id': 0, 'label': 'mainbody', 'score': 0.8161919713020325, 'coordinate': [76.07117, 272.83017, 329.5627, 519.48236]}, {'cls_id': 0, 'label': 'mainbody', 'score': 0.8071584701538086, 'coordinate': [662.7539, 92.804276, 874.7139, 308.21216]}, {'cls_id': 0, 'label': 'mainbody', 'score': 0.754974365234375, 'coordinate': [284.4833, 93.76895, 476.6789, 297.27588]}, {'cls_id': 0, 'label': 'mainbody', 'score': 0.6657832860946655, 'coordinate': [732.1591, 0, 1035.9547, 168.45923]}, {'cls_id': 0, 'label': 'mainbody', 'score': 0.614763081073761, 'coordinate': [763.9127, 280.74258, 925.48065, 439.444]}, ... ]}"}
+{'res': "{'input_path': 'general_object_detection_002.png', 'page_index': None, 'boxes': [{'cls_id': 0, 'label': 'mainbody', 'score': 0.8161919713020325, 'coordinate': [76.07117, 272.83017, 329.5627, 519.48236]}, {'cls_id': 0, 'label': 'mainbody', 'score': 0.8071584701538086, 'coordinate': [662.7539, 92.804276, 874.7139, 308.21216]}, {'cls_id': 0, 'label': 'mainbody', 'score': 0.754974365234375, 'coordinate': [284.4833, 93.76895, 476.6789, 297.27588]}, {'cls_id': 0, 'label': 'mainbody', 'score': 0.6657832860946655, 'coordinate': [732.1591, 0, 1035.9547, 168.45923]}, {'cls_id': 0, 'label': 'mainbody', 'score': 0.614763081073761, 'coordinate': [763.9127, 280.74258, 925.48065, 439.444]}, ... ]}"}
 ```
 运行结果参数含义如下:
 - `input_path`: 表示输入待预测图像的路径
+- `page_index`: 如果输入是PDF文件,则表示当前是PDF的第几页,否则为 `None`
 - `boxes`: 每个预测出的object的信息
   - `cls_id`: 类别ID
   - `label`: 类别名称

+ 2 - 1
docs/module_usage/tutorials/cv_modules/semantic_segmentation.en.md

@@ -227,11 +227,12 @@ for res in output:
 After running, the result obtained is:
 
 ```bash
-{'res': "{'input_path': 'general_semantic_segmentation_002.png', 'pred': '...'}"}
+{'res': "{'input_path': 'general_semantic_segmentation_002.png', 'page_index': None, 'pred': '...'}"}
 ```
 
 The meanings of the runtime parameters are as follows:
 - `input_path`: Indicates the path of the input image to be predicted.
+- `page_index`: If the input is a PDF file, it represents the current page number of the PDF; otherwise, it is `None`.
 - `pred`: The actual mask predicted by the semantic segmentation model. Since the data is too large to be printed directly, it is replaced with `...` here. The prediction result can be saved as an image through `res.save_to_img()` and as a JSON file through `res.save_to_json()`.
 
 The visualization image is as follows:

+ 3 - 2
docs/module_usage/tutorials/cv_modules/semantic_segmentation.md

@@ -227,10 +227,11 @@ for res in output:
 
 运行后,得到的结果为:
 ```bash
-{'res': "{'input_path': 'general_semantic_segmentation_002.png', 'pred': '...'}"}
+{'res': "{'input_path': 'general_semantic_segmentation_002.png', 'page_index': None, 'pred': '...'}"}
 ```
 运行结果参数含义如下:
 - `input_path`: 表示输入待预测图像的路径
+- `page_index`: 如果输入是PDF文件,则表示当前是PDF的第几页,否则为 `None`
 - `pred`: 语义分割模型实际预测的mask,由于数据过大不便于直接print,所以此处用`...`替换,可以通过`res.save_to_img()`将预测结果保存为图片,通过`res.save_to_json()`将预测结果保存为json文件。
 
 可视化图片如下:
@@ -609,7 +610,7 @@ python main.py -c paddlex/configs/modules/semantic_segmentation/PP-LiteSeg-T.yam
 ```bash
 python main.py -c paddlex/configs/modules/semantic_segmentation/PP-LiteSeg-T.yaml \
     -o Global.mode=predict \
-    -o Predict.model_dir="./output/best_model" \
+    -o Predict.model_dir="./output/best_model/inference" \
     -o Predict.input="general_semantic_segmentation_002.png"
 ```
 与模型训练和评估类似,需要如下几步:

+ 108 - 1
docs/module_usage/tutorials/cv_modules/small_object_detection.en.md

@@ -68,9 +68,116 @@ for res in output:
 After running, the result obtained is:
 
 ```bash
-{'res': "{'input_path': 'small_object_detection.jpg', 'boxes': [{'cls_id': 0, 'label': 'pedestrian', 'score': 0.8025697469711304, 'coordinate': [184.14276, 709.97455, 203.60669, 745.6286]}, {'cls_id': 0, 'label': 'pedestrian', 'score': 0.7245782017707825, 'coordinate': [203.48488, 700.377, 223.07726, 742.5181]}, {'cls_id': 0, 'label': 'pedestrian', 'score': 0.7014670968055725, 'coordinate': [851.23553, 435.81937, 862.94385, 466.81384]}, ... ]}"}
+{'res': "{'input_path': 'small_object_detection.jpg', 'page_index': None, 'boxes': [{'cls_id': 0, 'label': 'pedestrian', 'score': 0.8025697469711304, 'coordinate': [184.14276, 709.97455, 203.60669, 745.6286]}, {'cls_id': 0, 'label': 'pedestrian', 'score': 0.7245782017707825, 'coordinate': [203.48488, 700.377, 223.07726, 742.5181]}, {'cls_id': 0, 'label': 'pedestrian', 'score': 0.7014670968055725, 'coordinate': [851.23553, 435.81937, 862.94385, 466.81384]}, ... ]}"}
 ```
 
+Parameter meanings are as follows:
+- `input_path`: The path of the input image to be predicted.
+- `page_index`: If the input is a PDF file, it represents the current page number of the PDF; otherwise, it is `None`.
+- `boxes`: Information of the predicted bounding boxes, a list of dictionaries. Each dictionary contains the following information:
+  - `cls_id`: Class ID, an integer.
+  - `label`: Class label, a string.
+  - `score`: Confidence score of the bounding box, a float.
+  - `coordinate`: Coordinates of the bounding box, a list [xmin, ymin, xmax, ymax].
+
+</details>
+
+The visualization image is as follows:
+
+<img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/refs/heads/main/images/modules/smallobj_det/small_object_detection_res.jpg">
+
+**Note:** Due to network issues, the above URL may not be accessible. If you need to access this link, please check the validity of the URL and try again. If the problem persists, it may be related to the link itself or the network connection.
+
+Related methods, parameters, and explanations are as follows:
+
+* `create_model` instantiates an object detection model (here, `PP-YOLOE_plus_SOD-S` is used as an example), and the specific explanations are as follows:
+<table>
+<thead>
+<tr>
+<th>Parameter</th>
+<th>Parameter Description</th>
+<th>Parameter Type</th>
+<th>Options</th>
+<th>Default Value</th>
+</tr>
+</thead>
+<tr>
+<td><code>model_name</code></td>
+<td>Name of the model</td>
+<td><code>str</code></td>
+<td>None</td>
+<td><code>None</code></td>
+</tr>
+<tr>
+<td><code>model_dir</code></td>
+<td>Path to store the model</td>
+<td><code>str</code></td>
+<td>None</td>
+<td>None</td>
+</tr>
+<tr>
+<td><code>threshold</code></td>
+<td>Threshold for filtering low-confidence objects</td>
+<td><code>float/None/dict</code></td>
+<td>None</td>
+<td>None</td>
+</tr>
+</table>
+
+* The `model_name` must be specified. After specifying `model_name`, the default model parameters built into PaddleX are used. If `model_dir` is specified, the user-defined model is used.
+* `threshold` is the threshold for filtering low-confidence objects. The default is `None`, which means using the settings from the previous layer. The priority of parameter settings from highest to lowest is: `predict parameter > create_model initialization > yaml configuration file`. Currently, two types of threshold settings are supported:
+  * `float`, using the same threshold for all classes.
+  * `dict`, where the key is the class ID and the value is the threshold, allowing different thresholds for different classes.
+
+* The `predict()` method of the small object detection model is called for inference prediction. The `predict()` method has parameters `input`, `batch_size`, and `threshold`, which are explained as follows:
+
+<table>
+<thead>
+<tr>
+<th>Parameter</th>
+<th>Parameter Description</th>
+<th>Parameter Type</th>
+<th>Options</th>
+<th>Default Value</th>
+</tr>
+</thead>
+<tr>
+<td><code>input</code></td>
+<td>Data to be predicted, supporting multiple input types</td>
+<td><code>Python Var</code>/<code>str</code>/<code>dict</code>/<code>list</code></td>
+<td>
+<ul>
+  <li><b>Python variable</b>, such as image data represented by <code>numpy.ndarray</code></li>
+  <li><b>File path</b>, such as the local path of an image file: <code>/root/data/img.jpg</code></li>
+  <li><b>URL link</b>, such as the network URL of an image file: <a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/small_object_detection.jpg">Example</a></li>
+  <li><b>Local directory</b>, the directory should contain data files to be predicted, such as the local path: <code>/root/data/</code></li>
+  <li><b>List</b>, elements of the list must be of the above types of data, such as <code>[numpy.ndarray, numpy.ndarray]</code>, <code>["/root/data/img1.jpg", "/root/data/img2.jpg"]</code>, <code>["/root/data1", "/root/data2"]</code></li>
+</ul>
+</td>
+<td>None</td>
+</tr>
+<tr>
+<td><code>batch_size</code></td>
+<td>Batch size</td>
+<td><code>int</code></td>
+<td>Any integer</td>
+<td>1</td>
+</tr>
+<tr>
+<td><code>threshold</code></td>
+<td>Threshold for filtering low-confidence objects</td>
+<td><code>float</code>/<code>dict</code>/<code>None</code></td>
+<td>
+<ul>
+  <li><b>None</b>, indicating the use of settings from the previous layer. The priority of parameter settings from highest to lowest is: <code>predict parameter > create_model initialization > yaml configuration file</code></li>
+  <li><b>float</b>, such as 0.5, indicating the use of <code>0.5</code> as the threshold for filtering low-confidence objects during inference</li>
+  <li><b>dict</b>, such as <code>{0: 0.5, 1: 0.35}</code>, indicating the use of 0.5 as the threshold for class 0 and 0.35 for class 1 during inference.</li>
+</ul>
+</td>
+<td>None</td>
+</tr>
+</table>
+
 * The prediction results are processed as `dict` type for each sample, and support operations such as printing, saving as an image, and saving as a `json` file:
 
 <table>

+ 2 - 1
docs/module_usage/tutorials/cv_modules/small_object_detection.md

@@ -67,10 +67,11 @@ for res in output:
 
 运行后,得到的结果为:
 ```bash
-{'res': "{'input_path': 'small_object_detection.jpg', 'boxes': [{'cls_id': 0, 'label': 'pedestrian', 'score': 0.8025697469711304, 'coordinate': [184.14276, 709.97455, 203.60669, 745.6286]}, {'cls_id': 0, 'label': 'pedestrian', 'score': 0.7245782017707825, 'coordinate': [203.48488, 700.377, 223.07726, 742.5181]}, {'cls_id': 0, 'label': 'pedestrian', 'score': 0.7014670968055725, 'coordinate': [851.23553, 435.81937, 862.94385, 466.81384]}, ... ]}"}
+{'res': "{'input_path': 'small_object_detection.jpg', 'page_index': None, 'boxes': [{'cls_id': 0, 'label': 'pedestrian', 'score': 0.8025697469711304, 'coordinate': [184.14276, 709.97455, 203.60669, 745.6286]}, {'cls_id': 0, 'label': 'pedestrian', 'score': 0.7245782017707825, 'coordinate': [203.48488, 700.377, 223.07726, 742.5181]}, {'cls_id': 0, 'label': 'pedestrian', 'score': 0.7014670968055725, 'coordinate': [851.23553, 435.81937, 862.94385, 466.81384]}, ... ]}"}
 ```
 运行结果参数含义如下:
 - `input_path`: 表示输入待预测图像的路径
+- `page_index`: 如果输入是PDF文件,则表示当前是PDF的第几页,否则为 `None`
 - `boxes`: 每个预测出的object的信息
   - `cls_id`: 类别ID
   - `label`: 类别名称

+ 2 - 1
docs/module_usage/tutorials/cv_modules/vehicle_detection.en.md

@@ -57,11 +57,12 @@ for res in output:
 After running, the result obtained is:
 
 ```bash
-{'res': "{'input_path': 'vehicle_detection.jpg', 'boxes': [{'cls_id': 0, 'label': 'vehicle', 'score': 0.9574093222618103, 'coordinate': [0.10725308, 323.01917, 272.72037, 472.75375]}, {'cls_id': 0, 'label': 'vehicle', 'score': 0.9449281096458435, 'coordinate': [270.3387, 310.36923, 489.8854, 398.07562]}, {'cls_id': 0, 'label': 'vehicle', 'score': 0.939127504825592, 'coordinate': [896.4249, 292.2338, 1051.9075, 370.41345]}, {'cls_id': 0, 'label': 'vehicle', 'score': 0.9388730525970459, 'coordinate': [1057.6327, 274.0139, 1639.8386, 535.54926]}, {'cls_id': 0, 'label': 'vehicle', 'score': 0.9239683747291565, 'coordinate': [482.28885, 307.33447, 574.6905, 357.82965]}, ... ]}"}
+{'res': "{'input_path': 'vehicle_detection.jpg', 'page_index': None, 'boxes': [{'cls_id': 0, 'label': 'vehicle', 'score': 0.9574093222618103, 'coordinate': [0.10725308, 323.01917, 272.72037, 472.75375]}, {'cls_id': 0, 'label': 'vehicle', 'score': 0.9449281096458435, 'coordinate': [270.3387, 310.36923, 489.8854, 398.07562]}, {'cls_id': 0, 'label': 'vehicle', 'score': 0.939127504825592, 'coordinate': [896.4249, 292.2338, 1051.9075, 370.41345]}, {'cls_id': 0, 'label': 'vehicle', 'score': 0.9388730525970459, 'coordinate': [1057.6327, 274.0139, 1639.8386, 535.54926]}, {'cls_id': 0, 'label': 'vehicle', 'score': 0.9239683747291565, 'coordinate': [482.28885, 307.33447, 574.6905, 357.82965]}, ... ]}"}
 ```
 
 The meanings of the runtime parameters are as follows:
 - `input_path`: Indicates the path of the input image to be predicted.
+- `page_index`: If the input is a PDF file, it represents the current page number of the PDF; otherwise, it is `None`.
 - `boxes`: Information of each predicted object.
   - `cls_id`: Class ID.
   - `label`: Class name.

+ 2 - 1
docs/module_usage/tutorials/cv_modules/vehicle_detection.md

@@ -55,10 +55,11 @@ for res in output:
 
 运行后,得到的结果为:
 ```bash
-{'res': "{'input_path': 'vehicle_detection.jpg', 'boxes': [{'cls_id': 0, 'label': 'vehicle', 'score': 0.9574093222618103, 'coordinate': [0.10725308, 323.01917, 272.72037, 472.75375]}, {'cls_id': 0, 'label': 'vehicle', 'score': 0.9449281096458435, 'coordinate': [270.3387, 310.36923, 489.8854, 398.07562]}, {'cls_id': 0, 'label': 'vehicle', 'score': 0.939127504825592, 'coordinate': [896.4249, 292.2338, 1051.9075, 370.41345]}, {'cls_id': 0, 'label': 'vehicle', 'score': 0.9388730525970459, 'coordinate': [1057.6327, 274.0139, 1639.8386, 535.54926]}, {'cls_id': 0, 'label': 'vehicle', 'score': 0.9239683747291565, 'coordinate': [482.28885, 307.33447, 574.6905, 357.82965]}, ... ]}"}
+{'res': "{'input_path': 'vehicle_detection.jpg', 'page_index': None, 'boxes': [{'cls_id': 0, 'label': 'vehicle', 'score': 0.9574093222618103, 'coordinate': [0.10725308, 323.01917, 272.72037, 472.75375]}, {'cls_id': 0, 'label': 'vehicle', 'score': 0.9449281096458435, 'coordinate': [270.3387, 310.36923, 489.8854, 398.07562]}, {'cls_id': 0, 'label': 'vehicle', 'score': 0.939127504825592, 'coordinate': [896.4249, 292.2338, 1051.9075, 370.41345]}, {'cls_id': 0, 'label': 'vehicle', 'score': 0.9388730525970459, 'coordinate': [1057.6327, 274.0139, 1639.8386, 535.54926]}, {'cls_id': 0, 'label': 'vehicle', 'score': 0.9239683747291565, 'coordinate': [482.28885, 307.33447, 574.6905, 357.82965]}, ... ]}"}
 ```
 运行结果参数含义如下:
 - `input_path`: 表示输入待预测图像的路径
+- `page_index`: 如果输入是PDF文件,则表示当前是PDF的第几页,否则为 `None`
 - `boxes`: 每个预测出的object的信息
   - `cls_id`: 类别ID
   - `label`: 类别名称

+ 1 - 1
docs/pipeline_usage/tutorials/cv_pipelines/semantic_segmentation.en.md

@@ -238,7 +238,7 @@ The visualization results are saved under `save_path`, and the visualization res
 ```python
 from paddlex import create_pipeline
 pipeline = create_pipeline(pipeline="semantic_segmentation")
-output = pipeline.predict(input="general_semantic_segmentation_002.png", target_size = -1)
+output = pipeline.predict(input="makassaridn-road_demo.png", target_size = -1)
 for res in output:
     res.print()
     res.save_to_img(save_path="./output/")

+ 2 - 2
docs/pipeline_usage/tutorials/cv_pipelines/semantic_segmentation.md

@@ -5,7 +5,7 @@ comments: true
 # 通用语义分割产线使用教程
 
 ## 1. 通用语义分割产线介绍
-语义分割是一种计算机视觉技术,旨在将图像中的每个像素分配到特定的类别,从而实现对图像内容的精细化理解。语义分割不仅要识别出图像中的物体类型,还要对每个像素进行分类,这样使得同一类别的区域能够被完整标记。例如,在一幅街景图像中,语义分割可以将行人、汽车、天空和道路等不同类别的部分逐像素区分开来,形成一个详细的标签图。这项技术广泛应用于自动驾驶、医学影像分析和人机交互等领域,通常依赖于深度学习模型(如SegFormer等),通过卷积神经网络(CNN)或视觉变换器(Transformer)来提取特征并实现高精度的像素级分类,从而为进一步的智能分析提供基础。本产线同时提供了灵活的服务化部署方式,支持在多种硬件上使用多种编程语言调用。不仅如此,本产线也提供了二次开发的能力,您可以基于本产线在您自己的数据集上训练调优,训练后的模型也可以无缝集成。
+语义分割是一种计算机视觉技术,旨在将图像中的每个像素分配到特定的类别,从而实现对图像内容的精细化理解。语义分割不仅要识别出图像中的物体类型,还要对每个像素进行分类,这样使得同一类别的区域能够被完整标记。例如,在一幅街景图像中,语义分割可以将行人、汽车、天空和道路等不同类别的部分逐像素区分开来,形成一个详细的标签图。通用语义分割产线用于解决像素级别的图像理解问题,这项技术广泛应用于自动驾驶、医学影像分析和人机交互等领域,通常依赖于深度学习模型(如SegFormer等),通过卷积神经网络(CNN)或视觉变换器(Transformer)来提取特征并实现高精度的像素级分类,从而为进一步的智能分析提供基础。本产线同时提供了灵活的服务化部署方式,支持在多种硬件上使用多种编程语言调用。不仅如此,本产线也提供了二次开发的能力,您可以基于本产线在您自己的数据集上训练调优,训练后的模型也可以无缝集成。
 
 <img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/pipelines/semantic_segmentation/01.png">
 
@@ -242,7 +242,7 @@ paddlex --pipeline semantic_segmentation \
 ```python
 from paddlex import create_pipeline
 pipeline = create_pipeline(pipeline="semantic_segmentation")
-output = pipeline.predict(input="general_semantic_segmentation_002.png", target_size = -1)
+output = pipeline.predict(input="makassaridn-road_demo.png", target_size = -1)
 for res in output:
     res.print()
     res.save_to_img(save_path="./output/")

+ 3 - 3
docs/pipeline_usage/tutorials/cv_pipelines/small_object_detection.en.md

@@ -63,7 +63,7 @@ paddlex --pipeline small_object_detection \
         --input small_object_detection.jpg \
         --threshold 0.5 \
         --save_path ./output \
-        --device gpu:0 \
+        --device gpu:0
 ```
 
 The relevant parameter descriptions can be referred to in the parameter explanations in [2.2.2 Python Script Integration](#222-python-script-integration).
@@ -321,7 +321,7 @@ for res in output:
 ## 3. Development Integration/Deployment
 If the pipeline meets your requirements for inference speed and accuracy, you can proceed with development integration/deployment.
 
-If you need to apply the pipeline directly in your Python project, you can refer to the example code in [2.2 Python Script Integration](#22-python-script-integration).
+If you need to apply the pipeline directly in your Python project, you can refer to the example code in [2.1.2 Python Script Integration](#212-python-script-integration).
 
 In addition, PaddleX also provides three other deployment methods, detailed as follows:
 
@@ -912,7 +912,7 @@ SubModules:
     threshold: 0.5
 ```
 
-Subsequently, refer to the command-line method or Python script method in the local experience to load the modified production line configuration file.
+Subsequently, refer to the command-line method or Python script method in the [local experience](#21-local-experience) to load the modified production line configuration file.
 
 ## 5. Multi-Hardware Support
 PaddleX supports a variety of mainstream hardware devices, including NVIDIA GPU, Kunlunxin XPU, Ascend NPU, and Cambricon MLU. <b>Simply modify the `--device` parameter</b> to seamlessly switch between different hardware devices.

+ 3 - 3
docs/pipeline_usage/tutorials/cv_pipelines/small_object_detection.md

@@ -64,7 +64,7 @@ paddlex --pipeline small_object_detection \
         --input small_object_detection.jpg \
         --threshold 0.5 \
         --save_path ./output \
-        --device gpu:0 \
+        --device gpu:0
 ```
 相关的参数说明可以参考[2.2.2 Python脚本方式集成](#222-python脚本方式集成)中的参数说明。
 
@@ -319,7 +319,7 @@ for res in output:
 ## 3. 开发集成/部署
 如果产线可以达到您对产线推理速度和精度的要求,您可以直接进行开发集成/部署。
 
-若您需要将产线直接应用在您的Python项目中,可以参考 [2.2 Python脚本方式](#22-python脚本方式集成)中的示例代码。
+若您需要将产线直接应用在您的Python项目中,可以参考 [2.1.2 Python脚本方式](#212-python脚本方式集成)中的示例代码。
 
 此外,PaddleX 也提供了其他三种部署方式,详细说明如下:
 
@@ -909,7 +909,7 @@ SubModules:
     batch_size: 1
     threshold: 0.5
 ```
-随后, 参考本地体验中的命令行方式或 Python 脚本方式,加载修改后的产线配置文件即可。
+随后, 参考[本地体验](#21-本地体验)中的命令行方式或 Python 脚本方式,加载修改后的产线配置文件即可。
 
 ## 5. 多硬件支持
 PaddleX 支持英伟达 GPU、昆仑芯 XPU、昇腾 NPU和寒武纪 MLU 等多种主流硬件设备,<b>仅需修改 `--device`参数</b>即可完成不同硬件之间的无缝切换。