|
|
@@ -68,9 +68,116 @@ for res in output:
|
|
|
After running, the result obtained is:
|
|
|
|
|
|
```bash
|
|
|
-{'res': "{'input_path': 'small_object_detection.jpg', 'boxes': [{'cls_id': 0, 'label': 'pedestrian', 'score': 0.8025697469711304, 'coordinate': [184.14276, 709.97455, 203.60669, 745.6286]}, {'cls_id': 0, 'label': 'pedestrian', 'score': 0.7245782017707825, 'coordinate': [203.48488, 700.377, 223.07726, 742.5181]}, {'cls_id': 0, 'label': 'pedestrian', 'score': 0.7014670968055725, 'coordinate': [851.23553, 435.81937, 862.94385, 466.81384]}, ... ]}"}
|
|
|
+{'res': "{'input_path': 'small_object_detection.jpg', 'page_index': None, 'boxes': [{'cls_id': 0, 'label': 'pedestrian', 'score': 0.8025697469711304, 'coordinate': [184.14276, 709.97455, 203.60669, 745.6286]}, {'cls_id': 0, 'label': 'pedestrian', 'score': 0.7245782017707825, 'coordinate': [203.48488, 700.377, 223.07726, 742.5181]}, {'cls_id': 0, 'label': 'pedestrian', 'score': 0.7014670968055725, 'coordinate': [851.23553, 435.81937, 862.94385, 466.81384]}, ... ]}"}
|
|
|
```
|
|
|
|
|
|
+Parameter meanings are as follows:
|
|
|
+- `input_path`: The path of the input image to be predicted.
|
|
|
+- `page_index`: If the input is a PDF file, it represents the current page number of the PDF; otherwise, it is `None`.
|
|
|
+- `boxes`: Information of the predicted bounding boxes, a list of dictionaries. Each dictionary contains the following information:
|
|
|
+ - `cls_id`: Class ID, an integer.
|
|
|
+ - `label`: Class label, a string.
|
|
|
+ - `score`: Confidence score of the bounding box, a float.
|
|
|
+ - `coordinate`: Coordinates of the bounding box, a list [xmin, ymin, xmax, ymax].
|
|
|
+
|
|
|
+</details>
|
|
|
+
|
|
|
+The visualization image is as follows:
|
|
|
+
|
|
|
+<img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/refs/heads/main/images/modules/smallobj_det/small_object_detection_res.jpg">
|
|
|
+
|
|
|
+**Note:** Due to network issues, the above URL may not be accessible. If you need to access this link, please check the validity of the URL and try again. If the problem persists, it may be related to the link itself or the network connection.
|
|
|
+
|
|
|
+Related methods, parameters, and explanations are as follows:
|
|
|
+
|
|
|
+* `create_model` instantiates an object detection model (here, `PP-YOLOE_plus_SOD-S` is used as an example), and the specific explanations are as follows:
|
|
|
+<table>
|
|
|
+<thead>
|
|
|
+<tr>
|
|
|
+<th>Parameter</th>
|
|
|
+<th>Parameter Description</th>
|
|
|
+<th>Parameter Type</th>
|
|
|
+<th>Options</th>
|
|
|
+<th>Default Value</th>
|
|
|
+</tr>
|
|
|
+</thead>
|
|
|
+<tr>
|
|
|
+<td><code>model_name</code></td>
|
|
|
+<td>Name of the model</td>
|
|
|
+<td><code>str</code></td>
|
|
|
+<td>None</td>
|
|
|
+<td><code>None</code></td>
|
|
|
+</tr>
|
|
|
+<tr>
|
|
|
+<td><code>model_dir</code></td>
|
|
|
+<td>Path to store the model</td>
|
|
|
+<td><code>str</code></td>
|
|
|
+<td>None</td>
|
|
|
+<td>None</td>
|
|
|
+</tr>
|
|
|
+<tr>
|
|
|
+<td><code>threshold</code></td>
|
|
|
+<td>Threshold for filtering low-confidence objects</td>
|
|
|
+<td><code>float/None/dict</code></td>
|
|
|
+<td>None</td>
|
|
|
+<td>None</td>
|
|
|
+</tr>
|
|
|
+</table>
|
|
|
+
|
|
|
+* The `model_name` must be specified. After specifying `model_name`, the default model parameters built into PaddleX are used. If `model_dir` is specified, the user-defined model is used.
|
|
|
+* `threshold` is the threshold for filtering low-confidence objects. The default is `None`, which means using the settings from the previous layer. The priority of parameter settings from highest to lowest is: `predict parameter > create_model initialization > yaml configuration file`. Currently, two types of threshold settings are supported:
|
|
|
+ * `float`, using the same threshold for all classes.
|
|
|
+ * `dict`, where the key is the class ID and the value is the threshold, allowing different thresholds for different classes.
|
|
|
+
|
|
|
+* The `predict()` method of the small object detection model is called for inference prediction. The `predict()` method has parameters `input`, `batch_size`, and `threshold`, which are explained as follows:
|
|
|
+
|
|
|
+<table>
|
|
|
+<thead>
|
|
|
+<tr>
|
|
|
+<th>Parameter</th>
|
|
|
+<th>Parameter Description</th>
|
|
|
+<th>Parameter Type</th>
|
|
|
+<th>Options</th>
|
|
|
+<th>Default Value</th>
|
|
|
+</tr>
|
|
|
+</thead>
|
|
|
+<tr>
|
|
|
+<td><code>input</code></td>
|
|
|
+<td>Data to be predicted, supporting multiple input types</td>
|
|
|
+<td><code>Python Var</code>/<code>str</code>/<code>dict</code>/<code>list</code></td>
|
|
|
+<td>
|
|
|
+<ul>
|
|
|
+ <li><b>Python variable</b>, such as image data represented by <code>numpy.ndarray</code></li>
|
|
|
+ <li><b>File path</b>, such as the local path of an image file: <code>/root/data/img.jpg</code></li>
|
|
|
+ <li><b>URL link</b>, such as the network URL of an image file: <a href="https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/small_object_detection.jpg">Example</a></li>
|
|
|
+ <li><b>Local directory</b>, the directory should contain data files to be predicted, such as the local path: <code>/root/data/</code></li>
|
|
|
+ <li><b>List</b>, elements of the list must be of the above types of data, such as <code>[numpy.ndarray, numpy.ndarray]</code>, <code>["/root/data/img1.jpg", "/root/data/img2.jpg"]</code>, <code>["/root/data1", "/root/data2"]</code></li>
|
|
|
+</ul>
|
|
|
+</td>
|
|
|
+<td>None</td>
|
|
|
+</tr>
|
|
|
+<tr>
|
|
|
+<td><code>batch_size</code></td>
|
|
|
+<td>Batch size</td>
|
|
|
+<td><code>int</code></td>
|
|
|
+<td>Any integer</td>
|
|
|
+<td>1</td>
|
|
|
+</tr>
|
|
|
+<tr>
|
|
|
+<td><code>threshold</code></td>
|
|
|
+<td>Threshold for filtering low-confidence objects</td>
|
|
|
+<td><code>float</code>/<code>dict</code>/<code>None</code></td>
|
|
|
+<td>
|
|
|
+<ul>
|
|
|
+ <li><b>None</b>, indicating the use of settings from the previous layer. The priority of parameter settings from highest to lowest is: <code>predict parameter > create_model initialization > yaml configuration file</code></li>
|
|
|
+ <li><b>float</b>, such as 0.5, indicating the use of <code>0.5</code> as the threshold for filtering low-confidence objects during inference</li>
|
|
|
+ <li><b>dict</b>, such as <code>{0: 0.5, 1: 0.35}</code>, indicating the use of 0.5 as the threshold for class 0 and 0.35 for class 1 during inference.</li>
|
|
|
+</ul>
|
|
|
+</td>
|
|
|
+<td>None</td>
|
|
|
+</tr>
|
|
|
+</table>
|
|
|
+
|
|
|
* The prediction results are processed as `dict` type for each sample, and support operations such as printing, saving as an image, and saving as a `json` file:
|
|
|
|
|
|
<table>
|