--- comments: true --- # General Image Recognition Pipeline Usage Tutorial ## 1. Introduction to the General Image Recognition Pipeline The General Image Recognition Pipeline aims to solve the problem of open-domain object localization and recognition. Currently, PaddleX's General Image Recognition Pipeline supports PP-ShiTuV2. PP-ShiTuV2 is a practical general image recognition system mainly composed of three modules: mainbody detection module, image feature module, and vector retrieval module. The system integrates and improves various strategies in multiple aspects, including backbone network, loss function, data augmentation, learning rate scheduling, regularization, pre-trained model, and model pruning and quantization. It optimizes each module and ultimately achieves better performance in multiple application scenarios. The General Image Recognition Pipeline includes the mainbody detection module and the image feature module, with several models to choose. You can select the model to use based on the benchmark data below. If you prioritize model precision, choose a model with higher precision. If you prioritize inference speed, choose a model with faster inference. If you prioritize model storage size, choose a model with a smaller storage size. Object Detection Module:
Model mAP(0.5:0.95) mAP(0.5) GPU Inference Time (ms) CPU Inference Time (ms) Model Size (M) Description
PP-ShiTuV2_det 41.5 62.0 33.7 537.0 27.54 An mainbody detection model based on PicoDet_LCNet_x2_5, which may detect multiple common objects simultaneously.
Note: The above accuracy metrics are based on the private mainbody detection dataset. Image Feature Module:
Model Recall@1 (%) GPU Inference Time (ms) CPU Inference Time (ms) Model Size (M) Description
PP-ShiTuV2_rec 84.2 5.23428 19.6005 16.3 M PP-ShiTuV2 is a general image feature system consisting of three modules: mainbody detection, feature extraction, and vector retrieval. These models are part of the feature extraction module, and different models can be selected based on system requirements.
PP-ShiTuV2_rec_CLIP_vit_base 88.69 13.1957 285.493 306.6 M
PP-ShiTuV2_rec_CLIP_vit_large 91.03 51.1284 1131.28 1.05 G
Note: The above accuracy metrics are based on AliProducts Recall@1. All GPU inference times are based on NVIDIA Tesla T4 machines with FP32 precision. CPU inference speeds are based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision. ## 2. Quick Start The pre-trained model pipelines provided by PaddleX can be quickly experienced. You can use Python to experience locally. ### 2.1 Online Experience Not supported yet. ### 2.2 Local Experience > ❗ Before using the General Image Recognition Pipeline locally, please ensure you have installed the PaddleX wheel package according to the [PaddleX Installation Tutorial](../../../installation/installation.en.md). #### 2.2.1 Command Line Experience The pipeline does not support command line experience at this time. By default, the built-in General Image Recognition Pipeline configuration file is used. If you want to change it, you can run the following command to obtain:
πŸ‘‰Click to Expand
paddlex --get_pipeline_config PP-ShiTuV2

After execution, the General Image Recognition Pipeline configuration file will be saved in the current directory. If you want to customize the save location, you can run the following command (assuming the custom save location is ./my_path):

paddlex --get_pipeline_config PP-ShiTuV2 --save_path ./my_path
#### 2.2.2 Python Script Integration * In the example of using this pipeline, a feature vector library needs to be built beforehand. You can download the officially provided drink recognition test dataset [drink_dataset_v2.0](https://paddle-model-ecology.bj.bcebos.com/paddlex/data/drink_dataset_v2.0.tar) to build the feature vector library. If you want to use a private dataset, you can refer to [Section 2.3 Data Organization for Building the Feature Library](#23-data-organization-for-building-the-feature-library). After that, you can quickly build the feature vector library and predict using the General Image Recognition Pipeline with just a few lines of code. ```python from paddlex import create_pipeline pipeline = create_pipeline(pipeline="PP-ShiTuV2") pipeline.build_index(data_root="drink_dataset_v2.0/", index_dir="index_dir") output = pipeline.predict("./drink_dataset_v2.0/test_images/", index_dir="index_dir") for res in output: res.print() res.save_to_img("./output/") ```` In the above Python script, the following steps are executed: (1) Call the `create_pipeline` function to create a general image recognition pipeline object. The specific parameter descriptions are as follows:
Parameter Parameter Description Parameter Type Default Value
pipeline The name of the pipeline or the path to the pipeline configuration file. If it is the name of the pipeline, it must be a pipeline supported by PaddleX. str None
index_dir The directory where the retrieval database files used for pipeline inference are located. If this parameter is not passed, index_dir needs to be specified in predict(). str None
device The inference device for the pipeline model. Supports: "gpu", "cpu". str gpu
use_hpip Whether to enable high-performance inference, which is only available when the pipeline supports it. bool False
(2) Call the `build_index` function of the general image recognition pipeline object to build the feature vector library. The specific parameters are described as follows:
Parameter Parameter Description Parameter Type Default Value
data_root The root directory of the dataset. The data organization method refers to Section 2.3 Data Organization for Building the Feature Library str None
index_dir The save path for the feature library. After successfully calling the build_index function, two files will be generated in this path: "id_map.pkl" saves the mapping relationship between image IDs and image feature labels; "vector.index" stores the feature vectors of each image. str None
(3) Call the `predict` function of the general image recognition pipeline object for inference prediction: The `predict` function parameter is `input`, which is used to input the data to be predicted, supporting multiple input methods. Specific examples are as follows:
Parameter Type Parameter Description
Python Var Supports directly passing in Python variables, such as numpy.ndarray representing image data.
str Supports passing in the file path of the data to be predicted, such as the local path of an image file: /root/data/img.jpg.
str Supports passing in the URL of the data file to be predicted, such as the network URL of an image file: Example.
str Supports passing in a local directory that contains the data files to be predicted, such as the local path: /root/data/.
dict Supports passing in a dictionary type, where the key needs to correspond to the specific task, such as "img" for image classification tasks. The value of the dictionary supports the above types of data, for example: {"img": "/root/data1"}.
list Supports passing in a list, where the elements of the list need to be the above types of data, such as [numpy.ndarray, numpy.ndarray], ["/root/data/img1.jpg", "/root/data/img2.jpg"], ["/root/data1", "/root/data2"], [{"img": "/root/data1"}, {"img": "/root/data2/img.jpg"}].
Additionally, the `predict` method supports the `index_dir` parameter for setting the retrieval database:
Parameter Type Parameter Description
index_dir The directory where the retrieval database files used for pipeline inference are located. If this parameter is not passed, the default retrieval database specified through the index_dir parameter in create_pipeline() will be used.
(4) Obtain the prediction results by calling the `predict` method: The `predict` method is a `generator`, so prediction results need to be obtained by iteration. The `predict` method predicts data in batches. (5) Process the prediction results: The prediction result for each sample is of `dict` type and supports printing or saving to a file. The supported save types are related to the specific pipeline, such as:
Method Description Method Parameters
print Print the results to the terminal - format_json: bool type, whether to use json indentation formatting for the output content, default is True;
- indent: int type, json formatting setting, only effective when format_json is True, default is 4;
- ensure_ascii: bool type, json formatting setting, only effective when format_json is True, default is False;
save_to_json Save the results as a json-formatted file - save_path: str type, the save file path. When it is a directory, the saved file naming is consistent with the input file type naming;
- indent: int type, json formatting setting, default is 4;
- ensure_ascii: bool type, json formatting setting, default is False;
save_to_img Save the results as an image-formatted file - save_path: str type, the save file path. When it is a directory, the saved file naming is consistent with the input file type naming;
If you have a configuration file, you can customize the configurations for the general image recognition pipeline by modifying the `pipeline` parameter value in the `create_pipeline` method to the path of the pipeline configuration file. For example, if your configuration file is saved at `./my_path/PP-ShiTuV2.yaml`, you only need to execute: ```python from paddlex import create_pipeline pipeline = create_pipeline(pipeline="./my_path/PP-ShiTuV2.yaml", index_dir="index_dir") output = pipeline.predict("./drink_dataset_v2.0/test_images/") for res in output: res.print() res.save_to_img("./output/") ``` #### 2.2.3 Add or Remove Features from the Feature Library If you want to add more images to the feature library, you can call the `append_index` function; to remove image features, you can call the `remove_index` function. ```python from paddlex import create_pipeline pipeline = create_pipeline("PP-ShiTuV2") pipeline.build_index(data_root="drink_dataset_v2.0/", index_dir="index_dir", index_type="IVF") pipeline.append_index(data_root="drink_dataset_v2.0/", index_dir="index_dir", index_type="IVF") pipeline.remove_index(data_root="drink_dataset_v2.0/", index_dir="index_dir", index_type="IVF") ``` The parameter descriptions for the above methods are as follows:
Parameter Description Type Default Value
data_root The root directory of the dataset to be added. The data organization should be the same as when building the feature library, refer to Section 2.3 Data Organization for Building the Feature Library str None
index_dir The storage directory for the feature library. In append_index and remove_index, it is also the path of the feature library to be modified (or deleted). str None
index_type Supports HNSW32, IVF, Flat. Among them, HNSW32 has faster retrieval speed and higher accuracy but does not support the remove_index() operation; IVF has faster retrieval speed but relatively lower accuracy, and supports append_index() and remove_index() operations; Flat has lower retrieval speed but higher accuracy, and supports append_index() and remove_index() operations. str HNSW32
metric_type Supports: IP, Inner Product; L2, Euclidean Distance. str IP
### 2.3 Data Organization for Building the Feature Library The PaddleX general image recognition pipeline requires a pre-built feature library for feature retrieval. If you want to build a feature vector library with private data, you need to organize the data as follows: ```bash data_root # Root directory of the dataset, the directory name can be changed β”œβ”€β”€ images # Directory for saving images, the directory name can be changed β”‚ β”‚ ... └── gallery.txt # Annotation file for the feature library dataset, the file name cannot be changed. Each line gives the path of the image to be retrieved and the image label, separated by a space, for example: β€œ0/0.jpg label” ``` ## 3. Development Integration/Deployment If the pipeline meets your requirements for inference speed and accuracy, you can proceed directly with development integration/deployment. If you need to apply the pipeline directly in your Python project, refer to the example code in [2.2.2 Python Script Integration](#222-python-script-integration). Additionally, PaddleX provides three other deployment methods, detailed as follows: πŸš€ High-Performance Inference: In actual production environments, many applications have stringent standards for the performance metrics of deployment strategies (especially response speed) to ensure efficient system operation and smooth user experience. To this end, PaddleX provides high-performance inference plugins aimed at deeply optimizing model inference and pre/post-processing for significant end-to-end speedups. For detailed high-performance inference procedures, refer to the [PaddleX High-Performance Inference Guide](../../../pipeline_deploy/high_performance_inference.en.md). ☁️ Service-Oriented Deployment: Service-oriented deployment is a common deployment form in actual production environments. By encapsulating inference functions as services, clients can access these services through network requests to obtain inference results. PaddleX supports users in achieving low-cost service-oriented deployment of pipelines. For detailed service-oriented deployment procedures, refer to the [PaddleX Service-Oriented Deployment Guide](../../../pipeline_deploy/service_deploy.en.md). Below are the API references and multi-language service invocation examples:
API Reference

For main operations provided by the service:

Name Type Meaning
errorCode integer Error code. Fixed to 0.
errorMsg string Error description. Fixed to "Success".

The response body may also have a result property, which is an object type that stores operation result information.

Name Type Meaning
errorCode integer Error code. Same as the response status code.
errorMsg string Error description.

The main operations provided by the service are as follows:

Build feature vector index.

POST /shitu-index-build

Name Type Meaning Required
imageLabelPairs array Image-label pairs for building the index. Yes

Each element in imageLabelPairs is an object with the following properties:

Name Type Meaning
image string The URL of an image file accessible by the service, or the Base64 encoding result of the image file content.
label string Label.
Name Type Meaning
indexKey string The key corresponding to the index, used to identify the established index. Can be used as input for other operations.
idMap object Mapping from vector ID to label.

Add images (corresponding feature vectors) to the index.

POST /shitu-index-add

Name Type Meaning Required
imageLabelPairs array Image-label pairs for building the index. Yes
indexKey string The key corresponding to the index. Provided by the buildIndex operation. Yes

Each element in imageLabelPairs is an object with the following properties:

Name Type Meaning
image string The URL of an image file accessible by the service, or the Base64 encoding result of the image file content.
label string Label.
Name Type Meaning
idMap object Mapping from vector ID to label.

Remove images (corresponding feature vectors) from the index.

POST /shitu-index-remove

Name Type Meaning Required
ids array IDs of the vectors to be removed from the index. Yes
indexKey string The key corresponding to the index. Provided by the buildIndex operation. Yes
Name Type Meaning
idMap object Mapping from vector ID to label.

Perform image recognition.

POST /shitu-infer

Name Type Meaning Required
image string The URL of an image file accessible by the service, or the Base64 encoding result of the image file content. Yes
indexKey string The key corresponding to the index. Provided by the buildIndex operation. No
Name Type Meaning
detectedObjects array Information of the detected targets.
image string Recognition result image. The image is in JPEG format, encoded with Base64.

Each element in detectedObjects is an object with the following properties:

Name Type Meaning
bbox array Target location. The elements in the array are the x-coordinate of the upper-left corner, the y-coordinate of the upper-left corner, the x-coordinate of the lower-right corner, and the y-coordinate of the lower-right corner, respectively.
recResults array Recognition results.
score number Detection score.

Each element in recResults is an object with the following properties:

Name Type Meaning
label string Label.
score number Recognition score.
Multi-Language Service Invocation Examples
Python
import base64
import pprint
import sys

import requests

API_BASE_URL = "http://0.0.0.0:8080"

base_image_label_pairs = [
    {"image": "./demo0.jpg", "label": "兔子"},
    {"image": "./demo1.jpg", "label": "兔子"},
    {"image": "./demo2.jpg", "label": "小狗"},
]
image_label_pairs_to_add = [
    {"image": "./demo3.jpg", "label": "小狗"},
]
ids_to_remove = [1]
infer_image_path = "./demo4.jpg"
output_image_path = "./out.jpg"

for pair in base_image_label_pairs:
    with open(pair["image"], "rb") as file:
        image_bytes = file.read()
        image_data = base64.b64encode(image_bytes).decode("ascii")
    pair["image"] = image_data

payload = {"imageLabelPairs": base_image_label_pairs}
resp_index_build = requests.post(f"{API_BASE_URL}/shitu-index-build", json=payload)
if resp_index_build.status_code != 200:
    print(f"Request to shitu-index-build failed with status code {resp_index_build}.")
    pprint.pp(resp_index_build.json())
    sys.exit(1)
result_index_build = resp_index_build.json()["result"]
print(f"Number of images indexed: {len(result_index_build['idMap'])}")

for pair in image_label_pairs_to_add:
    with open(pair["image"], "rb") as file:
        image_bytes = file.read()
        image_data = base64.b64encode(image_bytes).decode("ascii")
    pair["image"] = image_data

payload = {"imageLabelPairs": image_label_pairs_to_add, "indexKey": result_index_build["indexKey"]}
resp_index_add = requests.post(f"{API_BASE_URL}/shitu-index-add", json=payload)
if resp_index_add.status_code != 200:
    print(f"Request to shitu-index-add failed with status code {resp_index_add}.")
    pprint.pp(resp_index_add.json())
    sys.exit(1)
result_index_add = resp_index_add.json()["result"]
print(f"Number of images indexed: {len(result_index_add['idMap'])}")

payload = {"ids": ids_to_remove, "indexKey": result_index_build["indexKey"]}
resp_index_remove = requests.post(f"{API_BASE_URL}/shitu-index-remove", json=payload)
if resp_index_remove.status_code != 200:
    print(f"Request to shitu-index-remove failed with status code {resp_index_remove}.")
    pprint.pp(resp_index_remove.json())
    sys.exit(1)
result_index_remove = resp_index_remove.json()["result"]
print(f"Number of images indexed: {len(result_index_remove['idMap'])}")

with open(infer_image_path, "rb") as file:
    image_bytes = file.read()
    image_data = base64.b64encode(image_bytes).decode("ascii")

payload = {"image": image_data, "indexKey": result_index_build["indexKey"]}
resp_infer = requests.post(f"{API_BASE_URL}/shitu-infer", json=payload)
if resp_infer.status_code != 200:
    print(f"Request to shitu-infer failed with status code {resp_infer}.")
    pprint.pp(resp_infer.json())
    sys.exit(1)
result_infer = resp_infer.json()["result"]

with open(output_image_path, "wb") as file:
    file.write(base64.b64decode(result_infer["image"]))
print(f"Output image saved at {output_image_path}")
print("\nDetected objects:")
pprint.pp(result_infer["detectedObjects"])

πŸ“± Edge Deployment: Edge deployment is a method that places computing and data processing functions on user devices themselves, allowing devices to process data directly without relying on remote servers. PaddleX supports deploying models on edge devices such as Android. For detailed edge deployment procedures, refer to the [PaddleX Edge Deployment Guide](../../../pipeline_deploy/edge_deploy.en.md). You can choose the appropriate deployment method for your model pipeline based on your needs and proceed with subsequent AI application integration. ## 4. Custom Development If the default model weights provided by the General Image Recognition Pipeline do not meet your expectations in terms of precision or speed. You can further **fine-tune** the existing models using **your own data from specific domains or application scenarios** to enhance the recognition performance of the pipeline in your context. ### 4.1 Model Fine-Tuning Since the General Image Recognition Pipeline consists of two modules (the mainbody detection module and the image feature module), the suboptimal performance of the pipeline may stem from either module. You can analyze images with poor recognition results. After analysising, if you find that many mainbody objects are not detected, it may indicate deficiencies in the mainbody detection model. You need to refer to the [Custom Development](../../../module_usage/tutorials/cv_modules/mainbody_detection.en.md#custom-development) section in the [Object Detection Module Development Tutorial](../../../module_usage/tutorials/cv_modules/mainbody_detection.en.md) and use your private dataset to fine-tune the mainbody detection model. If there are mismatches in the detected mainbody objects, it suggests that the image feature model requires further improvement. You should refer to the [Custom Development](../../../module_usage/tutorials/cv_modules/image_feature.md#custom-development) section in the [Image Feature Module Development Tutorial](../../../module_usage/tutorials/cv_modules/image_feature.en.md) and fine-tune the image feature model. ### 4.2 Model Application After you complete the fine-tuning training with your private dataset, you will obtain local model files. To use the fine-tuned model, you only need to modify the pipeline configuration file by replacing with the paths to your fine-tuned model: ```yaml Pipeline: device: "gpu:0" det_model: "./PP-ShiTuV2_det_infer/" # Can be modified to the local path of the fine-tuned mainbody detection model rec_model: "./PP-ShiTuV2_rec_infer/" # Can be modified to the local path of the fine-tuned image feature model det_batch_size: 1 rec_batch_size: 1 device: gpu ``` Subsequently, refer to the command-line method or Python script method in [2.2 Local Experience](#22-local-experience) to load the modified pipeline configuration file. ## 5. Multi-Hardware Support PaddleX supports various mainstream hardware devices such as NVIDIA GPUs, Kunlun XPU, Ascend NPU, and Cambricon MLU. **Simply by modifying the `--device` parameter**, seamless switching between different hardware can be achieved. For example, when running the General Image Recognition Pipeline using Python and changing the running device from an NVIDIA GPU to an Ascend NPU, you only need to modify the `device` in the script to `npu`: ```python from paddlex import create_pipeline pipeline = create_pipeline( pipeline="PP-ShiTuV2", device="npu:0" # gpu:0 --> npu:0 ) ``` If you want to use the General Image Recognition Pipeline on more types of hardware, please refer to the [PaddleX Multi-Device Usage Guide](../../../other_devices_support/multi_devices_use_guide.en.md).