general_image_recognition.en.md 38 KB


comments: true

General Image Recognition Pipeline Usage Tutorial

1. Introduction to the General Image Recognition Pipeline

The General Image Recognition Pipeline aims to solve the problem of open-domain object localization and recognition. Currently, PaddleX's General Image Recognition Pipeline supports PP-ShiTuV2.

PP-ShiTuV2 is a practical general image recognition system mainly composed of three modules: mainbody detection module, image feature module, and vector retrieval module. The system integrates and improves various strategies in multiple aspects, including backbone network, loss function, data augmentation, learning rate scheduling, regularization, pre-trained model, and model pruning and quantization. It optimizes each module and ultimately achieves better performance in multiple application scenarios.

The General Image Recognition Pipeline includes the mainbody detection module and the image feature module, with several models to choose. You can select the model to use based on the benchmark data below. If you prioritize model precision, choose a model with higher precision. If you prioritize inference speed, choose a model with faster inference. If you prioritize model storage size, choose a model with a smaller storage size.

Mainbody Detection Module:

Model mAP(0.5:0.95) mAP(0.5) GPU Inference Time (ms) CPU Inference Time (ms) Model Size (M) Description
PP-ShiTuV2_det 41.5 62.0 33.7 537.0 27.54 An mainbody detection model based on PicoDet_LCNet_x2_5, which may detect multiple common objects simultaneously.

Note: The above accuracy metrics are based on the private mainbody detection dataset.

Image Feature Module:

Model Recall@1 (%) GPU Inference Time (ms) CPU Inference Time (ms) Model Size (M) Description
PP-ShiTuV2_rec 84.2 5.23428 19.6005 16.3 M PP-ShiTuV2 is a general image feature system consisting of three modules: mainbody detection, feature extraction, and vector retrieval. These models are part of the feature extraction module, and different models can be selected based on system requirements.
PP-ShiTuV2_rec_CLIP_vit_base 88.69 13.1957 285.493 306.6 M
PP-ShiTuV2_rec_CLIP_vit_large 91.03 51.1284 1131.28 1.05 G

Note: The above accuracy metrics are based on AliProducts Recall@1. All GPU inference times are based on NVIDIA Tesla T4 machines with FP32 precision. CPU inference speeds are based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.

2. Quick Start

The pre-trained model pipelines provided by PaddleX can be quickly experienced. You can use Python to experience locally.

2.1 Online Experience

Not supported yet.

2.2 Local Experience

❗ Before using the general image recognition pipeline locally, please ensure that you have completed the installation of the PaddleX wheel package according to the PaddleX Installation Guide.

2.2.1 Command Line Experience

The pipeline currently does not support command line experience.

2.2.2 Python Script Integration

  • To run the pipeline, you need to build an index library in advance. You can download the official beverage recognition test dataset drink_dataset_v2.0 to build the index library. If you wish to use your private dataset, please refer to Section 2.3 Data Organization for Building the Index Library. After that, you can quickly build the index library and perform fast inference with the general image recognition pipeline using just a few lines of code.

    from paddlex import create_pipeline
    
    pipeline = create_pipeline(pipeline="PP-ShiTuV2")
    
    index_data = pipeline.build_index(gallery_imgs="drink_dataset_v2.0/", gallery_label="drink_dataset_v2.0/gallery.txt")
    index_data.save("drink_index")
    
    output = pipeline.predict("./drink_dataset_v2.0/test_images/001.jpeg", index=index_data)
    for res in output:
    res.print()
    res.save_to_img("./output/")
    res.save_to_json("./output/")
    

In the above Python script, the following steps are executed:

(1) Call the create_pipeline to instantiate the general image recognition production line object. The specific parameter descriptions are as follows:

Parameter Description Type Default Value
pipeline The name of the production line or the path to the production line configuration file. If it is a production line name, it must be a production line supported by PaddleX. str None
device The inference device for the production line. Supports specifying specific GPU card numbers, such as "gpu:0", specific card numbers for other hardware, such as "npu:0", or CPU, such as "cpu". str gpu:0
use_hpip Whether to enable high-performance inference. This is only available if the production line supports high-performance inference. bool False

(2) Call the build_index method of the general image recognition production line object to build the index library. The specific parameter descriptions are as follows:

Parameter Description Type Options Default Value
gallery_imgs The gallery images to be added. This is a required parameter. str|list
  • str: The root directory of the dataset. The data organization method is referenced in Section 2.3 Data Organization for Index Building.
  • List[numpy.ndarray]: Gallery image data in the form of a list of numpy arrays.
None
gallery_label The annotation information of the gallery images. This is a required parameter. str|list
  • str: The path to the annotation file. The data organization method is referenced in Section 2.3 Data Organization for Index Building.
  • List[str]: Gallery image annotations in the form of a list of strings.
None
metric_type The feature measurement method. This is an optional parameter. str
  • "IP": Inner Product
  • "L2": Euclidean Distance
"IP"
index_type The type of index. This is an optional parameter. str
  • "HNSW32": Faster search speed and higher accuracy, but does not support the remove_index() operation.
  • "IVF": Faster search speed but relatively lower accuracy, supports append_index() and remove_index() operations.
  • "Flat": Slower search speed but higher accuracy, supports append_index() and remove_index() operations.
"HNSW32"

The index library object index supports the save method, which is used to save the index library to disk:

Parameter Description Type Default Value
save_path The directory where the index library file is saved, such as drink_index. str None

(3) Call the predict method of the general image recognition production line object for inference prediction: The predict method takes input as a parameter, which is used to input the data to be predicted and supports multiple input methods. Specific examples are as follows:

Parameter Description Type Options Default Value
input Data to be predicted, supports multiple input types, required parameter Python Var|str|list
  • Python Var: Image data represented by numpy.ndarray
  • str: Local path of the image file, such as /root/data/img.jpg; URL link, such as the network URL of the image file: Example; Local directory, the directory should contain images to be predicted, such as the local path: /root/data/
  • List: Elements of the list must be of the above types, such as [numpy.ndarray, numpy.ndarray], [\"/root/data/img1.jpg\", \"/root/data/img2.jpg\"], [\"/root/data1\", \"/root/data2\"]
None
index The feature library used for production line inference prediction, optional parameter. If this parameter is not provided, the index library specified in the production line configuration file will be used by default. str|paddlex.inference.components.retrieval.faiss.IndexData|None
  • str type representing a directory (the directory should contain the feature library files, including vector.index and index_info.yaml)
  • IndexData object created by the build_index method
None

(4) Process the prediction results: The prediction result of each sample is of dict type, and it supports printing or saving as a file. The supported file types depend on the specific pipeline, such as:

Method Description Parameter Type Description Default
print() Print the result to the terminal format_json bool Whether to format the output content using JSON indentation True
indent int Specify the indentation level to beautify the output JSON data, making it more readable. Effective only when format_json is True 4
ensure_ascii bool Control whether to escape non-ASCII characters to Unicode. When set to True, all non-ASCII characters will be escaped; False will retain the original characters. Effective only when format_json is True False
save_to_json() Save the result as a JSON file save_path str Path to save the file. If it is a directory, the saved file name will be consistent with the input file type None
indent int Specify the indentation level to beautify the output JSON data, making it more readable. Effective only when format_json is True 4
ensure_ascii bool Control whether to escape non-ASCII characters to Unicode. When set to True, all non-ASCII characters will be escaped; False will retain the original characters. Effective only when format_json is True False
save_to_img() Save the result as an image file save_path str Path to save the file, supports directory or file path None
  • Calling the print() method will print the following result to the terminal:

    {'res': {'input_path': './drink_dataset_v2.0/test_images/001.jpeg', 'boxes': [{'labels': ['红牛-强化型', '红牛-强化型', '红牛-强化型', '红牛-强化型', '红牛-强化型'], 'rec_scores': [0.720183789730072, 0.7044230699539185, 0.6812724471092224, 0.6583285927772522, 0.6578206419944763], 'det_score': 0.6135568618774414, 'coordinate': [343.8184, 98.96374, 528.0366, 593.3813]}]}}
    
  • The meanings of the output parameters are as follows:

    • input_path: Indicates the path of the input image
    • boxes: Information of detected objects, a list of dictionaries, each dictionary contains the following information:
      • labels: List of recognized labels, sorted by score from high to low
      • rec_scores: List of recognition scores, where elements correspond to labels one by one
      • det_score: Detection score
      • coordinate: Coordinates of the target box, in the format [xmin, ymin, xmax, ymax]
  • Calling the save_to_json() method will save the above content to the specified save_path. If a directory is specified, the saved path will be save_path/{your_img_basename}.json. If a file is specified, it will be saved directly to that file.

  • Calling the save_to_img() method will save the visualization result to the specified save_path. If a directory is specified, the saved path will be save_path/{your_img_basename}_res.{your_img_extension}. If a file is specified, it will be saved directly to that file. (The production line usually contains many result images, it is not recommended to specify a specific file path directly, otherwise multiple images will be overwritten, leaving only the last one). In the above example, the visualization result is as follows:

  • Additionally, it also supports obtaining the visualized image with results and prediction results through attributes, as follows:
Attribute Description
json Get the prediction result in json format
img Get the visualized image in dict format
  • The prediction result obtained by the json attribute is data of dict type, and the relevant content is consistent with the content saved by calling the save_to_json() method.
  • The prediction result returned by the img attribute is data of dict type. The key is res, and the corresponding value is an Image.Image object used to visualize the general image recognition result.

The above Python script integration method uses the parameter settings in the PaddleX official configuration file by default. If you need to customize the configuration file, you can first execute the following command to obtain the official configuration file and save it in my_path:

paddlex --get_pipeline_config PP-ShiTuV2 --save_path ./my_path

If you have obtained the configuration file, you can customize the settings for the general image recognition production line. You just need to modify the pipeline parameter value in the create_pipeline method to the path of your custom production line configuration file.

For example, if your custom configuration file is saved in ./my_path/PP-ShiTuV2.yaml, you just need to execute:

from paddlex import create_pipeline
pipeline = create_pipeline(pipeline="./my_path/PP-ShiTuV2.yaml")

output = pipeline.predict("./drink_dataset_v2.0/test_images/001.jpeg", index="drink_index")
for res in output:
    res.print()
    res.save_to_json("./output/")
    res.save_to_img("./output/")

Note: The parameters in the configuration file are the initialization parameters of the pipeline. If you wish to change the initialization parameters of the general image recognition pipeline, you can directly modify the parameters in the configuration file and load the configuration file for prediction.

2.2.3 Adding and Deleting Operations in the Index Library

If you wish to add more images to the index library, you can call the append_index method; to delete image features, you can call the remove_index method.

from paddlex import create_pipeline

pipeline = create_pipeline("PP-ShiTuV2")
index_data = pipeline.build_index(gallery_imgs="drink_dataset_v2.0/", gallery_label="drink_dataset_v2.0/gallery.txt", index_type="IVF", metric_type="IP")
index_data = pipeline.append_index(gallery_imgs="drink_dataset_v2.0/", gallery_label="drink_dataset_v2.0/gallery.txt", index=index_data)
index_data = pipeline.remove_index(remove_ids="drink_dataset_v2.0/remove_ids.txt", index=index_data)
index_data.save("drink_index")

The parameters of the above method are described as follows:

Parameter Description Type Options Default Value
gallery_imgs Gallery images to be added, required parameter str|list
  • str: Root directory of images, data organization refers to Section 2.3 Data Organization for Building the Index Library
  • List[numpy.ndarray]: Gallery image data in the form of a list of numpy arrays
None
gallery_label Labels for gallery images, required parameter str|list
  • str: Path to the label file, data organization is the same as when building the feature library, refer to Section 2.3 Data Organization for Building the Index Library
  • List[str]: Gallery image labels in the form of a list of strings
None
metric_type Feature measurement method, optional parameter str
  • "IP": Inner Product
  • "L2": Euclidean Distance
"IP"
index_type Type of index, optional parameter str
  • "HNSW32": Faster search speed and higher accuracy, but does not support remove_index() operation
  • "IVF": Faster search speed but relatively lower accuracy, supports append_index() and remove_index() operations
  • "Flat": Slower search speed but higher accuracy, supports append_index() and remove_index() operations
"HNSW32"
remove_ids Indices to be removed str|list
  • str: Path to a txt file containing the indices to be removed, one "id" per line;
  • List[int]: List of indices to be removed. Only valid in remove_index.
None
index Feature library used for pipeline inference str|paddlex.inference.components.retrieval.faiss.IndexData
    <li><b>str</b>: Directory (the directory should contain feature library files, including <code>vector.index</code> and <code>index_info.yaml</code>)</li>
    <li><b>IndexData</b> object created by <code>build_index</code> method</li>
    

    None

    Note: HNSW32 has compatibility issues on the Windows platform, which may prevent the index library from being built or loaded.

    2.3 Data Organization for Building the Index Library

    The general image recognition pipeline example of PaddleX requires a pre-built index library for feature retrieval. If you wish to build an index library with your private data, you need to organize the data as follows:

    data_root             # 数据集根目录,目录名称可以改变
    ├── images            # 图像的保存目录,目录名称可以改变
    │   │   ...
    └── gallery.txt       # 索引库数据集标注文件,文件名称可以改变。每行给出待检索图像路径和图像标签,使用空格分隔,内容举例: “0/0.jpg 脉动”
    

    3. Development Integration/Deployment

    If the general image recognition production line meets your requirements for inference speed and accuracy, you can proceed directly with development integration/deployment.

    If you need to apply the general image recognition production line directly in your Python project, you can refer to the example code in 2.2.2 Python Script Integration.

    Additionally, PaddleX provides three other deployment methods, detailed as follows:

    🚀 High-Performance Inference: In actual production environments, many applications have stringent standards for the performance metrics of deployment strategies (especially response speed) to ensure efficient system operation and smooth user experience. To this end, PaddleX offers a high-performance inference plugin aimed at deeply optimizing the performance of model inference and pre/post-processing, significantly speeding up the end-to-end process. For detailed high-performance inference processes, please refer to PaddleX High-Performance Inference Guide.

    ☁️ Service Deployment: Service deployment is a common form of deployment in actual production environments. By encapsulating the inference function as a service, clients can access these services via network requests to obtain inference results. PaddleX supports multiple production line service deployment schemes. For detailed production line service deployment processes, please refer to PaddleX Service Deployment Guide.

    Below is the API reference for basic service deployment and multi-language service call examples:

    API Reference

    For the main operations provided by the service:

    • The HTTP request method is POST.
    • Both the request body and response body are JSON data (JSON objects).
    • When the request is processed successfully, the response status code is 200, and the properties of the response body are as follows:
    Name Type Meaning
    logId string The UUID of the request.
    errorCode integer Error code. Fixed at 0.
    errorMsg string Error description. Fixed at "Success".
    result object Operation result.
    • When the request is not processed successfully, the properties of the response body are as follows:
    Name Type Meaning
    logId string The UUID of the request.
    errorCode integer Error code. Same as the response status code.
    errorMsg string Error description.

    The main operations provided by the service are as follows:

    • buildIndex

    Build feature vector index.

    POST /shitu-index-build

    • The properties of the request body are as follows:
    Name Type Meaning Required
    imageLabelPairs array Image-label pairs used to build the index. Yes

    Each element in imageLabelPairs is an object with the following properties:

    Name Type Meaning
    image string The URL of the image file accessible by the server or the Base64 encoded result of the image file content.
    label string Label.
    • When the request is processed successfully, the result in the response body has the following properties:
    Name Type Description
    indexKey string The key corresponding to the index, used to identify the created index. It can be used as input for other operations.
    idMap object Mapping from vector IDs to labels.
    • addImagesToIndex

    Add images (corresponding feature vectors) to the index.

    POST /shitu-index-add

    • The properties of the request body are as follows:
    Name Type Description Required
    imageLabelPairs array Image-label pairs used to build the index. Yes
    indexKey string The key corresponding to the index. Provided by the buildIndex operation. No

    Each element in imageLabelPairs is an object with the following properties:

    Name Type Description
    image string The URL of an image file accessible by the server or the Base64-encoded content of the image file.
    label string The label.
    • When the request is processed successfully, the result in the response body has the following properties:
    Name Type Description
    idMap object Mapping from vector IDs to labels.
    • removeImagesFromIndex

    Remove images (corresponding feature vectors) from the index.

    POST /shitu-index-remove

    • The properties of the request body are as follows:
    Name Type Description Required
    ids array The IDs of the vectors to be removed from the index. Yes
    indexKey string The key corresponding to the index. Provided by the buildIndex operation. No
    • When the request is processed successfully, the result in the response body has the following properties:
    Name Type Description
    idMap object Mapping from vector IDs to labels.
    • infer

    Perform image recognition.

    POST /shitu-infer

    • The properties of the request body are as follows:
    Name Type Description Required
    image string The URL of an image file accessible by the server or the Base64-encoded content of the image file. Yes
    indexKey string The key corresponding to the index. Provided by the buildIndex operation. No
    • When the request is processed successfully, the result in the response body has the following properties:
    Name Type Description
    detectedObjects array Information about detected objects.
    image string The recognition result image. The image is in JPEG format and is Base64-encoded.

    Each element in detectedObjects is an object with the following properties:

    Name Type Description
    bbox array The location of the object. The elements of the array are the x-coordinate of the top-left corner, the y-coordinate of the top-left corner, the x-coordinate of the bottom-right corner, and the y-coordinate of the bottom-right corner.
    recResults array Recognition results.
    score number The detection score.

    Each element in recResults is an object with the following properties:

    Name Type Description
    label string The label.
    score number The recognition score.

    Multi-language Service Call Example

    Python

    import base64
    import pprint
    import sys

    import requests

    API_BASE_URL = "http://0.0.0.0:8080"

    base_image_label_pairs = [

    {&quot;image&quot;: &quot;./demo0.jpg&quot;, &quot;label&quot;: &quot;Rabbit&quot;},
    {&quot;image&quot;: &quot;./demo1.jpg&quot;, &quot;label&quot;: &quot;Rabbit&quot;},
    {&quot;image&quot;: &quot;./demo2.jpg&quot;, &quot;label&quot;: &quot;Dog&quot;},
    

    ] image_label_pairs_to_add = [

    {&quot;image&quot;: &quot;./demo3.jpg&quot;, &quot;label&quot;: &quot;Dog&quot;},
    

    ] ids_to_remove = [1] infer_image_path = "./demo4.jpg" output_image_path = "./out.jpg"

    for pair in base_image_label_pairs:

    with open(pair[&quot;image&quot;], &quot;rb&quot;) as file:
        image_bytes = file.read()
        image_data = base64.b64encode(image_bytes).decode(&quot;ascii&quot;)
    pair[&quot;image&quot;] = image_data
    

    payload = {"imageLabelPairs": base_image_label_pairs} resp_index_build = requests.post(f"{API_BASE_URL}/shitu-index-build", json=payload) if resp_index_build.status_code != 200:

    print(f&quot;Request to shitu-index-build failed with status code {resp_index_build}.&quot;)
    pprint.pp(resp_index_build.json())
    sys.exit(1)
    

    result_index_build = resp_index_build.json()["result" print(f"Number of images indexed: {len(result_index_build['idMap'])}")

    for pair in image_label_pairs_to_add:

    with open(pair[&quot;image&quot;], &quot;rb&quot;) as file:
        image_bytes = file.read()
        image_data = base64.b64encode(image_bytes).decode(&quot;ascii&quot;)
    pair[&quot;image&quot;] = image_data
    

    payload = {"imageLabelPairs": image_label_pairs_to_add, "indexKey": result_index_build["indexKey"]} resp_index_add = requests.post(f"{API_BASE_URL}/shitu-index-add", json=payload) if resp_index_add.status_code != 200:

    print(f&quot;Request to shitu-index-add failed with status code {resp_index_add}.&quot;)
    pprint.pp(resp_index_add.json())
    sys.exit(1)
    

    result_index_add = resp_index_add.json()["result"] print(f"Number of images indexed: {len(result_index_add['idMap'])}")

    payload = {"ids": ids_to_remove, "indexKey": result_index_build["indexKey"]} resp_index_remove = requests.post(f"{API_BASE_URL}/shitu-index-remove", json=payload) if resp_index_remove.status_code != 200:

    print(f&quot;Request to shitu-index-remove failed with status code {resp_index_remove}.&quot;)
    pprint.pp(resp_index_remove.json())
    sys.exit(1)
    

    result_index_remove = resp_index_remove.json()["result"] print(f"Number of images indexed: {len(result_index_remove['idMap'])}")

    with open(infer_image_path, "rb") as file:

    image_bytes = file.read()
    image_data = base64.b64encode(image_bytes).decode(&quot;ascii&quot;)
    

    payload = {"image": image_data, "indexKey": result_index_build["indexKey"]} resp_infer = requests.post(f"{API_BASE_URL}/shitu-infer", json=payload) if resp_infer.status_code != 200:

    print(f&quot;Request to shitu-infer failed with status code {resp_infer}.&quot;)
    pprint.pp(resp_infer.json())
    sys.exit(1)
    

    result_infer = resp_infer.json()["result"]

    with open(output_image_path, "wb") as file:

    file.write(base64.b64decode(result_infer[&quot;image&quot;]))
    

    print(f"Output image saved at {output_image_path}") print("\nDetected objects:") pprint.pp(result_infer["detectedObjects"])


    📱 Edge Deployment: Edge deployment is a method where computation and data processing functions are placed on the user's device itself, allowing the device to process data directly without relying on remote servers. PaddleX supports deploying models on edge devices such as Android. For detailed edge deployment processes, please refer to the PaddleX Edge Deployment Guide. You can choose the appropriate method to deploy the model production line based on your needs for subsequent AI application integration.

    4. Secondary Development

    If the default model weights provided by the general image recognition production line do not meet your accuracy or speed requirements in your scenario, you can try further fine-tuning the existing model using your own specific domain or application scenario data to improve the recognition performance of the production line in your scenario.

    4.1 Model Fine-Tuning

    Since the general image recognition production line includes two modules (main body detection module and image feature module), the suboptimal performance of the model production line may come from either module.

    You can analyze the images with poor recognition results. If you find that many main body targets are not detected during the analysis, it may be due to the inadequacy of the main body detection model. You need to refer to the Main Body Detection Module Development Tutorial in the Secondary Development section to fine-tune the main body detection model using your private dataset. If there are matching errors in the detected main bodies, it indicates that the image feature model needs further improvement. You need to refer to the Image Feature Module Development Tutorial in the Secondary Development section to fine-tune the image feature model.

    4.2 Model Application

    After completing the fine-tuning training with your private dataset, you will obtain a local model weight file.

    If you need to use the fine-tuned model weights, simply modify the production line configuration file by replacing the local path of the fine-tuned model weights in the corresponding position in the configuration file:

    
    ...
    
    SubModules:
      Detection:
        module_name: text_detection
        model_name: PP-ShiTuV2_det
        model_dir: null #可修改为微调后主体检测模型的本地路径
        batch_size: 1
      Recognition:
        module_name: text_recognition
        model_name: PP-ShiTuV2_rec
        model_dir: null #可修改为微调后图像特征模型的本地路径
        batch_size: 1
    

    Subsequently, refer to the command line method or Python script method in 2.2 Local Experience to load the modified production line configuration file.

    5. Multi-Hardware Support

    PaddleX supports various mainstream hardware devices such as NVIDIA GPU, Kunlunxin XPU, Ascend NPU, and Cambricon MLU. You only need to modify the --device parameter to achieve seamless switching between different hardware.

    For example, when running the general image recognition production line using Python, to change the running device from NVIDIA GPU to Ascend NPU, you only need to modify the device in the script to npu:

    from paddlex import create_pipeline
    
    pipeline = create_pipeline(
        pipeline="PP-ShiTuV2",
        device="npu:0" # gpu:0 --> npu:0
        )
    

    If you want to use the general image recognition pipeline on more types of hardware, please refer to the PaddleX Multi-Hardware Usage Guide.