简体中文 | English
| Pipeline Name | Pipeline Modules | Baidu AIStudio Community Experience URL | Pipeline Introduction | Applicable Scenarios | |-|-|-|-|-| | General Image Classification | Image Classification | Online Experience | Image classification is a technique that assigns images to predefined categories. It is widely used in object recognition, scene understanding, and automatic annotation. Image classification can identify various objects such as animals, plants, traffic signs, etc., and categorize them based on their features. By leveraging deep learning models, image classification can automatically extract image features and perform accurate classification. The General Image Classification Pipeline is designed to solve image classification tasks for given images. | Automatic classification and recognition of product images, real-time monitoring of defective products on production lines, personnel recognition in security surveillance | | General Object Detection | Object Detection | Online Experience | Object detection aims to identify the categories and locations of multiple objects in images or videos by generating bounding boxes to mark these objects. Unlike simple image classification, object detection not only recognizes what objects are in the image, such as people, cars, and animals, but also accurately determines the specific location of each object, usually represented by a rectangular box. This technology is widely used in autonomous driving, surveillance systems, and smart photo albums, relying on deep learning models (e.g., YOLO, Faster R-CNN) that efficiently extract features and perform real-time detection, significantly enhancing the computer's ability to understand image content. | Tracking moving objects in video surveillance, vehicle detection in autonomous driving, defect detection in industrial manufacturing, shelf product detection in retail | | General Semantic Segmentation | Semantic Segmentation | Online Experience | Semantic segmentation is a computer vision technique that assigns each pixel in an image to a specific category, enabling detailed understanding of image content. Semantic segmentation not only identifies the types of objects in an image but also classifies each pixel, allowing entire regions of the same category to be marked. For example, in a street scene image, semantic segmentation can distinguish pedestrians, cars, sky, and roads at the pixel level, forming a detailed label map. This technology is widely used in autonomous driving, medical image analysis, and human-computer interaction, often relying on deep learning models (e.g., FCN, U-Net) that use Convolutional Neural Networks (CNNs) to extract features and achieve high-precision pixel-level classification, providing a foundation for further intelligent analysis. | Analysis of satellite images in Geographic Information Systems, segmentation of obstacles and passable areas in robot vision, separation of foreground and background in film production | | General Instance Segmentation | Instance Segmentation | Online Experience | Instance segmentation is a computer vision task that identifies object categories in images and distinguishes the pixels of different instances within the same category, enabling precise segmentation of each object. Instance segmentation can separately mark each car, person, or animal in an image, ensuring they are processed independently at the pixel level. For example, in a street scene image with multiple cars and pedestrians, instance segmentation can clearly separate the contours of each car and person, forming multiple independent region labels. This technology is widely used in autonomous driving, video surveillance, and robot vision, often relying on deep learning models (e.g., Mask R-CNN) that use CNNs for efficient pixel classification and instance differentiation, providing powerful support for understanding complex scenes. | Crowd counting in malls, counting crops or fruits in agricultural intelligence, selecting and segmenting specific objects in image editing | | General OCR | Text Detection, Text Recognition | Online Experience | OCR (Optical Character Recognition) is a technology that converts text in images into editable text. It is widely used in document digitization, information extraction, and data processing. OCR can recognize printed text, handwritten text, and even certain types of fonts and symbols. The General OCR Pipeline is designed to solve text recognition tasks, extracting text information from images and outputting it in text form. PP-OCRv4 is an end-to-end OCR system that achieves millisecond-level text content prediction on CPUs, achieving state-of-the-art (SOTA) performance in general scenarios. Based on this project, developers from academia, industry, and research have quickly implemented various OCR applications covering general, manufacturing, finance, transportation,
| Pipeline Name | Pipeline Modules | Baidu AIStudio Community Experience Link | Pipeline Introduction | Applicable Scenarios | |-|-|-|-|-| | Semi-supervised Learning for Large Models - Image Classification | Semi-supervised Learning for Large Models - Image Classification | Online Experience | Image classification is a technique that assigns images to predefined categories. It is widely used in object recognition, scene understanding, and automatic annotation. Image classification can identify various objects such as animals, plants, traffic signs, etc., and categorize them based on their features. By leveraging deep learning models, image classification can automatically extract image features and perform accurate classification. The general image classification pipeline is designed to solve image classification tasks for given images. | When training data is insufficient, for tasks such as commodity image classification, artwork style classification, crop disease and pest identification, animal species recognition, and classification of land, water bodies, and buildings in satellite remote sensing images. | | Semi-supervised Learning for Large Models - Object Detection | Semi-supervised Learning for Large Models - Object Detection | Online Experience | The semi-supervised learning for large models - object detection pipeline is a unique offering from PaddlePaddle. It utilizes a joint training approach with large and small models, leveraging a small amount of labeled data and a large amount of unlabeled data to enhance model accuracy, significantly reducing the costs of manual model iteration and data annotation. The figure below demonstrates the performance of this pipeline on the COCO dataset with 10% labeled data. After training with this pipeline, on COCO 10% labeled data + 90% unlabeled data, the large model (RT-DETR-H) achieves an 8.4% higher accuracy (47.7% -> 56.1%), setting a new state-of-the-art (SOTA) for this dataset. The small model (PicoDet-S) also achieves over 10% higher accuracy (18.3% -> 28.8%) compared to direct training. | When training data is insufficient, for tasks such as pedestrian, vehicle, and traffic sign detection in autonomous driving, enemy facility and equipment detection in military reconnaissance, and seabed organism detection in deep-sea exploration. | | Semi-supervised Learning for Large Models - OCR | Text Detection & Large Model Semi-supervised Learning - Text Recognition | Online Experience | The semi-supervised learning for large models - OCR pipeline is a unique OCR training pipeline from PaddlePaddle. It consists of a text detection model and a text recognition model working in series. The input image is first processed by the text detection model to obtain and rectify all text line bounding boxes, which are then fed into the text recognition model to generate OCR text results. In the text recognition part, a joint training approach with large and small models is adopted, utilizing a small amount of labeled data and a large amount of unlabeled data to enhance model accuracy, significantly reducing the costs of manual model iteration and data annotation. The figure below shows the effects of this pipeline in two OCR application scenarios, demonstrating significant improvements for both large and small models in different contexts. | When training data is insufficient, for tasks such as digitizing paper documents, reading and verifying personal information on IDs, passports, and driver's licenses, and recognizing product information in retail. | | General Scene Information Extraction v2 | Text Detection & Text Recognition | Online Experience | The General Scene Information Extraction Pipeline (PP-ChatOCRv2-common) is a unique intelligent analysis solution for complex documents from PaddlePaddle. It combines Large Language Models (LLMs) and OCR technology, leveraging the Wenxin Large Model to integrate massive data and knowledge, achieving high accuracy and wide applicability. The system flow of PP-ChatOCRv2-common is as follows: Input the prediction image, send it to the general OCR system, predict text through text detection and text recognition models, perform vector retrieval between the predicted text and user queries to obtain relevant text information, and finally pass these text information to the prompt generator to recombine them into prompts for the Wenxin Large Model to generate prediction results. | Key information extraction from various scenarios such as ID cards, bank cards, household registration books, train tickets, and paper invoices. | | Document Scene Information Extraction v2 | Layout Analysis, Text Detection, Text Recognition, Table Recognition | [Online Experience](https://aistudio.baidu.com/community