Pārlūkot izejas kodu

update npu pipeline list (#3904)

a31413510 6 mēneši atpakaļ
vecāks
revīzija
735f380d78

+ 254 - 36
docs/support_list/pipelines_list_npu.en.md

@@ -10,12 +10,12 @@ comments: true
   <tr>
     <th width="10%">Pipeline Name</th>
     <th width="10%">Pipeline Modules</th>
-    <th width="10%">Baidu AIStudio Community Experience URL</th>
+    <th width="10%">Baidu AI Studio Community Experience URL</th>
     <th width="50%">Pipeline Introduction</th>
     <th width="20%">Applicable Scenarios</th>
   </tr>
   <tr>
-    <td>General Image Classification</td>
+    <td>Image Classification</td>
     <td>Image Classification</td>
     <td><a href="https://aistudio.baidu.com/community/app/100061/webUI">Online Experience</a></td>
     <td>Image classification is a technique that assigns images to predefined categories. It is widely used in object recognition, scene understanding, and automatic annotation. Image classification can identify various objects such as animals, plants, traffic signs, etc., and categorize them based on their features. By leveraging deep learning models, image classification can automatically extract image features and perform accurate classification. The General Image Classification Pipeline is designed to solve image classification tasks for given images.</td>
@@ -28,7 +28,7 @@ comments: true
     </td>
   </tr>
   <tr>
-    <td>General Object Detection</td>
+    <td>Object Detection</td>
     <td>Object Detection</td>
     <td><a href="https://aistudio.baidu.com/community/app/70230/webUI">Online Experience</a></td>
     <td>Object detection aims to identify the categories and locations of multiple objects in images or videos by generating bounding boxes to mark these objects. Unlike simple image classification, object detection not only recognizes what objects are in the image, such as people, cars, and animals, but also accurately determines the specific location of each object, usually represented by a rectangular box. This technology is widely used in autonomous driving, surveillance systems, and smart photo albums, relying on deep learning models (e.g., YOLO, Faster R-CNN) that efficiently extract features and perform real-time detection, significantly enhancing the computer's ability to understand image content.</td>
@@ -42,7 +42,7 @@ comments: true
     </td>
   </tr>
   <tr>
-    <td>General Semantic Segmentation</td>
+    <td>Semantic Segmentation</td>
     <td>Semantic Segmentation</td>
     <td><a href="https://aistudio.baidu.com/community/app/100062/webUI?source=appCenter">Online Experience</a></td>
     <td>Semantic segmentation is a computer vision technique that assigns each pixel in an image to a specific category, enabling detailed understanding of image content. Semantic segmentation not only identifies the types of objects in an image but also classifies each pixel, allowing entire regions of the same category to be marked. For example, in a street scene image, semantic segmentation can distinguish pedestrians, cars, sky, and roads at the pixel level, forming a detailed label map. This technology is widely used in autonomous driving, medical image analysis, and human-computer interaction, often relying on deep learning models (e.g., FCN, U-Net) that use Convolutional Neural Networks (CNNs) to extract features and achieve high-precision pixel-level classification, providing a foundation for further intelligent analysis.</td>
@@ -55,7 +55,7 @@ comments: true
     </td>
   </tr>
   <tr>
-    <td>General Instance Segmentation</td>
+    <td>Instance Segmentation</td>
     <td>Instance Segmentation</td>
     <td><a href="https://aistudio.baidu.com/community/app/100063/webUI">Online Experience</a></td>
     <td>Instance segmentation is a computer vision task that identifies object categories in images and distinguishes the pixels of different instances within the same category, enabling precise segmentation of each object. Instance segmentation can separately mark each car, person, or animal in an image, ensuring they are processed independently at the pixel level. For example, in a street scene image with multiple cars and pedestrians, instance segmentation can clearly separate the contours of each car and person, forming multiple independent region labels. This technology is widely used in autonomous driving, video surveillance, and robot vision, often relying on deep learning models (e.g., Mask R-CNN) that use CNNs for efficient pixel classification and instance differentiation, providing powerful support for understanding complex scenes.</td>
@@ -133,43 +133,59 @@ comments: true
       <td>Document-based Vision-Language Model</td>
   </tr>
   <tr>
-    <td rowspan = 2>General OCR</td>
-    <td >Text Detection</td>
-    <td rowspan = 2><a href="https://aistudio.baidu.com/community/app/91660/webUI?source=appMineRecent">Online Experience</a></td>
-    <td rowspan = 2>OCR (Optical Character Recognition) is a technology that converts text in images into editable text. It is widely used in document digitization, information extraction, and data processing. OCR can recognize printed text, handwritten text, and even certain types of fonts and symbols. The General OCR Pipeline is designed to solve text recognition tasks, extracting text information from images and outputting it in text form. PP-OCRv4 is an end-to-end OCR system that achieves millisecond-level text content prediction on CPUs, achieving state-of-the-art (SOTA) performance in general scenarios. Based on this project, developers from academia, industry, and research have quickly implemented various OCR applications covering general, manufacturing, finance, transportation.</td>
-    <td rowspan = 2>
-      <ul>
-        <li>Document digitization</li>
-        <li>Information extraction</li>
-        <li>Data processing</li>
-      </ul>
+    <td rowspan="5">General OCR</td>
+    <td>Text Detection</td>
+    <td rowspan="5"><a href="https://aistudio.baidu.com/community/app/91660/webUI?source=appMineRecent">Online Experience</a></td>
+    <td rowspan="5">OCR (Optical Character Recognition) is a technology that converts text in images into editable text. It is widely used in document digitization, information extraction, and data processing. OCR can recognize printed text, handwritten text, and even certain types of fonts and symbols. General OCR is used to solve text recognition tasks, extracting text information from images and outputting it in text form. PP-OCRv4 is an end-to-end OCR system that can achieve millisecond-level accurate text prediction on CPUs, reaching open-source SOTA in general scenarios. Based on this project, many developers from academia, industry, and research have quickly implemented multiple OCR applications, covering various fields such as general, manufacturing, finance, and transportation.</td>
+    <td rowspan="5">
+    <ul>
+        <li>License plate recognition in intelligent security</li>
+        <li>Recognition of house numbers and other information</li>
+        <li>Digitization of paper documents</li>
+        <li>Recognition of ancient characters in cultural heritage</li>
+    </ul>
     </td>
-  </tr>
-    <tr>
+</tr>
+<tr>
     <td>Text Recognition</td>
-  </tr>
-  <tr>
-        <td rowspan = 4>General Table Recognition</td>
-        <td>Layout Detection</td>
-        <td rowspan = 4><a href="https://aistudio.baidu.com/community/app/91661/webUI">Online Experience</a></td>
-        <td rowspan = 4>Table recognition is a technology that automatically identifies and extracts table content and its structure from documents or images. It is widely used in data entry, information retrieval, and document analysis. By leveraging computer vision and machine learning algorithms, table recognition can convert complex table information into editable formats, facilitating further data processing and analysis by users</td>
-<td rowspan = 4>
+</tr>
+<tr>
+    <td>Document Image Orientation Classification </td>
+</tr>
+<tr>
+    <td>Text Image Unwarping </td>
+</tr>
+<tr>
+    <td>Text Line Orientation Classification </td>
+</tr>
+<tr>
+    <td rowspan="6">General Table Recognition</td>
+    <td>Table Structure Recognition</td>
+    <td rowspan="6"><a href="https://aistudio.baidu.com/community/app/91661/webUI">Online Experience</a></td>
+    <td rowspan="6">Table recognition is a technology that automatically identifies and extracts table content and structure from documents or images. It is widely used in data entry, information retrieval, and document analysis. By using computer vision and machine learning algorithms, table recognition can convert complex table information into an editable format, facilitating further processing and analysis by users.</td>
+    <td rowspan="6">
     <ul>
         <li>Processing of bank statements</li>
-        <li>recognition and extraction of various indicators in medical reports</li>
-        <li>extraction of tabular information from contracts</li>
-      </ul>
-      </td>
-   </tr>
-  <tr>
-    <td>Table Structure Recognition </td>
-  </tr>
-  <tr>
+        <li>Recognition and extraction of indicators in medical reports</li>
+        <li>Extraction of table information in contracts</li>
+    </ul>
+    </td>
+</tr>
+<tr>
     <td>Text Detection</td>
-  </tr>
-  <tr>
+</tr>
+<tr>
     <td>Text Recognition</td>
-  </tr>
+</tr>
+<tr>
+    <td>Layout Detection </td>
+</tr>
+<tr>
+    <td>Doc Img Orientation Classification </td>
+</tr>
+<tr>
+    <td>Text Image Unrapping </td>
+</tr>
     <tr>
         <td>Time Series Forecasting</td>
         <td>Time Series Forecasting Module</td>
@@ -219,6 +235,208 @@ comments: true
         <li>Equipment Operating Condition Classification</li>
       </ul>
       </td>
+<tr>
+    <td>Multi-label Image Classification</td>
+    <td>Multi-label Image Classification</td>
+    <td><a href="https://aistudio.baidu.com/community/app/387974/webUI?source=appCenter">Online Experience</a></td>
+    <td>Image multi-label classification is a technology that assigns an image to multiple related categories simultaneously. It is widely used in image tagging, content recommendation, and social media analysis. It can identify multiple objects or features present in an image, such as both "dog" and "outdoor" labels in a single picture. By using deep learning models, image multi-label classification can automatically extract image features and perform accurate classification to provide more comprehensive information for users. This technology is significant in applications like intelligent search engines and automatic content generation.</td>
+    <td>
+    <ul>
+        <li>Medical image diagnosis</li>
+        <li>Complex scene recognition</li>
+        <li>Multi-target monitoring</li>
+        <li>Product attribute recognition</li>
+        <li>Ecological environment monitoring</li>
+        <li>Security monitoring</li>
+        <li>Disaster warning</li>
+      </ul>
+      </td>
+  </tr>
+  <tr>
+    <td>Small Object Detection</td>
+    <td>Small Object Detection</td>
+    <td><a href="https://aistudio.baidu.com/community/app/387975/webUI?source=appCenter">Online Experience</a></td>
+    <td>Small object detection is a technology specifically for identifying small objects in images. It is widely used in surveillance, autonomous driving, and satellite image analysis. It can accurately find and classify small-sized objects like pedestrians, traffic signs, or small animals in complex scenes. By using deep learning algorithms and optimized convolutional neural networks, small object detection can effectively enhance the recognition ability of small objects, ensuring that important information is not missed in practical applications. This technology plays an important role in improving safety and automation levels.</td>
+    <td>
+  <ul>
+    <li>Pedestrian detection in autonomous vehicles</li>
+    <li>Identification of small buildings in satellite images</li>
+    <li>Detection of small traffic signs in intelligent transportation systems</li>
+    <li>Identification of small intruding objects in security surveillance</li>
+    <li>Detection of small defects in industrial inspection</li>
+    <li>Monitoring of small animals in drone images</li>
+  </ul>
+</td>
+  </tr>
+  <tr>
+    <td>Image Anomaly Detection</td>
+    <td>Image Anomaly Detection</td>
+    <td>None</td>
+    <td>Image anomaly detection is a technology that identifies images that deviate from or do not conform to normal patterns by analyzing their content. It is widely used in industrial quality inspection, medical image analysis, and security surveillance. By using machine learning and deep learning algorithms, image anomaly detection can automatically identify potential defects, anomalies, or abnormal behavior in images, helping us detect problems and take appropriate measures promptly. Image anomaly detection systems are designed to automatically detect and label abnormal situations in images to improve work efficiency and accuracy.</td>
+    <td>
+    <ul>
+    <li>Industrial quality control</li>
+    <li>Medical image analysis</li>
+    <li>Anomaly detection in surveillance videos</li>
+    <li>Identification of violations in traffic monitoring</li>
+    <li>Obstacle detection in autonomous driving</li>
+    <li>Agricultural pest and disease monitoring</li>
+    <li>Pollutant identification in environmental monitoring</li>
+  </ul></td>
+  </tr>
+  <tr>
+    <td rowspan="10">General Layout Parsing</td>
+    <td>Layout Detection</td>
+    <td rowspan="10">None</td>
+    <td rowspan="10">Layout parsing is a technology that extracts structured information from document images, primarily used to convert complex document layouts into machine-readable data formats. This technology is widely applied in document management, information extraction, and data digitization. By combining Optical Character Recognition (OCR), image processing, and machine learning algorithms, layout parsing can identify and extract text blocks, headings, paragraphs, images, tables, and other layout elements from documents. The process typically includes three main steps: layout analysis, element analysis, and data formatting, ultimately generating structured document data to enhance the efficiency and accuracy of data processing.</td>
+    <td rowspan="10">
+        <ul>
+            <li>Analysis of financial and legal documents</li>
+            <li>Digitization of historical documents and archives</li>
+            <li>Automated form filling</li>
+            <li>Page structure parsing</li>
+        </ul>
+    </td>
+</tr>
+<tr>
+    <td>Layout Detection Module</td>
+</tr>
+<tr>
+    <td>Text Detection Module</td>
+</tr>
+<tr>
+    <td>Text Recognition Module</td>
+</tr>
+<tr>
+    <td>Doc Img Orientation Classification</td>
+</tr>
+<tr>
+    <td>Text Image Unrapping</td>
+</tr>
+<tr>
+    <td>Table Structure Recognition</td>
+</tr>
+<tr>
+    <td>Text Line Orientation Classification</td>
+</tr>
+<tr>
+    <td>Formula Recognition</td>
+</tr>
+<tr>
+    <td>Seal Text Detection</td>
+</tr>
+<tr>
+    <td rowspan="4">Formula Recognition</td>
+    <td>Formula Recognition</td>
+    <td rowspan="4"><a href="https://aistudio.baidu.com/community/app/387976/webUI?source=appCenter">Online Experience</a></td>
+    <td rowspan="4">Formula recognition is a technology that automatically identifies and extracts LaTeX formula content and structure from documents or images. It is widely used in document editing and data analysis in fields such as mathematics, physics, and computer science. By using computer vision and machine learning algorithms, formula recognition can convert complex mathematical formula information into editable LaTeX format, facilitating further processing and analysis by users.</td>
+    <td rowspan="4">
+        <ul>
+            <li>Document digitization and retrieval</li>
+            <li>Formula search engine</li>
+            <li>Formula editor</li>
+            <li>Automated typesetting</li>
+        </ul>
+    </td>
+</tr>
+<tr>
+    <td>Layout Detection Module </td>
+</tr>
+<tr>
+    <td>Doc Img Orientation Classification </td>
+</tr>
+<tr>
+    <td>Text Image Unrapping</td>
+</tr>
+<tr>
+    <td rowspan="5">Seal Text Recognition</td>
+    <td>Seal Text Detection</td>
+    <td rowspan="5"><a href="https://aistudio.baidu.com/community/app/387977/webUI?source=appCenter">Online Experience</a></td>
+    <td rowspan="5">Seal text recognition is a technology that automatically extracts and identifies seal content from documents or images. Seal text recognition is a part of document processing and is useful in many scenarios, such as contract comparison, inventory audit, and invoice reimbursement review.</td>
+    <td rowspan="5">
+        <ul>
+            <li>Contract and agreement verification</li>
+            <li>Check processing</li>
+            <li>Loan approval</li>
+            <li>Legal document management</li>
+        </ul>
+    </td>
+</tr>
+<tr>
+    <td>Text Recognition</td>
+</tr>
+<tr>
+    <td>Layout Detection </td>
+</tr>
+<tr>
+    <td>Doc Img Orientation Classification </td>
+</tr>
+<tr>
+    <td>Text Image Unrapping</td>
+</tr>
+<tr>
+    <td rowspan = 2>General Image Recognition</td>
+    <td>Mainbody Detection</td>
+    <td rowspan = 2>None</td>
+    <td rowspan = 2>The general image recognition pipeline is designed to address open-domain target localization and recognition issues. It can effectively identify and differentiate various target objects in different environments and conditions, making it widely applicable in autonomous driving, intelligent security, medical image analysis, and industrial automation, among other fields.</td>
+    <td rowspan = 2>
+    <ul>
+        <li>Automated Identity Verification</li>
+        <li>Unmanned Retail</li>
+        <li>Autonomous Driving</li>
+      </ul>
+      </td>
+  </tr>
+  <tr>
+    <td>Image Features</td>
+  </tr>
+  <tr>
+    <td rowspan = 2>Pedestrian Attribute Recognition</td>
+    <td>Pedestrian Detection</td>
+    <td rowspan = 2>None</td>
+    <td rowspan = 2>Pedestrian attribute recognition is a key function in computer vision systems used to locate and tag specific features of pedestrians in images or videos, such as gender, age, clothing color, and style.</td>
+    <td rowspan = 2>
+    <ul>
+        <li>Smart City</li>
+        <li>Security Monitoring</li>
+      </ul>
+      </td>
+  </tr>
+  <tr>
+    <td>Pedestrian Attribute Recognition</td>
+  </tr>
+  <tr>
+    <td rowspan = 2>Vehicle Attribute Recognition</td>
+    <td>Vehicle Detection</td>
+    <td rowspan = 2>None</td>
+    <td rowspan = 2>Vehicle attribute recognition is an important component of computer vision systems. Its main task is to locate and tag specific attributes of vehicles in images or videos, such as vehicle type, color, and license plate number. This task not only requires accurate detection of vehicles but also the recognition of detailed attribute information for each vehicle.</td>
+    <td rowspan = 2>
+    <ul>
+        <li>Intelligent Parking</li>
+        <li>Traffic Management</li>
+        <li>Autonomous Driving</li>
+      </ul>
+      </td>
+  </tr>
+  <tr>
+    <td>Vehicle Attribute Recognition</td>
+  </tr>
+<tr>
+    <td rowspan="2">Document Image Preprocessing</td>
+    <td>Doc Img Orientation Classification</td>
+    <td rowspan="2">Not Available</td>
+    <td rowspan="2">Document image preprocessing is a key step in document analysis and recognition, aiming to optimize document images through a series of technical means to improve the accuracy and efficiency of subsequent processing. Document image preprocessing includes operations such as orientation classification, text rectification, noise removal, and binarization, which can effectively improve image quality, correct document orientation, and remove interference factors. This technology is widely used in document scanning, OCR text recognition, and electronic document generation.</td>
+    <td rowspan="2">
+    <ul>
+        <li>Automatic orientation correction in document scanners</li>
+        <li>Text image optimization in OCR systems</li>
+        <li>Image restoration in historical document digitization</li>
+    </ul>
+    </td>
+</tr>
+<tr>
+    <td>Text Image Unrapping</td>
+</tr>
 </table>
 
 ## 2. Featured Pipelines

+ 228 - 12
docs/support_list/pipelines_list_npu.md

@@ -134,11 +134,11 @@ comments: true
     <td>文档类视觉语言模型</td>
   </tr>
   <tr>
-    <td rowspan = 2>通用OCR</td>
+    <td rowspan = 5>通用OCR</td>
     <td>文本检测</td>
-    <td rowspan = 2><a href="https://aistudio.baidu.com/community/app/91660/webUI?source=appMineRecent">在线体验</a></td>
-    <td rowspan = 2>OCR(光学字符识别,Optical Character Recognition)是一种将图像中的文字转换为可编辑文本的技术。它广泛应用于文档数字化、信息提取和数据处理等领域。OCR 可以识别印刷文本、手写文本,甚至某些类型的字体和符号。 通用 OCR 产线用于解决文字识别任务,提取图片中的文字信息以文本形式输出,PP-OCRv4 是一个端到端 OCR 串联系统,可实现 CPU 上毫秒级的文本内容精准预测,在通用场景上达到开源SOTA。基于该项目,产学研界多方开发者已快速落地多个 OCR 应用,使用场景覆盖通用、制造、金融、交通等各个领域。</td>
-    <td rowspan = 2>
+    <td rowspan = 5><a href="https://aistudio.baidu.com/community/app/91660/webUI?source=appMineRecent">在线体验</a></td>
+    <td rowspan = 5>OCR(光学字符识别,Optical Character Recognition)是一种将图像中的文字转换为可编辑文本的技术。它广泛应用于文档数字化、信息提取和数据处理等领域。OCR 可以识别印刷文本、手写文本,甚至某些类型的字体和符号。 通用 OCR 产线用于解决文字识别任务,提取图片中的文字信息以文本形式输出,PP-OCRv4 是一个端到端 OCR 串联系统,可实现 CPU 上毫秒级的文本内容精准预测,在通用场景上达到开源SOTA。基于该项目,产学研界多方开发者已快速落地多个 OCR 应用,使用场景覆盖通用、制造、金融、交通等各个领域。</td>
+    <td rowspan = 5>
     <ul>
         <li>智能安防中车牌号</li>
         <li>门牌号等信息的识别</li>
@@ -151,11 +151,19 @@ comments: true
     <td>文本识别</td>
   </tr>
   <tr>
-  <td rowspan = 4>通用表格识别</td>
-    <td>版面区域检测</td>
-    <td rowspan = 4><a href="https://aistudio.baidu.com/community/app/91661/webUI">在线体验</a></td>
-    <td rowspan = 4>表格识别是一种自动从文档或图像中识别和提取表格内容及其结构的技术,广泛应用于数据录入、信息检索和文档分析等领域。通过使用计算机视觉和机器学习算法,表格识别能够将复杂的表格信息转换为可编辑的格式,方便用户进一步处理和分析数据。</td>
-    <td rowspan = 4>
+    <td>文档图像方向分类</td>
+  </tr>
+  <tr>
+    <td>文本图像矫正</td>
+  </tr>
+  <tr>
+    <td>文本行方向分类</td>
+  </tr>
+    <td rowspan = 6>通用表格识别</td>
+    <td>表格结构识别</td>
+    <td rowspan = 6><a href="https://aistudio.baidu.com/community/app/91661/webUI">在线体验</a></td>
+    <td rowspan = 6>表格识别是一种自动从文档或图像中识别和提取表格内容及其结构的技术,广泛应用于数据录入、信息检索和文档分析等领域。通过使用计算机视觉和机器学习算法,表格识别能够将复杂的表格信息转换为可编辑的格式,方便用户进一步处理和分析数据。</td>
+    <td rowspan = 6>
     <ul>
         <li>银行账单的处理</li>
         <li>医疗报告中各项指标的识别和提取</li>
@@ -164,15 +172,21 @@ comments: true
       </td>
    </tr>
   <tr>
-    <td>表格结构识别</td>
-  </tr>
-  <tr>
     <td>文本检测</td>
   </tr>
   <tr>
     <td>文本识别</td>
   </tr>
   <tr>
+    <td>版面区域检测</td>
+  </tr>
+  <tr>
+    <td>文档图像方向分类</td>
+  </tr>
+  <tr>
+    <td>文本图像矫正</td>
+  </tr>
+  <tr>
     <td>时序预测</td>
     <td>时序预测</td>
     <td><a href="https://aistudio.baidu.com/community/app/105706/webUI?source=appCenter">在线体验</a></td>
@@ -222,7 +236,209 @@ comments: true
       </ul>
       </td>
   </tr>
+   <tr>
+    <td>图像多标签分类</td>
+    <td>图像多标签分类</td>
+    <td><a href="https://aistudio.baidu.com/community/app/387974/webUI?source=appCenter">在线体验</a></td>
+    <td>图像多标签分类是一种将一张图像同时分配到多个相关类别的技术,广泛应用于图像标注、内容推荐和社交媒体分析等领域。它能够识别图像中存在的多个物体或特征,例如一张图片中同时包含“狗”和“户外”这两个标签。通过使用深度学习模型,图像多标签分类能够自动提取图像特征并进行准确分类,以便为用户提供更加全面的信息。这项技术在智能搜索引擎和自动内容生成等应用中具有重要意义。</td>
+    <td>
+    <ul>
+        <li>医学影像诊断</li>
+        <li>复杂场景识别</li>
+        <li>多目标监控</li>
+        <li>商品属性识别</li>
+        <li>生态环境监测</li>
+        <li>安全监控</li>
+        <li>灾害预警</li>
+      </ul>
+      </td>
+  </tr>
+  <tr>
+    <td>小目标检测</td>
+    <td>小目标检测</td>
+    <td><a href="https://aistudio.baidu.com/community/app/387975/webUI?source=appCenter">在线体验</a></td>
+    <td>小目标检测是一种专门识别图像中体积较小物体的技术,广泛应用于监控、无人驾驶和卫星图像分析等领域。它能够从复杂场景中准确找到并分类像行人、交通标志或小动物等小尺寸物体。通过使用深度学习算法和优化的卷积神经网络,小目标检测可以有效提升对小物体的识别能力,确保在实际应用中不遗漏重要信息。这项技术在提高安全性和自动化水平方面发挥着重要作用。</td>
+    <td>
+  <ul>
+    <li>无人驾驶汽车中的行人检测</li>
+    <li>卫星图像中的小型建筑物识别</li>
+    <li>智能交通系统中的小型交通标志检测</li>
+    <li>安防监控中的小型入侵物体识别</li>
+    <li>工业检测中的微小瑕疵检测</li>
+    <li>无人机图像中的小型动物监测</li>
+  </ul>
+</td>
+  </tr>
+  <tr>
+    <td>图像异常检测</td>
+    <td>图像异常检测</td>
+    <td>暂无</td>
+    <td>图像异常检测是一种通过分析图像中的内容,来识别与众不同或不符合正常模式的图像处理技术。它广泛应用于工业质量检测、医疗影像分析和安全监控等领域。通过使用机器学习和深度学习算法,图像异常检测能够自动识别出图像中潜在的缺陷、异常或异常行为,从而帮助我们及时发现问题并采取相应措施。图像异常检测系统被设计用于自动检测和标记图像中的异常情况,以提高工作效率和准确性。</td>
+    <td>
+    <ul>
+    <li>工业质量控制</li>
+    <li>医疗影像分析</li>
+    <li>监控视频异常检测</li>
+    <li>交通监控中的违规行为识别</li>
+    <li>自动驾驶中的障碍物检测</li>
+    <li>农业病虫害监测</li>
+    <li>环境监测中的污染物识别</li>
+  </ul></td>
+  </tr>
+  <tr>
+    <td rowspan = 10>通用版面解析</td>
+    <td>版面区域检测</td>
+    <td rowspan = 10>暂无</td>
+    <td rowspan = 10>版面解析是一种从文档图像中提取结构化信息的技术,主要用于将复杂的文档版面转换为机器可读的数据格式。这项技术在文档管理、信息提取和数据数字化等领域具有广泛的应用。版面解析通过结合光学字符识别(OCR)、图像处理和机器学习算法,能够识别和提取文档中的文本块、标题、段落、图片、表格以及其他版面元素。此过程通常包括版面分析、元素分析和数据格式化三个主要步骤,最终生成结构化的文档数据,提升数据处理的效率和准确性。</td>
+    <td rowspan="10">
+  <ul>
+    <li>金融与法律文档分析</li>
+    <li>历史文献和档案数字化</li>
+    <li>自动化表单填写</li>
+    <li>页面结构解析</li>
+  </ul>
+</td>
+  </tr>
+  <tr>
+    <td>版面区域检测模块</td>
+  </tr>
+  <tr>
+    <td>文本检测模块</td>
+  </tr>
+  <tr>
+    <td>文本识别模块</td>
+  </tr>
+  <tr>
+    <td>文档图像方向分类模块</td>
+  </tr>
+  <tr>
+    <td>文本图像矫正模块</td>
+  </tr>
+  <tr>
+    <td>表格结构识别模块</td>
+  </tr>
+  <tr>
+    <td>文本行方向分类模块</td>
+  </tr>
+  <tr>
+    <td>公式识别模块</td>
+  </tr>
+  <tr>
+    <td>印章文本检测模块</td>
+  </tr>
+  <tr>
+    <td rowspan = 4>公式识别</td>
+    <td>公式识别模块</td>
+    <td rowspan = 4><a href="https://aistudio.baidu.com/community/app/387976/webUI?source=appCenter">在线体验</a></td>
+    <td rowspan = 4>公式识别是一种自动从文档或图像中识别和提取LaTeX公式内容及其结构的技术,广泛应用于数学、物理、计算机科学等领域的文档编辑和数据分析。通过使用计算机视觉和机器学习算法,公式识别能够将复杂的数学公式信息转换为可编辑的LaTeX格式,方便用户进一步处理和分析数据。</td>
+    <td rowspan = 4>
+    <ul>
+        <li>文档数字化与检索</li>
+        <li>公式搜索引擎</li>
+        <li>公式编辑器</li>
+        <li>自动化排版</li>
+      </ul>
+      </td>
+  </tr>
+  <tr>
+    <td>版面区域检测模块</td>
+  </tr>
+  <tr>
+    <td>文档图像方向分类模块</td>
+  </tr>
+  <tr>
+    <td>文本图像矫正模块</td>
+  </tr>
+  <tr>
+    <td rowspan = 5>印章文本识别</td>
+    <td>印章文本检测</td>
+    <td rowspan = 5><a href="https://aistudio.baidu.com/community/app/387977/webUI?source=appCenter">在线体验</a></td>
+    <td rowspan = 5>印章文本识别是一种自动从文档或图像中提取和识别印章内容的技术,印章文本的识别是文档处理的一部分,在很多场景都有用途,例如合同比对,出入库审核以及发票报销审核等场景。</td>
+    <td rowspan = 5>
+    <ul>
+        <li>合同和协议验证</li>
+        <li>支票处理</li>
+        <li>贷款审批</li>
+        <li>法律文书管理</li>
+      </ul>
+      </td>
+  </tr>
+  <tr>
+    <td>文本识别</td>
+  </tr>
+  <tr>
+    <td>版面区域检测</td>
+  </tr>
+  <tr>
+    <td>文档图像方向分类</td>
+  </tr>
+  <tr>
+    <td>文本图像矫正</td>
+  </tr>
+  <tr>
+  <tr>
+    <td rowspan = 2>通用图像识别</td>
+    <td>主体检测</td>
+    <td rowspan = 2>暂无</td>
+    <td rowspan = 2>通用图像识别产线旨在解决开放域目标定位及识别问题,通用图像识别产线能够在不同的环境和条件下有效识别和区分各种目标物体,从而广泛应用于自动驾驶、智能安防、医疗影像分析以及工业自动化等多个领域。</td>
+    <td rowspan = 2>
+    <ul>
+        <li>自动化身份核验</li>
+        <li>无人零售</li>
+        <li>自动驾驶</li>
+      </ul>
+      </td>
+  </tr>
+  <tr>
+    <td>图像特征</td>
+  </tr>
+  <tr>
+    <td rowspan = 2>行人属性识别</td>
+    <td>行人检测</td>
+    <td rowspan = 2><a href="https://aistudio.baidu.com/community/app/387978/webUI?source=appCenter">在线体验</a></td>
+    <td rowspan = 2>行人属性识别是计算机视觉系统中的关键功能,用于在图像或视频中定位并标记行人的特定特征,如性别、年龄、衣物颜色和款式等。</td>
+    <td rowspan = 2>
+    <ul>
+        <li>智慧城市</li>
+        <li>安防监控</li>
+      </ul>
+      </td>
+  </tr>
+  <tr>
+    <td>行人属性识别</td>
+  </tr>
+  <tr>
+    <td rowspan = 2>车辆属性识别</td>
+    <td>车辆检测</td>
+    <td rowspan = 2><a href="https://aistudio.baidu.com/community/app/387979/webUI?source=appCenter">在线体验</a></td>
+    <td rowspan = 2>车辆属性识别是计算机视觉系统中的重要组成部分,其主要任务是在图像或视频中定位并标记出车辆的特定属性,如车辆类型、颜色、车牌号等。该任务不仅要求准确检测出车辆,还需识别每辆车的详细属性信息。</td>
+    <td rowspan = 2>
+    <ul>
+        <li>智能停车</li>
+        <li>交通管理</li>
+        <li>自动驾驶</li>
+      </ul>
+      </td>
+  </tr>
+  <tr>
+    <td>车辆属性识别</td>
+  </tr>
+<tr>
+    <td rowspan="2">文档图像预处理</td>
+    <td>文档图像方向分类</td>
+    <td rowspan="2">暂无</td>
+    <td rowspan="2">文档图像预处理是文档分析与识别中的关键步骤,旨在通过一系列技术手段对文档图像进行优化,以提高后续处理的准确性和效率。文档图像预处理包括方向分类、文本矫正、去噪、二值化等操作,能够有效改善图像质量,纠正文档方向,并去除干扰因素。该技术广泛应用于文档扫描、OCR文字识别、电子文档生成等领域。</td>
+    <td rowspan="2">
+    <ul>
+        <li>文档扫描仪中的自动方向校正</li>
+        <li>OCR系统中的文本图像优化</li>
+        <li>历史文档数字化中的图像修复</li>
+    </ul>
+    </td>
+</tr>
 </table>
 
+
 ## 2、特色产线
+
 暂不支持,敬请期待!