|
@@ -68,7 +68,7 @@ comments: true
|
|
|
</td>
|
|
</td>
|
|
|
</tr>
|
|
</tr>
|
|
|
<tr>
|
|
<tr>
|
|
|
- <td rowspan = 7>Document Scene Information Extraction v3</td>
|
|
|
|
|
|
|
+ <td rowspan = 7>PP-ChatOCRv3</td>
|
|
|
<td>Table Structure Recognition</td>
|
|
<td>Table Structure Recognition</td>
|
|
|
<td rowspan = 7><a href="https://aistudio.baidu.com/community/app/182491/webUI?source=appCenter">Online Experience</a></td>
|
|
<td rowspan = 7><a href="https://aistudio.baidu.com/community/app/182491/webUI?source=appCenter">Online Experience</a></td>
|
|
|
<td rowspan = 7>Document Image Scene Information Extraction v3 (PP-ChatOCRv3-doc) is a PaddlePaddle-specific intelligent document and image analysis solution that integrates LLM and OCR technologies to solve common complex document information extraction challenges such as layout analysis, rare characters, multi-page PDFs, tables, and seal recognition. By integrating the Wenxin large model, it combines vast data and knowledge, providing high accuracy and wide applicability. The open-source version supports local experience and deployment, and fine-tuning training for each module.</td>
|
|
<td rowspan = 7>Document Image Scene Information Extraction v3 (PP-ChatOCRv3-doc) is a PaddlePaddle-specific intelligent document and image analysis solution that integrates LLM and OCR technologies to solve common complex document information extraction challenges such as layout analysis, rare characters, multi-page PDFs, tables, and seal recognition. By integrating the Wenxin large model, it combines vast data and knowledge, providing high accuracy and wide applicability. The open-source version supports local experience and deployment, and fine-tuning training for each module.</td>
|
|
@@ -93,12 +93,46 @@ comments: true
|
|
|
<td>Seal Text Detection</td>
|
|
<td>Seal Text Detection</td>
|
|
|
</tr>
|
|
</tr>
|
|
|
<tr>
|
|
<tr>
|
|
|
- <td>Text Image Un ra p ping</td>
|
|
|
|
|
|
|
+ <td>Text Image Unrapping</td>
|
|
|
</tr>
|
|
</tr>
|
|
|
<tr>
|
|
<tr>
|
|
|
<td>Document Image Orientation Classification</td>
|
|
<td>Document Image Orientation Classification</td>
|
|
|
</tr>
|
|
</tr>
|
|
|
<tr>
|
|
<tr>
|
|
|
|
|
+ <td rowspan="8">PP-ChatOCRv4</td>
|
|
|
|
|
+ <td>Table Structure Recognition</td>
|
|
|
|
|
+ <td rowspan="8">Coming Soon</td>
|
|
|
|
|
+ <td rowspan="8">Document Scene Information Extraction v4 (PP-ChatOCRv4) is a PaddlePaddle-featured intelligent analysis solution for documents and images, combining LLM, MLLM, and OCR technologies. Based on PP-ChatOCRv3, it optimizes common complex document information extraction challenges such as layout analysis, rare characters, multi-page PDFs, tables, and seal recognition. It integrates massive data and knowledge with the Ernie model, achieving high accuracy and wide applicability. This pipeline also provides flexible service deployment methods, supporting deployment on various hardware. Furthermore, it offers secondary development capabilities, allowing you to train and optimize on your own dataset, and the trained model can be seamlessly integrated.</td>
|
|
|
|
|
+ <td rowspan="8">
|
|
|
|
|
+ <ul>
|
|
|
|
|
+ <li>Knowledge Graph Construction</li>
|
|
|
|
|
+ <li>Detection of Information Related to Specific Events in Online News and Social Media</li>
|
|
|
|
|
+ <li>Extraction and Analysis of Key Information in Academic Literature (especially scenarios requiring recognition of seals, distorted images, and more complex tables)</li>
|
|
|
|
|
+ </ul>
|
|
|
|
|
+ </td>
|
|
|
|
|
+ </tr>
|
|
|
|
|
+ <tr>
|
|
|
|
|
+ <td>Layout Detection</td>
|
|
|
|
|
+ </tr>
|
|
|
|
|
+ <tr>
|
|
|
|
|
+ <td>Text Detection</td>
|
|
|
|
|
+ </tr>
|
|
|
|
|
+ <tr>
|
|
|
|
|
+ <td>Text Recognition</td>
|
|
|
|
|
+ </tr>
|
|
|
|
|
+ <tr>
|
|
|
|
|
+ <td>Seal Text Detection</td>
|
|
|
|
|
+ </tr>
|
|
|
|
|
+ <tr>
|
|
|
|
|
+ <td>Text Image Unrapping</td>
|
|
|
|
|
+ </tr>
|
|
|
|
|
+ <tr>
|
|
|
|
|
+ <td>Document Image Orientation Classification</td>
|
|
|
|
|
+ </tr>
|
|
|
|
|
+ <tr>
|
|
|
|
|
+ <td>Document-based Vision-Language Model</td>
|
|
|
|
|
+ </tr>
|
|
|
|
|
+ <tr>
|
|
|
<td rowspan="5">General OCR</td>
|
|
<td rowspan="5">General OCR</td>
|
|
|
<td>Text Detection</td>
|
|
<td>Text Detection</td>
|
|
|
<td rowspan="5"><a href="https://aistudio.baidu.com/community/app/91660/webUI?source=appMineRecent">Online Experience</a></td>
|
|
<td rowspan="5"><a href="https://aistudio.baidu.com/community/app/91660/webUI?source=appMineRecent">Online Experience</a></td>
|
|
@@ -292,6 +326,59 @@ comments: true
|
|
|
<td>Seal Text Detection</td>
|
|
<td>Seal Text Detection</td>
|
|
|
</tr>
|
|
</tr>
|
|
|
<tr>
|
|
<tr>
|
|
|
|
|
+ <td rowspan="13">General Layout Parsing v3</td>
|
|
|
|
|
+ <td>Layout Detection Module</td>
|
|
|
|
|
+ <td rowspan="13">Coming Soon</td>
|
|
|
|
|
+ <td rowspan="13">Based on the General Layout Parsing v1 pipeline, the General Layout Parsing v3 pipeline enhances the capabilities of layout detection, table recognition, and formula recognition. It adds the ability to restore multi-column reading order and convert results into Markdown files. It performs exceptionally well in various document data and can handle more complex document data. This pipeline also provides flexible service deployment methods, supporting multiple programming languages on various hardware. Furthermore, it offers secondary development capabilities, allowing you to train and optimize on your own dataset, and the trained model can be seamlessly integrated.</td>
|
|
|
|
|
+ <td rowspan="13">
|
|
|
|
|
+ <ul>
|
|
|
|
|
+ <li>Intelligent Document Analysis</li>
|
|
|
|
|
+ <li>Document Digitization</li>
|
|
|
|
|
+ <li>Page Structure Parsing</li>
|
|
|
|
|
+ <li>Complex Table Recognition</li>
|
|
|
|
|
+ <li>Large Model Data Construction</li>
|
|
|
|
|
+ <li>RAG</li>
|
|
|
|
|
+ </ul>
|
|
|
|
|
+ </td>
|
|
|
|
|
+</tr>
|
|
|
|
|
+<tr>
|
|
|
|
|
+ <td>Text Detection Module</td>
|
|
|
|
|
+</tr>
|
|
|
|
|
+<tr>
|
|
|
|
|
+ <td>Text Recognition Module</td>
|
|
|
|
|
+</tr>
|
|
|
|
|
+<tr>
|
|
|
|
|
+ <td>Doc Img Orientation Classification</td>
|
|
|
|
|
+</tr>
|
|
|
|
|
+<tr>
|
|
|
|
|
+ <td>Text Image Unrapping Module</td>
|
|
|
|
|
+</tr>
|
|
|
|
|
+<tr>
|
|
|
|
|
+ <td>Wired Table Structure Recognition Module</td>
|
|
|
|
|
+</tr>
|
|
|
|
|
+<tr>
|
|
|
|
|
+ <td>Wireless Table Structure Recognition Module</td>
|
|
|
|
|
+</tr>
|
|
|
|
|
+<tr>
|
|
|
|
|
+ <td>Table Classification Module</td>
|
|
|
|
|
+</tr>
|
|
|
|
|
+<tr>
|
|
|
|
|
+ <td>Wired Table Cell Detection Module</td>
|
|
|
|
|
+</tr>
|
|
|
|
|
+<tr>
|
|
|
|
|
+ <td>Wireless Table Cell Detection Module</td>
|
|
|
|
|
+</tr>
|
|
|
|
|
+<tr>
|
|
|
|
|
+ <td>Text Line Orientation Classification Module</td>
|
|
|
|
|
+</tr>
|
|
|
|
|
+<tr>
|
|
|
|
|
+ <td>Formula Recognition Module</td>
|
|
|
|
|
+</tr>
|
|
|
|
|
+<tr>
|
|
|
|
|
+ <td>Seal Text Detection Module</td>
|
|
|
|
|
+</tr>
|
|
|
|
|
+
|
|
|
|
|
+<tr>
|
|
|
<td rowspan="4">Formula Recognition</td>
|
|
<td rowspan="4">Formula Recognition</td>
|
|
|
<td>Formula Recognition</td>
|
|
<td>Formula Recognition</td>
|
|
|
<td rowspan="4"><a href="https://aistudio.baidu.com/community/app/387976/webUI?source=appCenter">Online Experience</a></td>
|
|
<td rowspan="4"><a href="https://aistudio.baidu.com/community/app/387976/webUI?source=appCenter">Online Experience</a></td>
|