--- comments: true --- # Seal Text Recognition Pipeline Tutorial ## 1. Introduction to Seal Text Recognition Pipeline Seal text recognition is a technology that automatically extracts and recognizes the content of seals from documents or images. The recognition of seal text is part of document processing and has many applications in various scenarios, such as contract comparison, warehouse entry and exit review, and invoice reimbursement review. The seal text recognition pipeline is used to recognize the text content of seals, extracting the text information from seal images and outputting it in text form. This pipeline integrates the industry-renowned end-to-end OCR system PP-OCRv4, supporting the detection and recognition of curved seal text. Additionally, this pipeline integrates an optional layout region localization module, which can accurately locate the layout position of the seal within the entire document. It also includes optional document image orientation correction and distortion correction functions. Based on this pipeline, millisecond-level accurate text content prediction can be achieved on a CPU. This pipeline also provides flexible service deployment methods, supporting the use of multiple programming languages on various hardware. Moreover, it offers secondary development capabilities, allowing you to train and fine-tune on your own dataset based on this pipeline, and the trained model can be seamlessly integrated. The seal text recognition pipeline includes a seal text detection module and a text recognition module, as well as optional layout detection module, document image orientation classification module, and text image correction module. If you prioritize model accuracy, choose a model with higher accuracy. If you prioritize inference speed, choose a model with faster inference speed. If you prioritize model storage size, choose a model with smaller storage size.

Layout Region Detection Module (Optional):

* Layout detection model, including 23 common categories: document title, paragraph title, text, page number, abstract, table of contents, references, footnotes, header, footer, algorithm, formula, formula number, image, chart title, table, table title, seal, chart title, chart, header image, footer image, sidebar text
ModelModel Download Link mAP(0.5) (%) GPU Inference Time (ms) CPU Inference Time (ms) Model Storage Size (M) Description
PicoDet_layout_1xInference Model/Trained Model 86.8 13.0 91.3 7.4 An efficient layout area localization model trained on the PubLayNet dataset based on PicoDet-1x can locate five types of areas, including text, titles, tables, images, and lists.
PicoDet_layout_1x_tableInference Model/Trained Model 95.7 12.623 90.8934 7.4 M An efficient layout area localization model trained on the PubLayNet dataset based on PicoDet-1x can locate one type of tables.
PicoDet-S_layout_3clsInference Model/Trained Model 87.1 13.5 45.8 4.8 An high-efficient layout area localization model trained on a self-constructed dataset based on PicoDet-S for scenarios such as Chinese and English papers, magazines, and research reports includes three categories: tables, images, and seals.
PicoDet-S_layout_17clsInference Model/Trained Model 70.3 13.6 46.2 4.8 A high-efficient layout area localization model trained on a self-constructed dataset based on PicoDet-S_layout_17cls for scenarios such as Chinese and English papers, magazines, and research reports includes 17 common layout categories, namely: paragraph titles, images, text, numbers, abstracts, content, chart titles, formulas, tables, table titles, references, document titles, footnotes, headers, algorithms, footers, and seals.
PicoDet-L_layout_3clsInference Model/Trained Model 89.3 15.7 159.8 22.6 An efficient layout area localization model trained on a self-constructed dataset based on PicoDet-L for scenarios such as Chinese and English papers, magazines, and research reports includes three categories: tables, images, and seals.
PicoDet-L_layout_17clsInference Model/Trained Model 79.9 17.2 160.2 22.6 A efficient layout area localization model trained on a self-constructed dataset based on PicoDet-L_layout_17cls for scenarios such as Chinese and English papers, magazines, and research reports includes 17 common layout categories, namely: paragraph titles, images, text, numbers, abstracts, content, chart titles, formulas, tables, table titles, references, document titles, footnotes, headers, algorithms, footers, and seals.
RT-DETR-H_layout_3clsInference Model/Trained Model 95.9 114.6 3832.6 470.1 A high-precision layout area localization model trained on a self-constructed dataset based on RT-DETR-H for scenarios such as Chinese and English papers, magazines, and research reports includes three categories: tables, images, and seals.
RT-DETR-H_layout_17clsInference Model/Trained Model 92.6 115.1 3827.2 470.2 A high-precision layout area localization model trained on a self-constructed dataset based on RT-DETR-H for scenarios such as Chinese and English papers, magazines, and research reports includes 17 common layout categories, namely: paragraph titles, images, text, numbers, abstracts, content, chart titles, formulas, tables, table titles, references, document titles, footnotes, headers, algorithms, footers, and seals.
Note: The evaluation dataset for the above accuracy metrics is the self-built layout region detection dataset of PaddleOCR, which includes 500 common document images of Chinese and English papers, magazines, contracts, books, test papers, and research reports. GPU inference time is based on NVIDIA Tesla T4 machine, precision type is FP32, and CPU inference speed is based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz, with 8 threads, precision type is FP32. > ❗ The above listed are the 3 core models that the layout detection module mainly supports. This module supports a total of 11 full models, including multiple models predefined with different categories. Among them, there are 9 models that include the seal category. In addition to the above 3 core models, the remaining model list is as follows:
👉Model List Details * 3-category Layout Detection Models, including table, image, and seal
ModelModel Download Link mAP(0.5) (%) GPU Inference Time (ms) CPU Inference Time (ms) Model Storage Size (M) Introduction
PicoDet-S_layout_3clsInference Model/Training Model 88.2 13.5 45.8 4.8 A high-efficiency layout area localization model trained on a self-built dataset for Chinese and English papers, magazines, and research reports based on the lightweight PicoDet-S model
PicoDet-L_layout_3clsInference Model/Training Model 89.0 15.7 159.8 22.6 A layout area localization model with balanced efficiency and accuracy, trained on a self-built dataset for Chinese and English papers, magazines, and research reports based on PicoDet-L
RT-DETR-H_layout_3clsInference Model/Training Model 95.8 114.6 3832.6 470.1 A high-precision layout area localization model trained on a self-built dataset for Chinese and English papers, magazines, and research reports based on RT-DETR-H
Note: The evaluation set for the above accuracy metrics is the layout area detection dataset self-built by PaddleOCR, which includes 1,154 images of common document types such as Chinese and English papers, magazines, and research reports. The GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision. The CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision. * 17-category Layout Detection Models, including 17 common layout categories: paragraph title, image, text, number, abstract, content, figure title, formula, table, table title, reference, document title, footnote, header, algorithm, footer, seal
ModelModel Download Link mAP(0.5) (%) GPU Inference Time (ms) CPU Inference Time (ms) Model Storage Size (M) Introduction
PicoDet-S_layout_17clsInference Model/Training Model 87.4 13.6 46.2 4.8 A high-efficiency layout area localization model trained on a self-built dataset for Chinese and English papers, magazines, and research reports based on the lightweight PicoDet-S model
PicoDet-L_layout_17clsInference Model/Training Model 89.0 17.2 160.2 22.6 A layout area localization model with balanced efficiency and accuracy, trained on a self-built dataset for Chinese and English papers, magazines, and research reports based on PicoDet-L
RT-DETR-H_layout_17clsInference Model/Training Model 98.3 115.1 3827.2 470.2 A high-precision layout area localization model trained on a self-built dataset for Chinese and English papers, magazines, and research reports based on RT-DETR-H
Note: The evaluation set for the above accuracy metrics is the layout area detection dataset self-built by PaddleOCR, which includes 892 images of common document types such as Chinese and English papers, magazines, and research reports. The GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision. The CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.

Document Image Orientation Classification Module (Optional):

ModelModel Download Link Top-1 Acc (%) GPU Inference Time (ms) CPU Inference Time (ms) Model Storage Size (M) Description
PP-LCNet_x1_0_doc_oriInference Model/Training Model 99.06 3.84845 9.23735 7 A document image classification model based on PP-LCNet_x1_0, containing four categories: 0 degrees, 90 degrees, 180 degrees, and 270 degrees

Note: The above accuracy metrics are evaluated on a self-built dataset covering multiple scenarios such as certificates and documents, containing 1000 images. GPU inference time is based on NVIDIA Tesla T4 machine, precision type is FP32, CPU inference speed is based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz, with 8 threads, precision type is FP32.

Text Image Correction Module (Optional):

ModelModel Download Link CER Model Storage Size (M) Description
UVDocInference Model/Training Model 0.179 30.3 M High-precision text image correction model
Note: The accuracy metrics of the model are measured from the DocUNet benchmark.

Text Detection Module:

ModelModel Download Link Detection Hmean (%) GPU Inference Time (ms) CPU Inference Time (ms) Model Storage Size (M) Description
PP-OCRv4_server_seal_detInference Model/Trained Model 98.21 84.341 2425.06 109 PP-OCRv4 server-side seal text detection model, with higher accuracy, suitable for deployment on better servers
PP-OCRv4_mobile_seal_detInference Model/Trained Model 96.47 10.5878 131.813 4.6 PP-OCRv4 mobile-side seal text detection model, with higher efficiency, suitable for deployment on the edge
Note: The above accuracy metrics are evaluated on a self-built dataset containing 500 circular seal images. GPU inference time is based on NVIDIA Tesla T4 machine, precision type is FP32, CPU inference speed is based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz, with 8 threads, precision type is FP32.

Text Recognition Module:

ModelModel Download Link Recognition Avg Accuracy(%) GPU Inference Time (ms) CPU Inference Time (ms) Model Storage Size (M) Description
PP-OCRv4_mobile_recInference Model/Trained Model 78.20 7.95018 46.7868 10.6 M The PP-OCRv4 recognition model is an upgrade from PP-OCRv3. Under comparable speed conditions, the effect in Chinese and English scenarios is further improved. The average recognition accuracy of the 80 multilingual models is increased by more than 8%.
PP-OCRv4_server_recInference Model/Trained Model 79.20 7.19439 140.179 71.2 M A high-precision server text recognition model, featuring high accuracy, fast speed, and multilingual support. It is suitable for text recognition tasks in various scenarios.
PP-OCRv3_mobile_recInference Model/Training Model An ultra-lightweight OCR model suitable for mobile applications. It adopts an encoder-decoder structure based on Transformer and enhances recognition accuracy and efficiency through techniques such as data augmentation and mixed precision training. The model size is 10.6M, making it suitable for deployment on resource-constrained devices. It can be used in scenarios such as mobile photo translation and business card recognition.

Note: The evaluation set for the above accuracy indicators is the Chinese dataset built by PaddleOCR, covering multiple scenarios such as street view, web images, documents, and handwriting. The text recognition includes 11,000 images. The GPU inference time for all models is based on NVIDIA Tesla T4 machines with FP32 precision type. The CPU inference speed is based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision type.

ModelModel Download Link Recognition Avg Accuracy(%) GPU Inference Time (ms) CPU Inference Time Model Storage Size (M) Introduction
ch_SVTRv2_recInference Model/Training Model 68.81 8.36801 165.706 73.9 M SVTRv2 is a server text recognition model developed by the OpenOCR team of Fudan University's Visual and Learning Laboratory (FVL). It won the first prize in the PaddleOCR Algorithm Model Challenge - Task One: OCR End-to-End Recognition Task. The end-to-end recognition accuracy on the A list is 6% higher than that of PP-OCRv4.

Note: The evaluation set for the above accuracy indicators is the PaddleOCR Algorithm Model Challenge - Task One: OCR End-to-End Recognition Task A list. The GPU inference time for all models is based on NVIDIA Tesla T4 machines with FP32 precision type. The CPU inference speed is based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision type.

ModelModel Download Link Recognition Avg Accuracy(%) GPU Inference Time (ms) CPU Inference Time Model Storage Size (M) Introduction
ch_RepSVTR_recInference Model/Training Model 65.07 10.5047 51.5647 22.1 M The RepSVTR text recognition model is a mobile text recognition model based on SVTRv2. It won the first prize in the PaddleOCR Algorithm Model Challenge - Task One: OCR End-to-End Recognition Task. The end-to-end recognition accuracy on the B list is 2.5% higher than that of PP-OCRv4, with the same inference speed.

Note: The evaluation set for the above accuracy indicators is the PaddleOCR Algorithm Model Challenge - Task One: OCR End-to-End Recognition Task B list. The GPU inference time for all models is based on NVIDIA Tesla T4 machines with FP32 precision type. The CPU inference speed is based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision type.

* English Recognition Model
ModelModel Download Link Recognition Avg Accuracy(%) GPU Inference Time (ms) CPU Inference Time Model Storage Size (M) Introduction
en_PP-OCRv4_mobile_recInference Model/Training Model [Latest] Further upgraded based on PP-OCRv3, with improved accuracy under comparable speed conditions.
en_PP-OCRv3_mobile_recInference Model/Training Model Ultra-lightweight model, supporting English and numeric recognition.
* Multilingual Recognition Model
ModelModel Download Link Recognition Avg Accuracy(%) GPU Inference Time (ms) CPU Inference Time Model Storage Size (M) Introduction
korean_PP-OCRv3_mobile_recInference Model/Training Model Korean Recognition
japan_PP-OCRv3_mobile_recInference Model/Training Model Japanese Recognition
chinese_cht_PP-OCRv3_mobile_recInference Model/Training Model Traditional Chinese Recognition
te_PP-OCRv3_mobile_recInference Model/Training Model Telugu Recognition
ka_PP-OCRv3_mobile_recInference Model/Training Model Kannada Recognition
ta_PP-OCRv3_mobile_recInference Model/Training Model Tamil Recognition
latin_PP-OCRv3_mobile_recInference Model/Training Model Latin Recognition
arabic_PP-OCRv3_mobile_recInference Model/Training Model Arabic Script Recognition
cyrillic_PP-OCRv3_mobile_recInference Model/Training Model Cyrillic Script Recognition
devanagari_PP-OCRv3_mobile_recInference Model/Training Model Devanagari Script Recognition
## 2. Quick Start The pre-trained model pipelines provided by PaddleX can be quickly experienced. You can experience the seal text recognition pipeline locally using the command line or Python. Before using the seal text recognition pipeline locally, please ensure that you have completed the installation of the PaddleX wheel package according to the [PaddleX Local Installation Tutorial](../../../installation/installation.en.md). ### 2.1 Command Line Experience You can quickly experience the seal text recognition pipeline with a single command. Use the [test file](https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/seal_text_det.png), and replace `--input` with the local path for prediction. ```bash paddlex --pipeline seal_recognition \ --input seal_text_det.png \ --use_doc_orientation_classify False \ --use_doc_unwarping False \ --device gpu:0 \ --save_path ./output ``` The relevant parameter descriptions can be referred to in the parameter explanations of [2.1.2 Integration via Python Script](#212-integration-via-python-script). After running, the results will be printed to the terminal, as follows:
👉Click to Expand ```bash {'res': {'input_path': 'seal_text_det.png', 'model_settings': {'use_doc_preprocessor': False, 'use_layout_detection': True}, 'layout_det_res': {'input_path': None, 'page_index': None, 'boxes': [{'cls_id': 16, 'label': 'seal', 'score': 0.975529670715332, 'coordinate': [6.191284, 0.16680908, 634.39325, 628.85345]}]}, 'seal_res_list': [{'input_path': None, 'page_index': None, 'model_settings': {'use_doc_preprocessor': False, 'use_textline_orientation': False}, 'dt_polys': [array([[320, 38], [479, 92], [483, 94], [486, 97], [579, 226], [582, 230], [582, 235], [584, 383], [584, 388], [582, 392], [578, 396], [573, 398], [566, 398], [502, 380], [497, 377], [494, 374], [491, 369], [491, 366], [488, 259], [424, 172], [318, 136], [251, 154], [200, 174], [137, 260], [133, 366], [132, 370], [130, 375], [126, 378], [123, 380], [ 60, 398], [ 55, 398], [ 49, 397], [ 45, 394], [ 43, 390], [ 41, 383], [ 43, 236], [ 44, 230], [ 45, 227], [141, 96], [144, 93], [148, 90], [311, 38], [315, 38]]), array([[461, 347], [465, 350], [468, 354], [470, 360], [470, 425], [469, 429], [467, 433], [462, 437], [456, 439], [169, 439], [165, 439], [160, 436], [157, 432], [155, 426], [154, 360], [155, 356], [158, 352], [161, 348], [168, 346], [456, 346]]), array([[439, 445], [441, 447], [443, 451], [444, 453], [444, 497], [443, 502], [440, 504], [437, 506], [434, 507], [189, 505], [184, 504], [182, 502], [180, 498], [179, 496], [181, 453], [182, 449], [184, 446], [188, 444], [434, 444]]), array([[158, 468], [199, 502], [242, 522], [299, 534], [339, 532], [373, 526], [417, 508], [459, 475], [462, 474], [467, 474], [472, 476], [502, 507], [503, 510], [504, 515], [503, 518], [501, 521], [452, 559], [450, 560], [391, 584], [390, 584], [372, 590], [370, 590], [305, 596], [302, 596], [224, 581], [221, 580], [164, 553], [162, 551], [114, 509], [112, 507], [111, 503], [112, 498], [114, 496], [146, 468], [149, 466], [154, 466]])], 'text_det_params': {'limit_side_len': 736, 'limit_type': 'min', 'thresh': 0.2, 'box_thresh': 0.6, 'unclip_ratio': 0.5}, 'text_type': 'seal', 'textline_orientation_angles': [-1, -1, -1, -1], 'text_rec_score_thresh': 0, 'rec_texts': ['天津君和缘商贸有限公司', '发票专用章', '吗繁物', '5263647368706'], 'rec_scores': [0.9934046268463135, 0.9999403953552246, 0.998250424861908, 0.9913849234580994], 'rec_polys': [array([[320, 38], [479, 92], [483, 94], [486, 97], [579, 226], [582, 230], [582, 235], [584, 383], [584, 388], [582, 392], [578, 396], [573, 398], [566, 398], [502, 380], [497, 377], [494, 374], [491, 369], [491, 366], [488, 259], [424, 172], [318, 136], [251, 154], [200, 174], [137, 260], [133, 366], [132, 370], [130, 375], [126, 378], [123, 380], [ 60, 398], [ 55, 398], [ 49, 397], [ 45, 394], [ 43, 390], [ 41, 383], [ 43, 236], [ 44, 230], [ 45, 227], [141, 96], [144, 93], [148, 90], [311, 38], [315, 38]]), array([[461, 347], [465, 350], [468, 354], [470, 360], [470, 425], [469, 429], [467, 433], [462, 437], [456, 439], [169, 439], [165, 439], [160, 436], [157, 432], [155, 426], [154, 360], [155, 356], [158, 352], [161, 348], [168, 346], [456, 346]]), array([[439, 445], [441, 447], [443, 451], [444, 453], [444, 497], [443, 502], [440, 504], [437, 506], [434, 507], [189, 505], [184, 504], [182, 502], [180, 498], [179, 496], [181, 453], [182, 449], [184, 446], [188, 444], [434, 444]]), array([[158, 468], [199, 502], [242, 522], [299, 534], [339, 532], [373, 526], [417, 508], [459, 475], [462, 474], [467, 474], [472, 476], [502, 507], [503, 510], [504, 515], [503, 518], [501, 521], [452, 559], [450, 560], [391, 584], [390, 584], [372, 590], [370, 590], [305, 596], [302, 596], [224, 581], [221, 580], [164, 553], [162, 551], [114, 509], [112, 507], [111, 503], [112, 498], [114, 496], [146, 468], [149, 466], [154, 466]])], 'rec_boxes': array([], dtype=float64)}]}} ```
The explanation of the result parameters can be found in [2.1.2 Python Script Integration](#212-python-script-integration). The visualized results are saved under `save_path`, and the visualized result of seal OCR is as follows: ### 2.1.2 Python Script Integration * The above command line is for quickly experiencing and viewing the effect. Generally, in a project, you often need to integrate through code. You can complete the quick inference of the pipeline with just a few lines of code. The inference code is as follows: ```python from paddlex import create_pipeline pipeline = create_pipeline(pipeline="seal_recognition") output = pipeline.predict( "seal_text_det.png", use_doc_orientation_classify=False, use_doc_unwarping=False, ) for res in output: res.print() ## 打印预测的结构化输出 res.save_to_img("./output/") ## 保存可视化结果 res.save_to_json("./output/") ## 保存可视化结果 ``` In the above Python script, the following steps were executed: (1) The OCR production line object was instantiated via `create_pipeline()`, with the specific parameters described as follows:
Parameter Description Type Default Value
pipeline The name of the production line or the path to the production line configuration file. If it is a production line name, it must be supported by PaddleX. str None
device The device used for production line inference. It supports specifying the specific card number of the GPU, such as "gpu:0", other hardware card numbers, such as "npu:0", or CPU, such as "cpu". str gpu:0
use_hpip Whether to enable high-performance inference. This is only available if the production line supports high-performance inference. bool False
(2) Call the `predict()` method of the Seal Text Recognition pipeline object for inference prediction. This method will return a `generator`. Below are the parameters and their descriptions for the `predict()` method:
Parameter Description Type Options Default Value
input Data to be predicted, supports multiple input types (required) Python Var|str|list
  • Python Var: Image data represented by numpy.ndarray
  • str: Local path of an image or PDF file, e.g., /root/data/img.jpg; URL link, e.g., the network URL of an image or PDF file: Example; Local directory, containing images to be predicted, e.g., /root/data/ (currently does not support prediction of PDF files in directories; PDF files must be specified with an exact file path)
  • List: Elements of the list must be of the above types, e.g., [numpy.ndarray, numpy.ndarray], [\"/root/data/img1.jpg\", \"/root/data/img2.jpg\"], [\"/root/data1\", \"/root/data2\"]
None
device Inference device for the pipeline str|None
  • CPU: e.g., cpu for CPU inference;
  • GPU: e.g., gpu:0 for inference using the first GPU;
  • NPU: e.g., npu:0 for inference using the first NPU;
  • XPU: e.g., xpu:0 for inference using the first XPU;
  • MLU: e.g., mlu:0 for inference using the first MLU;
  • DCU: e.g., dcu:0 for inference using the first DCU;
  • None: If set to None, the default value from the pipeline initialization will be used. During initialization, the local GPU device 0 will be prioritized; if unavailable, the CPU device will be used.
None
use_doc_orientation_classify Whether to use the document orientation classification module bool|None
  • bool: True or False;
  • None: If set to None, the default value from the pipeline initialization will be used, initialized as True.
None
use_doc_unwarping Whether to use the document unwarping module bool|None
  • bool: True or False;
  • None: If set to None, the default value from the pipeline initialization will be used, initialized as True.
None
use_layout_detection Whether to use the layout detection module bool|None
  • bool: True or False;
  • None: If set to None, the default value from the pipeline initialization will be used, initialized as True.
None
layout_threshold Confidence threshold for layout detection; only scores above this threshold will be output float|dict|None
  • float: Any float greater than 0
  • dict: Key is the int category ID, value is any float greater than 0
  • None: If set to None, the default value from the pipeline initialization will be used, initialized as 0.5
None
layout_nms Whether to use Non-Maximum Suppression (NMS) for layout detection post-processing bool|None
  • bool: True or False;
  • None: If set to None, the default value from the pipeline initialization will be used, initialized as True.
None
layout_unclip_ratio Expansion ratio of detection box edges; if not specified, the default value from the PaddleX official model configuration will be used float|list|None
  • float: Any float greater than 0, e.g., 1.1, which means expanding the width and height of the detection box by 1.1 times while keeping the center unchanged
  • list: e.g., [1.2, 1.5], which means expanding the width of the detection box by 1.2 times and the height by 1.5 times while keeping the center unchanged
  • None: If set to None, the default value from the pipeline initialization will be used, initialized as 1.0
layout_merge_bboxes_mode Merging mode for detection boxes in layout detection output; if not specified, the default value from the PaddleX official model configuration will be used string|None
  • large: When set to large, only the largest external box will be retained for overlapping detection boxes, and the internal overlapping boxes will be removed.
  • small: When set to small, only the smallest internal box will be retained for overlapping detection boxes, and the external overlapping boxes will be removed.
  • union: No filtering of boxes will be performed; both internal and external boxes will be retained.
  • None: If set to None, the default value from the pipeline initialization will be used, initialized as large.
None
seal_det_limit_side_len Side length limit for seal text detection int|None
  • int: Any integer greater than 0
  • None: If set to None, the default value from the pipeline initialization will be used, initialized as 736
None
seal_rec_score_thresh Text recognition threshold; text results with scores above this threshold will be retained float|None
  • float: Any float greater than 0
  • None: If set to None, the default value from the pipeline initialization will be used, initialized as 0.0. This means no threshold is applied.
None
(3) Process the prediction results. The prediction result for each sample is of `dict` type and supports operations such as printing, saving as an image, and saving as a `json` file:
Method Description Parameter Parameter Type Parameter Description Default Value
print() Print results to the terminal format_json bool Whether to format the output content using JSON indentation True
indent int Specify the indentation level to beautify the output JSON data for better readability, effective only when format_json is True 4
ensure_ascii bool Control whether to escape non-ASCII characters to Unicode. When set to True, all non-ASCII characters will be escaped; False will retain the original characters, effective only when format_json is True False
save_to_json() Save results as a json file save_path str The file path to save the results. When it is a directory, the saved file name will be consistent with the input file type None
indent int Specify the indentation level to beautify the output JSON data for better readability, effective only when format_json is True 4
ensure_ascii bool Control whether to escape non-ASCII characters to Unicode. When set to True, all non-ASCII characters will be escaped; False will retain the original characters, effective only when format_json is True False
save_to_img() Save results as an image file save_path str The file path to save the results, supports directory or file path None
- Calling the `print()` method will print the results to the terminal, and the explanations of the printed content are as follows: - `input_path`: `(str)` The input path of the image to be predicted. - `model_settings`: `(Dict[str, bool])` The model parameters required for pipeline configuration. - `use_doc_preprocessor`: `(bool)` Controls whether to enable the document preprocessing sub-pipeline. - `use_layout_detection`: `(bool)` Controls whether to enable the layout detection sub-module. - `layout_det_res`: `(Dict[str, Union[List[numpy.ndarray], List[float]]])` The output result of the layout detection sub-module. Only exists when `use_layout_detection=True`. - `input_path`: `(Union[str, None])` The image path accepted by the layout detection module. Saved as `None` when the input is a `numpy.ndarray`. - `page_index`: `(Union[int, None])` Indicates the current page number of the PDF if the input is a PDF file; otherwise, it is `None`. - `boxes`: `(List[Dict])` A list of detected layout seal regions, with each element containing the following fields: - `cls_id`: `(int)` The class ID of the detected seal region. - `score`: `(float)` The confidence score of the detected region. - `coordinate`: `(List[float])` The coordinates of the four corners of the detection box, in the order of x1, y1, x2, y2, representing the x-coordinate of the top-left corner, the y-coordinate of the top-left corner, the x-coordinate of the bottom-right corner, and the y-coordinate of the bottom-right corner. - `seal_res_list`: `List[Dict]` A list of seal text recognition results, with each element containing the following fields: - `input_path`: `(Union[str, None])` The image path accepted by the seal text recognition pipeline. Saved as `None` when the input is a `numpy.ndarray`. - `page_index`: `(Union[int, None])` Indicates the current page number of the PDF if the input is a PDF file; otherwise, it is `None`. - `model_settings`: `(Dict[str, bool])` The model configuration parameters for the seal text recognition pipeline. - `use_doc_preprocessor`: `(bool)` Controls whether to enable the document preprocessing sub-pipeline. - `use_textline_orientation`: `(bool)` Controls whether to enable the text line orientation classification sub-module. - `doc_preprocessor_res`: `(Dict[str, Union[str, Dict[str, bool], int]])` The output result of the document preprocessing sub-pipeline. Only exists when `use_doc_preprocessor=True`. - `input_path`: `(Union[str, None])` The image path accepted by the document preprocessing sub-pipeline. Saved as `None` when the input is a `numpy.ndarray`. - `model_settings`: `(Dict)` The model configuration parameters for the preprocessing sub-pipeline. - `use_doc_orientation_classify`: `(bool)` Controls whether to enable document orientation classification. - `use_doc_unwarping`: `(bool)` Controls whether to enable document unwarping. - `angle`: `(int)` The predicted result of document orientation classification. When enabled, it takes values [0, 1, 2, 3], corresponding to [0°, 90°, 180°, 270°]; when disabled, it is -1. - `dt_polys`: `(List[numpy.ndarray])` A list of polygon boxes for seal text detection. Each detection box is represented by a numpy array of multiple vertex coordinates, with the array shape being (n, 2). - `dt_scores`: `(List[float])` A list of confidence scores for text detection boxes. - `text_det_params`: `(Dict[str, Dict[str, int, float]])` Configuration parameters for the text detection module. - `limit_side_len`: `(int)` The side length limit value during image preprocessing. - `limit_type`: `(str)` The handling method for side length limits. - `thresh`: `(float)` The confidence threshold for text pixel classification. - `box_thresh`: `(float)` The confidence threshold for text detection boxes. - `unclip_ratio`: `(float)` The expansion ratio for text detection boxes. - `text_type`: `(str)` The type of seal text detection, currently fixed as "seal". - `text_rec_score_thresh`: `(float)` The filtering threshold for text recognition results. - `rec_texts`: `(List[str])` A list of text recognition results, containing only texts with confidence scores above `text_rec_score_thresh`. - `rec_scores`: `(List[float])` A list of confidence scores for text recognition, filtered by `text_rec_score_thresh`. - `rec_polys`: `(List[numpy.ndarray])` A list of text detection boxes filtered by confidence score, in the same format as `dt_polys`. - `rec_boxes`: `(numpy.ndarray)` An array of rectangular bounding boxes for detection boxes; the seal recognition pipeline returns an empty array. - Calling the `save_to_json()` method will save the above content to the specified `save_path`. If a directory is specified, the saved path will be `save_path/{your_img_basename}_res.json`. If a file is specified, it will be saved directly to that file. Since JSON files do not support saving numpy arrays, `numpy.array` types will be converted to list format. - Calling the `save_to_img()` method will save the visualization results to the specified `save_path`. If a directory is specified, the saved path will be `save_path/{your_img_basename}_seal_res_region1.{your_img_extension}`. If a file is specified, it will be saved directly to that file. (The pipeline usually contains multiple result images, so it is not recommended to specify a specific file path directly, as multiple images will be overwritten, and only the last image will be retained.) * Additionally, you can obtain visualized images with results and prediction results through attributes, as follows:
Attribute Description
json Get the prediction results in json format.
img Get the visualization results in dict format.
- The prediction results obtained through the `json` attribute are of dict type, with content consistent with what is saved by calling the `save_to_json()` method. - The prediction results returned by the `img` attribute are of dict type. The keys are `layout_det_res`, `seal_res_region1`, and `preprocessed_img`, corresponding to three `Image.Image` objects: one for visualizing layout detection, one for visualizing seal text recognition results, and one for visualizing image preprocessing. If the image preprocessing sub-module is not used, `preprocessed_img` will not be included in the dictionary. If the layout region detection module is not used, `layout_det_res` will not be included. Additionally, you can obtain the configuration file for the seal text recognition pipeline and load the configuration file for prediction. You can execute the following command to save the results in `my_path`: ``` paddlex --get_pipeline_config seal_recognition --save_path ./my_path ``` If you have obtained the configuration file, you can customize the settings for the seal text recognition production line by simply modifying the `pipeline` parameter value in the `create_pipeline` method to the path of the production line configuration file. The example is as follows: ```python from paddlex import create_pipeline pipeline = create_pipeline(pipeline="./my_path/seal_recognition.yaml") output = pipeline.predict("seal_text_det.png") for res in output: res.print() ## 打印预测的结构化输出 res.save_to_img("./output/") ## 保存可视化结果 res.save_to_json("./output/") ## 保存预测结果的json文件 ``` Note: The parameters in the configuration file are the pipeline initialization parameters. If you wish to change the initialization parameters of the seal text recognition pipeline, you can directly modify the parameters in the configuration file and load the configuration file for prediction. Additionally, CLI prediction also supports passing in a configuration file. Simply specify the path of the configuration file with `--pipeline`. ## 3. Development Integration/Deployment If the pipeline meets your requirements for inference speed and accuracy, you can proceed directly with development integration/deployment. If you need to integrate the pipeline into your Python project, you can refer to the example code in [2.2.2 Python Script Method](#222-python脚本方式集成). In addition, PaddleX also provides three other deployment methods, which are detailed as follows: 🚀 High-Performance Deployment: In practical production environments, many applications have strict performance requirements (especially response speed) for deployment strategies to ensure efficient system operation and smooth user experience. To this end, PaddleX provides a high-performance inference plugin that aims to deeply optimize the performance of model inference and pre/post-processing, significantly speeding up the end-to-end process. For detailed high-performance deployment procedures, please refer to the [PaddleX High-Performance Deployment Guide](../../../pipeline_deploy/high_performance_inference.en.md). ☁️ Service-Oriented Deployment: Service-oriented deployment is a common form of deployment in practical production environments. By encapsulating inference capabilities as services, clients can access these services via network requests to obtain inference results. PaddleX supports various pipeline service-oriented deployment solutions. For detailed pipeline service-oriented deployment procedures, please refer to the [PaddleX Service-Oriented Deployment Guide](../../../pipeline_deploy/serving.en.md). Below are the API references for basic service-oriented deployment and multi-language service invocation examples:
API Reference

For the main operations provided by the service:

Name Type Description
logId string The UUID of the request.
errorCode integer Error code. Fixed to 0.
errorMsg string Error message. Fixed to "Success".
result object Operation result.
Name Type Description
logId string The UUID of the request.
errorCode integer Error code. Same as the response status code.
errorMsg string Error message.

The main operations provided by the service are as follows:

Get seal text recognition results.

POST /seal-recognition

Name Type Description Required
file string The URL of an image or PDF file accessible to the server, or the Base64 encoded result of the content of the above file types. For PDF files exceeding 10 pages, only the content of the first 10 pages will be used. Yes
fileType integer File type. 0 indicates a PDF file, 1 indicates an image file. If this property is not present in the request body, the file type will be inferred from the URL. No
Name Type Description
sealRecResults object Seal text recognition results. The array length is 1 (for image input) or the smaller of the document page count and 10 (for PDF input). For PDF input, each element in the array represents the processing result of each page in the PDF file in order.
dataInfo object Input data information.

Each element in sealRecResults is an object with the following properties:

Name Type Description
texts array Text position, content, and score.
inputImage string Input image. The image is in JPEG format and encoded using Base64.
layoutImage string Layout area detection result image. The image is in JPEG format and encoded using Base64.
ocrImage string OCR result image. The image is in JPEG format and encoded using Base64.

Each element in texts is an object with the following properties:

Name Type Description
poly array Text position. The elements in the array are the vertex coordinates of the polygon surrounding the text.
text string Text content.
score number Text recognition score.
Multi-language Service Call Examples
Python
import base64
import requests

API_URL = "http://localhost:8080/seal-recognition"
file_path = "./demo.jpg"

with open(file_path, "rb") as file:
    file_bytes = file.read()
    file_data = base64.b64encode(file_bytes).decode("ascii")

payload = {"file": file_data, "fileType": 1}

response = requests.post(API_URL, json=payload)

assert response.status_code == 200
result = response.json()["result"]
for i, res in enumerate(result["sealRecResults"]):
    print("Detected texts:")
    print(res["texts"])
    layout_img_path = f"layout_{i}.jpg"
    with open(layout_img_path, "wb") as f:
        f.write(base64.b64decode(res["layoutImage"]))
    ocr_img_path = f"ocr_{i}.jpg"
    with open(ocr_img_path, "wb") as f:
        f.write(base64.b64decode(res["ocrImage"]))
    print(f"Output images saved at {layout_img_path} and {ocr_img_path}")

📱 Edge Deployment: Edge deployment is a method of placing computing and data processing capabilities directly on user devices, allowing devices to process data without relying on remote servers. PaddleX supports deploying models on edge devices such as Android. For detailed edge deployment procedures, please refer to the [PaddleX Edge Deployment Guide](../../../pipeline_deploy/edge_deploy.en.md). You can choose the appropriate deployment method based on your needs to integrate the model pipeline into subsequent AI applications. ## 4. Custom Development If the default model weights provided by the seal text recognition pipeline do not meet your requirements in terms of accuracy or speed, you can try to fine-tune the existing models using your own domain-specific or application data to improve the recognition performance of the seal text recognition pipeline in your scenario. ### 4.1 Model Fine-Tuning Since the seal text recognition pipeline consists of several modules, if the pipeline's performance does not meet expectations, the issue may arise from any one of these modules. You can analyze images with poor recognition results to identify which module is problematic and refer to the corresponding fine-tuning tutorial links in the table below for model fine-tuning.
Scenario Fine-Tuning Module Fine-Tuning Reference Link
Inaccurate or missing seal position detection Layout Detection Module Link
Missing text detection Text Detection Module Link
Inaccurate text content Text Recognition Module Link
Inaccurate full-image rotation correction Document Image Orientation Classification Module Link
Inaccurate image distortion correction Text Image Correction Module Not supported for fine-tuning
### 4.2 Model Application After fine-tuning with your private dataset, you will obtain the local model weight files. If you need to use the fine-tuned model weights, simply modify the pipeline configuration file by replacing the local path of the fine-tuned model weights in the corresponding position of the pipeline configuration file: ```python ...... SubModules: TextDetection: module_name: seal_text_detection model_name: PP-OCRv4_server_seal_det model_dir: null # 修改此处为微调后模型权重的本地路径 limit_side_len: 736 limit_type: min thresh: 0.2 box_thresh: 0.6 unclip_ratio: 0.5 ...... ``` Then, refer to the command-line or Python script methods in [2.2 Local Experience](#22-local-experience) to load the modified production line configuration file. ## 5. Multi-Hardware Support PaddleX supports a variety of mainstream hardware devices, including NVIDIA GPU, Kunlunxin XPU, Ascend NPU, and Cambricon MLU. Simply modify the `--device` parameter to seamlessly switch between different hardware devices. For example, if you use Ascend NPU for inference on the seal text recognition production line, the Python command would be: ```bash paddlex --pipeline seal_recognition \ --input seal_text_det.png \ --use_doc_orientation_classify False \ --use_doc_unwarping False \ --device npu:0 \ --save_path ./output ``` If you wish to use the seal text recognition pipeline on a wider variety of hardware, please refer to the [PaddleX Multi-Device Usage Guide](../../../other_devices_support/multi_devices_use_guide.en.md).