--- comments: true --- # Seal Text Recognition Pipeline Tutorial ## 1. Introduction to Seal Text Recognition Pipeline Seal text recognition is a technology that automatically extracts and recognizes the content of seals from documents or images. The recognition of seal text is part of document processing and has many applications in various scenarios, such as contract comparison, warehouse entry and exit review, and invoice reimbursement review. The seal text recognition pipeline is used to recognize the text content of seals, extracting the text information from seal images and outputting it in text form. This pipeline integrates the industry-renowned end-to-end OCR system PP-OCRv4, supporting the detection and recognition of curved seal text. Additionally, this pipeline integrates an optional layout region localization module, which can accurately locate the layout position of the seal within the entire document. It also includes optional document image orientation correction and distortion correction functions. Based on this pipeline, millisecond-level accurate text content prediction can be achieved on a CPU. This pipeline also provides flexible service deployment methods, supporting the use of multiple programming languages on various hardware. Moreover, it offers secondary development capabilities, allowing you to train and fine-tune on your own dataset based on this pipeline, and the trained model can be seamlessly integrated.

The seal text recognition pipeline includes a seal text detection module and a text recognition module, as well as optional layout detection module, document image orientation classification module, and text image correction module. If you prioritize model accuracy, choose a model with higher accuracy. If you prioritize inference speed, choose a model with faster inference speed. If you prioritize model storage size, choose a model with smaller storage size.

Layout Region Detection Module (Optional):

* Layout detection model, including 23 common categories: document title, paragraph title, text, page number, abstract, table of contents, references, footnotes, header, footer, algorithm, formula, formula number, image, chart title, table, table title, seal, chart title, chart, header image, footer image, sidebar text

Model	Model Download Link	mAP(0.5) (%)	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (M)	Description
PicoDet_layout_1x	Inference Model/Trained Model	86.8	9.03 / 3.10	25.82 / 20.70	7.4	An efficient layout area localization model trained on the PubLayNet dataset based on PicoDet-1x can locate five types of areas, including text, titles, tables, images, and lists.
PicoDet_layout_1x_table	Inference Model/Trained Model	95.7	8.02 / 3.09	23.70 / 20.41	7.4 M	An efficient layout area localization model trained on the PubLayNet dataset based on PicoDet-1x can locate one type of tables.
PicoDet-S_layout_3cls	Inference Model/Trained Model	87.1	8.99 / 2.22	16.11 / 8.73	4.8	An high-efficient layout area localization model trained on a self-constructed dataset based on PicoDet-S for scenarios such as Chinese and English papers, magazines, and research reports includes three categories: tables, images, and seals.
PicoDet-S_layout_17cls	Inference Model/Trained Model	70.3	9.11 / 2.12	15.42 / 9.12	4.8	A high-efficient layout area localization model trained on a self-constructed dataset based on PicoDet-S_layout_17cls for scenarios such as Chinese and English papers, magazines, and research reports includes 17 common layout categories, namely: paragraph titles, images, text, numbers, abstracts, content, chart titles, formulas, tables, table titles, references, document titles, footnotes, headers, algorithms, footers, and seals.
PicoDet-L_layout_3cls	Inference Model/Trained Model	89.3	13.05 / 4.50	41.30 / 41.30	22.6	An efficient layout area localization model trained on a self-constructed dataset based on PicoDet-L for scenarios such as Chinese and English papers, magazines, and research reports includes three categories: tables, images, and seals.
PicoDet-L_layout_17cls	Inference Model/Trained Model	79.9	13.50 / 4.69	43.32 / 43.32	22.6	A efficient layout area localization model trained on a self-constructed dataset based on PicoDet-L_layout_17cls for scenarios such as Chinese and English papers, magazines, and research reports includes 17 common layout categories, namely: paragraph titles, images, text, numbers, abstracts, content, chart titles, formulas, tables, table titles, references, document titles, footnotes, headers, algorithms, footers, and seals.
RT-DETR-H_layout_3cls	Inference Model/Trained Model	95.9	114.93 / 27.71	947.56 / 947.56	470.1	A high-precision layout area localization model trained on a self-constructed dataset based on RT-DETR-H for scenarios such as Chinese and English papers, magazines, and research reports includes three categories: tables, images, and seals.
RT-DETR-H_layout_17cls	Inference Model/Trained Model	92.6	115.29 / 104.09	995.27 / 995.27	470.2	A high-precision layout area localization model trained on a self-constructed dataset based on RT-DETR-H for scenarios such as Chinese and English papers, magazines, and research reports includes 17 common layout categories, namely: paragraph titles, images, text, numbers, abstracts, content, chart titles, formulas, tables, table titles, references, document titles, footnotes, headers, algorithms, footers, and seals.

Note: The evaluation dataset for the above accuracy metrics is the self-built layout region detection dataset of PaddleOCR, which includes 500 common document images of Chinese and English papers, magazines, contracts, books, test papers, and research reports. GPU inference time is based on NVIDIA Tesla T4 machine, precision type is FP32, and CPU inference speed is based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz, with 8 threads, precision type is FP32. > ❗ The above listed are the 3 core models that the layout detection module mainly supports. This module supports a total of 11 full models, including multiple models predefined with different categories. Among them, there are 9 models that include the seal category. In addition to the above 3 core models, the remaining model list is as follows:

👉Model List Details

* 3-category Layout Detection Models, including table, image, and seal

Model	Model Download Link	mAP(0.5) (%)	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (M)	Introduction
PicoDet-S_layout_3cls	Inference Model/Training Model	88.2	8.99 / 2.22	16.11 / 8.73	4.8	A high-efficiency layout area localization model trained on a self-built dataset for Chinese and English papers, magazines, and research reports based on the lightweight PicoDet-S model
PicoDet-L_layout_3cls	Inference Model/Training Model	89.0	13.05 / 4.50	41.30 / 41.30	22.6	A layout area localization model with balanced efficiency and accuracy, trained on a self-built dataset for Chinese and English papers, magazines, and research reports based on PicoDet-L
RT-DETR-H_layout_3cls	Inference Model/Training Model	95.8	114.93 / 27.71	947.56 / 947.56	470.1	A high-precision layout area localization model trained on a self-built dataset for Chinese and English papers, magazines, and research reports based on RT-DETR-H

Note: The evaluation set for the above accuracy metrics is the layout area detection dataset self-built by PaddleOCR, which includes 1,154 images of common document types such as Chinese and English papers, magazines, and research reports. The GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision. The CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision. * 17-category Layout Detection Models, including 17 common layout categories: paragraph title, image, text, number, abstract, content, figure title, formula, table, table title, reference, document title, footnote, header, algorithm, footer, seal

Model	Model Download Link	mAP(0.5) (%)	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (M)	Introduction
PicoDet-S_layout_17cls	Inference Model/Training Model	87.4	9.11 / 2.12	15.42 / 9.12	4.8	A high-efficiency layout area localization model trained on a self-built dataset for Chinese and English papers, magazines, and research reports based on the lightweight PicoDet-S model
PicoDet-L_layout_17cls	Inference Model/Training Model	89.0	13.50 / 4.69	43.32 / 43.32	22.6	A layout area localization model with balanced efficiency and accuracy, trained on a self-built dataset for Chinese and English papers, magazines, and research reports based on PicoDet-L
RT-DETR-H_layout_17cls	Inference Model/Training Model	98.3	115.29 / 104.09	995.27 / 995.27	470.2	A high-precision layout area localization model trained on a self-built dataset for Chinese and English papers, magazines, and research reports based on RT-DETR-H

Note: The evaluation set for the above accuracy metrics is the layout area detection dataset self-built by PaddleOCR, which includes 892 images of common document types such as Chinese and English papers, magazines, and research reports. The GPU inference time is based on an NVIDIA Tesla T4 machine with FP32 precision. The CPU inference speed is based on an Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision.

Document Image Orientation Classification Module (Optional):

Model	Model Download Link	Top-1 Acc (%)	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (M)	Description
PP-LCNet_x1_0_doc_ori	Inference Model/Training Model	99.06	2.31 / 0.43	3.37 / 1.27	7	A document image classification model based on PP-LCNet_x1_0, containing four categories: 0 degrees, 90 degrees, 180 degrees, and 270 degrees

Note: The above accuracy metrics are evaluated on a self-built dataset covering multiple scenarios such as certificates and documents, containing 1000 images. GPU inference time is based on NVIDIA Tesla T4 machine, precision type is FP32, CPU inference speed is based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz, with 8 threads, precision type is FP32.

Text Image Correction Module (Optional):

Model	Model Download Link	CER	Model Storage Size (M)	Description
UVDoc	Inference Model/Training Model	0.179	30.3 M	High-precision text image correction model

Note: The accuracy metrics of the model are measured from the DocUNet benchmark.

Text Detection Module:

Model	Model Download Link	Detection Hmean (%)	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (M)	Description
PP-OCRv4_server_seal_det	Inference Model/Trained Model	98.21	74.75 / 67.72	382.55 / 382.55	109	PP-OCRv4 server-side seal text detection model, with higher accuracy, suitable for deployment on better servers
PP-OCRv4_mobile_seal_det	Inference Model/Trained Model	96.47	7.82 / 3.09	48.28 / 23.97	4.6	PP-OCRv4 mobile-side seal text detection model, with higher efficiency, suitable for deployment on the edge

Note: The above accuracy metrics are evaluated on a self-built dataset containing 500 circular seal images. GPU inference time is based on NVIDIA Tesla T4 machine, precision type is FP32, CPU inference speed is based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz, with 8 threads, precision type is FP32.

Text Recognition Module:

Model	Model Download Link	Recognition Avg Accuracy(%)	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	CPU Inference Time (ms) [Normal Mode / High-Performance Mode]	Model Storage Size (M)	Description
PP-OCRv4_mobile_rec	Inference Model/Trained Model	78.20	4.82 / 4.82	16.74 / 4.64	10.6 M	The PP-OCRv4 recognition model is an upgrade from PP-OCRv3. Under comparable speed conditions, the effect in Chinese and English scenarios is further improved. The average recognition accuracy of the 80 multilingual models is increased by more than 8%.
PP-OCRv4_server_rec	Inference Model/Trained Model	79.20	6.58 / 6.58	33.17 / 33.17	71.2 M	A high-precision server text recognition model, featuring high accuracy, fast speed, and multilingual support. It is suitable for text recognition tasks in various scenarios.
PP-OCRv3_mobile_rec	Inference Model/Training Model		5.87 / 5.87	9.07 / 4.28		An ultra-lightweight OCR model suitable for mobile applications. It adopts an encoder-decoder structure based on Transformer and enhances recognition accuracy and efficiency through techniques such as data augmentation and mixed precision training. The model size is 10.6M, making it suitable for deployment on resource-constrained devices. It can be used in scenarios such as mobile photo translation and business card recognition.

Note: The evaluation set for the above accuracy indicators is the Chinese dataset built by PaddleOCR, covering multiple scenarios such as street view, web images, documents, and handwriting. The text recognition includes 11,000 images. The GPU inference time for all models is based on NVIDIA Tesla T4 machines with FP32 precision type. The CPU inference speed is based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision type.

Model	Model Download Link	Recognition Avg Accuracy(%)	GPU Inference Time (ms)	CPU Inference Time	Model Storage Size (M)	Introduction
ch_SVTRv2_rec	Inference Model/Training Model	68.81	8.36801	165.706	73.9 M	SVTRv2 is a server text recognition model developed by the OpenOCR team of Fudan University's Visual and Learning Laboratory (FVL). It won the first prize in the PaddleOCR Algorithm Model Challenge - Task One: OCR End-to-End Recognition Task. The end-to-end recognition accuracy on the A list is 6% higher than that of PP-OCRv4.

Note: The evaluation set for the above accuracy indicators is the PaddleOCR Algorithm Model Challenge - Task One: OCR End-to-End Recognition Task A list. The GPU inference time for all models is based on NVIDIA Tesla T4 machines with FP32 precision type. The CPU inference speed is based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision type.

Model	Model Download Link	Recognition Avg Accuracy(%)	GPU Inference Time (ms)	CPU Inference Time	Model Storage Size (M)	Introduction
ch_RepSVTR_rec	Inference Model/Training Model	65.07	10.5047	51.5647	22.1 M	The RepSVTR text recognition model is a mobile text recognition model based on SVTRv2. It won the first prize in the PaddleOCR Algorithm Model Challenge - Task One: OCR End-to-End Recognition Task. The end-to-end recognition accuracy on the B list is 2.5% higher than that of PP-OCRv4, with the same inference speed.

Note: The evaluation set for the above accuracy indicators is the PaddleOCR Algorithm Model Challenge - Task One: OCR End-to-End Recognition Task B list. The GPU inference time for all models is based on NVIDIA Tesla T4 machines with FP32 precision type. The CPU inference speed is based on Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz with 8 threads and FP32 precision type.

* English Recognition Model

Model	Model Download Link	Recognition Avg Accuracy(%)	GPU Inference Time (ms)	CPU Inference Time	Model Storage Size (M)	Introduction
en_PP-OCRv4_mobile_rec	Inference Model/Training Model					[Latest] Further upgraded based on PP-OCRv3, with improved accuracy under comparable speed conditions.
en_PP-OCRv3_mobile_rec	Inference Model/Training Model					Ultra-lightweight model, supporting English and numeric recognition.

* Multilingual Recognition Model

Model	Model Download Link	Introduction
korean_PP-OCRv3_mobile_rec	Inference Model/Training Model	Korean Recognition
japan_PP-OCRv3_mobile_rec	Inference Model/Training Model	Japanese Recognition
chinese_cht_PP-OCRv3_mobile_rec	Inference Model/Training Model	Traditional Chinese Recognition
te_PP-OCRv3_mobile_rec	Inference Model/Training Model	Telugu Recognition
ka_PP-OCRv3_mobile_rec	Inference Model/Training Model	Kannada Recognition
ta_PP-OCRv3_mobile_rec	Inference Model/Training Model	Tamil Recognition
latin_PP-OCRv3_mobile_rec	Inference Model/Training Model	Latin Recognition
arabic_PP-OCRv3_mobile_rec	Inference Model/Training Model	Arabic Script Recognition
cyrillic_PP-OCRv3_mobile_rec	Inference Model/Training Model	Cyrillic Script Recognition
devanagari_PP-OCRv3_mobile_rec	Inference Model/Training Model	Devanagari Script Recognition

## 2. Quick Start All model production lines provided by PaddleX can be quickly experienced. You can experience the effect of the seal text recognition pipeline on the community platform, or you can use the command line or Python locally to experience the effect of the seal text recognition pipeline. ### 2.1 Online Experience You can [experience the seal text recognition pipeline online](https://aistudio.baidu.com/community/app/387977/webUI?source=appCenter) by recognizing the demo images provided by the official platform, for example:

If you are satisfied with the performance of the production line, you can directly integrate and deploy it. You can choose to download the deployment package from the cloud, or refer to the methods in [Section 2.2 Local Experience](#22-local-experience) for local deployment. If you are not satisfied with the effect, you can fine-tune the models in the production line using your private data. If you have local hardware resources for training, you can start training directly on your local machine; if not, the Star River Zero-Code platform provides a one-click training service. You don't need to write any code—just upload your data and start the training task with one click. ### 2.2 Local Experience > ❗ Before using the seal text recognition pipeline locally, please ensure that you have completed the installation of the PaddleX wheel package according to the [PaddleX Installation Guide](../../../installation/installation.en.md). #### 2.2.1 Command Line Experience You can quickly experience the seal text recognition pipeline with a single command. Use the [test file](https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/seal_text_det.png), and replace `--input` with the local path for prediction. ```bash paddlex --pipeline seal_recognition \ --input seal_text_det.png \ --use_doc_orientation_classify False \ --use_doc_unwarping False \ --device gpu:0 \ --save_path ./output ``` The relevant parameter descriptions can be referred to in the parameter explanations of [2.1.2 Integration via Python Script](#212-integration-via-python-script). After running, the results will be printed to the terminal, as follows:

👉Click to Expand

```bash {'res': {'input_path': 'seal_text_det.png', 'model_settings': {'use_doc_preprocessor': False, 'use_layout_detection': True}, 'layout_det_res': {'input_path': None, 'page_index': None, 'boxes': [{'cls_id': 16, 'label': 'seal', 'score': 0.975531280040741, 'coordinate': [6.195526, 0.1579895, 634.3982, 628.84595]}]}, 'seal_res_list': [{'input_path': None, 'page_index': None, 'model_settings': {'use_doc_preprocessor': False, 'use_textline_orientation': False}, 'dt_polys': [array([[320, 38], ..., [315, 38]]), array([[461, 347], ..., [456, 346]]), array([[439, 445], ..., [434, 444]]), array([[158, 468], ..., [154, 466]])], 'text_det_params': {'limit_side_len': 736, 'limit_type': 'min', 'thresh': 0.2, 'box_thresh': 0.6, 'unclip_ratio': 0.5}, 'text_type': 'seal', 'textline_orientation_angles': array([-1, ..., -1]), 'text_rec_score_thresh': 0, 'rec_texts': ['天津君和缘商贸有限公司', '发票专用章', '吗繁物', '5263647368706'], 'rec_scores': array([0.9934051 , ..., 0.99139398]), 'rec_polys': [array([[320, 38], ..., [315, 38]]), array([[461, 347], ..., [456, 346]]), array([[439, 445], ..., [434, 444]]), array([[158, 468], ..., [154, 466]])], 'rec_boxes': array([], dtype=float64)}]}} ```

The explanation of the result parameters can be found in [2.1.2 Python Script Integration](#212-python-script-integration). The visualized results are saved under `save_path`, and the visualized result of seal OCR is as follows:

#### 2.2.2 Python Script Integration * The above command line is for quickly experiencing and viewing the effect. Generally, in a project, you often need to integrate through code. You can complete the quick inference of the pipeline with just a few lines of code. The inference code is as follows: ```python from paddlex import create_pipeline pipeline = create_pipeline(pipeline="seal_recognition") output = pipeline.predict( "seal_text_det.png", use_doc_orientation_classify=False, use_doc_unwarping=False, ) for res in output: res.print() res.save_to_img("./output/") res.save_to_json("./output/") ``` In the above Python script, the following steps were executed: (1) The seal recognition production line object was instantiated via `create_pipeline()`, with the specific parameters described as follows:

Parameter	Description	Type	Default Value
`pipeline`	The name of the production line or the path to the production line configuration file. If it is a production line name, it must be supported by PaddleX.	`str`	`None`
`config`	Specific configuration information for the production line (if set simultaneously with `pipeline`, it has higher priority than `pipeline`, and the production line name must be consistent with `pipeline`).	`dict[str, Any]`	`None`
`device`	The device used for production line inference. It supports specifying the specific card number of the GPU, such as "gpu:0", other hardware card numbers, such as "npu:0", or CPU, such as "cpu".	`str`	`gpu:0`
`use_hpip`	Whether to enable high-performance inference. This is only available if the production line supports high-performance inference.	`bool`	`False`

(2) Call the `predict()` method of the Seal Text Recognition pipeline object for inference prediction. This method will return a `generator`. Below are the parameters and their descriptions for the `predict()` method:

Parameter	Description	Type	Options	Default Value
`input`	Data to be predicted, supports multiple input types (required)	`Python Var\|str\|list`	Python Var: Image data represented by `numpy.ndarray` str: Local path of an image or PDF file, e.g., `/root/data/img.jpg`; URL link, e.g., the network URL of an image or PDF file: Example; Local directory, containing images to be predicted, e.g., `/root/data/` (currently does not support prediction of PDF files in directories; PDF files must be specified with an exact file path) List: Elements of the list must be of the above types, e.g., `[numpy.ndarray, numpy.ndarray]`, `[\"/root/data/img1.jpg\", \"/root/data/img2.jpg\"]`, `[\"/root/data1\", \"/root/data2\"]`	`None`
`device`	Inference device for the pipeline	`str\|None`	CPU: e.g., `cpu` for CPU inference; GPU: e.g., `gpu:0` for inference using the first GPU; NPU: e.g., `npu:0` for inference using the first NPU; XPU: e.g., `xpu:0` for inference using the first XPU; MLU: e.g., `mlu:0` for inference using the first MLU; DCU: e.g., `dcu:0` for inference using the first DCU; None: If set to `None`, the default value from the pipeline initialization will be used. During initialization, the local GPU device 0 will be prioritized; if unavailable, the CPU device will be used.	`None`
`use_doc_orientation_classify`	Whether to use the document orientation classification module	`bool\|None`	bool: `True` or `False`; None: If set to `None`, the default value from the pipeline initialization will be used, initialized as `True`.	`None`
`use_doc_unwarping`	Whether to use the document unwarping module	`bool\|None`	bool: `True` or `False`; None: If set to `None`, the default value from the pipeline initialization will be used, initialized as `True`.	`None`
`use_layout_detection`	Whether to use the layout detection module	`bool\|None`	bool: `True` or `False`; None: If set to `None`, the default value from the pipeline initialization will be used, initialized as `True`.	`None`
`layout_threshold`	Confidence threshold for layout detection; only scores above this threshold will be output	`float\|dict\|None`	float: Any float greater than `0` dict: Key is the int category ID, value is any float greater than `0` None: If set to `None`, the default value from the pipeline initialization will be used, initialized as `0.5`	`None`
`layout_nms`	Whether to use Non-Maximum Suppression (NMS) for layout detection post-processing	`bool\|None`	bool: `True` or `False`; None: If set to `None`, the default value from the pipeline initialization will be used, initialized as `True`.	`None`
`layout_unclip_ratio`	Expansion ratio of detection box edges; if not specified, the default value from the PaddleX official model configuration will be used	`float\|list\|None`	float: Any float greater than 0, e.g., 1.1, which means expanding the width and height of the detection box by 1.1 times while keeping the center unchanged list: e.g., [1.2, 1.5], which means expanding the width of the detection box by 1.2 times and the height by 1.5 times while keeping the center unchanged None: If set to `None`, the default value from the pipeline initialization will be used, initialized as 1.0
`layout_merge_bboxes_mode`	Merging mode for detection boxes in layout detection output; if not specified, the default value from the PaddleX official model configuration will be used	`string\|None`	large: When set to `large`, only the largest external box will be retained for overlapping detection boxes, and the internal overlapping boxes will be removed. small: When set to `small`, only the smallest internal box will be retained for overlapping detection boxes, and the external overlapping boxes will be removed. union: No filtering of boxes will be performed; both internal and external boxes will be retained. None: If set to `None`, the default value from the pipeline initialization will be used, initialized as `large`.	None
`seal_det_limit_side_len`	Side length limit for seal text detection	`int\|None`	int: Any integer greater than `0` None: If set to `None`, the default value from the pipeline initialization will be used, initialized as `736`	`None`
`seal_rec_score_thresh`	Text recognition threshold; text results with scores above this threshold will be retained	`float\|None`	float: Any float greater than `0` None: If set to `None`, the default value from the pipeline initialization will be used, initialized as `0.0`. This means no threshold is applied.	`None`

(3) Process the prediction results. The prediction result for each sample is of `dict` type and supports operations such as printing, saving as an image, and saving as a `json` file:

Method	Description	Parameter	Parameter Type	Parameter Description	Default Value
`print()`	Print results to the terminal	`format_json`	`bool`	Whether to format the output content using `JSON` indentation	`True`
		`indent`	`int`	Specify the indentation level to beautify the output `JSON` data for better readability, effective only when `format_json` is `True`	4
		`ensure_ascii`	`bool`	Control whether to escape non-`ASCII` characters to `Unicode`. When set to `True`, all non-`ASCII` characters will be escaped; `False` will retain the original characters, effective only when `format_json` is `True`	`False`
`save_to_json()`	Save results as a json file	`save_path`	`str`	The file path to save the results. When it is a directory, the saved file name will be consistent with the input file type	None
		`indent`	`int`	Specify the indentation level to beautify the output `JSON` data for better readability, effective only when `format_json` is `True`	4
		`ensure_ascii`	`bool`	Control whether to escape non-`ASCII` characters to `Unicode`. When set to `True`, all non-`ASCII` characters will be escaped; `False` will retain the original characters, effective only when `format_json` is `True`	`False`
`save_to_img()`	Save results as an image file	`save_path`	`str`	The file path to save the results, supports directory or file path	None

- Calling the `print()` method will print the results to the terminal, and the explanations of the printed content are as follows: - `input_path`: `(str)` The input path of the image to be predicted. - `model_settings`: `(Dict[str, bool])` The model parameters required for pipeline configuration. - `use_doc_preprocessor`: `(bool)` Controls whether to enable the document preprocessing sub-pipeline. - `use_layout_detection`: `(bool)` Controls whether to enable the layout detection sub-module. - `layout_det_res`: `(Dict[str, Union[List[numpy.ndarray], List[float]]])` The output result of the layout detection sub-module. Only exists when `use_layout_detection=True`. - `input_path`: `(Union[str, None])` The image path accepted by the layout detection module. Saved as `None` when the input is a `numpy.ndarray`. - `page_index`: `(Union[int, None])` Indicates the current page number of the PDF if the input is a PDF file; otherwise, it is `None`. - `boxes`: `(List[Dict])` A list of detected layout seal regions, with each element containing the following fields: - `cls_id`: `(int)` The class ID of the detected seal region. - `score`: `(float)` The confidence score of the detected region. - `coordinate`: `(List[float])` The coordinates of the four corners of the detection box, in the order of x1, y1, x2, y2, representing the x-coordinate of the top-left corner, the y-coordinate of the top-left corner, the x-coordinate of the bottom-right corner, and the y-coordinate of the bottom-right corner. - `seal_res_list`: `List[Dict]` A list of seal text recognition results, with each element containing the following fields: - `input_path`: `(Union[str, None])` The image path accepted by the seal text recognition pipeline. Saved as `None` when the input is a `numpy.ndarray`. - `page_index`: `(Union[int, None])` Indicates the current page number of the PDF if the input is a PDF file; otherwise, it is `None`. - `model_settings`: `(Dict[str, bool])` The model configuration parameters for the seal text recognition pipeline. - `use_doc_preprocessor`: `(bool)` Controls whether to enable the document preprocessing sub-pipeline. - `use_textline_orientation`: `(bool)` Controls whether to enable the text line orientation classification sub-module. - `doc_preprocessor_res`: `(Dict[str, Union[str, Dict[str, bool], int]])` The output result of the document preprocessing sub-pipeline. Only exists when `use_doc_preprocessor=True`. - `input_path`: `(Union[str, None])` The image path accepted by the document preprocessing sub-pipeline. Saved as `None` when the input is a `numpy.ndarray`. - `model_settings`: `(Dict)` The model configuration parameters for the preprocessing sub-pipeline. - `use_doc_orientation_classify`: `(bool)` Controls whether to enable document orientation classification. - `use_doc_unwarping`: `(bool)` Controls whether to enable document unwarping. - `angle`: `(int)` The predicted result of document orientation classification. When enabled, it takes values [0, 1, 2, 3], corresponding to [0°, 90°, 180°, 270°]; when disabled, it is -1. - `dt_polys`: `(List[numpy.ndarray])` A list of polygon boxes for seal text detection. Each detection box is represented by a numpy array of multiple vertex coordinates, with the array shape being (n, 2). - `dt_scores`: `(List[float])` A list of confidence scores for text detection boxes. - `text_det_params`: `(Dict[str, Dict[str, int, float]])` Configuration parameters for the text detection module. - `limit_side_len`: `(int)` The side length limit value during image preprocessing. - `limit_type`: `(str)` The handling method for side length limits. - `thresh`: `(float)` The confidence threshold for text pixel classification. - `box_thresh`: `(float)` The confidence threshold for text detection boxes. - `unclip_ratio`: `(float)` The expansion ratio for text detection boxes. - `text_type`: `(str)` The type of seal text detection, currently fixed as "seal". - `text_rec_score_thresh`: `(float)` The filtering threshold for text recognition results. - `rec_texts`: `(List[str])` A list of text recognition results, containing only texts with confidence scores above `text_rec_score_thresh`. - `rec_scores`: `(List[float])` A list of confidence scores for text recognition, filtered by `text_rec_score_thresh`. - `rec_polys`: `(List[numpy.ndarray])` A list of text detection boxes filtered by confidence score, in the same format as `dt_polys`. - `rec_boxes`: `(numpy.ndarray)` An array of rectangular bounding boxes for detection boxes; the seal recognition pipeline returns an empty array. - Calling the `save_to_json()` method will save the above content to the specified `save_path`. If a directory is specified, the saved path will be `save_path/{your_img_basename}_res.json`. If a file is specified, it will be saved directly to that file. Since JSON files do not support saving numpy arrays, `numpy.array` types will be converted to list format. - Calling the `save_to_img()` method will save the visualization results to the specified `save_path`. If a directory is specified, the saved path will be `save_path/{your_img_basename}_seal_res_region1.{your_img_extension}`. If a file is specified, it will be saved directly to that file. (The pipeline usually contains multiple result images, so it is not recommended to specify a specific file path directly, as multiple images will be overwritten, and only the last image will be retained.) * Additionally, you can obtain visualized images with results and prediction results through attributes, as follows:

Attribute	Description
`json`	Get the prediction results in `json` format.
`img`	Get the visualization results in `dict` format.

- The prediction results obtained through the `json` attribute are of dict type, with content consistent with what is saved by calling the `save_to_json()` method. - The prediction results returned by the `img` attribute are of dict type. The keys are `layout_det_res`, `seal_res_region1`, and `preprocessed_img`, corresponding to three `Image.Image` objects: one for visualizing layout detection, one for visualizing seal text recognition results, and one for visualizing image preprocessing. If the image preprocessing sub-module is not used, `preprocessed_img` will not be included in the dictionary. If the layout region detection module is not used, `layout_det_res` will not be included. Additionally, you can obtain the configuration file for the seal text recognition pipeline and load the configuration file for prediction. You can execute the following command to save the results in `my_path`: ``` paddlex --get_pipeline_config seal_recognition --save_path ./my_path ``` If you have obtained the configuration file, you can customize the settings for the seal text recognition production line by simply modifying the `pipeline` parameter value in the `create_pipeline` method to the path of the production line configuration file. The example is as follows: ```python from paddlex import create_pipeline pipeline = create_pipeline(pipeline="./my_path/seal_recognition.yaml") output = pipeline.predict("seal_text_det.png") for res in output: res.print() ## 打印预测的结构化输出 res.save_to_img("./output/") ## 保存可视化结果 res.save_to_json("./output/") ## 保存预测结果的json文件 ``` Note: The parameters in the configuration file are the pipeline initialization parameters. If you wish to change the initialization parameters of the seal text recognition pipeline, you can directly modify the parameters in the configuration file and load the configuration file for prediction. Additionally, CLI prediction also supports passing in a configuration file. Simply specify the path of the configuration file with `--pipeline`. ## 3. Development Integration/Deployment If the pipeline meets your requirements for inference speed and accuracy, you can proceed directly with development integration/deployment. If you need to integrate the pipeline into your Python project, you can refer to the example code in [2.2.2 Python Script Method](#222-python脚本方式集成). In addition, PaddleX also provides three other deployment methods, which are detailed as follows: 🚀 High-Performance Deployment: In practical production environments, many applications have strict performance requirements (especially response speed) for deployment strategies to ensure efficient system operation and smooth user experience. To this end, PaddleX provides a high-performance inference plugin that aims to deeply optimize the performance of model inference and pre/post-processing, significantly speeding up the end-to-end process. For detailed high-performance deployment procedures, please refer to the [PaddleX High-Performance Deployment Guide](../../../pipeline_deploy/high_performance_inference.en.md). ☁️ Service-Oriented Deployment: Service-oriented deployment is a common form of deployment in practical production environments. By encapsulating inference capabilities as services, clients can access these services via network requests to obtain inference results. PaddleX supports various pipeline service-oriented deployment solutions. For detailed pipeline service-oriented deployment procedures, please refer to the [PaddleX Service-Oriented Deployment Guide](../../../pipeline_deploy/serving.en.md). Below are the API references for basic service-oriented deployment and multi-language service invocation examples:

API Reference

For the main operations provided by the service:

The HTTP request method is POST.
Both the request body and response body are JSON data (JSON objects).
When the request is processed successfully, the response status code is 200, and the response body has the following properties:

Name	Type	Description
`logId`	`string`	The UUID of the request.
`errorCode`	`integer`	Error code. Fixed to `0`.
`errorMsg`	`string`	Error message. Fixed to `"Success"`.
`result`	`object`	Operation result.

When the request is not processed successfully, the response body has the following properties:

Name	Type	Description
`logId`	`string`	The UUID of the request.
`errorCode`	`integer`	Error code. Same as the response status code.
`errorMsg`	`string`	Error message.

The main operations provided by the service are as follows:

infer

Get seal text recognition results.

POST /seal-recognition

The request body has the following properties:

Name	Type	Description	Required
`file`	`string`	The URL of an image or PDF file accessible to the server, or the Base64 encoded result of the content of the above file types. For PDF files exceeding 10 pages, only the content of the first 10 pages will be used.	Yes
`fileType`	`integer`	File type. `0` indicates a PDF file, `1` indicates an image file. If this property is not present in the request body, the file type will be inferred from the URL.	No

When the request is processed successfully, the result property of the response body has the following properties:

Name	Type	Description
`sealRecResults`	`object`	Seal text recognition results. The array length is 1 (for image input) or the smaller of the document page count and 10 (for PDF input). For PDF input, each element in the array represents the processing result of each page in the PDF file in order.
`dataInfo`	`object`	Input data information.

Each element in sealRecResults is an object with the following properties:

Name	Type	Description
`texts`	`array`	Text position, content, and score.
`inputImage`	`string`	Input image. The image is in JPEG format and encoded using Base64.
`layoutImage`	`string`	Layout area detection result image. The image is in JPEG format and encoded using Base64.
`ocrImage`	`string`	OCR result image. The image is in JPEG format and encoded using Base64.

Each element in texts is an object with the following properties:

Name	Type	Description
`poly`	`array`	Text position. The elements in the array are the vertex coordinates of the polygon surrounding the text.
`text`	`string`	Text content.
`score`	`number`	Text recognition score.

Multi-language Service Call Examples

Python

import base64
import requests

API_URL = "http://localhost:8080/seal-recognition"
file_path = "./demo.jpg"

with open(file_path, "rb") as file:
    file_bytes = file.read()
    file_data = base64.b64encode(file_bytes).decode("ascii")

payload = {"file": file_data, "fileType": 1}

response = requests.post(API_URL, json=payload)

assert response.status_code == 200
result = response.json()["result"]
for i, res in enumerate(result["sealRecResults"]):
    print("Detected texts:")
    print(res["texts"])
    layout_img_path = f"layout_{i}.jpg"
    with open(layout_img_path, "wb") as f:
        f.write(base64.b64decode(res["layoutImage"]))
    ocr_img_path = f"ocr_{i}.jpg"
    with open(ocr_img_path, "wb") as f:
        f.write(base64.b64decode(res["ocrImage"]))
    print(f"Output images saved at {layout_img_path} and {ocr_img_path}")

📱 Edge Deployment: Edge deployment is a method of placing computing and data processing capabilities directly on user devices, allowing devices to process data without relying on remote servers. PaddleX supports deploying models on edge devices such as Android. For detailed edge deployment procedures, please refer to the [PaddleX Edge Deployment Guide](../../../pipeline_deploy/edge_deploy.en.md). You can choose the appropriate deployment method based on your needs to integrate the model pipeline into subsequent AI applications. ## 4. Custom Development If the default model weights provided by the seal text recognition pipeline do not meet your requirements in terms of accuracy or speed, you can try to fine-tune the existing models using your own domain-specific or application data to improve the recognition performance of the seal text recognition pipeline in your scenario. ### 4.1 Model Fine-Tuning Since the seal text recognition pipeline consists of several modules, if the pipeline's performance does not meet expectations, the issue may arise from any one of these modules. You can analyze images with poor recognition results to identify which module is problematic and refer to the corresponding fine-tuning tutorial links in the table below for model fine-tuning.

Scenario	Fine-Tuning Module	Fine-Tuning Reference Link
Inaccurate or missing seal position detection	Layout Detection Module	Link
Missing text detection	Text Detection Module	Link
Inaccurate text content	Text Recognition Module	Link
Inaccurate full-image rotation correction	Document Image Orientation Classification Module	Link
Inaccurate image distortion correction	Text Image Correction Module	Not supported for fine-tuning

### 4.2 Model Application After fine-tuning with your private dataset, you will obtain the local model weight files. If you need to use the fine-tuned model weights, simply modify the pipeline configuration file by replacing the local path of the fine-tuned model weights in the corresponding position of the pipeline configuration file: ```python ...... SubModules: LayoutDetection: module_name: layout_detection model_name: PP-DocLayout-L model_dir: null # 修改此处为微调后的版面检测模型权重的本地路径 ... SubPipelines: DocPreprocessor: ... SubModules: DocOrientationClassify: module_name: doc_text_orientation model_name: PP-LCNet_x1_0_doc_ori model_dir: null # 修改此处为微调后的文档图像方向分类模型权重的本地路径 ... SubModules: TextDetection: module_name: seal_text_detection model_name: PP-OCRv4_server_seal_det model_dir: null # Modify this to the local path of the fine-tuned text detection model weights ... TextRecognition: module_name: text_recognition model_name: PP-OCRv4_server_rec model_dir: null # Modify this to the local path of the fine-tuned text recognition model weights ... ``` Then, refer to the command-line or Python script methods in [2.2 Local Experience](#2-quick-start) to load the modified production line configuration file. ## 5. Multi-Hardware Support PaddleX supports a variety of mainstream hardware devices, including NVIDIA GPU, Kunlunxin XPU, Ascend NPU, and Cambricon MLU. Simply modify the `--device` parameter to seamlessly switch between different hardware devices. For example, if you use Ascend NPU for inference on the seal text recognition production line, the Python command would be: ```bash paddlex --pipeline seal_recognition \ --input seal_text_det.png \ --use_doc_orientation_classify False \ --use_doc_unwarping False \ --device npu:0 \ --save_path ./output ``` If you wish to use the seal text recognition pipeline on a wider variety of hardware, please refer to the [PaddleX Multi-Device Usage Guide](../../../other_devices_support/multi_devices_use_guide.en.md).