--- comments: true --- # Table Structure Recognition Module Development Tutorial ## I. Overview Table structure recognition is a crucial component in table recognition systems, converting non-editable table images into editable table formats (e.g., HTML). The goal of table structure recognition is to identify the rows, columns, and cell positions of tables. The performance of this module directly impacts the accuracy and efficiency of the entire table recognition system. The module typically outputs HTML or LaTeX code for the table area, which is then passed to the table content recognition module for further processing. ## II. Supported Model List
| Model | Model Download Link | Accuracy (%) | CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Size (M) | Description |
|---|---|---|---|---|---|---|
| SLANet | Inference Model/Trained Model | 59.52 | 103.08 / 103.08 | 197.99 / 197.99 | 6.9 M | SLANet is a table structure recognition model developed by Baidu PaddlePaddle Vision Team. The model significantly improves the accuracy and inference speed of table structure recognition by adopting a CPU-friendly lightweight backbone network PP-LCNet, a high-low-level feature fusion module CSP-PAN, and a feature decoding module SLA Head that aligns structural and positional information. |
| SLANet_plus | Inference Model/Trained Model | 63.69 | 140.29 / 140.29 | 195.39 / 195.39 | 6.9 M | SLANet_plus is an enhanced version of SLANet, a table structure recognition model developed by Baidu PaddlePaddle's Vision Team. Compared to SLANet, SLANet_plus significantly improves its recognition capabilities for wireless and complex tables, while reducing the model's sensitivity to the accuracy of table localization. Even when there are offsets in table localization, it can still perform relatively accurate recognition. |
| ', '', ' | |||
| ', ' | ', ' | ', ' | ', ' |
| ', ' | ', ' | ', ' | ', ' |
| ', ' | ', ' | ', ' | ', ' |
input_path: The path of the input image to be predictedboxes: Predicted table cell information, a list composed of several predicted table cell coordinates. Note that the table cell predictions from the SLANeXt series models are invalidstructure: Predicted table structure in HTML expressions, a list composed of several predicted HTML keywords in orderstructure_score: Confidence score of the predicted table structurecreate_model instantiates a table structure recognition model (here, SLANet is used as an example), with specific details as follows:
| Parameter | Parameter Description | Parameter Type | Options | Default Value |
|---|---|---|---|---|
model_name |
Name of the model | str |
All model names supported by PaddleX | None |
model_dir |
Path to store the model | str |
None | None |
model_name must be specified. After specifying model_name, the default model parameters from PaddleX will be used. If model_dir is specified, the user-defined model will be used.
* The predict() method of the table structure recognition model is called for inference and prediction. The predict() method has parameters input and batch_size, with specific details as follows:
| Parameter | Parameter Description | Parameter Type | Options | Default Value |
|---|---|---|---|---|
input |
Data to be predicted, supporting multiple input types | Python Var/str/dict/list |
|
None |
batch_size |
Batch size | int |
Any integer | 1 |
| Method | Method Description | Parameter | Parameter Type | Parameter Description | Default Value |
|---|---|---|---|---|---|
print() |
Print the result to the terminal | format_json |
bool |
Whether to format the output content using JSON indentation |
True |
indent |
int |
Specify the indentation level to beautify the output JSON data, making it more readable. It is only effective when format_json is True |
4 | ||
ensure_ascii |
bool |
Control whether to escape non-ASCII characters to Unicode. If set to True, all non-ASCII characters will be escaped; False will retain the original characters. It is only effective when format_json is True |
False |
||
save_to_json() |
Save the result as a JSON-formatted file | save_path |
str |
The path to save the file. If it is a directory, the saved file will have the same name as the input file | None |
indent |
int |
Specify the indentation level to beautify the output JSON data, making it more readable. It is only effective when format_json is True |
4 | ||
ensure_ascii |
bool |
Control whether to escape non-ASCII characters to Unicode. If set to True, all non-ASCII characters will be escaped; False will retain the original characters. It is only effective when format_json is True |
False |
| Attribute | Attribute Description |
|---|---|
json |
Get the prediction result in json format |
The specific content of the validation result file is:
{
"done_flag": true,
"check_pass": true,
"attributes": {
"train_samples": 2000,
"train_sample_paths": [
"../dataset/table_rec_dataset_examples/images/border_right_7384_X9UFEPKVMLALY7DDB11A.jpg",
"../dataset/table_rec_dataset_examples/images/border_top_13708_VE2DGBD4DCQU2ITLBTEA.jpg",
"../dataset/table_rec_dataset_examples/images/border_top_6490_14Z6ZN6G52GG4XA0K4XU.jpg",
"../dataset/table_rec_dataset_examples/images/border_top_14236_DG96EX0EDKIIDK8P6ENG.jpg",
"../dataset/table_rec_dataset_examples/images/border_19648_SV8B7X34RTYRAT2T5CPI.jpg",
"../dataset/table_rec_dataset_examples/images/border_bottom_7186_HODBC25HISMCSVKY0HJ9.jpg",
"../dataset/table_rec_dataset_examples/images/head_border_bottom_5773_4K4H9OVK9X9YVHE4Y1BQ.jpg",
"../dataset/table_rec_dataset_examples/images/border_7760_8C62CCH5T57QUGE0NTHZ.jpg",
"../dataset/table_rec_dataset_examples/images/border_bottom_15707_B1YVOU3X4NHHB6TL269O.jpg",
"../dataset/table_rec_dataset_examples/images/no_border_5223_HLG406UK35UD5EUYC2AV.jpg"
],
"val_samples": 100,
"val_sample_paths": [
"../dataset/table_rec_dataset_examples/images/border_2945_L7MSRHBZRW6Y347G39O6.jpg",
"../dataset/table_rec_dataset_examples/images/head_border_bottom_4825_LH9WI6X104CP3VFXPSON.jpg",
"../dataset/table_rec_dataset_examples/images/head_border_bottom_16837_79KHWU9WDM9ZQHNBGQAL.jpg",
"../dataset/table_rec_dataset_examples/images/border_bottom_10107_9ENLLC29SQ6XI8WZY53E.jpg",
"../dataset/table_rec_dataset_examples/images/border_top_16668_JIS0YFDZKTKETZIEKCKX.jpg",
"../dataset/table_rec_dataset_examples/images/border_18653_J9SSKHLFTRJD4J8W17OW.jpg",
"../dataset/table_rec_dataset_examples/images/border_bottom_8396_VJ3QJ3I0DP63P4JR77FE.jpg",
"../dataset/table_rec_dataset_examples/images/border_9017_K2V7QBWSU2BA4R3AJSO7.jpg",
"../dataset/table_rec_dataset_examples/images/border_top_19494_SDFMWP92NOB2OT7109FI.jpg",
"../dataset/table_rec_dataset_examples/images/no_border_288_6LK683JUCMOQ38V5BV29.jpg"
]
},
"analysis": {},
"dataset_path": "./dataset/table_rec_dataset_examples",
"show_type": "image",
"dataset_type": "PubTabTableRecDataset"
}
In the above validation results, check_pass being True indicates that the dataset format meets the requirements. Explanations for other indicators are as follows:
attributes.train_samples: The number of samples in the training set of this dataset is 2000;attributes.val_samples: The number of samples in the validation set of this dataset is 100;attributes.train_sample_paths: A list of relative paths to the visualization images of samples in the training set of this dataset;attributes.val_sample_paths: A list of relative paths to the visualization images of samples in the validation set of this dataset.(1) Dataset Format Conversion
Table structure recognition does not support data format conversion.
(2) Dataset Splitting
The dataset splitting parameters can be set by modifying the fields under CheckDataset in the configuration file. An example of part of the configuration file is shown below:
CheckDataset:split:enable: Whether to re-split the dataset. Set to True to enable dataset splitting, default is False;train_percent: If re-splitting the dataset, set the percentage of the training set. The type is any integer between 0-100, ensuring the sum with val_percent equals 100;For example, if you want to re-split the dataset with a 90% training set and a 10% validation set, modify the configuration file as follows:
......
CheckDataset:
......
split:
enable: True
train_percent: 90
val_percent: 10
......
Then execute the command:
python main.py -c paddlex/configs/modules/table_recognition/SLANet.yaml \
-o Global.mode=check_dataset \
-o Global.dataset_dir=./dataset/table_rec_dataset_examples
After the data splitting is executed, the original annotation files will be renamed to xxx.bak in their original paths.
The above parameters also support setting through appending command line arguments:
python main.py -c paddlex/configs/modules/table_recognition/SLANet.yaml \
-o Global.mode=check_dataset \
-o Global.dataset_dir=./dataset/table_rec_dataset_examples \
-o CheckDataset.split.enable=True \
-o CheckDataset.split.train_percent=90 \
-o CheckDataset.split.val_percent=10
output. If you need to specify a save path, you can set it through the -o Global.output field in the configuration file.After completing the model training, all outputs are saved in the specified output directory (default is ./output/), typically including:
train_result.json: Training result record file, recording whether the training task was completed normally, as well as the output weight metrics, related file paths, etc.;
train.log: Training log file, recording changes in model metrics and loss during training;config.yaml: Training configuration file, recording the hyperparameter configuration for this training session;.pdparams, .pdema, .pdopt.pdstate, .pdiparams, .pdmodel: Model weight-related files, including network parameters, optimizer, EMA, static graph network parameters, static graph network structure, etc.;When evaluating the model, you need to specify the model weights file path. Each configuration file has a default weight save path. If you need to change it, simply append the command line parameter to set it, such as -o Evaluate.weight_path=./output/best_accuracy/best_accuracy.pdparams.
After completing the model evaluation, an evaluate_result.json file will be produced, which records the evaluation results, specifically, whether the evaluation task was completed successfully and the model's evaluation metrics, including acc ;