For annotating table data, use the PPOCRLabelv2 tool. Detailed steps can be found in: 【Video Demonstration】
Table annotation focuses on structured extraction of table data, converting tables in images into Excel format. Therefore, annotation requires the use of an external software to open Excel simultaneously. In PPOCRLabel, complete the annotation of text information within the table (text and position), and in the Excel file, complete the annotation of table structure information. The recommended steps are:
Table Recognition button in the upper right corner of the software. The software will call the table recognition model in PP-Structure to automatically label the table, and simultaneously open an Excel file.Cell Re-recognition to automatically recognize the text within the cell using the model.View - Show Box Number to display the annotation box numbers. Drag all results under the Recognition Results column on the right side of the software interface to arrange the annotation box numbers in order from left to right and top to bottom, annotating by row.1), ensuring that the cell merging in Excel matches the original image (i.e., the text in Excel cells does not need to be identical to the text in the image).File - Export Table Annotation, and generate the gt.txt annotation file.The dataset structure and annotation format defined by PaddleX for table recognition tasks are as follows:
```ruby dataset_dir # Root directory of the dataset, the directory name can be changed ├── images # Directory for saving images, the directory name can be changed, but note the correspondence with the content of train.txt and val.txt ├── train.txt # Training set annotation file, the file name cannot be changed. Example content: {"filename": "images/border.jpg", "html": {"structure": {"tokens": ["
| 、自我 | ||
| Aghas | 失吴 | 月, |
| lonwyCau | 9 |