After executing the mineru command, in addition to the main markdown file output, multiple auxiliary files are generated for debugging, quality inspection, and further processing. These files include:
The following sections provide detailed descriptions of each file's purpose and format.
File naming format: {original_filename}_layout.pdf
Functionality:
Use cases:
[!NOTE] Only applicable to pipeline backend
File naming format: {original_filename}_spans.pdf
Functionality:
Use cases:
[!NOTE] Only applicable to pipeline backend
File naming format: {original_filename}_model.json
from pydantic import BaseModel, Field
from enum import IntEnum
class CategoryType(IntEnum):
"""Content category enumeration"""
title = 0 # Title
plain_text = 1 # Text
abandon = 2 # Including headers, footers, page numbers, and page annotations
figure = 3 # Image
figure_caption = 4 # Image caption
table = 5 # Table
table_caption = 6 # Table caption
table_footnote = 7 # Table footnote
isolate_formula = 8 # Interline formula
formula_caption = 9 # Interline formula number
embedding = 13 # Inline formula
isolated = 14 # Interline formula
text = 15 # OCR recognition result
class PageInfo(BaseModel):
"""Page information"""
page_no: int = Field(description="Page number, first page is 0", ge=0)
height: int = Field(description="Page height", gt=0)
width: int = Field(description="Page width", ge=0)
class ObjectInferenceResult(BaseModel):
"""Object recognition result"""
category_id: CategoryType = Field(description="Category", ge=0)
poly: list[float] = Field(description="Quadrilateral coordinates, format: [x0,y0,x1,y1,x2,y2,x3,y3]")
score: float = Field(description="Confidence score of inference result")
latex: str | None = Field(description="LaTeX parsing result", default=None)
html: str | None = Field(description="HTML parsing result", default=None)
class PageInferenceResults(BaseModel):
"""Page inference results"""
layout_dets: list[ObjectInferenceResult] = Field(description="Page recognition results")
page_info: PageInfo = Field(description="Page metadata")
# Complete inference results
inference_result: list[PageInferenceResults] = []
poly coordinate format: [x0, y0, x1, y1, x2, y2, x3, y3]
[
{
"layout_dets": [
{
"category_id": 2,
"poly": [
99.1906967163086,
100.3119125366211,
730.3707885742188,
100.3119125366211,
730.3707885742188,
245.81326293945312,
99.1906967163086,
245.81326293945312
],
"score": 0.9999997615814209
}
],
"page_info": {
"page_no": 0,
"height": 2339,
"width": 1654
}
},
{
"layout_dets": [
{
"category_id": 5,
"poly": [
99.13092803955078,
2210.680419921875,
497.3183898925781,
2210.680419921875,
497.3183898925781,
2264.78076171875,
99.13092803955078,
2264.78076171875
],
"score": 0.9999997019767761
}
],
"page_info": {
"page_no": 1,
"height": 2339,
"width": 1654
}
}
]
[!NOTE] Only applicable to VLM backend
File naming format: {original_filename}_model_output.txt
---- to separate output results for each page<|box_start|> and ending with <|md_end|>| Tag | Format | Description |
|---|---|---|
| Bounding box | <\|box_start\|>x0 y0 x1 y1<\|box_end\|> |
Quadrilateral coordinates (top-left, bottom-right points), coordinate values after scaling page to 1000×1000 |
| Type tag | <\|ref_start\|>type<\|ref_end\|> |
Content block type identifier |
| Content | <\|md_start\|>markdown content<\|md_end\|> |
Markdown content of the block |
{
"text": "Text",
"title": "Title",
"image": "Image",
"image_caption": "Image caption",
"image_footnote": "Image footnote",
"table": "Table",
"table_caption": "Table caption",
"table_footnote": "Table footnote",
"equation": "Interline formula"
}
<|txt_contd|>: Appears at the end of text, indicating that this text block can be connected with subsequent text blocksotsl format and needs to be converted to HTML for rendering in MarkdownFile naming format: {original_filename}_middle.json
| Field Name | Type | Description |
|---|---|---|
pdf_info |
list[dict] |
Array of parsing results for each page |
_backend |
string |
Parsing mode: pipeline or vlm |
_version_name |
string |
MinerU version number |
| Field Name | Description |
|---|---|
preproc_blocks |
Unsegmented intermediate results after PDF preprocessing |
layout_bboxes |
Layout segmentation results, including layout direction and bounding boxes, sorted by reading order |
page_idx |
Page number, starting from 0 |
page_size |
Page width and height [width, height] |
_layout_tree |
Layout tree structure |
images |
Image block information list |
tables |
Table block information list |
interline_equations |
Interline formula block information list |
discarded_blocks |
Block information to be discarded |
para_blocks |
Content block results after segmentation |
Level 1 blocks (table | image)
└── Level 2 blocks
└── Lines
└── Spans
| Field Name | Description |
|---|---|
type |
Block type: table or image |
bbox |
Rectangular box coordinates of the block [x0, y0, x1, y1] |
blocks |
List of contained level 2 blocks |
| Field Name | Description |
|---|---|
type |
Block type (see table below) |
bbox |
Rectangular box coordinates of the block |
lines |
List of contained line information |
| Type | Description |
|---|---|
image_body |
Image body |
image_caption |
Image caption text |
image_footnote |
Image footnote |
table_body |
Table body |
table_caption |
Table caption text |
table_footnote |
Table footnote |
text |
Text block |
title |
Title block |
index |
Index block |
list |
List block |
interline_equation |
Interline formula block |
Line fields:
bbox: Rectangular box coordinates of the linespans: List of contained spansSpan fields:
bbox: Rectangular box coordinates of the spantype: Span type (image, table, text, inline_equation, interline_equation)content | img_path: Text content or image path{
"pdf_info": [
{
"preproc_blocks": [
{
"type": "text",
"bbox": [
52,
61.956024169921875,
294,
82.99800872802734
],
"lines": [
{
"bbox": [
52,
61.956024169921875,
294,
72.0000228881836
],
"spans": [
{
"bbox": [
54.0,
61.956024169921875,
296.2261657714844,
72.0000228881836
],
"content": "dependent on the service headway and the reliability of the departure ",
"type": "text",
"score": 1.0
}
]
}
]
}
],
"layout_bboxes": [
{
"layout_bbox": [
52,
61,
294,
731
],
"layout_label": "V",
"sub_layout": []
}
],
"page_idx": 0,
"page_size": [
612.0,
792.0
],
"_layout_tree": [],
"images": [],
"tables": [],
"interline_equations": [],
"discarded_blocks": [],
"para_blocks": [
{
"type": "text",
"bbox": [
52,
61.956024169921875,
294,
82.99800872802734
],
"lines": [
{
"bbox": [
52,
61.956024169921875,
294,
72.0000228881836
],
"spans": [
{
"bbox": [
54.0,
61.956024169921875,
296.2261657714844,
72.0000228881836
],
"content": "dependent on the service headway and the reliability of the departure ",
"type": "text",
"score": 1.0
}
]
}
]
}
]
}
],
"_backend": "pipeline",
"_version_name": "0.6.1"
}
File naming format: {original_filename}_content_list.json
This is a simplified version of middle.json that stores all readable content blocks in reading order as a flat structure, removing complex layout information for easier subsequent processing.
| Type | Description |
|---|---|
image |
Image |
table |
Table |
text |
Text/Title |
equation |
Interline formula |
Text levels are distinguished through the text_level field:
text_level or text_level: 0: Body texttext_level: 1: Level 1 headingtext_level: 2: Level 2 headingpage_idx field indicating the page number (starting from 0).bbox field representing the bounding box coordinates of the content block [x0, y0, x1, y1], mapped to a range of 0-1000.[
{
"type": "text",
"text": "The response of flow duration curves to afforestation ",
"text_level": 1,
"bbox": [
62,
480,
946,
904
],
"page_idx": 0
},
{
"type": "image",
"img_path": "images/a8ecda1c69b27e4f79fce1589175a9d721cbdc1cf78b4cc06a015f3746f6b9d8.jpg",
"img_caption": [
"Fig. 1. Annual flow duration curves of daily flows from Pine Creek, Australia, 1989–2000. "
],
"img_footnote": [],
"bbox": [
62,
480,
946,
904
],
"page_idx": 1
},
{
"type": "equation",
"img_path": "images/181ea56ef185060d04bf4e274685f3e072e922e7b839f093d482c29bf89b71e8.jpg",
"text": "$$\nQ _ { \\% } = f ( P ) + g ( T )\n$$",
"text_format": "latex",
"bbox": [
62,
480,
946,
904
],
"page_idx": 2
},
{
"type": "table",
"img_path": "images/e3cb413394a475e555807ffdad913435940ec637873d673ee1b039e3bc3496d0.jpg",
"table_caption": [
"Table 2 Significance of the rainfall and time terms "
],
"table_footnote": [
"indicates that the rainfall term was significant at the $5 \\%$ level, $T$ indicates that the time term was significant at the $5 \\%$ level, \\* represents significance at the $10 \\%$ level, and na denotes too few data points for meaningful analysis. "
],
"table_body": "<html><body><table><tr><td rowspan=\"2\">Site</td><td colspan=\"10\">Percentile</td></tr><tr><td>10</td><td>20</td><td>30</td><td>40</td><td>50</td><td>60</td><td>70</td><td>80</td><td>90</td><td>100</td></tr><tr><td>Traralgon Ck</td><td>P</td><td>P,*</td><td>P</td><td>P</td><td>P,</td><td>P,</td><td>P,</td><td>P,</td><td>P</td><td>P</td></tr><tr><td>Redhill</td><td>P,T</td><td>P,T</td><td>,*</td><td>**</td><td>P.T</td><td>P,*</td><td>P*</td><td>P*</td><td>*</td><td>,*</td></tr><tr><td>Pine Ck</td><td></td><td>P,T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>T</td><td>T</td><td>T</td><td>na</td><td>na</td></tr><tr><td>Stewarts Ck 5</td><td>P,T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>P.T</td><td>P.T</td><td>P,T</td><td>na</td><td>na</td><td>na</td></tr><tr><td>Glendhu 2</td><td>P</td><td>P,T</td><td>P,*</td><td>P,T</td><td>P.T</td><td>P,ns</td><td>P,T</td><td>P,T</td><td>P,T</td><td>P,T</td></tr><tr><td>Cathedral Peak 2</td><td>P,T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>*,T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>T</td></tr><tr><td>Cathedral Peak 3</td><td>P.T</td><td>P.T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>T</td></tr><tr><td>Lambrechtsbos A</td><td>P,T</td><td>P</td><td>P</td><td>P,T</td><td>*,T</td><td>*,T</td><td>*,T</td><td>*,T</td><td>*,T</td><td>T</td></tr><tr><td>Lambrechtsbos B</td><td>P,T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>T</td><td>T</td></tr><tr><td>Biesievlei</td><td>P,T</td><td>P.T</td><td>P,T</td><td>P,T</td><td>*,T</td><td>*,T</td><td>T</td><td>T</td><td>P,T</td><td>P,T</td></tr></table></body></html>",
"bbox": [
62,
480,
946,
904
],
"page_idx": 5
}
]
The above files constitute MinerU's complete output results. Users can choose appropriate files for subsequent processing based on their needs: