After executing the mineru command, in addition to the main markdown file output, multiple auxiliary files are generated for debugging, quality inspection, and further processing. These files include:
The following sections provide detailed descriptions of each file's purpose and format.
File naming format: {original_filename}_layout.pdf
Functionality:
Use cases:
[!NOTE] Only applicable to pipeline backend
File naming format: {original_filename}_spans.pdf
Functionality:
Use cases:
[!NOTE] Only applicable to pipeline backend
File naming format: {original_filename}_model.json
from pydantic import BaseModel, Field
from enum import IntEnum
class CategoryType(IntEnum):
"""Content category enumeration"""
title = 0 # Title
plain_text = 1 # Text
abandon = 2 # Including headers, footers, page numbers, and page annotations
figure = 3 # Image
figure_caption = 4 # Image caption
table = 5 # Table
table_caption = 6 # Table caption
table_footnote = 7 # Table footnote
isolate_formula = 8 # Interline formula
formula_caption = 9 # Interline formula number
embedding = 13 # Inline formula
isolated = 14 # Interline formula
text = 15 # OCR recognition result
class PageInfo(BaseModel):
"""Page information"""
page_no: int = Field(description="Page number, first page is 0", ge=0)
height: int = Field(description="Page height", gt=0)
width: int = Field(description="Page width", ge=0)
class ObjectInferenceResult(BaseModel):
"""Object recognition result"""
category_id: CategoryType = Field(description="Category", ge=0)
poly: list[float] = Field(description="Quadrilateral coordinates, format: [x0,y0,x1,y1,x2,y2,x3,y3]")
score: float = Field(description="Confidence score of inference result")
latex: str | None = Field(description="LaTeX parsing result", default=None)
html: str | None = Field(description="HTML parsing result", default=None)
class PageInferenceResults(BaseModel):
"""Page inference results"""
layout_dets: list[ObjectInferenceResult] = Field(description="Page recognition results")
page_info: PageInfo = Field(description="Page metadata")
# Complete inference results
inference_result: list[PageInferenceResults] = []
poly coordinate format: [x0, y0, x1, y1, x2, y2, x3, y3]
[
{
"layout_dets": [
{
"category_id": 2,
"poly": [
99.1906967163086,
100.3119125366211,
730.3707885742188,
100.3119125366211,
730.3707885742188,
245.81326293945312,
99.1906967163086,
245.81326293945312
],
"score": 0.9999997615814209
}
],
"page_info": {
"page_no": 0,
"height": 2339,
"width": 1654
}
},
{
"layout_dets": [
{
"category_id": 5,
"poly": [
99.13092803955078,
2210.680419921875,
497.3183898925781,
2210.680419921875,
497.3183898925781,
2264.78076171875,
99.13092803955078,
2264.78076171875
],
"score": 0.9999997019767761
}
],
"page_info": {
"page_no": 1,
"height": 2339,
"width": 1654
}
}
]
[!NOTE] Only applicable to VLM backend
File naming format: {original_filename}_model.json
type, bbox, angle, and content fields{
"text",
"title",
"equation",
"image",
"image_caption",
"image_footnote",
"table",
"table_caption",
"table_footnote",
"phonetic",
"code",
"code_caption",
"ref_text",
"algorithm",
"list",
"header",
"footer",
"page_number",
"aside_text",
"page_footnote",
}
[!NOTE] Only applicable to pipeline backend
File naming format: {original_filename}_middle.json
| Field Name | Type | Description |
|---|---|---|
pdf_info |
list[dict] |
Array of parsing results for each page |
_backend |
string |
Parsing mode: pipeline or vlm |
_version_name |
string |
MinerU version number |
| Field Name | Description |
|---|---|
preproc_blocks |
Unsegmented intermediate results after PDF preprocessing |
layout_bboxes |
Layout segmentation results, including layout direction and bounding boxes, sorted by reading order |
page_idx |
Page number, starting from 0 |
page_size |
Page width and height [width, height] |
_layout_tree |
Layout tree structure |
images |
Image block information list |
tables |
Table block information list |
interline_equations |
Interline formula block information list |
discarded_blocks |
Block information to be discarded |
para_blocks |
Content block results after segmentation |
Level 1 blocks (table | image)
└── Level 2 blocks
└── Lines
└── Spans
| Field Name | Description |
|---|---|
type |
Block type: table or image |
bbox |
Rectangular box coordinates of the block [x0, y0, x1, y1] |
blocks |
List of contained level 2 blocks |
| Field Name | Description |
|---|---|
type |
Block type (see table below) |
bbox |
Rectangular box coordinates of the block |
lines |
List of contained line information |
| Type | Description |
|---|---|
image_body |
Image body |
image_caption |
Image caption text |
image_footnote |
Image footnote |
table_body |
Table body |
table_caption |
Table caption text |
table_footnote |
Table footnote |
text |
Text block |
title |
Title block |
index |
Index block |
list |
List block |
interline_equation |
Interline formula block |
Line fields:
bbox: Rectangular box coordinates of the linespans: List of contained spansSpan fields:
bbox: Rectangular box coordinates of the spantype: Span type (image, table, text, inline_equation, interline_equation)content | img_path: Text content or image path{
"pdf_info": [
{
"preproc_blocks": [
{
"type": "text",
"bbox": [
52,
61.956024169921875,
294,
82.99800872802734
],
"lines": [
{
"bbox": [
52,
61.956024169921875,
294,
72.0000228881836
],
"spans": [
{
"bbox": [
54.0,
61.956024169921875,
296.2261657714844,
72.0000228881836
],
"content": "dependent on the service headway and the reliability of the departure ",
"type": "text",
"score": 1.0
}
]
}
]
}
],
"layout_bboxes": [
{
"layout_bbox": [
52,
61,
294,
731
],
"layout_label": "V",
"sub_layout": []
}
],
"page_idx": 0,
"page_size": [
612.0,
792.0
],
"_layout_tree": [],
"images": [],
"tables": [],
"interline_equations": [],
"discarded_blocks": [],
"para_blocks": [
{
"type": "text",
"bbox": [
52,
61.956024169921875,
294,
82.99800872802734
],
"lines": [
{
"bbox": [
52,
61.956024169921875,
294,
72.0000228881836
],
"spans": [
{
"bbox": [
54.0,
61.956024169921875,
296.2261657714844,
72.0000228881836
],
"content": "dependent on the service headway and the reliability of the departure ",
"type": "text",
"score": 1.0
}
]
}
]
}
]
}
],
"_backend": "pipeline",
"_version_name": "0.6.1"
}
[!NOTE] Only applicable to pipeline backend
File naming format: {original_filename}_content_list.json
This is a simplified version of middle.json that stores all readable content blocks in reading order as a flat structure, removing complex layout information for easier subsequent processing.
| Type | Description |
|---|---|
image |
Image |
table |
Table |
text |
Text/Title |
equation |
Interline formula |
Text levels are distinguished through the text_level field:
text_level or text_level: 0: Body texttext_level: 1: Level 1 headingtext_level: 2: Level 2 headingpage_idx field indicating the page number (starting from 0).bbox field representing the bounding box coordinates of the content block [x0, y0, x1, y1], mapped to a range of 0-1000.[
{
"type": "text",
"text": "The response of flow duration curves to afforestation ",
"text_level": 1,
"bbox": [
62,
480,
946,
904
],
"page_idx": 0
},
{
"type": "image",
"img_path": "images/a8ecda1c69b27e4f79fce1589175a9d721cbdc1cf78b4cc06a015f3746f6b9d8.jpg",
"img_caption": [
"Fig. 1. Annual flow duration curves of daily flows from Pine Creek, Australia, 1989–2000. "
],
"img_footnote": [],
"bbox": [
62,
480,
946,
904
],
"page_idx": 1
},
{
"type": "equation",
"img_path": "images/181ea56ef185060d04bf4e274685f3e072e922e7b839f093d482c29bf89b71e8.jpg",
"text": "$$\nQ _ { \\% } = f ( P ) + g ( T )\n$$",
"text_format": "latex",
"bbox": [
62,
480,
946,
904
],
"page_idx": 2
},
{
"type": "table",
"img_path": "images/e3cb413394a475e555807ffdad913435940ec637873d673ee1b039e3bc3496d0.jpg",
"table_caption": [
"Table 2 Significance of the rainfall and time terms "
],
"table_footnote": [
"indicates that the rainfall term was significant at the $5 \\%$ level, $T$ indicates that the time term was significant at the $5 \\%$ level, \\* represents significance at the $10 \\%$ level, and na denotes too few data points for meaningful analysis. "
],
"table_body": "<html><body><table><tr><td rowspan=\"2\">Site</td><td colspan=\"10\">Percentile</td></tr><tr><td>10</td><td>20</td><td>30</td><td>40</td><td>50</td><td>60</td><td>70</td><td>80</td><td>90</td><td>100</td></tr><tr><td>Traralgon Ck</td><td>P</td><td>P,*</td><td>P</td><td>P</td><td>P,</td><td>P,</td><td>P,</td><td>P,</td><td>P</td><td>P</td></tr><tr><td>Redhill</td><td>P,T</td><td>P,T</td><td>,*</td><td>**</td><td>P.T</td><td>P,*</td><td>P*</td><td>P*</td><td>*</td><td>,*</td></tr><tr><td>Pine Ck</td><td></td><td>P,T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>T</td><td>T</td><td>T</td><td>na</td><td>na</td></tr><tr><td>Stewarts Ck 5</td><td>P,T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>P.T</td><td>P.T</td><td>P,T</td><td>na</td><td>na</td><td>na</td></tr><tr><td>Glendhu 2</td><td>P</td><td>P,T</td><td>P,*</td><td>P,T</td><td>P.T</td><td>P,ns</td><td>P,T</td><td>P,T</td><td>P,T</td><td>P,T</td></tr><tr><td>Cathedral Peak 2</td><td>P,T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>*,T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>T</td></tr><tr><td>Cathedral Peak 3</td><td>P.T</td><td>P.T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>T</td></tr><tr><td>Lambrechtsbos A</td><td>P,T</td><td>P</td><td>P</td><td>P,T</td><td>*,T</td><td>*,T</td><td>*,T</td><td>*,T</td><td>*,T</td><td>T</td></tr><tr><td>Lambrechtsbos B</td><td>P,T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>P,T</td><td>T</td><td>T</td></tr><tr><td>Biesievlei</td><td>P,T</td><td>P.T</td><td>P,T</td><td>P,T</td><td>*,T</td><td>*,T</td><td>T</td><td>T</td><td>P,T</td><td>P,T</td></tr></table></body></html>",
"bbox": [
62,
480,
946,
904
],
"page_idx": 5
}
]
The above files constitute MinerU's complete output results. Users can choose appropriate files for subsequent processing based on their needs: