2 mēneši atpakaļ · 39eaf31fb9
--- a/docs/en/reference/output_files.md
+++ b/docs/en/reference/output_files.md
@@ -51,14 +51,16 @@ The following sections provide detailed descriptions of each file's purpose and
 
				 
			
 
				 ## Structured Data Files
			
 
				 
			
 
				-### Model Inference Results (model.json)
			
 
				+> [!IMPORTANT]
			
 
				+> The VLM backend output has significant changes in version 2.5 and is not backward-compatible with the pipeline backend. If you plan to build secondary development on structured outputs, please read this document carefully.
			
 
				 
			
 
				-> [!NOTE]
			
 
				-> Only applicable to pipeline backend
			
 
				+### Pipeline Backend Output Results
			
 
				+
			
 
				+#### Model Inference Results (model.json)
			
 
				 
			
 
				 **File naming format**: `{original_filename}_model.json`
			
 
				 
			
 
				-#### Data Structure Definition
			
 
				+##### Data Structure Definition
			
 
				 
			
 
				 ```python
			
 
				 from pydantic import BaseModel, Field
			
@@ -103,7 +105,7 @@ class PageInferenceResults(BaseModel):
 
				 inference_result: list[PageInferenceResults] = []
			
 
				 ```
			
 
				 
			
 
				-#### Coordinate System Description
			
 
				+##### Coordinate System Description
			
 
				 
			
 
				 `poly` coordinate format: `[x0, y0, x1, y1, x2, y2, x3, y3]`
			
 
				 
			
@@ -112,7 +114,7 @@ inference_result: list[PageInferenceResults] = []
 
				 
			
 
				 ![poly coordinate diagram](../images/poly.png)
			
 
				 
			
 
				-#### Sample Data
			
 
				+##### Sample Data
			
 
				 
			
 
				 ```json
			
 
				 [
			
@@ -165,54 +167,11 @@ inference_result: list[PageInferenceResults] = []
 
				 ]
			
 
				 ```
			
 
				 
			
 
				-### VLM Output Results (model.json)
			
 
				-
			
 
				-> [!NOTE]
			
 
				-> Only applicable to VLM backend
			
 
				-
			
 
				-**File naming format**: `{original_filename}_model.json`
			
 
				-
			
 
				-#### File Format Description
			
 
				-
			
 
				-- This file contains the raw output results from the VLM model, with two nested list layers: the outer layer represents pages, and the inner layer represents content blocks for each page
			
 
				-- Each content block is a dict containing `type`, `bbox`, `angle`, and `content` fields
			
 
				-
			
 
				-
			
 
				-#### Supported Content Types
			
 
				-
			
 
				-```json
			
 
				-{
			
 
				-    "text",
			
 
				-    "title", 
			
 
				-    "equation",
			
 
				-    "image",
			
 
				-    "image_caption",
			
 
				-    "image_footnote",
			
 
				-    "table",
			
 
				-    "table_caption",
			
 
				-    "table_footnote",
			
 
				-    "phonetic",
			
 
				-    "code",
			
 
				-    "code_caption",
			
 
				-    "ref_text",
			
 
				-    "algorithm",
			
 
				-    "list",
			
 
				-    "header",
			
 
				-    "footer",
			
 
				-    "page_number",
			
 
				-    "aside_text", 
			
 
				-    "page_footnote", 
			
 
				-}
			
 
				-```
			
 
				-
			
 
				-### Intermediate Processing Results (middle.json)
			
 
				-
			
 
				-> [!NOTE]
			
 
				-> Only applicable to pipeline backend
			
 
				+#### Intermediate Processing Results (middle.json)
			
 
				 
			
 
				 **File naming format**: `{original_filename}_middle.json`
			
 
				 
			
 
				-#### Top-level Structure
			
 
				+##### Top-level Structure
			
 
				 
			
 
				 | Field Name | Type | Description |
			
 
				 |------------|------|-------------|
			
@@ -220,22 +179,20 @@ inference_result: list[PageInferenceResults] = []
 
				 | `_backend` | `string` | Parsing mode: `pipeline` or `vlm` |
			
 
				 | `_version_name` | `string` | MinerU version number |
			
 
				 
			
 
				-#### Page Information Structure (pdf_info)
			
 
				+##### Page Information Structure (pdf_info)
			
 
				 
			
 
				 | Field Name | Description |
			
 
				 |------------|-------------|
			
 
				 | `preproc_blocks` | Unsegmented intermediate results after PDF preprocessing |
			
 
				-| `layout_bboxes` | Layout segmentation results, including layout direction and bounding boxes, sorted by reading order |
			
 
				 | `page_idx` | Page number, starting from 0 |
			
 
				 | `page_size` | Page width and height `[width, height]` |
			
 
				-| `_layout_tree` | Layout tree structure |
			
 
				 | `images` | Image block information list |
			
 
				 | `tables` | Table block information list |
			
 
				 | `interline_equations` | Interline formula block information list |
			
 
				 | `discarded_blocks` | Block information to be discarded |
			
 
				 | `para_blocks` | Content block results after segmentation |
			
 
				 
			
 
				-#### Block Structure Hierarchy
			
 
				+##### Block Structure Hierarchy
			
 
				 
			
 
				 ```
			
 
				 Level 1 blocks (table | image)
			
@@ -244,7 +201,7 @@ Level 1 blocks (table | image)
 
				         └── Spans
			
 
				 ```
			
 
				 
			
 
				-#### Level 1 Block Fields
			
 
				+##### Level 1 Block Fields
			
 
				 
			
 
				 | Field Name | Description |
			
 
				 |------------|-------------|
			
@@ -252,7 +209,7 @@ Level 1 blocks (table | image)
 
				 | `bbox` | Rectangular box coordinates of the block `[x0, y0, x1, y1]` |
			
 
				 | `blocks` | List of contained level 2 blocks |
			
 
				 
			
 
				-#### Level 2 Block Fields
			
 
				+##### Level 2 Block Fields
			
 
				 
			
 
				 | Field Name | Description |
			
 
				 |------------|-------------|
			
@@ -260,7 +217,7 @@ Level 1 blocks (table | image)
 
				 | `bbox` | Rectangular box coordinates of the block |
			
 
				 | `lines` | List of contained line information |
			
 
				 
			
 
				-#### Level 2 Block Types
			
 
				+##### Level 2 Block Types
			
 
				 
			
 
				 | Type | Description |
			
 
				 |------|-------------|
			
@@ -276,7 +233,7 @@ Level 1 blocks (table | image)
 
				 | `list` | List block |
			
 
				 | `interline_equation` | Interline formula block |
			
 
				 
			
 
				-#### Line and Span Structure
			
 
				+##### Line and Span Structure
			
 
				 
			
 
				 **Line fields**:
			
 
				 - `bbox`: Rectangular box coordinates of the line
			
@@ -287,7 +244,7 @@ Level 1 blocks (table | image)
 
				 - `type`: Span type (`image`, `table`, `text`, `inline_equation`, `interline_equation`)
			
 
				 - `content` | `img_path`: Text content or image path
			
 
				 
			
 
				-#### Sample Data
			
 
				+##### Sample Data
			
 
				 
			
 
				 ```json
			
 
				 {
			
@@ -390,18 +347,15 @@ Level 1 blocks (table | image)
 
				 }
			
 
				 ```
			
 
				 
			
 
				-### Content List (content_list.json)
			
 
				-
			
 
				-> [!NOTE]
			
 
				-> Only applicable to pipeline backend
			
 
				+#### Content List (content_list.json)
			
 
				 
			
 
				 **File naming format**: `{original_filename}_content_list.json`
			
 
				 
			
 
				-#### Functionality
			
 
				+##### Functionality
			
 
				 
			
 
				 This is a simplified version of `middle.json` that stores all readable content blocks in reading order as a flat structure, removing complex layout information for easier subsequent processing.
			
 
				 
			
 
				-#### Content Types
			
 
				+##### Content Types
			
 
				 
			
 
				 | Type | Description |
			
 
				 |------|-------------|
			
@@ -410,7 +364,7 @@ This is a simplified version of `middle.json` that stores all readable content b
 
				 | `text` | Text/Title |
			
 
				 | `equation` | Interline formula |
			
 
				 
			
 
				-#### Text Level Identification
			
 
				+##### Text Level Identification
			
 
				 
			
 
				 Text levels are distinguished through the `text_level` field:
			
 
				 
			
@@ -419,12 +373,12 @@ Text levels are distinguished through the `text_level` field:
 
				 - `text_level: 2`: Level 2 heading
			
 
				 - And so on...
			
 
				 
			
 
				-#### Common Fields
			
 
				+##### Common Fields
			
 
				 
			
 
				 - All content blocks include a `page_idx` field indicating the page number (starting from 0).
			
 
				 - All content blocks include a `bbox` field representing the bounding box coordinates of the content block `[x0, y0, x1, y1]`, mapped to a range of 0-1000.
			
 
				 
			
 
				-#### Sample Data
			
 
				+##### Sample Data
			
 
				 
			
 
				 ```json
			
 
				 [
			
@@ -489,6 +443,252 @@ Text levels are distinguished through the `text_level` field:
 
				 ]
			
 
				 ```
			
 
				 
			
 
				+### VLM Backend Output Results
			
 
				+
			
 
				+#### Model Inference Results (model.json)
			
 
				+
			
 
				+**File naming format**: `{original_filename}_model.json`
			
 
				+
			
 
				+##### File format description
			
 
				+- Two-level nested list: outer list = pages; inner list = content blocks of that page
			
 
				+- Each block is a dict with at least: `type`, `bbox`, `angle`, `content` (some types add extra fields like `score`, `block_tags`, `content_tags`, `format`)
			
 
				+- Designed for direct, raw model inspection
			
 
				+
			
 
				+##### Supported content types (type field values)
			
 
				+```json
			
 
				+{
			
 
				+  "text": "Plain text",
			
 
				+  "title": "Title",
			
 
				+  "equation": "Display (interline) formula",
			
 
				+  "image": "Image",
			
 
				+  "image_caption": "Image caption",
			
 
				+  "image_footnote": "Image footnote",
			
 
				+  "table": "Table",
			
 
				+  "table_caption": "Table caption",
			
 
				+  "table_footnote": "Table footnote",
			
 
				+  "phonetic": "Phonetic annotation",
			
 
				+  "code": "Code block",
			
 
				+  "code_caption": "Code caption",
			
 
				+  "ref_text": "Reference / citation entry",
			
 
				+  "algorithm": "Algorithm block (treated as code subtype)",
			
 
				+  "list": "List container",
			
 
				+  "header": "Page header",
			
 
				+  "footer": "Page footer",
			
 
				+  "page_number": "Page number",
			
 
				+  "aside_text": "Side / margin note",
			
 
				+  "page_footnote": "Page footnote"
			
 
				+}
			
 
				+```
			
 
				+
			
 
				+##### Coordinate system
			
 
				+- `bbox` = `[x0, y0, x1, y1]` (top-left, bottom-right)
			
 
				+- Origin at top-left of the page
			
 
				+- All coordinates are normalized percentages in `[0,1]`
			
 
				+
			
 
				+##### Sample data
			
 
				+```json
			
 
				+[
			
 
				+  [
			
 
				+    {
			
 
				+      "type": "header",
			
 
				+      "bbox": [0.077, 0.095, 0.18, 0.181],
			
 
				+      "angle": 0,
			
 
				+      "score": null,
			
 
				+      "block_tags": null,
			
 
				+      "content": "ELSEVIER",
			
 
				+      "format": null,
			
 
				+      "content_tags": null
			
 
				+    },
			
 
				+    {
			
 
				+      "type": "title",
			
 
				+      "bbox": [0.157, 0.228, 0.833, 0.253],
			
 
				+      "angle": 0,
			
 
				+      "score": null,
			
 
				+      "block_tags": null,
			
 
				+      "content": "The response of flow duration curves to afforestation",
			
 
				+      "format": null,
			
 
				+      "content_tags": null
			
 
				+    }
			
 
				+  ]
			
 
				+]
			
 
				+```
			
 
				+
			
 
				+#### Intermediate Processing Results (middle.json)
			
 
				+
			
 
				+**File naming format**: `{original_filename}_middle.json`
			
 
				+
			
 
				+Structure is broadly similar to the pipeline backend, but with these differences:
			
 
				+
			
 
				+1. `list` becomes a second‑level block; a new field `sub_type` distinguishes list categories:
			
 
				+   - `text`: ordinary list
			
 
				+   - `ref_text`: reference / bibliography style list
			
 
				+2. New `code` block type with `sub_type`:
			
 
				+   - `code`
			
 
				+   - `algorithm`
			
 
				+   A code block always has at least a `code_body`; it may optionally have a `code_caption`.
			
 
				+3. `discarded_blocks` may contain additional types: `header`, `footer`, `page_number`, `aside_text`, `page_footnote`.
			
 
				+4. All blocks include an `angle` field indicating rotation (one of `0, 90, 180, 270`).
			
 
				+
			
 
				+##### Examples
			
 
				+- Example: list block
			
 
				+    ```json
			
 
				+    {
			
 
				+      "bbox": [174,155,818,333],
			
 
				+      "type": "list",
			
 
				+      "angle": 0,
			
 
				+      "index": 11,
			
 
				+      "blocks": [
			
 
				+        {
			
 
				+          "bbox": [174,157,311,175],
			
 
				+          "type": "text",
			
 
				+          "angle": 0,
			
 
				+          "lines": [
			
 
				+            {
			
 
				+              "bbox": [174,157,311,175],
			
 
				+                "spans": [
			
 
				+                  {
			
 
				+                    "bbox": [174,157,311,175],
			
 
				+                    "type": "text",
			
 
				+                    "content": "H.1 Introduction"
			
 
				+                  }
			
 
				+                ]
			
 
				+            }
			
 
				+          ],
			
 
				+          "index": 3
			
 
				+        },
			
 
				+        {
			
 
				+          "bbox": [175,182,464,229],
			
 
				+          "type": "text",
			
 
				+          "angle": 0,
			
 
				+          "lines": [
			
 
				+            {
			
 
				+              "bbox": [175,182,464,229],
			
 
				+              "spans": [
			
 
				+                {
			
 
				+                  "bbox": [175,182,464,229],
			
 
				+                  "type": "text",
			
 
				+                  "content": "H.2 Example: Divide by Zero without Exception Handling"
			
 
				+                }
			
 
				+              ]
			
 
				+            }
			
 
				+          ],
			
 
				+          "index": 4
			
 
				+        }
			
 
				+      ],
			
 
				+      "sub_type": "text"
			
 
				+    }
			
 
				+    ```
			
 
				+
			
 
				+- Example: code block with optional caption:
			
 
				+    ```json
			
 
				+    {
			
 
				+      "type": "code",
			
 
				+      "bbox": [114,780,885,1231],
			
 
				+      "blocks": [
			
 
				+        {
			
 
				+          "bbox": [114,780,885,1231],
			
 
				+          "lines": [
			
 
				+            {
			
 
				+              "bbox": [114,780,885,1231],
			
 
				+              "spans": [
			
 
				+                {
			
 
				+                  "bbox": [114,780,885,1231],
			
 
				+                  "type": "text",
			
 
				+                  "content": "1 // Fig. H.1: DivideByZeroNoExceptionHandling.java  \n2 // Integer division without exception handling.  \n3 import java.util.Scanner;  \n4  \n5 public class DivideByZeroNoExceptionHandling  \n6 {  \n7 // demonstrates throwing an exception when a divide-by-zero occurs  \n8 public static int quotient( int numerator, int denominator )  \n9 {  \n10 return numerator / denominator; // possible division by zero  \n11 } // end method quotient  \n12  \n13 public static void main(String[] args)  \n14 {  \n15 Scanner scanner = new Scanner(System.in); // scanner for input  \n16  \n17 System.out.print(\"Please enter an integer numerator: \");  \n18 int numerator = scanner.nextInt();  \n19 System.out.print(\"Please enter an integer denominator: \");  \n20 int denominator = scanner.nextInt();  \n21"
			
 
				+                }
			
 
				+              ]
			
 
				+            }
			
 
				+          ],
			
 
				+          "index": 17,
			
 
				+          "angle": 0,
			
 
				+          "type": "code_body"
			
 
				+        },
			
 
				+        {
			
 
				+          "bbox": [867,160,1280,189],
			
 
				+          "lines": [
			
 
				+            {
			
 
				+              "bbox": [867,160,1280,189],
			
 
				+              "spans": [
			
 
				+                {
			
 
				+                  "bbox": [867,160,1280,189],
			
 
				+                  "type": "text",
			
 
				+                  "content": "Algorithm 1 Modules for MCTSteg"
			
 
				+                }
			
 
				+              ]
			
 
				+            }
			
 
				+          ],
			
 
				+          "index": 19,
			
 
				+          "angle": 0,
			
 
				+          "type": "code_caption"
			
 
				+        }
			
 
				+      ],
			
 
				+      "index": 17,
			
 
				+      "sub_type": "code"
			
 
				+    }
			
 
				+    ```
			
 
				+
			
 
				+#### Content List (content_list.json)
			
 
				+
			
 
				+**File naming format**: `{original_filename}_content_list.json`
			
 
				+
			
 
				+Based on the pipeline format, with these VLM-specific extensions:
			
 
				+
			
 
				+1. New `code` type with `sub_type` (`code` | `algorithm`):
			
 
				+   - Fields: `code_body` (string), optional `code_caption` (list of strings)
			
 
				+2. New `list` type with `sub_type` (`text` | `ref_text`):
			
 
				+   - Field: `list_items` (array of strings)
			
 
				+3. All `discarded_blocks` entries are also output (e.g., headers, footers, page numbers, margin notes, page footnotes).
			
 
				+4. Existing types (`image`, `table`, `text`, `equation`) remain unchanged.
			
 
				+5. `bbox` still uses the 0–1000 normalized coordinate mapping.
			
 
				+
			
 
				+
			
 
				+##### Examples
			
 
				+Example: code (algorithm) entry
			
 
				+```json
			
 
				+{
			
 
				+  "type": "code",
			
 
				+  "sub_type": "algorithm",
			
 
				+  "code_caption": ["Algorithm 1 Modules for MCTSteg"],
			
 
				+  "code_body": "1: function GETCOORDINATE(d)  \n2:  $x \\gets d / l$ ,  $y \\gets d$  mod  $l$   \n3: return  $(x, y)$   \n4: end function  \n5: function BESTCHILD(v)  \n6:  $C \\gets$  child set of  $v$   \n7:  $v' \\gets \\arg \\max_{c \\in C} \\mathrm{UCTScore}(c)$   \n8:  $v'.n \\gets v'.n + 1$   \n9: return  $v'$   \n10: end function  \n11: function BACK PROPAGATE(v)  \n12: Calculate  $R$  using Equation 11  \n13: while  $v$  is not a root node do  \n14:  $v.r \\gets v.r + R$ ,  $v \\gets v.p$   \n15: end while  \n16: end function  \n17: function RANDOMSEARCH(v)  \n18: while  $v$  is not a leaf node do  \n19: Randomly select an untried action  $a \\in A(v)$   \n20: Create a new node  $v'$   \n21:  $(x, y) \\gets \\mathrm{GETCOORDINATE}(v'.d)$   \n22:  $v'.p \\gets v$ ,  $v'.d \\gets v.d + 1$ ,  $v'.\\Gamma \\gets v.\\Gamma$   \n23:  $v'.\\gamma_{x,y} \\gets a$   \n24: if  $a = -1$  then  \n25:  $v.lc \\gets v'$   \n26: else if  $a = 0$  then  \n27:  $v.mc \\gets v'$   \n28: else  \n29:  $v.rc \\gets v'$   \n30: end if  \n31:  $v \\gets v'$   \n32: end while  \n33: return  $v$   \n34: end function  \n35: function SEARCH(v)  \n36: while  $v$  is fully expanded do  \n37:  $v \\gets$  BESTCHILD(v)  \n38: end while  \n39: if  $v$  is not a leaf node then  \n40:  $v \\gets$  RANDOMSEARCH(v)  \n41: end if  \n42: return  $v$   \n43: end function",
			
 
				+  "bbox": [510,87,881,740],
			
 
				+  "page_idx": 0
			
 
				+}
			
 
				+```
			
 
				+
			
 
				+Example: list (text) entry
			
 
				+```json
			
 
				+{
			
 
				+  "type": "list",
			
 
				+  "sub_type": "text",
			
 
				+  "list_items": [
			
 
				+    "H.1 Introduction",
			
 
				+    "H.2 Example: Divide by Zero without Exception Handling",
			
 
				+    "H.3 Example: Divide by Zero with Exception Handling",
			
 
				+    "H.4 Summary"
			
 
				+  ],
			
 
				+  "bbox": [174,155,818,333],
			
 
				+  "page_idx": 0
			
 
				+}
			
 
				+```
			
 
				+
			
 
				+Example: discarded blocks output
			
 
				+```json
			
 
				+[
			
 
				+  {
			
 
				+    "type": "header",
			
 
				+    "text": "Journal of Hydrology 310 (2005) 253-265",
			
 
				+    "bbox": [363,164,623,177],
			
 
				+    "page_idx": 0
			
 
				+  },
			
 
				+  {
			
 
				+    "type": "page_footnote",
			
 
				+    "text": "* Corresponding author. Address: Forest Science Centre, Department of Sustainability and Environment, P.O. Box 137, Heidelberg, Vic. 3084, Australia. Tel.: +61 3 9450 8719; fax: +61 3 9450 8644.",
			
 
				+    "bbox": [71,815,915,841],
			
 
				+    "page_idx": 0
			
 
				+  }
			
 
				+]
			
 
				+```
			
 
				+
			
 
				 ## Summary
			
 
				 
			
 
				 The above files constitute MinerU's complete output results. Users can choose appropriate files for subsequent processing based on their needs:
			
--- a/docs/zh/reference/output_files.md
+++ b/docs/zh/reference/output_files.md
@@ -51,14 +51,16 @@
 
				 
			
 
				 ## 结构化数据文件
			
 
				 
			
 
				-### 模型推理结果 (model.json)
			
 
				+> [!IMPORTANT]
			
 
				+> 2.5版本vlm后端的输出存在较大变化，与pipeline版本存在不兼容情况，如需基于结构化输出进行二次开发，请仔细阅读本文档内容。
			
 
				 
			
 
				-> [!NOTE]
			
 
				-> 仅适用于 pipeline 后端
			
 
				+### pipeline 后端 输出结果
			
 
				+
			
 
				+#### 模型推理结果 (model.json)
			
 
				 
			
 
				 **文件命名格式**：`{原文件名}_model.json`
			
 
				 
			
 
				-#### 数据结构定义
			
 
				+##### 数据结构定义
			
 
				 
			
 
				 ```python
			
 
				 from pydantic import BaseModel, Field
			
@@ -103,7 +105,7 @@ class PageInferenceResults(BaseModel):
 
				 inference_result: list[PageInferenceResults] = []
			
 
				 ```
			
 
				 
			
 
				-#### 坐标系统说明
			
 
				+##### 坐标系统说明
			
 
				 
			
 
				 `poly` 坐标格式：`[x0, y0, x1, y1, x2, y2, x3, y3]`
			
 
				 
			
@@ -112,7 +114,7 @@ inference_result: list[PageInferenceResults] = []
 
				 
			
 
				 ![poly 坐标示意图](../images/poly.png)
			
 
				 
			
 
				-#### 示例数据
			
 
				+##### 示例数据
			
 
				 
			
 
				 ```json
			
 
				 [
			
@@ -165,55 +167,11 @@ inference_result: list[PageInferenceResults] = []
 
				 ]
			
 
				 ```
			
 
				 
			
 
				-### VLM 输出结果 (model.json)
			
 
				-
			
 
				-> [!NOTE]
			
 
				-> 仅适用于 VLM 后端
			
 
				-
			
 
				-**文件命名格式**：`{原文件名}_model.json`
			
 
				-
			
 
				-#### 文件格式说明
			
 
				-
			
 
				-- 该文件为 VLM 模型的原始输出结果，包含两层嵌套list，外层表示页面，内层表示该页的内容块
			
 
				-- 每个内容块都是一个dict，包含 `type`、`bbox`、`angle`、`content` 字段
			
 
				-
			
 
				-
			
 
				-#### 支持的内容类型
			
 
				-
			
 
				-```json
			
 
				-{
			
 
				-    "text",
			
 
				-    "title", 
			
 
				-    "equation",
			
 
				-    "image",
			
 
				-    "image_caption",
			
 
				-    "image_footnote",
			
 
				-    "table",
			
 
				-    "table_caption",
			
 
				-    "table_footnote",
			
 
				-    "phonetic",
			
 
				-    "code",
			
 
				-    "code_caption",
			
 
				-    "ref_text",
			
 
				-    "algorithm",
			
 
				-    "list",
			
 
				-    "header",
			
 
				-    "footer",
			
 
				-    "page_number",
			
 
				-    "aside_text", 
			
 
				-    "page_footnote", 
			
 
				-}
			
 
				-```
			
 
				-
			
 
				-
			
 
				-### 中间处理结果 (middle.json)
			
 
				-
			
 
				-> [!NOTE]
			
 
				-> 仅适用于 pipeline 后端
			
 
				+#### 中间处理结果 (middle.json)
			
 
				 
			
 
				 **文件命名格式**：`{原文件名}_middle.json`
			
 
				 
			
 
				-#### 顶层结构
			
 
				+##### 顶层结构
			
 
				 
			
 
				 | 字段名 | 类型 | 说明 |
			
 
				 |--------|------|------|
			
@@ -221,22 +179,20 @@ inference_result: list[PageInferenceResults] = []
 
				 | `_backend` | `string` | 解析模式：`pipeline` 或 `vlm` |
			
 
				 | `_version_name` | `string` | MinerU 版本号 |
			
 
				 
			
 
				-#### 页面信息结构 (pdf_info)
			
 
				+##### 页面信息结构 (pdf_info)
			
 
				 
			
 
				 | 字段名 | 说明 |
			
 
				 |--------|------|
			
 
				 | `preproc_blocks` | PDF 预处理后的未分段中间结果 |
			
 
				-| `layout_bboxes` | 布局分割结果，包含布局方向和边界框，按阅读顺序排序 |
			
 
				 | `page_idx` | 页码，从 0 开始 |
			
 
				 | `page_size` | 页面的宽度和高度 `[width, height]` |
			
 
				-| `_layout_tree` | 布局树状结构 |
			
 
				 | `images` | 图片块信息列表 |
			
 
				 | `tables` | 表格块信息列表 |
			
 
				 | `interline_equations` | 行间公式块信息列表 |
			
 
				 | `discarded_blocks` | 需要丢弃的块信息 |
			
 
				 | `para_blocks` | 分段后的内容块结果 |
			
 
				 
			
 
				-#### 块结构层次
			
 
				+##### 块结构层次
			
 
				 
			
 
				 ```
			
 
				 一级块 (table | image)
			
@@ -245,7 +201,7 @@ inference_result: list[PageInferenceResults] = []
 
				         └── 片段 (span)
			
 
				 ```
			
 
				 
			
 
				-#### 一级块字段
			
 
				+##### 一级块字段
			
 
				 
			
 
				 | 字段名 | 说明 |
			
 
				 |--------|------|
			
@@ -253,7 +209,7 @@ inference_result: list[PageInferenceResults] = []
 
				 | `bbox` | 块的矩形框坐标 `[x0, y0, x1, y1]` |
			
 
				 | `blocks` | 包含的二级块列表 |
			
 
				 
			
 
				-#### 二级块字段
			
 
				+##### 二级块字段
			
 
				 
			
 
				 | 字段名 | 说明 |
			
 
				 |--------|------|
			
@@ -261,7 +217,7 @@ inference_result: list[PageInferenceResults] = []
 
				 | `bbox` | 块的矩形框坐标 |
			
 
				 | `lines` | 包含的行信息列表 |
			
 
				 
			
 
				-#### 二级块类型
			
 
				+##### 二级块类型
			
 
				 
			
 
				 | 类型 | 说明 |
			
 
				 |------|------|
			
@@ -277,7 +233,7 @@ inference_result: list[PageInferenceResults] = []
 
				 | `list` | 列表块 |
			
 
				 | `interline_equation` | 行间公式块 |
			
 
				 
			
 
				-#### 行和片段结构
			
 
				+##### 行和片段结构
			
 
				 
			
 
				 **行 (line) 字段**：
			
 
				 - `bbox`：行的矩形框坐标
			
@@ -288,7 +244,7 @@ inference_result: list[PageInferenceResults] = []
 
				 - `type`：片段类型（`image`、`table`、`text`、`inline_equation`、`interline_equation`）
			
 
				 - `content` | `img_path`：文本内容或图片路径
			
 
				 
			
 
				-#### 示例数据
			
 
				+##### 示例数据
			
 
				 
			
 
				 ```json
			
 
				 {
			
@@ -391,18 +347,15 @@ inference_result: list[PageInferenceResults] = []
 
				 }
			
 
				 ```
			
 
				 
			
 
				-### 内容列表 (content_list.json)
			
 
				-
			
 
				-> [!NOTE]
			
 
				-> 仅适用于 pipeline 后端
			
 
				+#### 内容列表 (content_list.json)
			
 
				 
			
 
				 **文件命名格式**：`{原文件名}_content_list.json`
			
 
				 
			
 
				-#### 功能说明
			
 
				+##### 功能说明
			
 
				 
			
 
				 这是一个简化版的 `middle.json`，按阅读顺序平铺存储所有可读内容块，去除了复杂的布局信息，便于后续处理。
			
 
				 
			
 
				-#### 内容类型
			
 
				+##### 内容类型
			
 
				 
			
 
				 | 类型 | 说明 |
			
 
				 |------|------|
			
@@ -411,7 +364,7 @@ inference_result: list[PageInferenceResults] = []
 
				 | `text` | 文本/标题 |
			
 
				 | `equation` | 行间公式 |
			
 
				 
			
 
				-#### 文本层级标识
			
 
				+##### 文本层级标识
			
 
				 
			
 
				 通过 `text_level` 字段区分文本层级：
			
 
				 
			
@@ -420,12 +373,12 @@ inference_result: list[PageInferenceResults] = []
 
				 - `text_level: 2`：二级标题
			
 
				 - 以此类推...
			
 
				 
			
 
				-#### 通用字段
			
 
				+##### 通用字段
			
 
				 
			
 
				 - 所有内容块都包含 `page_idx` 字段，表示所在页码（从 0 开始）。
			
 
				 - 所有内容块都包含 `bbox` 字段，表示内容块的边界框坐标 `[x0, y0, x1, y1]` 映射在0-1000范围内的结果。
			
 
				 
			
 
				-#### 示例数据
			
 
				+##### 示例数据
			
 
				 
			
 
				 ```json
			
 
				 [
			
@@ -490,6 +443,341 @@ inference_result: list[PageInferenceResults] = []
 
				 ]
			
 
				 ```
			
 
				 
			
 
				+### VLM 后端 输出结果
			
 
				+
			
 
				+#### 模型推理结果 (model.json)
			
 
				+
			
 
				+**文件命名格式**：`{原文件名}_model.json`
			
 
				+
			
 
				+##### 文件格式说明
			
 
				+
			
 
				+- 该文件为 VLM 模型的原始输出结果，包含两层嵌套list，外层表示页面，内层表示该页的内容块
			
 
				+- 每个内容块都是一个dict，包含 `type`、`bbox`、`angle`、`content` 字段
			
 
				+
			
 
				+
			
 
				+##### 支持的内容类型
			
 
				+
			
 
				+```json
			
 
				+{
			
 
				+    "text": "文本",
			
 
				+    "title": "标题", 
			
 
				+    "equation": "行间公式",
			
 
				+    "image": "图片",
			
 
				+    "image_caption": "图片描述",
			
 
				+    "image_footnote": "图片脚注",
			
 
				+    "table": "表格",
			
 
				+    "table_caption": "表格描述",
			
 
				+    "table_footnote": "表格脚注",
			
 
				+    "phonetic": "拼音",
			
 
				+    "code": "代码块",
			
 
				+    "code_caption": "代码描述",
			
 
				+    "ref_text": "参考文献",
			
 
				+    "algorithm": "算法块",
			
 
				+    "list": "列表",
			
 
				+    "header": "页眉",
			
 
				+    "footer": "页脚",
			
 
				+    "page_number": "页码",
			
 
				+    "aside_text": "装订线旁注", 
			
 
				+    "page_footnote": "页面脚注"
			
 
				+}
			
 
				+```
			
 
				+
			
 
				+##### 坐标系统说明
			
 
				+
			
 
				+`bbox` 坐标格式：`[x0, y0, x1, y1]`
			
 
				+
			
 
				+- 分别表示左上、右下两点的坐标
			
 
				+- 坐标原点在页面左上角
			
 
				+- 坐标为相对于原始页面尺寸的百分比，范围在0-1之间
			
 
				+
			
 
				+##### 示例数据
			
 
				+
			
 
				+```json
			
 
				+[
			
 
				+    [
			
 
				+        {
			
 
				+            "type": "header",
			
 
				+            "bbox": [
			
 
				+                0.077,
			
 
				+                0.095,
			
 
				+                0.18,
			
 
				+                0.181
			
 
				+            ],
			
 
				+            "angle": 0,
			
 
				+            "score": null,
			
 
				+            "block_tags": null,
			
 
				+            "content": "ELSEVIER",
			
 
				+            "format": null,
			
 
				+            "content_tags": null
			
 
				+        },
			
 
				+        {
			
 
				+            "type": "title",
			
 
				+            "bbox": [
			
 
				+                0.157,
			
 
				+                0.228,
			
 
				+                0.833,
			
 
				+                0.253
			
 
				+            ],
			
 
				+            "angle": 0,
			
 
				+            "score": null,
			
 
				+            "block_tags": null,
			
 
				+            "content": "The response of flow duration curves to afforestation",
			
 
				+            "format": null,
			
 
				+            "content_tags": null
			
 
				+        }
			
 
				+    ]
			
 
				+]
			
 
				+```
			
 
				+
			
 
				+#### 中间处理结果 (middle.json)
			
 
				+
			
 
				+**文件命名格式**：`{原文件名}_middle.json`
			
 
				+
			
 
				+vlm 后端的 middle.json 文件结构与 pipeline 后端类似，但存在以下差异： 
			
 
				+- list变成二级block，增加"sub_type"字段区分list类型，"sub_type"可选"text"（文本类型），"ref_text"（引用类型）
			
 
				+  - 示例数据
			
 
				+    ```json
			
 
				+    {
			
 
				+        "bbox": [
			
 
				+            174,
			
 
				+            155,
			
 
				+            818,
			
 
				+            333
			
 
				+        ],
			
 
				+        "type": "list",
			
 
				+        "angle": 0,
			
 
				+        "index": 11,
			
 
				+        "blocks": [
			
 
				+            {
			
 
				+                "bbox": [
			
 
				+                    174,
			
 
				+                    157,
			
 
				+                    311,
			
 
				+                    175
			
 
				+                ],
			
 
				+                "type": "text",
			
 
				+                "angle": 0,
			
 
				+                "lines": [
			
 
				+                    {
			
 
				+                        "bbox": [
			
 
				+                            174,
			
 
				+                            157,
			
 
				+                            311,
			
 
				+                            175
			
 
				+                        ],
			
 
				+                        "spans": [
			
 
				+                            {
			
 
				+                                "bbox": [
			
 
				+                                    174,
			
 
				+                                    157,
			
 
				+                                    311,
			
 
				+                                    175
			
 
				+                                ],
			
 
				+                                "type": "text",
			
 
				+                                "content": "H.1 Introduction"
			
 
				+                            }
			
 
				+                        ]
			
 
				+                    }
			
 
				+                ],
			
 
				+                "index": 3
			
 
				+            },
			
 
				+            {
			
 
				+                "bbox": [
			
 
				+                    175,
			
 
				+                    182,
			
 
				+                    464,
			
 
				+                    229
			
 
				+                ],
			
 
				+                "type": "text",
			
 
				+                "angle": 0,
			
 
				+                "lines": [
			
 
				+                    {
			
 
				+                        "bbox": [
			
 
				+                            175,
			
 
				+                            182,
			
 
				+                            464,
			
 
				+                            229
			
 
				+                        ],
			
 
				+                        "spans": [
			
 
				+                            {
			
 
				+                                "bbox": [
			
 
				+                                    175,
			
 
				+                                    182,
			
 
				+                                    464,
			
 
				+                                    229
			
 
				+                                ],
			
 
				+                                "type": "text",
			
 
				+                                "content": "H.2 Example: Divide by Zero without Exception Handling"
			
 
				+                            }
			
 
				+                        ]
			
 
				+                    }
			
 
				+                ],
			
 
				+                "index": 4
			
 
				+            }
			
 
				+        ],
			
 
				+        "sub_type": "text"
			
 
				+    }
			
 
				+    ```
			
 
				+- 增加code类型block，code类型包含两种"sub_type"，分别是"code"和"algorithm",至少有code_body，可选code_caption
			
 
				+  - 示例数据 
			
 
				+    ```json
			
 
				+    {
			
 
				+        "type": "code",
			
 
				+        "bbox": [
			
 
				+            114,
			
 
				+            780,
			
 
				+            885,
			
 
				+            1231
			
 
				+        ],
			
 
				+        "blocks": [
			
 
				+            {
			
 
				+                "bbox": [
			
 
				+                    114,
			
 
				+                    780,
			
 
				+                    885,
			
 
				+                    1231
			
 
				+                ],
			
 
				+                "lines": [
			
 
				+                    {
			
 
				+                        "bbox": [
			
 
				+                            114,
			
 
				+                            780,
			
 
				+                            885,
			
 
				+                            1231
			
 
				+                        ],
			
 
				+                        "spans": [
			
 
				+                            {
			
 
				+                                "bbox": [
			
 
				+                                    114,
			
 
				+                                    780,
			
 
				+                                    885,
			
 
				+                                    1231
			
 
				+                                ],
			
 
				+                                "type": "text",
			
 
				+                                "content": "1 // Fig. H.1: DivideByZeroNoExceptionHandling.java  \n2 // Integer division without exception handling.  \n3 import java.util.Scanner;  \n4  \n5 public class DivideByZeroNoExceptionHandling  \n6 {  \n7 // demonstrates throwing an exception when a divide-by-zero occurs  \n8 public static int quotient( int numerator, int denominator )  \n9 {  \n10 return numerator / denominator; // possible division by zero  \n11 } // end method quotient  \n12  \n13 public static void main(String[] args)  \n14 {  \n15 Scanner scanner = new Scanner(System.in); // scanner for input  \n16  \n17 System.out.print(\"Please enter an integer numerator: \");  \n18 int numerator = scanner.nextInt();  \n19 System.out.print(\"Please enter an integer denominator: \");  \n20 int denominator = scanner.nextInt();  \n21"
			
 
				+                            }
			
 
				+                        ]
			
 
				+                    }
			
 
				+                ],
			
 
				+                "index": 17,
			
 
				+                "angle": 0,
			
 
				+                "type": "code_body"
			
 
				+            },
			
 
				+            {
			
 
				+                "bbox": [
			
 
				+                    867,
			
 
				+                    160,
			
 
				+                    1280,
			
 
				+                    189
			
 
				+                ],
			
 
				+                "lines": [
			
 
				+                    {
			
 
				+                        "bbox": [
			
 
				+                            867,
			
 
				+                            160,
			
 
				+                            1280,
			
 
				+                            189
			
 
				+                        ],
			
 
				+                        "spans": [
			
 
				+                            {
			
 
				+                                "bbox": [
			
 
				+                                    867,
			
 
				+                                    160,
			
 
				+                                    1280,
			
 
				+                                    189
			
 
				+                                ],
			
 
				+                                "type": "text",
			
 
				+                                "content": "Algorithm 1 Modules for MCTSteg"
			
 
				+                            }
			
 
				+                        ]
			
 
				+                    }
			
 
				+                ],
			
 
				+                "index": 19,
			
 
				+                "angle": 0,
			
 
				+                "type": "code_caption"
			
 
				+            }
			
 
				+        ],
			
 
				+        "index": 17,
			
 
				+        "sub_type": "code"
			
 
				+    }
			
 
				+    ```
			
 
				+- `discarded_blocks`内元素type增加"header"、"footer"、"page_number"、"aside_text"、"page_footnote"类型
			
 
				+- 所有block增加`angle`字段，用来表示旋转角度，0，90，180，270
			
 
				+
			
 
				+#### 内容列表 (content_list.json)
			
 
				+
			
 
				+**文件命名格式**：`{原文件名}_content_list.json`
			
 
				+
			
 
				+vlm 后端的 content_list.json 文件结构与 pipeline 后端类似，伴随本次middle.json的变化，做了以下调整：
			
 
				+- 新增`code`类型，code类型包含两种"sub_type"，分别是"code"和"algorithm",至少有code_body，可选code_caption
			
 
				+  - 示例数据 
			
 
				+    ```json
			
 
				+    {
			
 
				+        "type": "code",
			
 
				+        "sub_type": "algorithm",
			
 
				+        "code_caption": [
			
 
				+            "Algorithm 1 Modules for MCTSteg"
			
 
				+        ],
			
 
				+        "code_body": "1: function GETCOORDINATE(d)  \n2:  $x \\gets d / l$ ,  $y \\gets d$  mod  $l$   \n3: return  $(x, y)$   \n4: end function  \n5: function BESTCHILD(v)  \n6:  $C \\gets$  child set of  $v$   \n7:  $v' \\gets \\arg \\max_{c \\in C} \\mathrm{UCTScore}(c)$   \n8:  $v'.n \\gets v'.n + 1$   \n9: return  $v'$   \n10: end function  \n11: function BACK PROPAGATE(v)  \n12: Calculate  $R$  using Equation 11  \n13: while  $v$  is not a root node do  \n14:  $v.r \\gets v.r + R$ ,  $v \\gets v.p$   \n15: end while  \n16: end function  \n17: function RANDOMSEARCH(v)  \n18: while  $v$  is not a leaf node do  \n19: Randomly select an untried action  $a \\in A(v)$   \n20: Create a new node  $v'$   \n21:  $(x, y) \\gets \\mathrm{GETCOORDINATE}(v'.d)$   \n22:  $v'.p \\gets v$ ,  $v'.d \\gets v.d + 1$ ,  $v'.\\Gamma \\gets v.\\Gamma$   \n23:  $v'.\\gamma_{x,y} \\gets a$   \n24: if  $a = -1$  then  \n25:  $v.lc \\gets v'$   \n26: else if  $a = 0$  then  \n27:  $v.mc \\gets v'$   \n28: else  \n29:  $v.rc \\gets v'$   \n30: end if  \n31:  $v \\gets v'$   \n32: end while  \n33: return  $v$   \n34: end function  \n35: function SEARCH(v)  \n36: while  $v$  is fully expanded do  \n37:  $v \\gets$  BESTCHILD(v)  \n38: end while  \n39: if  $v$  is not a leaf node then  \n40:  $v \\gets$  RANDOMSEARCH(v)  \n41: end if  \n42: return  $v$   \n43: end function",
			
 
				+        "bbox": [
			
 
				+            510,
			
 
				+            87,
			
 
				+            881,
			
 
				+            740
			
 
				+        ],
			
 
				+        "page_idx": 0
			
 
				+    }
			
 
				+    ```
			
 
				+- 新增`list`类型，list类型包含两种"sub_type"，分别是"text"和"ref_text"
			
 
				+  - 示例数据
			
 
				+    ```json
			
 
				+    {
			
 
				+        "type": "list",
			
 
				+        "sub_type": "text",
			
 
				+        "list_items": [
			
 
				+            "H.1 Introduction",
			
 
				+            "H.2 Example: Divide by Zero without Exception Handling",
			
 
				+            "H.3 Example: Divide by Zero with Exception Handling",
			
 
				+            "H.4 Summary"
			
 
				+        ],
			
 
				+        "bbox": [
			
 
				+            174,
			
 
				+            155,
			
 
				+            818,
			
 
				+            333
			
 
				+        ],
			
 
				+        "page_idx": 0
			
 
				+    }
			
 
				+    ```
			
 
				+- 增加所有所有`discarded_blocks`的输出内容
			
 
				+  - 示例数据 
			
 
				+    ```json
			
 
				+    [{
			
 
				+        "type": "header",
			
 
				+        "text": "Journal of Hydrology 310 (2005) 253-265",
			
 
				+        "bbox": [
			
 
				+            363,
			
 
				+            164,
			
 
				+            623,
			
 
				+            177
			
 
				+        ],
			
 
				+        "page_idx": 0
			
 
				+    },
			
 
				+    {
			
 
				+        "type": "page_footnote",
			
 
				+        "text": "* Corresponding author. Address: Forest Science Centre, Department of Sustainability and Environment, P.O. Box 137, Heidelberg, Vic. 3084, Australia. Tel.: +61 3 9450 8719; fax: +61 3 9450 8644.",
			
 
				+        "bbox": [
			
 
				+            71,
			
 
				+            815,
			
 
				+            915,
			
 
				+            841
			
 
				+        ],
			
 
				+        "page_idx": 0
			
 
				+    }]
			
 
				+    ```
			
 
				+
			
 
				+
			
 
				+
			
 
				 ## 总结
			
 
				 
			
 
				 以上文件为 MinerU 的完整输出结果，用户可根据需要选择合适的文件进行后续处理：