Browse Source

feat: update output_files.md to include new block types and fields for code and list structures

myhloli 2 months ago
parent
commit
c2208d84cb
2 changed files with 23 additions and 23 deletions
  1. 16 16
      docs/en/reference/output_files.md
  2. 7 7
      docs/zh/reference/output_files.md

+ 16 - 16
docs/en/reference/output_files.md

@@ -519,15 +519,15 @@ Text levels are distinguished through the `text_level` field:
 
 Structure is broadly similar to the pipeline backend, but with these differences:
 
-1. `list` becomes a second‑level block; a new field `sub_type` distinguishes list categories:
-   - `text`: ordinary list
-   - `ref_text`: reference / bibliography style list
-2. New `code` block type with `sub_type`:
-   - `code`
-   - `algorithm`
-   A code block always has at least a `code_body`; it may optionally have a `code_caption`.
-3. `discarded_blocks` may contain additional types: `header`, `footer`, `page_number`, `aside_text`, `page_footnote`.
-4. All blocks include an `angle` field indicating rotation (one of `0, 90, 180, 270`).
+- 1. `list` becomes a second‑level block; a new field `sub_type` distinguishes list categories:
+  - `text`: ordinary list
+  - `ref_text`: reference / bibliography style list
+- 2. New `code` block type with `sub_type`:
+  - `code`
+  - `algorithm`
+  A code block always has at least a `code_body`; it may optionally have a `code_caption`.
+- 3. `discarded_blocks` may contain additional types: `header`, `footer`, `page_number`, `aside_text`, `page_footnote`.
+- 4. All blocks include an `angle` field indicating rotation (one of `0, 90, 180, 270`).
 
 ##### Examples
 - Example: list block
@@ -633,13 +633,13 @@ Structure is broadly similar to the pipeline backend, but with these differences
 
 Based on the pipeline format, with these VLM-specific extensions:
 
-1. New `code` type with `sub_type` (`code` | `algorithm`):
-   - Fields: `code_body` (string), optional `code_caption` (list of strings)
-2. New `list` type with `sub_type` (`text` | `ref_text`):
-   - Field: `list_items` (array of strings)
-3. All `discarded_blocks` entries are also output (e.g., headers, footers, page numbers, margin notes, page footnotes).
-4. Existing types (`image`, `table`, `text`, `equation`) remain unchanged.
-5. `bbox` still uses the 0–1000 normalized coordinate mapping.
+- 1. New `code` type with `sub_type` (`code` | `algorithm`):
+  - Fields: `code_body` (string), optional `code_caption` (list of strings)
+- 2. New `list` type with `sub_type` (`text` | `ref_text`):
+  - Field: `list_items` (array of strings)
+- 3. All `discarded_blocks` entries are also output (e.g., headers, footers, page numbers, margin notes, page footnotes).
+- 4. Existing types (`image`, `table`, `text`, `equation`) remain unchanged.
+- 5. `bbox` still uses the 0–1000 normalized coordinate mapping.
 
 
 ##### Examples

+ 7 - 7
docs/zh/reference/output_files.md

@@ -535,10 +535,10 @@ inference_result: list[PageInferenceResults] = []
 
 ##### 文件格式说明
 vlm 后端的 middle.json 文件结构与 pipeline 后端类似,但存在以下差异: 
-1. list变成二级block,增加"sub_type"字段区分list类型,"sub_type"可选"text"(文本类型),"ref_text"(引用类型)
-2. 增加code类型block,code类型包含两种"sub_type",分别是"code"和"algorithm",至少有code_body,可选code_caption
-3. `discarded_blocks`内元素type增加"header"、"footer"、"page_number"、"aside_text"、"page_footnote"类型
-4. 所有block增加`angle`字段,用来表示旋转角度,0,90,180,270
+- 1. list变成二级block,增加"sub_type"字段区分list类型,"sub_type"可选"text"(文本类型),"ref_text"(引用类型)
+- 2. 增加code类型block,code类型包含两种"sub_type",分别是"code"和"algorithm",至少有code_body,可选code_caption
+- 3. `discarded_blocks`内元素type增加"header"、"footer"、"page_number"、"aside_text"、"page_footnote"类型
+- 4. 所有block增加`angle`字段,用来表示旋转角度,0,90,180,270
 
 
 ##### 示例数据
@@ -714,9 +714,9 @@ vlm 后端的 middle.json 文件结构与 pipeline 后端类似,但存在以
 
 ##### 文件格式说明
 vlm 后端的 content_list.json 文件结构与 pipeline 后端类似,伴随本次middle.json的变化,做了以下调整:
-1. 新增`code`类型,code类型包含两种"sub_type",分别是"code"和"algorithm",至少有code_body,可选code_caption
-2. 新增`list`类型,list类型包含两种"sub_type",分别是"text"和"ref_text" 
-3. 增加所有所有`discarded_blocks`的输出内容
+- 1. 新增`code`类型,code类型包含两种"sub_type",分别是"code"和"algorithm",至少有code_body,可选code_caption
+- 2. 新增`list`类型,list类型包含两种"sub_type",分别是"text"和"ref_text" 
+- 3. 增加所有所有`discarded_blocks`的输出内容
 
 ##### 示例数据
 - code 类型 content