| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144 |
- Inference Result
- ==================
- .. admonition:: Tip
- :class: tip
- Please first navigate to :doc:`tutorial/pipeline` to get an initial understanding of how the pipeline works; this will help in understanding the content of this section.
- The **InferenceResult** class is a container for storing model inference results and implements a series of methods related to these results, such as draw_model, dump_model.
- Checkout :doc:`../api/model_operators` for more details about **InferenceResult**
- Model Inference Result
- -----------------------
- Structure Definition
- ^^^^^^^^^^^^^^^^^^^^^^^^
- .. code:: python
- from pydantic import BaseModel, Field
- from enum import IntEnum
- class CategoryType(IntEnum):
- title = 0 # Title
- plain_text = 1 # Text
- abandon = 2 # Includes headers, footers, page numbers, and page annotations
- figure = 3 # Image
- figure_caption = 4 # Image description
- table = 5 # Table
- table_caption = 6 # Table description
- table_footnote = 7 # Table footnote
- isolate_formula = 8 # Block formula
- formula_caption = 9 # Formula label
- embedding = 13 # Inline formula
- isolated = 14 # Block formula
- text = 15 # OCR recognition result
- class PageInfo(BaseModel):
- page_no: int = Field(description="Page number, the first page is 0", ge=0)
- height: int = Field(description="Page height", gt=0)
- width: int = Field(description="Page width", ge=0)
- class ObjectInferenceResult(BaseModel):
- category_id: CategoryType = Field(description="Category", ge=0)
- poly: list[float] = Field(description="Quadrilateral coordinates, representing the coordinates of the top-left, top-right, bottom-right, and bottom-left points respectively")
- score: float = Field(description="Confidence of the inference result")
- latex: str | None = Field(description="LaTeX parsing result", default=None)
- html: str | None = Field(description="HTML parsing result", default=None)
- class PageInferenceResults(BaseModel):
- layout_dets: list[ObjectInferenceResult] = Field(description="Page recognition results", ge=0)
- page_info: PageInfo = Field(description="Page metadata")
- Example
- ^^^^^^^^^^^
- .. code:: json
- [
- {
- "layout_dets": [
- {
- "category_id": 2,
- "poly": [
- 99.1906967163086,
- 100.3119125366211,
- 730.3707885742188,
- 100.3119125366211,
- 730.3707885742188,
- 245.81326293945312,
- 99.1906967163086,
- 245.81326293945312
- ],
- "score": 0.9999997615814209
- }
- ],
- "page_info": {
- "page_no": 0,
- "height": 2339,
- "width": 1654
- }
- },
- {
- "layout_dets": [
- {
- "category_id": 5,
- "poly": [
- 99.13092803955078,
- 2210.680419921875,
- 497.3183898925781,
- 2210.680419921875,
- 497.3183898925781,
- 2264.78076171875,
- 99.13092803955078,
- 2264.78076171875
- ],
- "score": 0.9999997019767761
- }
- ],
- "page_info": {
- "page_no": 1,
- "height": 2339,
- "width": 1654
- }
- }
- ]
- The format of the poly coordinates is [x0, y0, x1, y1, x2, y2, x3, y3],
- representing the coordinates of the top-left, top-right, bottom-right,
- and bottom-left points respectively. |Poly Coordinate Diagram|
- Inference Result
- -------------------------
- .. code:: python
- from magic_pdf.operators.models import InferenceResult
- from magic_pdf.data.dataset import Dataset
- dataset : Dataset = some_data_set # not real dataset
- # The inference results of all pages, ordered by page number, are stored in a list as the inference results of MinerU
- model_inference_result: list[PageInferenceResults] = []
- Inference_result = InferenceResult(model_inference_result, dataset)
- some_model.pdf
- ^^^^^^^^^^^^^^^^^^^^
- .. figure:: ../_static/image/inference_result.png
- .. |Poly Coordinate Diagram| image:: ../_static/image/poly.png
|