zhengchun
/
MinerU


			
				
					
						
						
							12345678910111213141516171819
							Known Issues
============

-  Reading order is based on the model’s sorting of text distribution in
   space, which may become disordered under extremely complex layouts.
-  Vertical text is not supported.
-  Tables of contents and lists are recognized through rules; a few
   uncommon list formats may not be identified.
-  Only one level of headings is supported; hierarchical heading levels
   are currently not supported.
-  Code blocks are not yet supported in the layout model.
-  Comic books, art books, elementary school textbooks, and exercise
   books are not well-parsed yet
-  Enabling OCR may produce better results in PDFs with a high density
   of formulas
-  If you are processing PDFs with a large number of formulas, it is
   strongly recommended to enable the OCR function. When using PyMuPDF
   to extract text, overlapping text lines can occur, leading to
   inaccurate formula insertion positions.