Commit History

Author SHA1 Message Date
  myhloli 87b9eeee59 fix(ocr): handle inline equations consistently with text content 1 year ago
  myhloli 7c03014c2a fix(ocr_mkcontent): improve content handling for different languages and equation types- Adjust content formatting for Chinese, Japanese, Korean, and Western languages 1 year ago
  myhloli faf8c286fb fix(magic_pdf): handle missing image_path in spans 1 year ago
  myhloli 0e8d5893eb feat(draw_bbox): update bounding box drawing for tables and images 1 year ago
  myhloli c34c9d21ef refactor(ocr): improve image and table block handling 1 year ago
  myhloli 644085760b fix(ocr_mkcontent): expand para_to_standard_format_v2 to handle list and index blocks 1 year ago
  myhloli fc49f5c446 refactor(magic_pdf): remove unused parameters and simplify functions 1 year ago
  myhloli 011a1b973b refactor(ocr):Increase the dilation factor in OCR to address the issue of word concatenation. 1 year ago
  myhloli 1f1dd3538d feat(list&index block): detect and merge list and index blocks 1 year ago
  Xiaomeng Zhao 98313d4a25 Merge branch 'dev' into content-list-not-drop 1 year ago
  myhloli 16699a9a70 fix(ocr_mkcontent): streamline drop reason handling 1 year ago
  myhloli 196de029a3 fix(ocr_mkcontent): correct drop mode handling for pages with drop reasons 1 year ago
  myhloli 37fbe998ac feat(ocr_mkcontent): support drop reason in none_with_reason modeEnable the `NONE_WITH_REASON` drop mode in `para_to_standard_format_v2` by updating the 1 year ago
  myhloli 6062862c96 feat(pipeline): pass language parameter for parsing and markdown conversion 1 year ago
  icecraft 03469909bb Feat/support footnote in figure (#532) 1 year ago
  yyy d714ac8b76 Release: Release 0.7.1 verison, update dev (#527) 1 year ago
  drunkpig 18e65be489 fix: delete hyphen at end of line 1 year ago
  drunkpig 83e0d55a34 fix: replace \u0002, \u0003 in common text (#521) 1 year ago
  Xiaomeng Zhao dd19f59eb6 fix(ocr_mkcontent): revise table caption output (#397) 1 year ago
  Xiaomeng Zhao 66e3ce9c4a fix(ocr_mkcontent): improve language detection and content formatting (#458) 1 year ago
  liukaiwen ec7271faee fix table recognition bug#321 1 year ago
  myhloli 0998d22a32 fix(ocr_mkcontent): add spaces around inline equation in content 1 year ago
  Kaiwen Liu 37925f36d9 feat(model inference): add table recognition and conversion to LaTeX (#284) 1 year ago
  myhloli a5c35165ee feat(dict2md): add page index to para content for standard format v2 1 year ago
  myhloli ff13c8e115 fix(mkmarkdown): add 2 space after image and table URLs 1 year ago
  赵小蒙 5de013e6d5 fix:use line_lang instead of content_lang to concatenate para 1 year ago
  赵小蒙 6199e608d4 add union_make logic 1 year ago
  liukaiwen 503b9fad3e 解决标题后空格丢失 1 year ago
  赵小蒙 f01cb89f01 fix lost image or table bug 1 year ago
  赵小蒙 e980d2efa0 fix UNIPipe and spans space with language 1 year ago