Commit History

Автор SHA1 Съобщение Дата
  myhloli c660fdc8f0 feat(llm): add LLM-aided formula and text correction преди 11 месеца
  myhloli c638fc5d1f fix(pdf): improve ligature handling and text extraction преди 11 месеца
  myhloli 74ee428bbb fix(dict2md): add space for inline equations in CJK contexts преди 11 месеца
  myhloli b80befe9cf refactor(mkcontent): optimize paragraph text merging and language detection преди 1 година
  myhloli c8cabb3cf6 feat(ocr_mkcontent): add language detection for line spacing преди 1 година
  myhloli 782e6571bc fix(ocr_mkcontent): handle empty paragraphs on pages преди 1 година
  myhloli 88c0854a65 refactor(ocr): improve text processing and span handling преди 1 година
  Xiaomeng Zhao 23c8436ef9 Merge pull request #1047 from myhloli/dev преди 1 година
  myhloli a07007e5e1 fix(ocr_mkcontent): improve hyphen handling at line ends преди 1 година
  icecraft b492c19c4c refactor: move some constants or enums defs to config folder преди 1 година
  myhloli 2de1d0ef05 fix(ocr_mkcontent): improve handling of single-character content преди 1 година
  myhloli bd75596219 fix(merge_text): add ligature replacement functionality преди 1 година
  myhloli 99cf160d1c fix(dict2md): improve text concatenation logic преди 1 година
  myhloli 87b9eeee59 fix(ocr): handle inline equations consistently with text content преди 1 година
  myhloli 7c03014c2a fix(ocr_mkcontent): improve content handling for different languages and equation types- Adjust content formatting for Chinese, Japanese, Korean, and Western languages преди 1 година
  myhloli faf8c286fb fix(magic_pdf): handle missing image_path in spans преди 1 година
  myhloli 0e8d5893eb feat(draw_bbox): update bounding box drawing for tables and images преди 1 година
  myhloli c34c9d21ef refactor(ocr): improve image and table block handling преди 1 година
  myhloli 644085760b fix(ocr_mkcontent): expand para_to_standard_format_v2 to handle list and index blocks преди 1 година
  myhloli fc49f5c446 refactor(magic_pdf): remove unused parameters and simplify functions преди 1 година
  myhloli 011a1b973b refactor(ocr):Increase the dilation factor in OCR to address the issue of word concatenation. преди 1 година
  myhloli 1f1dd3538d feat(list&index block): detect and merge list and index blocks преди 1 година
  Xiaomeng Zhao 98313d4a25 Merge branch 'dev' into content-list-not-drop преди 1 година
  myhloli 16699a9a70 fix(ocr_mkcontent): streamline drop reason handling преди 1 година
  myhloli 196de029a3 fix(ocr_mkcontent): correct drop mode handling for pages with drop reasons преди 1 година
  myhloli 37fbe998ac feat(ocr_mkcontent): support drop reason in none_with_reason modeEnable the `NONE_WITH_REASON` drop mode in `para_to_standard_format_v2` by updating the преди 1 година
  myhloli 6062862c96 feat(pipeline): pass language parameter for parsing and markdown conversion преди 1 година
  icecraft 03469909bb Feat/support footnote in figure (#532) преди 1 година
  yyy d714ac8b76 Release: Release 0.7.1 verison, update dev (#527) преди 1 година
  drunkpig 18e65be489 fix: delete hyphen at end of line преди 1 година