Commit History

Autor SHA1 Mensaxe Data
  myhloli c46d3373de refactor(ocr_mkcontent): improve title level handling and formatting hai 8 meses
  myhloli df1b8f598f refactor(ocr_mkcontent): optimize full-width character handling hai 9 meses
  myhloli 315adbce38 feat(ocr_mkcontent): add full-width to half-width character conversion hai 9 meses
  myhloli 0a468eca6e feat(llm_aided): add title optimization feature hai 11 meses
  myhloli c660fdc8f0 feat(llm): add LLM-aided formula and text correction hai 11 meses
  myhloli c638fc5d1f fix(pdf): improve ligature handling and text extraction hai 11 meses
  myhloli 74ee428bbb fix(dict2md): add space for inline equations in CJK contexts hai 11 meses
  myhloli b80befe9cf refactor(mkcontent): optimize paragraph text merging and language detection hai 1 ano
  myhloli c8cabb3cf6 feat(ocr_mkcontent): add language detection for line spacing hai 1 ano
  myhloli 782e6571bc fix(ocr_mkcontent): handle empty paragraphs on pages hai 1 ano
  myhloli 88c0854a65 refactor(ocr): improve text processing and span handling hai 1 ano
  Xiaomeng Zhao 23c8436ef9 Merge pull request #1047 from myhloli/dev hai 1 ano
  myhloli a07007e5e1 fix(ocr_mkcontent): improve hyphen handling at line ends hai 1 ano
  icecraft b492c19c4c refactor: move some constants or enums defs to config folder hai 1 ano
  myhloli 2de1d0ef05 fix(ocr_mkcontent): improve handling of single-character content hai 1 ano
  myhloli bd75596219 fix(merge_text): add ligature replacement functionality hai 1 ano
  myhloli 99cf160d1c fix(dict2md): improve text concatenation logic hai 1 ano
  myhloli 87b9eeee59 fix(ocr): handle inline equations consistently with text content hai 1 ano
  myhloli 7c03014c2a fix(ocr_mkcontent): improve content handling for different languages and equation types- Adjust content formatting for Chinese, Japanese, Korean, and Western languages hai 1 ano
  myhloli faf8c286fb fix(magic_pdf): handle missing image_path in spans hai 1 ano
  myhloli 0e8d5893eb feat(draw_bbox): update bounding box drawing for tables and images hai 1 ano
  myhloli c34c9d21ef refactor(ocr): improve image and table block handling hai 1 ano
  myhloli 644085760b fix(ocr_mkcontent): expand para_to_standard_format_v2 to handle list and index blocks hai 1 ano
  myhloli fc49f5c446 refactor(magic_pdf): remove unused parameters and simplify functions hai 1 ano
  myhloli 011a1b973b refactor(ocr):Increase the dilation factor in OCR to address the issue of word concatenation. hai 1 ano
  myhloli 1f1dd3538d feat(list&index block): detect and merge list and index blocks hai 1 ano
  Xiaomeng Zhao 98313d4a25 Merge branch 'dev' into content-list-not-drop hai 1 ano
  myhloli 16699a9a70 fix(ocr_mkcontent): streamline drop reason handling hai 1 ano
  myhloli 196de029a3 fix(ocr_mkcontent): correct drop mode handling for pages with drop reasons hai 1 ano
  myhloli 37fbe998ac feat(ocr_mkcontent): support drop reason in none_with_reason modeEnable the `NONE_WITH_REASON` drop mode in `para_to_standard_format_v2` by updating the hai 1 ano