瀏覽代碼

fix(pdf-extract): adjust box threshold for OCR detection (#447)

Tuned the detection box threshold parameter in the OCR model initialization to improve the
accuracy of text extraction from images. The threshold was modified from 0.6 to
0.3 to filter out smaller detection boxes, which is expected to enhance the quality of the extracted
text by reducing noise and false positives in the OCR process.
Xiaomeng Zhao 1 年之前
父節點
當前提交
041b9465b9
共有 1 個文件被更改,包括 1 次插入1 次删除
  1. 1 1
      magic_pdf/model/pdf_extract_kit.py

+ 1 - 1
magic_pdf/model/pdf_extract_kit.py

@@ -139,7 +139,7 @@ class CustomPEKModel:
         )
         # 初始化ocr
         if self.apply_ocr:
-            self.ocr_model = ModifiedPaddleOCR(show_log=show_log)
+            self.ocr_model = ModifiedPaddleOCR(show_log=show_log, det_db_box_thresh=0.3)
 
         # init structeqtable
         if self.apply_table: