Forráskód Böngészése

fix(pdf-extract): adjust box threshold for OCR detection (#447)

Tuned the detection box threshold parameter in the OCR model initialization to improve the
accuracy of text extraction from images. The threshold was modified from 0.6 to
0.3 to filter out smaller detection boxes, which is expected to enhance the quality of the extracted
text by reducing noise and false positives in the OCR process.
Xiaomeng Zhao 1 éve
szülő
commit
041b9465b9
1 módosított fájl, 1 hozzáadás és 1 törlés
  1. 1 1
      magic_pdf/model/pdf_extract_kit.py

+ 1 - 1
magic_pdf/model/pdf_extract_kit.py

@@ -139,7 +139,7 @@ class CustomPEKModel:
         )
         # 初始化ocr
         if self.apply_ocr:
-            self.ocr_model = ModifiedPaddleOCR(show_log=show_log)
+            self.ocr_model = ModifiedPaddleOCR(show_log=show_log, det_db_box_thresh=0.3)
 
         # init structeqtable
         if self.apply_table: