소스 검색

fix(pdf-extract): adjust box threshold for OCR detection (#447)

Tuned the detection box threshold parameter in the OCR model initialization to improve the
accuracy of text extraction from images. The threshold was modified from 0.6 to
0.3 to filter out smaller detection boxes, which is expected to enhance the quality of the extracted
text by reducing noise and false positives in the OCR process.
Xiaomeng Zhao 1 년 전
부모
커밋
041b9465b9
1개의 변경된 파일1개의 추가작업 그리고 1개의 파일을 삭제
  1. 1 1
      magic_pdf/model/pdf_extract_kit.py

+ 1 - 1
magic_pdf/model/pdf_extract_kit.py

@@ -139,7 +139,7 @@ class CustomPEKModel:
         )
         # 初始化ocr
         if self.apply_ocr:
-            self.ocr_model = ModifiedPaddleOCR(show_log=show_log)
+            self.ocr_model = ModifiedPaddleOCR(show_log=show_log, det_db_box_thresh=0.3)
 
         # init structeqtable
         if self.apply_table: