瀏覽代碼

perf(pdf_extract_kit): conditional memory cleanup based on GPU capacity

- Introduce a conditional memory cleanup step in the PDF extraction process
- Assess available GPU memory before deciding to perform memory cleanup- Log the time taken for garbage collection when it occurs
- This optimization helps to balance performance and resource utilization
myhloli 1 年之前
父節點
當前提交
fb9949c44f
共有 1 個文件被更改,包括 10 次插入2 次删除
  1. 10 2
      magic_pdf/model/pdf_extract_kit.py

+ 10 - 2
magic_pdf/model/pdf_extract_kit.py

@@ -314,7 +314,8 @@ class CustomPEKModel:
             mfr_res = []
             for mf_img in dataloader:
                 mf_img = mf_img.to(self.device)
-                output = self.mfr_model.generate({'image': mf_img})
+                with torch.no_grad():
+                    output = self.mfr_model.generate({'image': mf_img})
                 mfr_res.extend(output['pred_str'])
             for res, latex in zip(latex_filling_list, mfr_res):
                 res['latex'] = latex_rm_whitespace(latex)
@@ -336,7 +337,14 @@ class CustomPEKModel:
             elif int(res['category_id']) in [5]:
                 table_res_list.append(res)
 
-        clean_memory()
+        if torch.cuda.is_available():
+            properties = torch.cuda.get_device_properties(self.device)
+            total_memory = properties.total_memory / (1024 ** 3)  # 将字节转换为 GB
+            if total_memory <= 8:
+                gc_start = time.time()
+                clean_memory()
+                gc_time = round(time.time() - gc_start, 2)
+                logger.info(f"gc time: {gc_time}")
 
         # ocr识别
         if self.apply_ocr: