Эх сурвалжийг харах

refactor(pdf_extract_kit): update model config and weight paths for UniMERNet-0.2.0

Update the paths to model weights and configuration files for the UniMERNet architecture
in both the demo.yaml and model_configs.yaml files. Adjust the mfr_model_init function toreflect the new weight and configuration paths. The changes include specifying more detailed
paths to the unimernet_base directory and changing the weight file extension to .pth.
myhloli 1 жил өмнө
parent
commit
4f340c4429

+ 1 - 1
magic_pdf/model/pdf_extract_kit.py

@@ -58,7 +58,7 @@ def mfd_model_init(weight):
 def mfr_model_init(weight_dir, cfg_path, _device_='cpu'):
     args = argparse.Namespace(cfg_path=cfg_path, options=None)
     cfg = Config(args)
-    cfg.config.model.pretrained = os.path.join(weight_dir, "pytorch_model.bin")
+    cfg.config.model.pretrained = os.path.join(weight_dir, "pytorch_model.pth")
     cfg.config.model.model_config.model_name = weight_dir
     cfg.config.model.tokenizer_config.path = weight_dir
     task = tasks.setup_task(cfg)

+ 7 - 7
magic_pdf/resources/model_config/UniMERNet/demo.yaml

@@ -2,13 +2,13 @@ model:
   arch: unimernet
   model_type: unimernet
   model_config:
-    model_name: ./models
-    max_seq_len: 1024
-    length_aware: False
+    model_name: ./models/unimernet_base
+    max_seq_len: 1536
+
   load_pretrained: True
-  pretrained: ./models/pytorch_model.bin
+  pretrained: './models/unimernet_base/pytorch_model.pth'
   tokenizer_config:
-    path: ./models
+    path: ./models/unimernet_base
 
 datasets:
   formula_rec_eval:
@@ -18,7 +18,7 @@ datasets:
         image_size:
           - 192
           - 672
-   
+
 run:
   runner: runner_iter
   task: unimernet_train
@@ -43,4 +43,4 @@ run:
   distributed_type: ddp  # or fsdp when train llm
 
   generate_cfg:
-    temperature: 0.0
+    temperature: 0.0

+ 1 - 1
magic_pdf/resources/model_config/model_configs.yaml

@@ -10,6 +10,6 @@ config:
 weights:
   layout: Layout/model_final.pth
   mfd: MFD/weights.pt
-  mfr: MFR/UniMERNet
+  mfr: MFR/unimernet_base
   struct_eqtable: TabRec/StructEqTable
   TableMaster: TabRec/TableMaster