ソースを参照

Merge pull request #3186 from myhloli/dev

fix: update past_key_values handling to support custom sequence lengths
Xiaomeng Zhao 3 ヶ月 前
コミット
b231f4493c

+ 2 - 0
README.md

@@ -43,6 +43,8 @@
 </div>
 
 # Changelog
+- 2025/07/27 version 2.1.7 Released
+  - `transformers` 4.54.0 version adaptation
 - 2025/07/26 2.1.6 Released
   - Fixed table parsing issues in handwritten documents when using `vlm` backend
   - Fixed visualization box position drift issue when document is rotated #3175

+ 2 - 0
README_zh-CN.md

@@ -43,6 +43,8 @@
 </div>
 
 # 更新记录
+- 2025/07/27 2.1.7发布
+  - `transformers` 4.54.0 版本适配
 - 2025/07/26 2.1.6发布
   - 修复`vlm`后端解析部分手写文档时的表格异常问题
   - 修复文档旋转时可视化框位置漂移问题 #3175

+ 2 - 2
docker/china/Dockerfile

@@ -1,7 +1,7 @@
 # Use the official sglang image
-FROM lmsysorg/sglang:v0.4.9.post3-cu126
+FROM lmsysorg/sglang:v0.4.9.post4-cu126
 # For blackwell GPU, use the following line instead:
-# FROM lmsysorg/sglang:v0.4.9.post3-cu128-b200
+# FROM lmsysorg/sglang:v0.4.9.post4-cu128-b200
 
 # Install libgl for opencv support & Noto fonts for Chinese characters
 RUN apt-get update && \

+ 2 - 2
docker/global/Dockerfile

@@ -1,7 +1,7 @@
 # Use the official sglang image
-FROM lmsysorg/sglang:v0.4.9.post3-cu126
+FROM lmsysorg/sglang:v0.4.9.post4-cu126
 # For blackwell GPU, use the following line instead:
-# FROM lmsysorg/sglang:v0.4.9.post3-cu128-b200
+# FROM lmsysorg/sglang:v0.4.9.post4-cu128-b200
 
 # Install libgl for opencv support & Noto fonts for Chinese characters
 RUN apt-get update && \

+ 2 - 2
docs/en/quick_start/docker_deployment.md

@@ -10,8 +10,8 @@ docker build -t mineru-sglang:latest -f Dockerfile .
 ```
 
 > [!TIP]
-> The [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/global/Dockerfile) uses `lmsysorg/sglang:v0.4.9.post3-cu126` as the base image by default, supporting Turing/Ampere/Ada Lovelace/Hopper platforms.
-> If you are using the newer `Blackwell` platform, please modify the base image to `lmsysorg/sglang:v0.4.9.post3-cu128-b200` before executing the build operation.
+> The [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/global/Dockerfile) uses `lmsysorg/sglang:v0.4.9.post4-cu126` as the base image by default, supporting Turing/Ampere/Ada Lovelace/Hopper platforms.
+> If you are using the newer `Blackwell` platform, please modify the base image to `lmsysorg/sglang:v0.4.9.post4-cu128-b200` before executing the build operation.
 
 ## Docker Description
 

+ 2 - 2
docs/zh/quick_start/docker_deployment.md

@@ -10,8 +10,8 @@ docker build -t mineru-sglang:latest -f Dockerfile .
 ```
 
 > [!TIP]
-> [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/china/Dockerfile)默认使用`lmsysorg/sglang:v0.4.9.post3-cu126`作为基础镜像,支持Turing/Ampere/Ada Lovelace/Hopper平台,
-> 如您使用较新的`Blackwell`平台,请将基础镜像修改为`lmsysorg/sglang:v0.4.9.post3-cu128-b200` 再执行build操作。
+> [Dockerfile](https://github.com/opendatalab/MinerU/blob/master/docker/china/Dockerfile)默认使用`lmsysorg/sglang:v0.4.9.post4-cu126`作为基础镜像,支持Turing/Ampere/Ada Lovelace/Hopper平台,
+> 如您使用较新的`Blackwell`平台,请将基础镜像修改为`lmsysorg/sglang:v0.4.9.post4-cu128-b200` 再执行build操作。
 
 ## Docker说明
 

+ 11 - 2
mineru/model/mfr/unimernet/unimernet_hf/unimer_mbart/modeling_unimer_mbart.py

@@ -1416,7 +1416,11 @@ class UnimerMBartDecoder(UnimerMBartPreTrainedModel):
             raise ValueError("You have to specify either decoder_input_ids or decoder_inputs_embeds")
 
         # past_key_values_length
-        past_key_values_length = past_key_values[0][0].shape[2] if past_key_values is not None else 0
+        # past_key_values_length = past_key_values[0][0].shape[2] if past_key_values is not None else 0
+        past_key_values_length = 0
+        if past_key_values is not None:
+            if isinstance(past_key_values, (list, tuple)) and past_key_values:
+                past_key_values_length = past_key_values[0][0].shape[2]
 
         if inputs_embeds is None:
             inputs_embeds = self.embed_tokens(input_ids)
@@ -1501,7 +1505,12 @@ class UnimerMBartDecoder(UnimerMBartPreTrainedModel):
                 if dropout_probability < self.layerdrop:
                     continue
 
-            past_key_value = past_key_values[idx] if past_key_values is not None else None
+            # past_key_value = past_key_values[idx] if past_key_values is not None else None
+            past_key_value = past_key_values[idx] if (
+                    past_key_values is not None and
+                    isinstance(past_key_values, (list, tuple)) and
+                    idx < len(past_key_values)
+            ) else None
 
             if self.gradient_checkpointing and self.training:
                 layer_outputs = self._gradient_checkpointing_func(