Explorar o código

fix: update FAQ entries for installation issues and error resolutions in FAQ_en_us.md and FAQ_zh_cn.md

myhloli hai 4 meses
pai
achega
1b6e5e89f1
Modificáronse 2 ficheiros con 10 adicións e 156 borrados
  1. 5 77
      docs/FAQ_en_us.md
  2. 5 79
      docs/FAQ_zh_cn.md

+ 5 - 77
docs/FAQ_en_us.md

@@ -1,33 +1,6 @@
 # Frequently Asked Questions
 
-### 1. When using the command `pip install magic-pdf[full]` on newer versions of macOS, the error `zsh: no matches found: magic-pdf[full]` occurs.
-
-On macOS, the default shell has switched from Bash to Z shell, which has special handling logic for certain types of string matching. This can lead to the "no matches found" error. You can try disabling the globbing feature in the command line and then run the installation command again.
-
-```bash
-setopt no_nomatch
-pip install magic-pdf[full]
-```
-
-### 2. Encountering the error `pickle.UnpicklingError: invalid load key, 'v'.` during use
-
-This might be due to an incomplete download of the model file. You can try re-downloading the model file and then try again.
-Reference: https://github.com/opendatalab/MinerU/issues/143
-
-### 3. Where should the model files be downloaded and how should the `/models-dir` configuration be set?
-
-The path for the model files is configured in "magic-pdf.json". just like:
-
-```json
-{
-  "models-dir": "/tmp/models"
-}
-```
-
-This path is an absolute path, not a relative path. You can obtain the absolute path in the models directory using the "pwd" command.
-Reference: https://github.com/opendatalab/MinerU/issues/155#issuecomment-2230216874
-
-### 4. Encountered the error `ImportError: libGL.so.1: cannot open shared object file: No such file or directory` in Ubuntu 22.04 on WSL2
+### 1. Encountered the error `ImportError: libGL.so.1: cannot open shared object file: No such file or directory` in Ubuntu 22.04 on WSL2
 
 The `libgl` library is missing in Ubuntu 22.04 on WSL2. You can install the `libgl` library with the following command to resolve the issue:
 
@@ -37,59 +10,14 @@ sudo apt-get install libgl1-mesa-glx
 
 Reference: https://github.com/opendatalab/MinerU/issues/388
 
-### 5. Encountered error `ModuleNotFoundError: No module named 'fairscale'`
-
-You need to uninstall the module and reinstall it:
-
-```bash
-pip uninstall fairscale
-pip install fairscale
-```
-
-Reference: https://github.com/opendatalab/MinerU/issues/411
-
-### 6. On some newer devices like the H100, the text parsed during OCR using CUDA acceleration is garbled.
 
-The compatibility of cuda11 with new graphics cards is poor, and the CUDA version used by Paddle needs to be upgraded.
-
-```bash
-pip install paddlepaddle-gpu==3.0.0b1 -i https://www.paddlepaddle.org.cn/packages/stable/cu123/
-```
-
-Reference: https://github.com/opendatalab/MinerU/issues/558
-
-### 7. On some Linux servers, the program immediately reports an error `Illegal instruction (core dumped)`
-
-This might be because the server's CPU does not support the AVX/AVX2 instruction set, or the CPU itself supports it but has been disabled by the system administrator. You can try contacting the system administrator to remove the restriction or change to a different server.
-
-References: https://github.com/opendatalab/MinerU/issues/591 , https://github.com/opendatalab/MinerU/issues/736
-
-
-### 8. Error when installing MinerU on CentOS 7 or Ubuntu 18: `ERROR: Failed building wheel for simsimd`
+### 2. Error when installing MinerU on CentOS 7 or Ubuntu 18: `ERROR: Failed building wheel for simsimd`
 
 The new version of albumentations (1.4.21) introduces a dependency on simsimd. Since the pre-built package of simsimd for Linux requires a glibc version greater than or equal to 2.28, this causes installation issues on some Linux distributions released before 2019. You can resolve this issue by using the following command:
 ```
-pip install -U magic-pdf[full,old_linux] --extra-index-url https://wheels.myhloli.com
+conda create -n mineru python=3.11 -y
+conda activate mineru
+pip install -U "mineru[pipeline_old_linux]"
 ```
 
 Reference: https://github.com/opendatalab/MinerU/issues/1004
-
-### 9. Old Graphics Cards Such as M40 Encounter "RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED"
-
-An error occurs during operation (cuda):
-```
-RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasGemmStridedBatchedEx(handle, opa, opb, (int)m, (int)n, (int)k, (void*)&falpha, a, CUDA_R_16BF, (int)lda, stridea, b, CUDA_R_16BF, (int)ldb, strideb, (void*)&fbeta, c, CUDA_R_16BF, (int)ldc, stridec, (int)num_batches, compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP)
-```
-Because BF16 precision is not supported on graphics cards before the Turing architecture and some graphics cards are not recognized by torch, it is necessary to manually disable BF16 precision.
-Modify the code in lines 287-290 of the "pdf_parse_union_core_v2.py" file (note that the location may vary in different versions):
-```
-if torch.cuda.is_bf16_supported():
-    supports_bfloat16 = True
-else:
-    supports_bfloat16 = False
-```
-Change it to:
-```
-supports_bfloat16 = False
-```
-Reference: https://github.com/opendatalab/MinerU/issues/1508

+ 5 - 79
docs/FAQ_zh_cn.md

@@ -1,35 +1,6 @@
 # 常见问题解答
 
-### 1.在较新版本的mac上使用命令安装pip install magic-pdf\[full\] zsh: no matches found: magic-pdf\[full\]
-
-在 macOS 上,默认的 shell 从 Bash 切换到了 Z shell,而 Z shell 对于某些类型的字符串匹配有特殊的处理逻辑,这可能导致no matches found错误。
-可以通过在命令行禁用globbing特性,再尝试运行安装命令
-
-```bash
-setopt no_nomatch
-pip install magic-pdf[full]
-```
-
-### 2.使用过程中遇到_pickle.UnpicklingError: invalid load key, 'v'.错误
-
-可能是由于模型文件未下载完整导致,可尝试重新下载模型文件后再试
-参考:https://github.com/opendatalab/MinerU/issues/143
-
-### 3.模型文件应该下载到哪里/models-dir的配置应该怎么填
-
-模型文件的路径输入是在"magic-pdf.json"中通过
-
-```json
-{
-  "models-dir": "/tmp/models"
-}
-```
-
-进行配置的。
-这个路径是绝对路径而不是相对路径,绝对路径的获取可在models目录中通过命令 "pwd" 获取。
-参考:https://github.com/opendatalab/MinerU/issues/155#issuecomment-2230216874
-
-### 4.在WSL2的Ubuntu22.04中遇到报错`ImportError: libGL.so.1: cannot open shared object file: No such file or directory`
+### 1.在WSL2的Ubuntu22.04中遇到报错`ImportError: libGL.so.1: cannot open shared object file: No such file or directory`
 
 WSL2的Ubuntu22.04中缺少`libgl`库,可通过以下命令安装`libgl`库解决:
 
@@ -39,59 +10,14 @@ sudo apt-get install libgl1-mesa-glx
 
 参考:https://github.com/opendatalab/MinerU/issues/388
 
-### 5.遇到报错 `ModuleNotFoundError : Nomodulenamed 'fairscale'`
-
-需要卸载该模块并重新安装
-
-```bash
-pip uninstall fairscale
-pip install fairscale
-```
-
-参考:https://github.com/opendatalab/MinerU/issues/411
 
-### 6.在部分较新的设备如H100上,使用CUDA加速OCR时解析出的文字乱码。
-
-cuda11对新显卡的兼容性不好,需要升级paddle使用的cuda版本
-
-```bash
-pip install paddlepaddle-gpu==3.0.0b1 -i https://www.paddlepaddle.org.cn/packages/stable/cu123/
-```
-参考:https://github.com/opendatalab/MinerU/issues/558
-
-### 7.在部分Linux服务器上,程序一运行就报错 `非法指令 (核心已转储)` 或 `Illegal instruction (core dumped)`
-
-可能是因为服务器CPU不支持AVX/AVX2指令集,或cpu本身支持但被运维禁用了,可以尝试联系运维解除限制或更换服务器。
-
-参考:https://github.com/opendatalab/MinerU/issues/591 , https://github.com/opendatalab/MinerU/issues/736
-
-### 8.在 CentOS 7 或 Ubuntu 18 系统安装MinerU时报错`ERROR: Failed building wheel for simsimd`
+### 2.在 CentOS 7 或 Ubuntu 18 系统安装MinerU时报错`ERROR: Failed building wheel for simsimd`
 
 新版本albumentations(1.4.21)引入了依赖simsimd,由于simsimd在linux的预编译包要求glibc的版本大于等于2.28,导致部分2019年之前发布的Linux发行版无法正常安装,可通过如下命令安装:
 ```
-pip install -U magic-pdf[full,old_linux] --extra-index-url https://wheels.myhloli.com
+conda create -n mineru python=3.11 -y
+conda activate mineru
+pip install -U "mineru[pipeline_old_linux]"
 ```
 
 参考:https://github.com/opendatalab/MinerU/issues/1004
-
-### 9. 旧显卡如M40出现 "RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED"
-
-在运行过程中(使用CUDA)出现以下错误:
-```
-RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasGemmStridedBatchedEx(handle, opa, opb, (int)m, (int)n, (int)k, (void*)&falpha, a, CUDA_R_16BF, (int)lda, stridea, b, CUDA_R_16BF, (int)ldb, strideb, (void*)&fbeta, c, CUDA_R_16BF, (int)ldc, stridec, (int)num_batches, compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP)
-```
-由于Turing架构之前的显卡不支持BF16精度,并且部分显卡未能被PyTorch正确识别,因此需要手动关闭BF16精度。
-
-请找到并修改`pdf_parse_union_core_v2.py`文件中的第287至290行代码(注意:不同版本中位置可能有所不同),原代码如下:
-```python
-if torch.cuda.is_bf16_supported():
-    supports_bfloat16 = True
-else:
-    supports_bfloat16 = False
-```
-将其修改为:
-```python
-supports_bfloat16 = False
-```
-
-参考:https://github.com/opendatalab/MinerU/issues/1508