Просмотр исходного кода

tip the change of model source (#4192)

Tingquan Gao 5 месяцев назад
Родитель
Сommit
1d522169e3
38 измененных файлов с 298 добавлено и 214 удалено
  1. 2 0
      docs/module_usage/tutorials/ocr_modules/doc_img_orientation_classification.en.md
  2. 2 0
      docs/module_usage/tutorials/ocr_modules/doc_img_orientation_classification.md
  3. 2 0
      docs/module_usage/tutorials/ocr_modules/formula_recognition.en.md
  4. 3 0
      docs/module_usage/tutorials/ocr_modules/formula_recognition.md
  5. 2 0
      docs/module_usage/tutorials/ocr_modules/layout_detection.en.md
  6. 2 0
      docs/module_usage/tutorials/ocr_modules/layout_detection.md
  7. 2 0
      docs/module_usage/tutorials/ocr_modules/seal_text_detection.en.md
  8. 2 0
      docs/module_usage/tutorials/ocr_modules/seal_text_detection.md
  9. 2 0
      docs/module_usage/tutorials/ocr_modules/table_cells_detection.en.md
  10. 2 0
      docs/module_usage/tutorials/ocr_modules/table_cells_detection.md
  11. 2 0
      docs/module_usage/tutorials/ocr_modules/table_classification.en.md
  12. 2 0
      docs/module_usage/tutorials/ocr_modules/table_classification.md
  13. 2 0
      docs/module_usage/tutorials/ocr_modules/table_structure_recognition.en.md
  14. 2 0
      docs/module_usage/tutorials/ocr_modules/table_structure_recognition.md
  15. 2 0
      docs/module_usage/tutorials/ocr_modules/text_detection.en.md
  16. 2 0
      docs/module_usage/tutorials/ocr_modules/text_detection.md
  17. 2 0
      docs/module_usage/tutorials/ocr_modules/text_image_unwarping.en.md
  18. 2 0
      docs/module_usage/tutorials/ocr_modules/text_image_unwarping.md
  19. 3 0
      docs/module_usage/tutorials/ocr_modules/text_recognition.en.md
  20. 2 0
      docs/module_usage/tutorials/ocr_modules/text_recognition.md
  21. 2 0
      docs/module_usage/tutorials/ocr_modules/textline_orientation_classification.en.md
  22. 2 0
      docs/module_usage/tutorials/ocr_modules/textline_orientation_classification.md
  23. 20 18
      docs/pipeline_usage/tutorials/ocr_pipelines/OCR.en.md
  24. 21 18
      docs/pipeline_usage/tutorials/ocr_pipelines/OCR.md
  25. 2 0
      docs/pipeline_usage/tutorials/ocr_pipelines/PP-StructureV3.en.md
  26. 3 0
      docs/pipeline_usage/tutorials/ocr_pipelines/PP-StructureV3.md
  27. 3 0
      docs/pipeline_usage/tutorials/ocr_pipelines/doc_preprocessor.en.md
  28. 2 0
      docs/pipeline_usage/tutorials/ocr_pipelines/doc_preprocessor.md
  29. 2 0
      docs/pipeline_usage/tutorials/ocr_pipelines/formula_recognition.en.md
  30. 3 0
      docs/pipeline_usage/tutorials/ocr_pipelines/formula_recognition.md
  31. 3 0
      docs/pipeline_usage/tutorials/ocr_pipelines/layout_parsing.en.md
  32. 3 0
      docs/pipeline_usage/tutorials/ocr_pipelines/layout_parsing.md
  33. 91 89
      docs/pipeline_usage/tutorials/ocr_pipelines/seal_recognition.en.md
  34. 91 89
      docs/pipeline_usage/tutorials/ocr_pipelines/seal_recognition.md
  35. 2 0
      docs/pipeline_usage/tutorials/ocr_pipelines/table_recognition.en.md
  36. 2 0
      docs/pipeline_usage/tutorials/ocr_pipelines/table_recognition.md
  37. 2 0
      docs/pipeline_usage/tutorials/ocr_pipelines/table_recognition_v2.en.md
  38. 2 0
      docs/pipeline_usage/tutorials/ocr_pipelines/table_recognition_v2.md

+ 2 - 0
docs/module_usage/tutorials/ocr_modules/doc_img_orientation_classification.en.md

@@ -93,6 +93,8 @@ for res in output:
     res.save_to_json("./output/res.json")
 ```
 
+<b>Note: </b>The official models would be download from HuggingFace by default. If can't access to HuggingFace, please set the environment variable `PADDLE_PDX_MODEL_SOURCE="BOS"` to change the model source to BOS. In the future, more model sources will be supported.
+
 After running, the result obtained is:
 
 ```

+ 2 - 0
docs/module_usage/tutorials/ocr_modules/doc_img_orientation_classification.md

@@ -93,6 +93,8 @@ for res in output:
     res.save_to_json("./output/res.json")
 ```
 
+<b>注:</b>PaddleX 官方模型默认从 HuggingFace 获取,如运行环境访问 HuggingFace 不便,可通过环境变量修改模型源为 BOS:`PADDLE_PDX_MODEL_SOURCE="BOS"`,未来将支持更多主流模型源;
+
 运行后,得到的结果为:
 ```bash
 {'res': {'input_path': 'test_imgs/img_rot180_demo.jpg', 'page_index': None, 'class_ids': array([2], dtype=int32), 'scores': array([0.88164], dtype=float32), 'label_names': ['180']}}

+ 2 - 0
docs/module_usage/tutorials/ocr_modules/formula_recognition.en.md

@@ -140,6 +140,8 @@ for res in output:
     res.save_to_json(save_path="./output/res.json")
 ```
 
+<b>Note: </b>The official models would be download from HuggingFace by default. If can't access to HuggingFace, please set the environment variable `PADDLE_PDX_MODEL_SOURCE="BOS"` to change the model source to BOS. In the future, more model sources will be supported.
+
 After running, the result obtained is:
 
 ````

+ 3 - 0
docs/module_usage/tutorials/ocr_modules/formula_recognition.md

@@ -138,6 +138,9 @@ for res in output:
     res.save_to_img(save_path="./output/")
     res.save_to_json(save_path="./output/res.json")
 ```
+
+<b>注:</b>PaddleX 官方模型默认从 HuggingFace 获取,如运行环境访问 HuggingFace 不便,可通过环境变量修改模型源为 BOS:`PADDLE_PDX_MODEL_SOURCE="BOS"`,未来将支持更多主流模型源;
+
 运行后,得到的结果为:
 ```bash
 {'res': {'input_path': 'general_formula_rec_001.png', 'page_index': None, 'rec_formula': '\\zeta_{0}(\\nu)=-\\frac{\\nu\\varrho^{-2\\nu}}{\\pi}\\int_{\\mu}^{\\infty}d\\omega\\int_{C_{+}}d z\\frac{2z^{2}}{(z^{2}+\\omega^{2})^{\\nu+1}}\\breve{\\Psi}(\\omega;z)e^{i\\epsilon z}\\quad,'}}

+ 2 - 0
docs/module_usage/tutorials/ocr_modules/layout_detection.en.md

@@ -303,6 +303,8 @@ for res in output:
 
 ```
 
+<b>Note: </b>The official models would be download from HuggingFace by default. If can't access to HuggingFace, please set the environment variable `PADDLE_PDX_MODEL_SOURCE="BOS"` to change the model source to BOS. In the future, more model sources will be supported.
+
 <details><summary>👉 <b>After running, the result is: (Click to expand)</b></summary>
 
 ```bash

+ 2 - 0
docs/module_usage/tutorials/ocr_modules/layout_detection.md

@@ -306,6 +306,8 @@ for res in output:
 
 ```
 
+<b>注:</b>PaddleX 官方模型默认从 HuggingFace 获取,如运行环境访问 HuggingFace 不便,可通过环境变量修改模型源为 BOS:`PADDLE_PDX_MODEL_SOURCE="BOS"`,未来将支持更多主流模型源;
+
 <details><summary>👉 <b>运行后,得到的结果为:(点击展开)</b></summary>
 
 ```bash

+ 2 - 0
docs/module_usage/tutorials/ocr_modules/seal_text_detection.en.md

@@ -101,6 +101,8 @@ for res in output:
     res.save_to_json(save_path="./output/res.json")
 ```
 
+<b>Note: </b>The official models would be download from HuggingFace by default. If can't access to HuggingFace, please set the environment variable `PADDLE_PDX_MODEL_SOURCE="BOS"` to change the model source to BOS. In the future, more model sources will be supported.
+
 After running, the result is:
 
 ```bash

+ 2 - 0
docs/module_usage/tutorials/ocr_modules/seal_text_detection.md

@@ -99,6 +99,8 @@ for res in output:
     res.save_to_json(save_path="./output/res.json")
 ```
 
+<b>注:</b>PaddleX 官方模型默认从 HuggingFace 获取,如运行环境访问 HuggingFace 不便,可通过环境变量修改模型源为 BOS:`PADDLE_PDX_MODEL_SOURCE="BOS"`,未来将支持更多主流模型源;
+
 运行后,得到的结果为:
 
 ```bash

+ 2 - 0
docs/module_usage/tutorials/ocr_modules/table_cells_detection.en.md

@@ -92,6 +92,8 @@ for res in output:
     res.save_to_json("./output/res.json")
 ```
 
+<b>Note: </b>The official models would be download from HuggingFace by default. If can't access to HuggingFace, please set the environment variable `PADDLE_PDX_MODEL_SOURCE="BOS"` to change the model source to BOS. In the future, more model sources will be supported.
+
 <details><summary>👉 <b>After running, the result is: (Click to expand)</b></summary>
 
 ```

+ 2 - 0
docs/module_usage/tutorials/ocr_modules/table_cells_detection.md

@@ -92,6 +92,8 @@ for res in output:
     res.save_to_json("./output/res.json")
 ```
 
+<b>注:</b>PaddleX 官方模型默认从 HuggingFace 获取,如运行环境访问 HuggingFace 不便,可通过环境变量修改模型源为 BOS:`PADDLE_PDX_MODEL_SOURCE="BOS"`,未来将支持更多主流模型源;
+
 <details><summary>👉 <b>运行后,得到的结果为:(点击展开)</b></summary>
 
 ```

+ 2 - 0
docs/module_usage/tutorials/ocr_modules/table_classification.en.md

@@ -83,6 +83,8 @@ for res in output:
     res.save_to_json("./output/res.json")
 ```
 
+<b>Note: </b>The official models would be download from HuggingFace by default. If can't access to HuggingFace, please set the environment variable `PADDLE_PDX_MODEL_SOURCE="BOS"` to change the model source to BOS. In the future, more model sources will be supported.
+
 After running the code, the result obtained is:
 
 ```

+ 2 - 0
docs/module_usage/tutorials/ocr_modules/table_classification.md

@@ -84,6 +84,8 @@ for res in output:
     res.save_to_json("./output/res.json")
 ```
 
+<b>注:</b>PaddleX 官方模型默认从 HuggingFace 获取,如运行环境访问 HuggingFace 不便,可通过环境变量修改模型源为 BOS:`PADDLE_PDX_MODEL_SOURCE="BOS"`,未来将支持更多主流模型源;
+
 运行后,得到的结果为:
 ```
 {"res": {"input_path": "table_recognition.jpg", "page_index": None, "page_index": null, "class_ids": array([0, 1], dtype=int32), "scores": array([0.84421, 0.15579], dtype=float32), "label_names": ["wired_table", "wireless_table"]}}

+ 2 - 0
docs/module_usage/tutorials/ocr_modules/table_structure_recognition.en.md

@@ -110,6 +110,8 @@ for res in output:
     res.save_to_json("./output/res.json")
 ```
 
+<b>Note: </b>The official models would be download from HuggingFace by default. If can't access to HuggingFace, please set the environment variable `PADDLE_PDX_MODEL_SOURCE="BOS"` to change the model source to BOS. In the future, more model sources will be supported.
+
 <details><summary>👉 <b>After running, the result is: (Click to expand)</b></summary>
 
 ```

+ 2 - 0
docs/module_usage/tutorials/ocr_modules/table_structure_recognition.md

@@ -106,6 +106,8 @@ for res in output:
     res.save_to_json("./output/res.json")
 ```
 
+<b>注:</b>PaddleX 官方模型默认从 HuggingFace 获取,如运行环境访问 HuggingFace 不便,可通过环境变量修改模型源为 BOS:`PADDLE_PDX_MODEL_SOURCE="BOS"`,未来将支持更多主流模型源;
+
 <details><summary>👉 <b>运行后,得到的结果为:(点击展开)</b></summary>
 
 ```

+ 2 - 0
docs/module_usage/tutorials/ocr_modules/text_detection.en.md

@@ -129,6 +129,8 @@ for res in output:
     res.save_to_json(save_path="./output/res.json")
 ```
 
+<b>Note: </b>The official models would be download from HuggingFace by default. If can't access to HuggingFace, please set the environment variable `PADDLE_PDX_MODEL_SOURCE="BOS"` to change the model source to BOS. In the future, more model sources will be supported.
+
 After running, the result obtained is:
 
 ```bash

+ 2 - 0
docs/module_usage/tutorials/ocr_modules/text_detection.md

@@ -131,6 +131,8 @@ for res in output:
     res.save_to_json(save_path="./output/res.json")
 ```
 
+<b>注:</b>PaddleX 官方模型默认从 HuggingFace 获取,如运行环境访问 HuggingFace 不便,可通过环境变量修改模型源为 BOS:`PADDLE_PDX_MODEL_SOURCE="BOS"`,未来将支持更多主流模型源;
+
 运行后,得到的结果为:
 
 ```bash

+ 2 - 0
docs/module_usage/tutorials/ocr_modules/text_image_unwarping.en.md

@@ -90,6 +90,8 @@ for res in output:
     res.save_to_json(save_path="./output/res.json")
 ```
 
+<b>Note: </b>The official models would be download from HuggingFace by default. If can't access to HuggingFace, please set the environment variable `PADDLE_PDX_MODEL_SOURCE="BOS"` to change the model source to BOS. In the future, more model sources will be supported.
+
 After running, the result obtained is:
 
 ```bash

+ 2 - 0
docs/module_usage/tutorials/ocr_modules/text_image_unwarping.md

@@ -87,6 +87,8 @@ for res in output:
     res.save_to_json(save_path="./output/res.json")
 ```
 
+<b>注:</b>PaddleX 官方模型默认从 HuggingFace 获取,如运行环境访问 HuggingFace 不便,可通过环境变量修改模型源为 BOS:`PADDLE_PDX_MODEL_SOURCE="BOS"`,未来将支持更多主流模型源;
+
 运行后,得到的结果为:
 
 ```bash

+ 3 - 0
docs/module_usage/tutorials/ocr_modules/text_recognition.en.md

@@ -385,6 +385,9 @@ for res in output:
     res.save_to_img(save_path="./output/")
     res.save_to_json(save_path="./output/res.json")
 ```
+
+<b>Note: </b>The official models would be download from HuggingFace by default. If can't access to HuggingFace, please set the environment variable `PADDLE_PDX_MODEL_SOURCE="BOS"` to change the model source to BOS. In the future, more model sources will be supported.
+
 For more information on using PaddleX's single-model inference APIs, please refer to the [PaddleX Single-Model Python Script Usage Instructions](../../instructions/model_python_API.en.md).
 
 After running, the result obtained is:

+ 2 - 0
docs/module_usage/tutorials/ocr_modules/text_recognition.md

@@ -409,6 +409,8 @@ for res in output:
     res.save_to_json(save_path="./output/res.json")
 ```
 
+<b>注:</b>PaddleX 官方模型默认从 HuggingFace 获取,如运行环境访问 HuggingFace 不便,可通过环境变量修改模型源为 BOS:`PADDLE_PDX_MODEL_SOURCE="BOS"`,未来将支持更多主流模型源;
+
 运行后,得到的结果为:
 ```bash
 {'res': {'input_path': 'general_ocr_rec_001.png', 'page_index': None, 'rec_text': '绿洲仕格维花园公寓', 'rec_score': 0.9823867082595825}}

+ 2 - 0
docs/module_usage/tutorials/ocr_modules/textline_orientation_classification.en.md

@@ -101,6 +101,8 @@ for res in output:
     res.save_to_json("./output/res.json")
 ```
 
+<b>Note: </b>The official models would be download from HuggingFace by default. If can't access to HuggingFace, please set the environment variable `PADDLE_PDX_MODEL_SOURCE="BOS"` to change the model source to BOS. In the future, more model sources will be supported.
+
 After running, the result obtained is:
 
 ```bash

+ 2 - 0
docs/module_usage/tutorials/ocr_modules/textline_orientation_classification.md

@@ -104,6 +104,8 @@ for res in output:
     res.save_to_json("./output/res.json")
 ```
 
+<b>注:</b>PaddleX 官方模型默认从 HuggingFace 获取,如运行环境访问 HuggingFace 不便,可通过环境变量修改模型源为 BOS:`PADDLE_PDX_MODEL_SOURCE="BOS"`,未来将支持更多主流模型源;
+
 运行后,得到的结果为:
 ```bash
 {'res': {'input_path': 'textline_rot180_demo.jpg', 'page_index': None, 'class_ids': array([1], dtype=int32), 'scores': array([0.99864], dtype=float32), 'label_names': ['180_degree']}}

+ 20 - 18
docs/pipeline_usage/tutorials/ocr_pipelines/OCR.en.md

@@ -543,6 +543,8 @@ paddlex --pipeline OCR \
         --device gpu:0
 ```
 
+<b>Note: </b>The official models would be download from HuggingFace by default. If can't access to HuggingFace, please set the environment variable `PADDLE_PDX_MODEL_SOURCE="BOS"` to change the model source to BOS. In the future, more model sources will be supported.
+
 For details on the relevant parameter descriptions, please refer to the parameter descriptions in [2.2.2 Python Script Integration](#222-python-script-integration). Supports specifying multiple devices simultaneously for parallel inference. For details, please refer to the documentation on pipeline parallel inference.
 
 After running, the results will be printed to the terminal as follows:
@@ -1207,8 +1209,8 @@ for i, res in enumerate(result["ocrResults"]):
 #include "base64.hpp" // https://github.com/tobiaslocker/base64
 
 int main() {
-    httplib::Client client("localhost", 8080);  
-    const std::string filePath = "./demo.jpg"; 
+    httplib::Client client("localhost", 8080);
+    const std::string filePath = "./demo.jpg";
 
     std::ifstream file(filePath, std::ios::binary | std::ios::ate);
     if (!file) {
@@ -1231,7 +1233,7 @@ int main() {
 
     nlohmann::json jsonObj;
     jsonObj["file"] = encodedFile;
-    jsonObj["fileType"] = 1;  
+    jsonObj["fileType"] = 1;
 
     auto response = client.Post("/ocr", jsonObj.dump(), "application/json");
 
@@ -1288,8 +1290,8 @@ import java.util.Base64;
 
 public class Main {
     public static void main(String[] args) throws IOException {
-        String API_URL = "http://localhost:8080/ocr"; 
-        String imagePath = "./demo.jpg"; 
+        String API_URL = "http://localhost:8080/ocr";
+        String imagePath = "./demo.jpg";
 
         File file = new File(imagePath);
         byte[] fileContent = java.nio.file.Files.readAllBytes(file.toPath());
@@ -1297,12 +1299,12 @@ public class Main {
 
         ObjectMapper objectMapper = new ObjectMapper();
         ObjectNode payload = objectMapper.createObjectNode();
-        payload.put("file", base64Image); 
-        payload.put("fileType", 1); 
+        payload.put("file", base64Image);
+        payload.put("fileType", 1);
 
         OkHttpClient client = new OkHttpClient();
         MediaType JSON = MediaType.get("application/json; charset=utf-8");
-	RequestBody body = RequestBody.create(JSON, payload.toString());
+    RequestBody body = RequestBody.create(JSON, payload.toString());
 
         Request request = new Request.Builder()
                 .url(API_URL)
@@ -1399,14 +1401,14 @@ func main() {
     }
 
     type OcrResult struct {
-        PrunedResult map[string]interface{} `json:"prunedResult"` 
-        OcrImage     *string                `json:"ocrImage"`     
+        PrunedResult map[string]interface{} `json:"prunedResult"`
+        OcrImage     *string                `json:"ocrImage"`
     }
 
     type Response struct {
         Result struct {
             OcrResults []OcrResult `json:"ocrResults"`
-            DataInfo   interface{} `json:"dataInfo"` 
+            DataInfo   interface{} `json:"dataInfo"`
         } `json:"result"`
     }
 
@@ -1417,14 +1419,14 @@ func main() {
     }
 
     for i, res := range respData.Result.OcrResults {
-        
+
         if res.OcrImage != nil {
             imgBytes, err := base64.StdEncoding.DecodeString(*res.OcrImage)
             if err != nil {
                 fmt.Printf("Error decoding image %d: %v\n", i, err)
                 continue
             }
-            
+
             filename := fmt.Sprintf("ocr_%d.jpg", i)
             if err := ioutil.WriteFile(filename, imgBytes, 0644); err != nil {
                 fmt.Printf("Error saving image %s: %v\n", filename, err)
@@ -1500,8 +1502,8 @@ const fs = require('fs');
 const path = require('path');
 
 const API_URL = 'http://localhost:8080/layout-parsing';
-const imagePath = './demo.jpg';  
-const fileType = 1;             
+const imagePath = './demo.jpg';
+const fileType = 1;
 
 function encodeImageToBase64(filePath) {
   const bitmap = fs.readFileSync(filePath);
@@ -1541,13 +1543,13 @@ axios.post(API_URL, payload)
 
 <pre><code class="language-php">&lt;?php
 
-$API_URL = "http://localhost:8080/ocr"; 
-$image_path = "./demo.jpg"; 
+$API_URL = "http://localhost:8080/ocr";
+$image_path = "./demo.jpg";
 
 $image_data = base64_encode(file_get_contents($image_path));
 $payload = array(
     "file" => $image_data,
-    "fileType" => 1 
+    "fileType" => 1
 );
 
 $ch = curl_init($API_URL);

+ 21 - 18
docs/pipeline_usage/tutorials/ocr_pipelines/OCR.md

@@ -554,6 +554,9 @@ paddlex --pipeline OCR \
         --save_path ./output \
         --device gpu:0
 ```
+
+<b>注:</b>PaddleX 官方模型默认从 HuggingFace 获取,如运行环境访问 HuggingFace 不便,可通过环境变量修改模型源为 BOS:`PADDLE_PDX_MODEL_SOURCE="BOS"`,未来将支持更多主流模型源;
+
 相关的参数说明可以参考[2.2.2 Python脚本方式集成](#222-python脚本方式集成)中的参数说明。支持同时指定多个设备以进行并行推理,详情请参考 [产线并行推理](../../instructions/parallel_inference.md#指定多个推理设备)。
 
 运行后,会将结果打印到终端上,结果如下:
@@ -1202,8 +1205,8 @@ for i, res in enumerate(result["ocrResults"]):
 #include "base64.hpp" // https://github.com/tobiaslocker/base64
 
 int main() {
-    httplib::Client client("localhost", 8080);  
-    const std::string filePath = "./demo.jpg"; 
+    httplib::Client client("localhost", 8080);
+    const std::string filePath = "./demo.jpg";
 
     std::ifstream file(filePath, std::ios::binary | std::ios::ate);
     if (!file) {
@@ -1226,7 +1229,7 @@ int main() {
 
     nlohmann::json jsonObj;
     jsonObj["file"] = encodedFile;
-    jsonObj["fileType"] = 1;  
+    jsonObj["fileType"] = 1;
 
     auto response = client.Post("/ocr", jsonObj.dump(), "application/json");
 
@@ -1283,8 +1286,8 @@ import java.util.Base64;
 
 public class Main {
     public static void main(String[] args) throws IOException {
-        String API_URL = "http://localhost:8080/ocr"; 
-        String imagePath = "./demo.jpg"; 
+        String API_URL = "http://localhost:8080/ocr";
+        String imagePath = "./demo.jpg";
 
         File file = new File(imagePath);
         byte[] fileContent = java.nio.file.Files.readAllBytes(file.toPath());
@@ -1292,12 +1295,12 @@ public class Main {
 
         ObjectMapper objectMapper = new ObjectMapper();
         ObjectNode payload = objectMapper.createObjectNode();
-        payload.put("file", base64Image); 
-        payload.put("fileType", 1); 
+        payload.put("file", base64Image);
+        payload.put("fileType", 1);
 
         OkHttpClient client = new OkHttpClient();
         MediaType JSON = MediaType.get("application/json; charset=utf-8");
-	RequestBody body = RequestBody.create(JSON, payload.toString());
+    RequestBody body = RequestBody.create(JSON, payload.toString());
 
         Request request = new Request.Builder()
                 .url(API_URL)
@@ -1394,14 +1397,14 @@ func main() {
     }
 
     type OcrResult struct {
-        PrunedResult map[string]interface{} `json:"prunedResult"` 
-        OcrImage     *string                `json:"ocrImage"`     
+        PrunedResult map[string]interface{} `json:"prunedResult"`
+        OcrImage     *string                `json:"ocrImage"`
     }
 
     type Response struct {
         Result struct {
             OcrResults []OcrResult `json:"ocrResults"`
-            DataInfo   interface{} `json:"dataInfo"` 
+            DataInfo   interface{} `json:"dataInfo"`
         } `json:"result"`
     }
 
@@ -1412,14 +1415,14 @@ func main() {
     }
 
     for i, res := range respData.Result.OcrResults {
-        
+
         if res.OcrImage != nil {
             imgBytes, err := base64.StdEncoding.DecodeString(*res.OcrImage)
             if err != nil {
                 fmt.Printf("Error decoding image %d: %v\n", i, err)
                 continue
             }
-            
+
             filename := fmt.Sprintf("ocr_%d.jpg", i)
             if err := ioutil.WriteFile(filename, imgBytes, 0644); err != nil {
                 fmt.Printf("Error saving image %s: %v\n", filename, err)
@@ -1495,8 +1498,8 @@ const fs = require('fs');
 const path = require('path');
 
 const API_URL = 'http://localhost:8080/layout-parsing';
-const imagePath = './demo.jpg';  
-const fileType = 1;             
+const imagePath = './demo.jpg';
+const fileType = 1;
 
 function encodeImageToBase64(filePath) {
   const bitmap = fs.readFileSync(filePath);
@@ -1536,13 +1539,13 @@ axios.post(API_URL, payload)
 
 <pre><code class="language-php">&lt;?php
 
-$API_URL = "http://localhost:8080/ocr"; 
-$image_path = "./demo.jpg"; 
+$API_URL = "http://localhost:8080/ocr";
+$image_path = "./demo.jpg";
 
 $image_data = base64_encode(file_get_contents($image_path));
 $payload = array(
     "file" => $image_data,
-    "fileType" => 1 
+    "fileType" => 1
 );
 
 $ch = curl_init($API_URL);

+ 2 - 0
docs/pipeline_usage/tutorials/ocr_pipelines/PP-StructureV3.en.md

@@ -684,6 +684,8 @@ paddlex --pipeline PP-StructureV3 \
         --device gpu:0
 ```
 
+<b>Note: </b>The official models would be download from HuggingFace by default. If can't access to HuggingFace, please set the environment variable `PADDLE_PDX_MODEL_SOURCE="BOS"` to change the model source to BOS. In the future, more model sources will be supported.
+
 The parameter description can be found in [2.2.2 Python Script Integration](#222-python-script-integration). Supports specifying multiple devices simultaneously for parallel inference. For details, please refer to the documentation on pipeline parallel inference.
 
 After running, the result will be printed to the terminal, as follows:

+ 3 - 0
docs/pipeline_usage/tutorials/ocr_pipelines/PP-StructureV3.md

@@ -650,6 +650,9 @@ paddlex --pipeline PP-StructureV3 \
         --save_path ./output \
         --device gpu:0
 ```
+
+<b>注:</b>PaddleX 官方模型默认从 HuggingFace 获取,如运行环境访问 HuggingFace 不便,可通过环境变量修改模型源为 BOS:`PADDLE_PDX_MODEL_SOURCE="BOS"`,未来将支持更多主流模型源;
+
 相关的参数说明可以参考[2.2.2 Python脚本方式集成](#222-python脚本方式集成)中的参数说明。支持同时指定多个设备以进行并行推理,详情请参考 [产线并行推理](../../instructions/parallel_inference.md#指定多个推理设备)。
 
 运行后,会将结果打印到终端上,结果如下:

+ 3 - 0
docs/pipeline_usage/tutorials/ocr_pipelines/doc_preprocessor.en.md

@@ -125,6 +125,9 @@ paddlex --pipeline doc_preprocessor \
         --save_path ./output \
         --device gpu:0
 ```
+
+<b>Note: </b>The official models would be download from HuggingFace by default. If can't access to HuggingFace, please set the environment variable `PADDLE_PDX_MODEL_SOURCE="BOS"` to change the model source to BOS. In the future, more model sources will be supported.
+
 You can refer to the parameter descriptions in [2.1.2 Python Script Integration](#212-python-script-integration) for related parameter details. Supports specifying multiple devices simultaneously for parallel inference. For details, please refer to the documentation on pipeline parallel inference.
 
 After running, the results will be printed to the terminal as follows:

+ 2 - 0
docs/pipeline_usage/tutorials/ocr_pipelines/doc_preprocessor.md

@@ -123,6 +123,8 @@ paddlex --pipeline doc_preprocessor \
         --device gpu:0
 ```
 
+<b>注:</b>PaddleX 官方模型默认从 HuggingFace 获取,如运行环境访问 HuggingFace 不便,可通过环境变量修改模型源为 BOS:`PADDLE_PDX_MODEL_SOURCE="BOS"`,未来将支持更多主流模型源;
+
 相关的参数说明可以参考[2.1.2 Python脚本方式集成](#212-python脚本方式集成)中的参数说明。支持同时指定多个设备以进行并行推理,详情请参考 [产线并行推理](../../instructions/parallel_inference.md#指定多个推理设备)。
 
 运行后,会将结果打印到终端上,结果如下:

+ 2 - 0
docs/pipeline_usage/tutorials/ocr_pipelines/formula_recognition.en.md

@@ -393,6 +393,8 @@ paddlex --pipeline formula_recognition \
         --device gpu:0
 ```
 
+<b>Note: </b>The official models would be download from HuggingFace by default. If can't access to HuggingFace, please set the environment variable `PADDLE_PDX_MODEL_SOURCE="BOS"` to change the model source to BOS. In the future, more model sources will be supported.
+
 The relevant parameter descriptions can be referenced from [2.2 Integration via Python Script](#22-integration-via-python-script). Supports specifying multiple devices simultaneously for parallel inference. For details, please refer to the documentation on pipeline parallel inference.
 
 After running, the results will be printed to the terminal, as shown below:

+ 3 - 0
docs/pipeline_usage/tutorials/ocr_pipelines/formula_recognition.md

@@ -395,6 +395,9 @@ paddlex --pipeline formula_recognition \
         --save_path ./output \
         --device gpu:0
 ```
+
+<b>注:</b>PaddleX 官方模型默认从 HuggingFace 获取,如运行环境访问 HuggingFace 不便,可通过环境变量修改模型源为 BOS:`PADDLE_PDX_MODEL_SOURCE="BOS"`,未来将支持更多主流模型源;
+
 相关的参数说明可以参考[2.2 Python脚本方式集成](#22-python脚本方式集成)中的参数说明。支持同时指定多个设备以进行并行推理,详情请参考 [产线并行推理](../../instructions/parallel_inference.md#指定多个推理设备)。
 
 运行后,会将结果打印到终端上,结果如下:

+ 3 - 0
docs/pipeline_usage/tutorials/ocr_pipelines/layout_parsing.en.md

@@ -615,6 +615,9 @@ paddlex --pipeline layout_parsing \
         --save_path ./output \
         --device gpu:0
 ```
+
+<b>Note: </b>The official models would be download from HuggingFace by default. If can't access to HuggingFace, please set the environment variable `PADDLE_PDX_MODEL_SOURCE="BOS"` to change the model source to BOS. In the future, more model sources will be supported.
+
 For parameter descriptions, refer to the parameter explanations in [2.2.2 Integration via Python Script](#222-integration-via-python-script). Supports specifying multiple devices simultaneously for parallel inference. For details, please refer to the documentation on pipeline parallel inference.
 
 After running, the results will be printed to the terminal, as shown below:

+ 3 - 0
docs/pipeline_usage/tutorials/ocr_pipelines/layout_parsing.md

@@ -632,6 +632,9 @@ paddlex --pipeline layout_parsing \
         --save_path ./output \
         --device gpu:0
 ```
+
+<b>注:</b>PaddleX 官方模型默认从 HuggingFace 获取,如运行环境访问 HuggingFace 不便,可通过环境变量修改模型源为 BOS:`PADDLE_PDX_MODEL_SOURCE="BOS"`,未来将支持更多主流模型源;
+
 相关的参数说明可以参考[2.2.2 Python脚本方式集成](#222-python脚本方式集成)中的参数说明。支持同时指定多个设备以进行并行推理,详情请参考 [产线并行推理](../../instructions/parallel_inference.md#指定多个推理设备)。
 
 运行后,会将结果打印到终端上,结果如下:

+ 91 - 89
docs/pipeline_usage/tutorials/ocr_pipelines/seal_recognition.en.md

@@ -661,6 +661,8 @@ paddlex --pipeline seal_recognition \
     --save_path ./output
 ```
 
+<b>Note: </b>The official models would be download from HuggingFace by default. If can't access to HuggingFace, please set the environment variable `PADDLE_PDX_MODEL_SOURCE="BOS"` to change the model source to BOS. In the future, more model sources will be supported.
+
 The relevant parameter descriptions can be referred to in the parameter explanations of [2.1.2 Integration via Python Script](#212-integration-via-python-script). Supports specifying multiple devices simultaneously for parallel inference. For details, please refer to the documentation on pipeline parallel inference.
 
 After running, the results will be printed to the terminal, as follows:
@@ -1505,98 +1507,98 @@ public class Main {
 <pre><code class="language-go">package main
 
 import (
-	"bytes"
-	"encoding/base64"
-	"encoding/json"
-	"fmt"
-	"io/ioutil"
-	"net/http"
+    "bytes"
+    "encoding/base64"
+    "encoding/json"
+    "fmt"
+    "io/ioutil"
+    "net/http"
 )
 
 func main() {
-	API_URL := "http://localhost:8080/seal-recognition"
-	filePath := "./demo.jpg"
-
-	fileBytes, err := ioutil.ReadFile(filePath)
-	if err != nil {
-		fmt.Printf("Error reading file: %v\n", err)
-		return
-	}
-	fileData := base64.StdEncoding.EncodeToString(fileBytes)
-
-	payload := map[string]interface{}{
-		"file":     fileData,
-		"fileType": 1,
-	}
-	payloadBytes, err := json.Marshal(payload)
-	if err != nil {
-		fmt.Printf("Error marshaling payload: %v\n", err)
-		return
-	}
-
-	client := &http.Client{}
-	req, err := http.NewRequest("POST", API_URL, bytes.NewBuffer(payloadBytes))
-	if err != nil {
-		fmt.Printf("Error creating request: %v\n", err)
-		return
-	}
-	req.Header.Set("Content-Type", "application/json")
-
-	resp, err := client.Do(req)
-	if err != nil {
-		fmt.Printf("Error sending request: %v\n", err)
-		return
-	}
-	defer resp.Body.Close()
-
-	if resp.StatusCode != http.StatusOK {
-		fmt.Printf("Unexpected status code: %d\n", resp.StatusCode)
-		return
-	}
-
-	body, err := ioutil.ReadAll(resp.Body)
-	if err != nil {
-		fmt.Printf("Error reading response body: %v\n", err)
-		return
-	}
-
-	type SealResult struct {
-		PrunedResult map[string]interface{}   `json:"prunedResult"`
-		OutputImages map[string]string        `json:"outputImages"`
-		InputImage   *string                  `json:"inputImage"`
-	}
-
-	type Response struct {
-		Result struct {
-			SealRecResults []SealResult  `json:"sealRecResults"`
-			DataInfo       interface{}   `json:"dataInfo"`
-		} `json:"result"`
-	}
-
-	var respData Response
-	if err := json.Unmarshal(body, &respData); err != nil {
-		fmt.Printf("Error unmarshaling response: %v\n", err)
-		return
-	}
-
-	for i, res := range respData.Result.SealRecResults {
-		fmt.Printf("Pruned Result %d: %+v\n", i, res.PrunedResult)
-
-		for name, imgBase64 := range res.OutputImages {
-			imgBytes, err := base64.StdEncoding.DecodeString(imgBase64)
-			if err != nil {
-				fmt.Printf("Error decoding image %s: %v\n", name, err)
-				continue
-			}
-
-			filename := fmt.Sprintf("%s_%d.jpg", name, i)
-			if err := ioutil.WriteFile(filename, imgBytes, 0644); err != nil {
-				fmt.Printf("Error saving image %s: %v\n", filename, err)
-				continue
-			}
-			fmt.Printf("Output image saved at %s\n", filename)
-		}
-	}
+    API_URL := "http://localhost:8080/seal-recognition"
+    filePath := "./demo.jpg"
+
+    fileBytes, err := ioutil.ReadFile(filePath)
+    if err != nil {
+        fmt.Printf("Error reading file: %v\n", err)
+        return
+    }
+    fileData := base64.StdEncoding.EncodeToString(fileBytes)
+
+    payload := map[string]interface{}{
+        "file":     fileData,
+        "fileType": 1,
+    }
+    payloadBytes, err := json.Marshal(payload)
+    if err != nil {
+        fmt.Printf("Error marshaling payload: %v\n", err)
+        return
+    }
+
+    client := &http.Client{}
+    req, err := http.NewRequest("POST", API_URL, bytes.NewBuffer(payloadBytes))
+    if err != nil {
+        fmt.Printf("Error creating request: %v\n", err)
+        return
+    }
+    req.Header.Set("Content-Type", "application/json")
+
+    resp, err := client.Do(req)
+    if err != nil {
+        fmt.Printf("Error sending request: %v\n", err)
+        return
+    }
+    defer resp.Body.Close()
+
+    if resp.StatusCode != http.StatusOK {
+        fmt.Printf("Unexpected status code: %d\n", resp.StatusCode)
+        return
+    }
+
+    body, err := ioutil.ReadAll(resp.Body)
+    if err != nil {
+        fmt.Printf("Error reading response body: %v\n", err)
+        return
+    }
+
+    type SealResult struct {
+        PrunedResult map[string]interface{}   `json:"prunedResult"`
+        OutputImages map[string]string        `json:"outputImages"`
+        InputImage   *string                  `json:"inputImage"`
+    }
+
+    type Response struct {
+        Result struct {
+            SealRecResults []SealResult  `json:"sealRecResults"`
+            DataInfo       interface{}   `json:"dataInfo"`
+        } `json:"result"`
+    }
+
+    var respData Response
+    if err := json.Unmarshal(body, &respData); err != nil {
+        fmt.Printf("Error unmarshaling response: %v\n", err)
+        return
+    }
+
+    for i, res := range respData.Result.SealRecResults {
+        fmt.Printf("Pruned Result %d: %+v\n", i, res.PrunedResult)
+
+        for name, imgBase64 := range res.OutputImages {
+            imgBytes, err := base64.StdEncoding.DecodeString(imgBase64)
+            if err != nil {
+                fmt.Printf("Error decoding image %s: %v\n", name, err)
+                continue
+            }
+
+            filename := fmt.Sprintf("%s_%d.jpg", name, i)
+            if err := ioutil.WriteFile(filename, imgBytes, 0644); err != nil {
+                fmt.Printf("Error saving image %s: %v\n", filename, err)
+                continue
+            }
+            fmt.Printf("Output image saved at %s\n", filename)
+        }
+    }
 }
 </code></pre></details>
 

+ 91 - 89
docs/pipeline_usage/tutorials/ocr_pipelines/seal_recognition.md

@@ -631,6 +631,8 @@ paddlex --pipeline seal_recognition \
     --save_path ./output
 ```
 
+<b>注:</b>PaddleX 官方模型默认从 HuggingFace 获取,如运行环境访问 HuggingFace 不便,可通过环境变量修改模型源为 BOS:`PADDLE_PDX_MODEL_SOURCE="BOS"`,未来将支持更多主流模型源;
+
 相关的参数说明可以参考[2.1.2 Python脚本方式集成](#212-python脚本方式集成)中的参数说明。支持同时指定多个设备以进行并行推理,详情请参考 [产线并行推理](../../instructions/parallel_inference.md#指定多个推理设备)。
 
 运行后,会将结果打印到终端上,结果如下:
@@ -1523,98 +1525,98 @@ public class Main {
 <pre><code class="language-go">package main
 
 import (
-	"bytes"
-	"encoding/base64"
-	"encoding/json"
-	"fmt"
-	"io/ioutil"
-	"net/http"
+    "bytes"
+    "encoding/base64"
+    "encoding/json"
+    "fmt"
+    "io/ioutil"
+    "net/http"
 )
 
 func main() {
-	API_URL := "http://localhost:8080/seal-recognition"
-	filePath := "./demo.jpg"
-
-	fileBytes, err := ioutil.ReadFile(filePath)
-	if err != nil {
-		fmt.Printf("Error reading file: %v\n", err)
-		return
-	}
-	fileData := base64.StdEncoding.EncodeToString(fileBytes)
-
-	payload := map[string]interface{}{
-		"file":     fileData,
-		"fileType": 1,
-	}
-	payloadBytes, err := json.Marshal(payload)
-	if err != nil {
-		fmt.Printf("Error marshaling payload: %v\n", err)
-		return
-	}
-
-	client := &http.Client{}
-	req, err := http.NewRequest("POST", API_URL, bytes.NewBuffer(payloadBytes))
-	if err != nil {
-		fmt.Printf("Error creating request: %v\n", err)
-		return
-	}
-	req.Header.Set("Content-Type", "application/json")
-
-	resp, err := client.Do(req)
-	if err != nil {
-		fmt.Printf("Error sending request: %v\n", err)
-		return
-	}
-	defer resp.Body.Close()
-
-	if resp.StatusCode != http.StatusOK {
-		fmt.Printf("Unexpected status code: %d\n", resp.StatusCode)
-		return
-	}
-
-	body, err := ioutil.ReadAll(resp.Body)
-	if err != nil {
-		fmt.Printf("Error reading response body: %v\n", err)
-		return
-	}
-
-	type SealResult struct {
-		PrunedResult map[string]interface{}   `json:"prunedResult"`
-		OutputImages map[string]string        `json:"outputImages"`
-		InputImage   *string                  `json:"inputImage"`
-	}
-
-	type Response struct {
-		Result struct {
-			SealRecResults []SealResult  `json:"sealRecResults"`
-			DataInfo       interface{}   `json:"dataInfo"`
-		} `json:"result"`
-	}
-
-	var respData Response
-	if err := json.Unmarshal(body, &respData); err != nil {
-		fmt.Printf("Error unmarshaling response: %v\n", err)
-		return
-	}
-
-	for i, res := range respData.Result.SealRecResults {
-		fmt.Printf("Pruned Result %d: %+v\n", i, res.PrunedResult)
-
-		for name, imgBase64 := range res.OutputImages {
-			imgBytes, err := base64.StdEncoding.DecodeString(imgBase64)
-			if err != nil {
-				fmt.Printf("Error decoding image %s: %v\n", name, err)
-				continue
-			}
-
-			filename := fmt.Sprintf("%s_%d.jpg", name, i)
-			if err := ioutil.WriteFile(filename, imgBytes, 0644); err != nil {
-				fmt.Printf("Error saving image %s: %v\n", filename, err)
-				continue
-			}
-			fmt.Printf("Output image saved at %s\n", filename)
-		}
-	}
+    API_URL := "http://localhost:8080/seal-recognition"
+    filePath := "./demo.jpg"
+
+    fileBytes, err := ioutil.ReadFile(filePath)
+    if err != nil {
+        fmt.Printf("Error reading file: %v\n", err)
+        return
+    }
+    fileData := base64.StdEncoding.EncodeToString(fileBytes)
+
+    payload := map[string]interface{}{
+        "file":     fileData,
+        "fileType": 1,
+    }
+    payloadBytes, err := json.Marshal(payload)
+    if err != nil {
+        fmt.Printf("Error marshaling payload: %v\n", err)
+        return
+    }
+
+    client := &http.Client{}
+    req, err := http.NewRequest("POST", API_URL, bytes.NewBuffer(payloadBytes))
+    if err != nil {
+        fmt.Printf("Error creating request: %v\n", err)
+        return
+    }
+    req.Header.Set("Content-Type", "application/json")
+
+    resp, err := client.Do(req)
+    if err != nil {
+        fmt.Printf("Error sending request: %v\n", err)
+        return
+    }
+    defer resp.Body.Close()
+
+    if resp.StatusCode != http.StatusOK {
+        fmt.Printf("Unexpected status code: %d\n", resp.StatusCode)
+        return
+    }
+
+    body, err := ioutil.ReadAll(resp.Body)
+    if err != nil {
+        fmt.Printf("Error reading response body: %v\n", err)
+        return
+    }
+
+    type SealResult struct {
+        PrunedResult map[string]interface{}   `json:"prunedResult"`
+        OutputImages map[string]string        `json:"outputImages"`
+        InputImage   *string                  `json:"inputImage"`
+    }
+
+    type Response struct {
+        Result struct {
+            SealRecResults []SealResult  `json:"sealRecResults"`
+            DataInfo       interface{}   `json:"dataInfo"`
+        } `json:"result"`
+    }
+
+    var respData Response
+    if err := json.Unmarshal(body, &respData); err != nil {
+        fmt.Printf("Error unmarshaling response: %v\n", err)
+        return
+    }
+
+    for i, res := range respData.Result.SealRecResults {
+        fmt.Printf("Pruned Result %d: %+v\n", i, res.PrunedResult)
+
+        for name, imgBase64 := range res.OutputImages {
+            imgBytes, err := base64.StdEncoding.DecodeString(imgBase64)
+            if err != nil {
+                fmt.Printf("Error decoding image %s: %v\n", name, err)
+                continue
+            }
+
+            filename := fmt.Sprintf("%s_%d.jpg", name, i)
+            if err := ioutil.WriteFile(filename, imgBytes, 0644); err != nil {
+                fmt.Printf("Error saving image %s: %v\n", filename, err)
+                continue
+            }
+            fmt.Printf("Output image saved at %s\n", filename)
+        }
+    }
 }
 </code></pre></details>
 

+ 2 - 0
docs/pipeline_usage/tutorials/ocr_pipelines/table_recognition.en.md

@@ -664,6 +664,8 @@ paddlex --pipeline table_recognition \
         --device gpu:0
 ```
 
+<b>Note: </b>The official models would be download from HuggingFace by default. If can't access to HuggingFace, please set the environment variable `PADDLE_PDX_MODEL_SOURCE="BOS"` to change the model source to BOS. In the future, more model sources will be supported.
+
 The content of the parameters can refer to the parameter description in [2.2 Python Script Method](#22-python-script-method-integration). Supports specifying multiple devices simultaneously for parallel inference. For details, please refer to the documentation on pipeline parallel inference.
 
 After running, the result will be printed to the terminal, as follows:

+ 2 - 0
docs/pipeline_usage/tutorials/ocr_pipelines/table_recognition.md

@@ -613,6 +613,8 @@ paddlex --pipeline table_recognition \
         --device gpu:0
 ```
 
+<b>注:</b>PaddleX 官方模型默认从 HuggingFace 获取,如运行环境访问 HuggingFace 不便,可通过环境变量修改模型源为 BOS:`PADDLE_PDX_MODEL_SOURCE="BOS"`,未来将支持更多主流模型源;
+
 相关的参数说明可以参考[2.2 Python脚本方式](#22-python脚本方式集成)中的参数说明。支持同时指定多个设备以进行并行推理,详情请参考 [产线并行推理](../../instructions/parallel_inference.md#指定多个推理设备)。
 
 运行后,会将结果打印到终端上,结果如下:

+ 2 - 0
docs/pipeline_usage/tutorials/ocr_pipelines/table_recognition_v2.en.md

@@ -740,6 +740,8 @@ paddlex --pipeline table_recognition_v2 \
         --device gpu:0
 ```
 
+<b>Note: </b>The official models would be download from HuggingFace by default. If can't access to HuggingFace, please set the environment variable `PADDLE_PDX_MODEL_SOURCE="BOS"` to change the model source to BOS. In the future, more model sources will be supported.
+
 <details><summary>👉 <b>After running, the result obtained is: (Click to expand)</b></summary>
 
 ```

+ 2 - 0
docs/pipeline_usage/tutorials/ocr_pipelines/table_recognition_v2.md

@@ -754,6 +754,8 @@ paddlex --pipeline table_recognition_v2 \
         --device gpu:0
 ```
 
+<b>注:</b>PaddleX 官方模型默认从 HuggingFace 获取,如运行环境访问 HuggingFace 不便,可通过环境变量修改模型源为 BOS:`PADDLE_PDX_MODEL_SOURCE="BOS"`,未来将支持更多主流模型源;
+
 相关的参数说明可以参考[2.2 Python脚本方式集成](#22-python脚本方式集成)中的参数说明。支持同时指定多个设备以进行并行推理,详情请参考 [产线并行推理](../../instructions/parallel_inference.md#指定多个推理设备)。
 
 <details><summary>👉 <b>运行后,得到的结果为:(点击展开)</b></summary>