Bläddra i källkod

[Feat] add pipeline benchmark (#4339)

* add pipeline benchmark

* update infomation display details

* update

* update the logic for handling core and other time

* update

* add the restrictions on top-level function calls

* update

* update

* to all pipelines

* update

* add benchmark doc

* update

* add en doc

* fix
zhang-prog 4 månader sedan
förälder
incheckning
8b48ba9e2a
41 ändrade filer med 1273 tillägg och 23 borttagningar
  1. 1 1
      docs/module_usage/instructions/benchmark.en.md
  2. 1 1
      docs/module_usage/instructions/benchmark.md
  3. 460 0
      docs/pipeline_usage/instructions/benchmark.en.md
  4. 460 0
      docs/pipeline_usage/instructions/benchmark.md
  5. 1 0
      mkdocs.yml
  6. 8 0
      paddlex/inference/models/base/predictor/base_predictor.py
  7. 2 0
      paddlex/inference/pipelines/anomaly_detection/pipeline.py
  8. 2 0
      paddlex/inference/pipelines/attribute_recognition/pipeline.py
  9. 2 0
      paddlex/inference/pipelines/doc_preprocessor/pipeline.py
  10. 2 0
      paddlex/inference/pipelines/doc_understanding/pipeline.py
  11. 2 0
      paddlex/inference/pipelines/face_recognition/pipeline.py
  12. 2 0
      paddlex/inference/pipelines/formula_recognition/pipeline.py
  13. 2 0
      paddlex/inference/pipelines/image_classification/pipeline.py
  14. 2 0
      paddlex/inference/pipelines/image_multilabel_classification/pipeline.py
  15. 2 0
      paddlex/inference/pipelines/instance_segmentation/pipeline.py
  16. 2 0
      paddlex/inference/pipelines/keypoint_detection/pipeline.py
  17. 2 0
      paddlex/inference/pipelines/layout_parsing/pipeline.py
  18. 2 0
      paddlex/inference/pipelines/layout_parsing/pipeline_v2.py
  19. 2 0
      paddlex/inference/pipelines/m_3d_bev_detection/pipeline.py
  20. 2 0
      paddlex/inference/pipelines/multilingual_speech_recognition/pipeline.py
  21. 2 0
      paddlex/inference/pipelines/object_detection/pipeline.py
  22. 2 0
      paddlex/inference/pipelines/ocr/pipeline.py
  23. 2 0
      paddlex/inference/pipelines/open_vocabulary_detection/pipeline.py
  24. 2 0
      paddlex/inference/pipelines/open_vocabulary_segmentation/pipeline.py
  25. 2 0
      paddlex/inference/pipelines/pp_chatocr/pipeline_v3.py
  26. 2 0
      paddlex/inference/pipelines/pp_chatocr/pipeline_v4.py
  27. 2 0
      paddlex/inference/pipelines/pp_doctranslation/pipeline.py
  28. 2 0
      paddlex/inference/pipelines/pp_shitu_v2/pipeline.py
  29. 2 0
      paddlex/inference/pipelines/rotated_object_detection/pipeline.py
  30. 2 0
      paddlex/inference/pipelines/seal_recognition/pipeline.py
  31. 2 0
      paddlex/inference/pipelines/semantic_segmentation/pipeline.py
  32. 2 0
      paddlex/inference/pipelines/small_object_detection/pipeline.py
  33. 2 0
      paddlex/inference/pipelines/table_recognition/pipeline.py
  34. 2 0
      paddlex/inference/pipelines/table_recognition/pipeline_v2.py
  35. 2 0
      paddlex/inference/pipelines/ts_anomaly_detection/pipeline.py
  36. 2 0
      paddlex/inference/pipelines/ts_classification/pipeline.py
  37. 2 0
      paddlex/inference/pipelines/ts_forecasting/pipeline.py
  38. 2 0
      paddlex/inference/pipelines/video_classification/pipeline.py
  39. 2 0
      paddlex/inference/pipelines/video_detection/pipeline.py
  40. 274 21
      paddlex/inference/utils/benchmark.py
  41. 2 0
      paddlex/utils/flags.py

+ 1 - 1
docs/module_usage/instructions/benchmark.en.md

@@ -23,7 +23,7 @@ To enable the benchmark feature, you must set the following environment variable
 **Note**:
 
 * At least one of `PADDLE_PDX_INFER_BENCHMARK_WARMUP` or `PADDLE_PDX_INFER_BENCHMARK_ITERS` must be set to a value greater than zero; otherwise, the benchmark feature cannot be used.
-* The benchmark feature does not currently apply to model pipelines.
+* For the pipeline inference benchmark feature, refer to [Pipeline Benchmark](../../pipeline_usage/instructions/benchmark.en.md).
 
 ## 2. Usage Examples
 

+ 1 - 1
docs/module_usage/instructions/benchmark.md

@@ -23,7 +23,7 @@ Benchmark 功能会统计模型在端到端推理过程中,所有操作的每
 **注意**:
 
 * `PADDLE_PDX_INFER_BENCHMARK_WARMUP` 或 `PADDLE_PDX_INFER_BENCHMARK_ITERS` 需要至少设置一个大于零的值,否则无法使用 benchmark 功能。
-* Benchmark 功能目前不适用于模型产线
+* 产线推理 Benchmark 功能参考 [产线推理 Benchmark](../../pipeline_usage/instructions/benchmark.md)
 
 ## 2. 使用示例
 

+ 460 - 0
docs/pipeline_usage/instructions/benchmark.en.md

@@ -0,0 +1,460 @@
+# Pipeline Benchmark
+
+## Table of Contents
+
+- [1. Instructions](#1.-instructions)
+- [2. Usage Examples](#2.-usage-examples)
+
+## 1. Instructions
+
+The Benchmark feature calculates the average execution time of all operations during end-to-end pipeline inference and provides summary information. The time unit is milliseconds.
+
+The benchmark feature needs to be enabled via environment variables as follows:
+
+* `PADDLE_PDX_PIPELINE_BENCHMARK`: Set to `True` to enable the benchmark feature. The default is `False`.
+
+The following table describes the methods and parameters related to pipeline inference benchmark:
+
+<table border="1" width="100%" cellpadding="5">
+    <tr>
+        <th>Method Name</th>
+        <th>Description</th>
+    </tr>
+    <tr>
+        <td><code>start_warmup()</code></td>
+        <td>Start the benchmark warmup.</td>
+    </tr>
+    <tr>
+        <td><code>stop_warmup()</code></td>
+        <td>Stop the warmup and clear all benchmark data generated during warmup.</td>
+    </tr>
+        <tr>
+        <td><code>print_detail_data()</code></td>
+        <td>Print detailed benchmark data, including the sequence (Step), name (Operation), and average time (Time) of each operation.</td>
+    </tr>
+    <tr>
+        <td><code>print_summary_data()</code></td>
+        <td>Print summary benchmark data, including the level (Level), name (Operation), and total average time (Time) of each operation. The Time for level 1 represents the total average time.</td>
+    </tr>
+    <tr>
+        <td><code>print_operation_info()</code></td>
+        <td>Print the source code location of each operation.</td>
+    </tr>
+    <tr>
+        <td><code>print_pipeline_data()</code></td>
+        <td>Print the detail data, summary data, and operation_info data of the benchmark to the console.</td>
+    </tr>
+    <tr>
+        <td><code>save_pipeline_data(save_path)</code></td>
+        <td><code>save_path</code>: string <br />Save the benchmark data to the specified file path, including detailed benchmark data <code>detail.csv</code> and summary benchmark data <code>summary.csv</code>.</td>
+    </tr>
+    <tr>
+        <td><code>reset()</code></td>
+        <td>Clear existing benchmark data.</td>
+    </tr>
+</table>
+
+**Note**:
+
+* For the Benchmark feature of single model inference, refer to [Model Inference Benchmark](../../module_usage/instructions/benchmark.en.md).
+
+## 2. Usage Examples
+
+Create a `test_infer.py` script:
+
+```python
+from paddlex import create_pipeline
+from paddlex.inference.utils.benchmark import benchmark
+
+pipeline = create_pipeline("OCR", device="gpu")
+image = "general_ocr_002.png"
+
+benchmark.start_warmup() # Start warmup
+for _ in range(50):
+    list(pipeline.predict(image))
+benchmark.stop_warmup() # End warmup
+
+for _ in range(100): # Start formal speed measurement
+    list(pipeline.predict(image))
+
+benchmark.print_pipeline_data()  # Print summary benchmark data
+benchmark.save_pipeline_data("./benchmark") # Save benchmark data to the benchmark folder
+```
+
+Execute the script:
+
+```bash
+PADDLE_PDX_PIPELINE_BENCHMARK=True python test_infer.py
+```
+
+The benchmark results obtained from running the example program are as follows:
+
+```
+                                                             Operation Info
++-----------------------------------------------------+-------------------------------------------------------------------------------+
+|                      Operation                      |                              Source Code Location                             |
++-----------------------------------------------------+-------------------------------------------------------------------------------+
+|                      ReadImage                      |       /PaddleX/paddlex/inference/common/reader/image_reader.py:47       |
+|                   DocTrPostProcess                  |    /PaddleX/paddlex/inference/models/image_unwarping/processors.py:51   |
+|                   DetResizeForTest                  |    /PaddleX/paddlex/inference/models/text_detection/processors.py:58    |
+|     _DocPreprocessorPipeline.get_model_settings     |  /PaddleX/paddlex/inference/pipelines/doc_preprocessor/pipeline.py:110  |
+|                         Crop                        | /PaddleX/paddlex/inference/models/image_classification/processors.py:45 |
+|                PaddleInferChainLegacy               |       /PaddleX/paddlex/inference/models/common/static_infer.py:248      |
+|              _OCRPipeline.rotate_image              |         /PaddleX/paddlex/inference/pipelines/ocr/pipeline.py:140        |
+|           _DocPreprocessorPipeline.predict          |  /PaddleX/paddlex/inference/pipelines/doc_preprocessor/pipeline.py:133  |
+|                 ClasPredictor.apply                 |  /PaddleX/paddlex/inference/models/base/predictor/base_predictor.py:213 |
+|                 WarpPredictor.apply                 |  /PaddleX/paddlex/inference/models/base/predictor/base_predictor.py:213 |
+|                TextDetPredictor.apply               |  /PaddleX/paddlex/inference/models/base/predictor/base_predictor.py:213 |
+|                    ResizeByShort                    |    /PaddleX/paddlex/inference/models/common/vision/processors.py:203    |
+|                      Normalize                      |    /PaddleX/paddlex/inference/models/common/vision/processors.py:268    |
+|                    DBPostProcess                    |    /PaddleX/paddlex/inference/models/text_detection/processors.py:487   |
+|                TextRecPredictor.apply               |  /PaddleX/paddlex/inference/models/base/predictor/base_predictor.py:213 |
+|                    NormalizeImage                   |    /PaddleX/paddlex/inference/models/text_detection/processors.py:252   |
+| _DocPreprocessorPipeline.check_model_settings_valid |   /PaddleX/paddlex/inference/pipelines/doc_preprocessor/pipeline.py:82  |
+|           _OCRPipeline.get_model_settings           |         /PaddleX/paddlex/inference/pipelines/ocr/pipeline.py:204        |
+|           _OCRPipeline.get_text_det_params          |         /PaddleX/paddlex/inference/pipelines/ocr/pipeline.py:236        |
+|                        Resize                       |    /PaddleX/paddlex/inference/models/common/vision/processors.py:117    |
+|                    CTCLabelDecode                   |   /PaddleX/paddlex/inference/models/text_recognition/processors.py:189  |
+|                         Topk                        | /PaddleX/paddlex/inference/models/image_classification/processors.py:83 |
+|                  OCRReisizeNormImg                  |   /PaddleX/paddlex/inference/models/text_recognition/processors.py:65   |
+|                       ToBatch                       |   /PaddleX/paddlex/inference/models/text_recognition/processors.py:235  |
+|                      ToCHWImage                     |    /PaddleX/paddlex/inference/models/common/vision/processors.py:277    |
+|                 _OCRPipeline.predict                |         /PaddleX/paddlex/inference/pipelines/ocr/pipeline.py:282        |
+|                       ToBatch                       |    /PaddleX/paddlex/inference/models/common/vision/processors.py:284    |
+|       _OCRPipeline.check_model_settings_valid       |         /PaddleX/paddlex/inference/pipelines/ocr/pipeline.py:176        |
++-----------------------------------------------------+-------------------------------------------------------------------------------+
+                                           Detail Data
++------+----------------------------------------------------------------+-----------------------+
+| Step | Operation                                                      | Time                  |
++------+----------------------------------------------------------------+-----------------------+
+|  1   | _OCRPipeline.predict                                           | 375.11244628956774    |
+|  2   |     -> _OCRPipeline.get_model_settings                         | 0.00428391998866573   |
+|  3   |     -> _OCRPipeline.check_model_settings_valid                 | 0.0024828016466926783 |
+|  4   |     -> _OCRPipeline.get_text_det_params                        | 0.005152080120751634  |
+|  5   |     -> ReadImage                                               | 3.2029549301660154    |
+|  6   |     -> _DocPreprocessorPipeline.predict                        | 27.310913350374904    |
+|  7   |         -> _DocPreprocessorPipeline.get_model_settings         | 0.004107539862161502  |
+|  8   |         -> _DocPreprocessorPipeline.check_model_settings_valid | 0.0016830896493047476 |
+|  9   |         -> ReadImage                                           | 0.0029576495580840856 |
+|  10  |         -> ClasPredictor.apply                                 | 4.701614730001893     |
+|  11  |             -> ReadImage                                       | 0.13587839042884298   |
+|  12  |             -> ResizeByShort                                   | 0.3281894406245556    |
+|  13  |             -> Crop                                            | 0.01503000981756486   |
+|  14  |             -> Normalize                                       | 0.3884544402535539    |
+|  15  |             -> ToCHWImage                                      | 0.006330519245238975  |
+|  16  |             -> ToBatch                                         | 0.14169737987685949   |
+|  17  |             -> PaddleInferChainLegacy                          | 3.283889550511958     |
+|  18  |             -> Topk                                            | 0.10010718091507442   |
+|  19  |         -> WarpPredictor.apply                                 | 21.893062600429403    |
+|  20  |             -> ReadImage                                       | 0.004573430051095784  |
+|  21  |             -> Normalize                                       | 4.245691860560328     |
+|  22  |             -> ToCHWImage                                      | 0.005895959911867976  |
+|  23  |             -> ToBatch                                         | 1.7250755491841119    |
+|  24  |             -> PaddleInferChainLegacy                          | 10.887994960212382    |
+|  25  |             -> DocTrPostProcess                                | 1.4253830898087472    |
+|  26  |     -> TextDetPredictor.apply                                  | 49.976056129235076    |
+|  27  |         -> ReadImage                                           | 0.004843260976485908  |
+|  28  |         -> DetResizeForTest                                    | 3.3269549095712136    |
+|  29  |         -> NormalizeImage                                      | 2.9576204597833566    |
+|  30  |         -> ToCHWImage                                          | 0.005182310123927891  |
+|  31  |         -> ToBatch                                             | 1.046062790119322     |
+|  32  |         -> PaddleInferChainLegacy                              | 34.70224040953326     |
+|  33  |         -> DBPostProcess                                       | 5.826775671303039     |
+|  34  |     -> ClasPredictor.apply                                     | 23.43678753997665     |
+|  35  |         -> ReadImage                                           | 0.0633359991479665    |
+|  36  |         -> Resize                                              | 0.24419097986537963   |
+|  37  |         -> Normalize                                           | 0.480741420033155     |
+|  38  |         -> ToCHWImage                                          | 0.0066608507768251    |
+|  39  |         -> ToBatch                                             | 0.18536171046434902   |
+|  40  |         -> PaddleInferChainLegacy                              | 3.3766339404974133    |
+|  41  |         -> Topk                                                | 0.15909907990135252   |
+|  42  |         -> ReadImage                                           | 0.0395357194065582    |
+|  43  |         -> Resize                                              | 0.2085290702234488    |
+|  44  |         -> Normalize                                           | 0.4068155895220116    |
+|  45  |         -> ToCHWImage                                          | 0.005677459557773545  |
+|  46  |         -> ToBatch                                             | 0.11155156986205839   |
+|  47  |         -> PaddleInferChainLegacy                              | 2.7268862597702537    |
+|  48  |         -> Topk                                                | 0.13428127014776692   |
+|  49  |         -> ReadImage                                           | 0.032502070971531793  |
+|  50  |         -> Resize                                              | 0.20152631899691187   |
+|  51  |         -> Normalize                                           | 0.347195100330282     |
+|  52  |         -> ToCHWImage                                          | 0.005517759709618986  |
+|  53  |         -> ToBatch                                             | 0.10656953061698005   |
+|  54  |         -> PaddleInferChainLegacy                              | 2.612808299745666     |
+|  55  |         -> Topk                                                | 0.13188434022595175   |
+|  56  |         -> ReadImage                                           | 0.03589507090509869   |
+|  57  |         -> Resize                                              | 0.2076980892161373    |
+|  58  |         -> Normalize                                           | 0.3592138692329172    |
+|  59  |         -> ToCHWImage                                          | 0.005206359783187509  |
+|  60  |         -> ToBatch                                             | 0.1359267797670327    |
+|  61  |         -> PaddleInferChainLegacy                              | 2.619662079960108     |
+|  62  |         -> Topk                                                | 0.130717080028262     |
+|  63  |         -> ReadImage                                           | 0.038393009890569374  |
+|  64  |         -> Resize                                              | 0.19743553988519125   |
+|  65  |         -> Normalize                                           | 0.33197281998582184   |
+|  66  |         -> ToCHWImage                                          | 0.00512515107402578   |
+|  67  |         -> ToBatch                                             | 0.10293568033375777   |
+|  68  |         -> PaddleInferChainLegacy                              | 2.5824282996472903    |
+|  69  |         -> Topk                                                | 0.129485729848966     |
+|  70  |         -> ReadImage                                           | 0.04028105002362281   |
+|  71  |         -> Resize                                              | 0.10972122952807695   |
+|  72  |         -> Normalize                                           | 0.1787920702190604    |
+|  73  |         -> ToCHWImage                                          | 0.00408922991482541   |
+|  74  |         -> ToBatch                                             | 0.05458273953991011   |
+|  75  |         -> PaddleInferChainLegacy                              | 2.262636839877814     |
+|  76  |         -> Topk                                                | 0.1055472502775956    |
+|  77  |     -> _OCRPipeline.rotate_image                               | 0.05102259965497069   |
+|  78  |     -> TextRecPredictor.apply                                  | 169.44437422047486    |
+|  79  |         -> ReadImage                                           | 0.004737989947898313  |
+|  80  |         -> OCRReisizeNormImg                                   | 0.46037410967983305   |
+|  81  |         -> ToBatch                                             | 0.6405122207070235    |
+|  82  |         -> PaddleInferChainLegacy                              | 15.439773340767715    |
+|  83  |         -> CTCLabelDecode                                      | 10.742378439754248    |
+|  84  |         -> ReadImage                                           | 0.006349970353767276  |
+|  85  |         -> OCRReisizeNormImg                                   | 0.6252558408596087    |
+|  86  |         -> ToBatch                                             | 0.7338531101413537    |
+|  87  |         -> PaddleInferChainLegacy                              | 15.204189889482222    |
+|  88  |         -> CTCLabelDecode                                      | 6.7516070799320005    |
+|  89  |         -> ReadImage                                           | 0.006978959863772616  |
+|  90  |         -> OCRReisizeNormImg                                   | 0.7167729703360237    |
+|  91  |         -> ToBatch                                             | 0.6568272292497568    |
+|  92  |         -> PaddleInferChainLegacy                              | 14.973864750063512    |
+|  93  |         -> CTCLabelDecode                                      | 6.695752280211309     |
+|  94  |         -> ReadImage                                           | 0.0070425499870907515 |
+|  95  |         -> OCRReisizeNormImg                                   | 0.7757280093210284    |
+|  96  |         -> ToBatch                                             | 0.6442721793428063    |
+|  97  |         -> PaddleInferChainLegacy                              | 15.027350780292181    |
+|  98  |         -> CTCLabelDecode                                      | 6.661591530573787     |
+|  99  |         -> ReadImage                                           | 0.007066540565574542  |
+| 100  |         -> OCRReisizeNormImg                                   | 0.9195591000025161    |
+| 101  |         -> ToBatch                                             | 0.7951801503077149    |
+| 102  |         -> PaddleInferChainLegacy                              | 15.379044259898365    |
+| 103  |         -> CTCLabelDecode                                      | 9.372330370388227     |
+| 104  |         -> ReadImage                                           | 0.006225309771252796  |
+| 105  |         -> OCRReisizeNormImg                                   | 1.1437026296334807    |
+| 106  |         -> ToBatch                                             | 1.091715270158602     |
+| 107  |         -> PaddleInferChainLegacy                              | 23.505835609685164    |
+| 108  |         -> CTCLabelDecode                                      | 17.118994210031815    |
++------+----------------------------------------------------------------+-----------------------+
+                                      Summary Data
++-------+-----------------------------------------------------+-----------------------+
+| Level | Operation                                           | Time                  |
++-------+-----------------------------------------------------+-----------------------+
+|   1   | _OCRPipeline.predict                                | 375.11244628956774    |
+|       |                                                     |                       |
+|   2   | Layer                                               | 375.11244628956774    |
+|       | Core                                                | 273.4340275716386     |
+|       | Other                                               | 101.67841871792916    |
+|       | _OCRPipeline.get_model_settings                     | 0.00428391998866573   |
+|       | _OCRPipeline.check_model_settings_valid             | 0.0024828016466926783 |
+|       | _OCRPipeline.get_text_det_params                    | 0.005152080120751634  |
+|       | ReadImage                                           | 3.2029549301660154    |
+|       | _DocPreprocessorPipeline.predict                    | 27.310913350374904    |
+|       | TextDetPredictor.apply                              | 49.976056129235076    |
+|       | ClasPredictor.apply                                 | 23.43678753997665     |
+|       | _OCRPipeline.rotate_image                           | 0.05102259965497069   |
+|       | TextRecPredictor.apply                              | 169.44437422047486    |
+|       |                                                     |                       |
+|   3   | Layer                                               | 270.1681312400615     |
+|       | Core                                                | 261.8130224109336     |
+|       | Other                                               | 8.355108829127857     |
+|       | _DocPreprocessorPipeline.get_model_settings         | 0.004107539862161502  |
+|       | _DocPreprocessorPipeline.check_model_settings_valid | 0.0016830896493047476 |
+|       | ReadImage                                           | 0.29614515136927366   |
+|       | ClasPredictor.apply                                 | 4.701614730001893     |
+|       | WarpPredictor.apply                                 | 21.893062600429403    |
+|       | DetResizeForTest                                    | 3.3269549095712136    |
+|       | NormalizeImage                                      | 2.9576204597833566    |
+|       | ToCHWImage                                          | 0.03745912094018422   |
+|       | ToBatch                                             | 6.305350960610667     |
+|       | PaddleInferChainLegacy                              | 150.41335475922097    |
+|       | DBPostProcess                                       | 5.826775671303039     |
+|       | Resize                                              | 1.1691012277151458    |
+|       | Normalize                                           | 2.104730869323248     |
+|       | Topk                                                | 0.7910147504298948    |
+|       | OCRReisizeNormImg                                   | 4.641392659832491     |
+|       | CTCLabelDecode                                      | 57.342653910891386    |
+|       |                                                     |                       |
+|   4   | Layer                                               | 26.594677330431296    |
+|       | Core                                                | 22.69419176140218     |
+|       | Other                                               | 3.900485569029115     |
+|       | ReadImage                                           | 0.14045182047993876   |
+|       | ResizeByShort                                       | 0.3281894406245556    |
+|       | Crop                                                | 0.01503000981756486   |
+|       | Normalize                                           | 4.634146300813882     |
+|       | ToCHWImage                                          | 0.012226479157106951  |
+|       | ToBatch                                             | 1.8667729290609714    |
+|       | PaddleInferChainLegacy                              | 14.17188451072434     |
+|       | Topk                                                | 0.10010718091507442   |
+|       | DocTrPostProcess                                    | 1.4253830898087472    |
++-------+-----------------------------------------------------+-----------------------+
+```
+
+The results will be saved locally to: `./benchmark/detail.csv` and `./benchmark/summary.csv`:
+
+The content of `detail.csv` is as follows:
+
+```
+Step,Operation,Time
+1,_OCRPipeline.predict,375.11244628956774
+2,    -> _OCRPipeline.get_model_settings,0.00428391998866573
+3,    -> _OCRPipeline.check_model_settings_valid,0.0024828016466926783
+4,    -> _OCRPipeline.get_text_det_params,0.005152080120751634
+5,    -> ReadImage,3.2029549301660154
+6,    -> _DocPreprocessorPipeline.predict,27.310913350374904
+7,        -> _DocPreprocessorPipeline.get_model_settings,0.004107539862161502
+8,        -> _DocPreprocessorPipeline.check_model_settings_valid,0.0016830896493047476
+9,        -> ReadImage,0.0029576495580840856
+10,        -> ClasPredictor.apply,4.701614730001893
+11,            -> ReadImage,0.13587839042884298
+12,            -> ResizeByShort,0.3281894406245556
+13,            -> Crop,0.01503000981756486
+14,            -> Normalize,0.3884544402535539
+15,            -> ToCHWImage,0.006330519245238975
+16,            -> ToBatch,0.14169737987685949
+17,            -> PaddleInferChainLegacy,3.283889550511958
+18,            -> Topk,0.10010718091507442
+19,        -> WarpPredictor.apply,21.893062600429403
+20,            -> ReadImage,0.004573430051095784
+21,            -> Normalize,4.245691860560328
+22,            -> ToCHWImage,0.005895959911867976
+23,            -> ToBatch,1.7250755491841119
+24,            -> PaddleInferChainLegacy,10.887994960212382
+25,            -> DocTrPostProcess,1.4253830898087472
+26,    -> TextDetPredictor.apply,49.976056129235076
+27,        -> ReadImage,0.004843260976485908
+28,        -> DetResizeForTest,3.3269549095712136
+29,        -> NormalizeImage,2.9576204597833566
+30,        -> ToCHWImage,0.005182310123927891
+31,        -> ToBatch,1.046062790119322
+32,        -> PaddleInferChainLegacy,34.70224040953326
+33,        -> DBPostProcess,5.826775671303039
+34,    -> ClasPredictor.apply,23.43678753997665
+35,        -> ReadImage,0.0633359991479665
+36,        -> Resize,0.24419097986537963
+37,        -> Normalize,0.480741420033155
+38,        -> ToCHWImage,0.0066608507768251
+39,        -> ToBatch,0.18536171046434902
+40,        -> PaddleInferChainLegacy,3.3766339404974133
+41,        -> Topk,0.15909907990135252
+42,        -> ReadImage,0.0395357194065582
+43,        -> Resize,0.2085290702234488
+44,        -> Normalize,0.4068155895220116
+45,        -> ToCHWImage,0.005677459557773545
+46,        -> ToBatch,0.11155156986205839
+47,        -> PaddleInferChainLegacy,2.7268862597702537
+48,        -> Topk,0.13428127014776692
+49,        -> ReadImage,0.032502070971531793
+50,        -> Resize,0.20152631899691187
+51,        -> Normalize,0.347195100330282
+52,        -> ToCHWImage,0.005517759709618986
+53,        -> ToBatch,0.10656953061698005
+54,        -> PaddleInferChainLegacy,2.612808299745666
+55,        -> Topk,0.13188434022595175
+56,        -> ReadImage,0.03589507090509869
+57,        -> Resize,0.2076980892161373
+58,        -> Normalize,0.3592138692329172
+59,        -> ToCHWImage,0.005206359783187509
+60,        -> ToBatch,0.1359267797670327
+61,        -> PaddleInferChainLegacy,2.619662079960108
+62,        -> Topk,0.130717080028262
+63,        -> ReadImage,0.038393009890569374
+64,        -> Resize,0.19743553988519125
+65,        -> Normalize,0.33197281998582184
+66,        -> ToCHWImage,0.00512515107402578
+67,        -> ToBatch,0.10293568033375777
+68,        -> PaddleInferChainLegacy,2.5824282996472903
+69,        -> Topk,0.129485729848966
+70,        -> ReadImage,0.04028105002362281
+71,        -> Resize,0.10972122952807695
+72,        -> Normalize,0.1787920702190604
+73,        -> ToCHWImage,0.00408922991482541
+74,        -> ToBatch,0.05458273953991011
+75,        -> PaddleInferChainLegacy,2.262636839877814
+76,        -> Topk,0.1055472502775956
+77,    -> _OCRPipeline.rotate_image,0.05102259965497069
+78,    -> TextRecPredictor.apply,169.44437422047486
+79,        -> ReadImage,0.004737989947898313
+80,        -> OCRReisizeNormImg,0.46037410967983305
+81,        -> ToBatch,0.6405122207070235
+82,        -> PaddleInferChainLegacy,15.439773340767715
+83,        -> CTCLabelDecode,10.742378439754248
+84,        -> ReadImage,0.006349970353767276
+85,        -> OCRReisizeNormImg,0.6252558408596087
+86,        -> ToBatch,0.7338531101413537
+87,        -> PaddleInferChainLegacy,15.204189889482222
+88,        -> CTCLabelDecode,6.7516070799320005
+89,        -> ReadImage,0.006978959863772616
+90,        -> OCRReisizeNormImg,0.7167729703360237
+91,        -> ToBatch,0.6568272292497568
+92,        -> PaddleInferChainLegacy,14.973864750063512
+93,        -> CTCLabelDecode,6.695752280211309
+94,        -> ReadImage,0.0070425499870907515
+95,        -> OCRReisizeNormImg,0.7757280093210284
+96,        -> ToBatch,0.6442721793428063
+97,        -> PaddleInferChainLegacy,15.027350780292181
+98,        -> CTCLabelDecode,6.661591530573787
+99,        -> ReadImage,0.007066540565574542
+100,        -> OCRReisizeNormImg,0.9195591000025161
+101,        -> ToBatch,0.7951801503077149
+102,        -> PaddleInferChainLegacy,15.379044259898365
+103,        -> CTCLabelDecode,9.372330370388227
+104,        -> ReadImage,0.006225309771252796
+105,        -> OCRReisizeNormImg,1.1437026296334807
+106,        -> ToBatch,1.091715270158602
+107,        -> PaddleInferChainLegacy,23.505835609685164
+108,        -> CTCLabelDecode,17.118994210031815
+```
+
+The content of `summary.csv` is as follows:
+
+```
+Level, Operation, Time
+1, _OCRPipeline.predict, 375.11244628956774
+,,
+2, Layer, 375.11244628956774
+, Core, 273.4340275716386
+, Other, 101.67841871792916
+, _OCRPipeline.get_model_settings, 0.00428391998866573
+, _OCRPipeline.check_model_settings_valid, 0.0024828016466926783
+, _OCRPipeline.get_text_det_params, 0.005152080120751634
+, ReadImage, 3.2029549301660154
+, _DocPreprocessorPipeline.predict, 27.310913350374904
+, TextDetPredictor.apply, 49.976056129235076
+, ClasPredictor.apply, 23.43678753997665
+, _OCRPipeline.rotate_image, 0.05102259965497069
+, TextRecPredictor.apply, 169.44437422047486
+,,
+3, Layer, 270.1681312400615
+, Core, 261.8130224109336
+, Other, 8.355108829127857
+, _DocPreprocessorPipeline.get_model_settings, 0.004107539862161502
+, _DocPreprocessorPipeline.check_model_settings_valid, 0.0016830896493047476
+, ReadImage, 0.29614515136927366
+, ClasPredictor.apply, 4.701614730001893
+, WarpPredictor.apply, 21.893062600429403
+, DetResizeForTest, 3.3269549095712136
+, NormalizeImage, 2.9576204597833566
+, ToCHWImage, 0.03745912094018422
+, ToBatch, 6.305350960610667
+, PaddleInferChainLegacy, 150.41335475922097
+, DBPostProcess, 5.826775671303039
+, Resize, 1.1691012277151458
+, Normalize, 2.104730869323248
+, Topk, 0.7910147504298948
+, OCRReisizeNormImg, 4.641392659832491
+, CTCLabelDecode, 57.342653910891386
+,,
+4, Layer, 26.594677330431296
+, Core, 22.69419176140218
+, Other, 3.900485569029115
+, ReadImage, 0.14045182047993876
+, ResizeByShort, 0.3281894406245556
+, Crop, 0.01503000981756486
+, Normalize, 4.634146300813882
+, ToCHWImage, 0.012226479157106951
+, ToBatch, 1.8667729290609714
+, PaddleInferChainLegacy, 14.17188451072434
+, Topk, 0.10010718091507442
+, DocTrPostProcess, 1.4253830898087472
+```

+ 460 - 0
docs/pipeline_usage/instructions/benchmark.md

@@ -0,0 +1,460 @@
+# 产线推理 Benchmark
+
+## 目录
+
+- [1. 使用说明](#1.-使用说明)
+- [2. 使用示例](#2.-使用示例)
+
+## 1. 使用说明
+
+Benchmark 功能会统计产线在端到端推理过程中,所有操作的平均执行时间,并给出汇总信息。耗时数据单位为毫秒。
+
+需通过环境变量启用 benchmark 功能,具体如下:
+
+* `PADDLE_PDX_PIPELINE_BENCHMARK`:设置为 `True` 时则开启 benchmark 功能,默认为 `False`;
+
+产线推理benchmark相关方法、参数说明如下:
+
+<table border="1" width="100%" cellpadding="5">
+    <tr>
+        <th>方法名称</th>
+        <th>描述</th>
+    </tr>
+    <tr>
+        <td><code>start_warmup()</code></td>
+        <td>开始benchmark的warmup。</td>
+    </tr>
+    <tr>
+        <td><code>stop_warmup()</code></td>
+        <td>结束warmup,会清除warmup时产生的所有benchmark数据。</td>
+    </tr>
+        <tr>
+        <td><code>print_detail_data()</code></td>
+        <td>打印详细的benchmark数据,包括每个操作的顺序(Step)、名称(Operation)、平均耗时(Time)。</td>
+    </tr>
+    <tr>
+        <td><code>print_summary_data()</code></td>
+        <td>打印汇总的benchmark数据,包括每个操作的层级(Level)、名称(Operation)、总平均耗时(Time)。level为1的Time即为总平均耗时。</td>
+    </tr>
+    <tr>
+        <td><code>print_operation_info()</code></td>
+        <td>打印每个操作的源代码位置。</td>
+    </tr>
+    <tr>
+        <td><code>print_pipeline_data()</code></td>
+        <td>打印benchmark的detail数据、summary数据和operation_info数据到控制台。</td>
+    </tr>
+    <tr>
+        <td><code>save_pipeline_data(save_path)</code></td>
+        <td><code>save_path</code>: string <br />保存benchmark数据的文件路径,包含详细的benchmark数据 <code>detail.csv</code> 和汇总的benchmark数据 <code>summary.csv</code>。</td>
+    </tr>
+    <tr>
+        <td><code>reset()</code></td>
+        <td>清除已有的benchmark数据。</td>
+    </tr>
+</table>
+
+**注意**:
+
+* 单模型推理 Benchmark 功能参考 [模型推理 Benchmark](../../module_usage/instructions/benchmark.md)。
+
+## 2. 使用示例
+
+创建 `test_infer.py` 脚本:
+
+```python
+from paddlex import create_pipeline
+from paddlex.inference.utils.benchmark import benchmark
+
+pipeline = create_pipeline("OCR", device="gpu")
+image = "general_ocr_002.png"
+
+benchmark.start_warmup() # warmup开始
+for _ in range(50):
+    list(pipeline.predict(image))
+benchmark.stop_warmup() # warmup结束
+
+for _ in range(100): # 开始正式测速
+    list(pipeline.predict(image))
+
+benchmark.print_pipeline_data()  # 打印汇总的benchmark数据
+benchmark.save_pipeline_data("./benchmark") # 将benchmark数据保存至benchmark文件夹
+```
+
+执行脚本:
+
+```bash
+PADDLE_PDX_PIPELINE_BENCHMARK=True python test_infer.py
+```
+
+运行示例程序所得到的 benchmark 结果如下:
+
+```
+                                                             Operation Info
++-----------------------------------------------------+-------------------------------------------------------------------------------+
+|                      Operation                      |                              Source Code Location                             |
++-----------------------------------------------------+-------------------------------------------------------------------------------+
+|                      ReadImage                      |       /PaddleX/paddlex/inference/common/reader/image_reader.py:47       |
+|                   DocTrPostProcess                  |    /PaddleX/paddlex/inference/models/image_unwarping/processors.py:51   |
+|                   DetResizeForTest                  |    /PaddleX/paddlex/inference/models/text_detection/processors.py:58    |
+|     _DocPreprocessorPipeline.get_model_settings     |  /PaddleX/paddlex/inference/pipelines/doc_preprocessor/pipeline.py:110  |
+|                         Crop                        | /PaddleX/paddlex/inference/models/image_classification/processors.py:45 |
+|                PaddleInferChainLegacy               |       /PaddleX/paddlex/inference/models/common/static_infer.py:248      |
+|              _OCRPipeline.rotate_image              |         /PaddleX/paddlex/inference/pipelines/ocr/pipeline.py:140        |
+|           _DocPreprocessorPipeline.predict          |  /PaddleX/paddlex/inference/pipelines/doc_preprocessor/pipeline.py:133  |
+|                 ClasPredictor.apply                 |  /PaddleX/paddlex/inference/models/base/predictor/base_predictor.py:213 |
+|                 WarpPredictor.apply                 |  /PaddleX/paddlex/inference/models/base/predictor/base_predictor.py:213 |
+|                TextDetPredictor.apply               |  /PaddleX/paddlex/inference/models/base/predictor/base_predictor.py:213 |
+|                    ResizeByShort                    |    /PaddleX/paddlex/inference/models/common/vision/processors.py:203    |
+|                      Normalize                      |    /PaddleX/paddlex/inference/models/common/vision/processors.py:268    |
+|                    DBPostProcess                    |    /PaddleX/paddlex/inference/models/text_detection/processors.py:487   |
+|                TextRecPredictor.apply               |  /PaddleX/paddlex/inference/models/base/predictor/base_predictor.py:213 |
+|                    NormalizeImage                   |    /PaddleX/paddlex/inference/models/text_detection/processors.py:252   |
+| _DocPreprocessorPipeline.check_model_settings_valid |   /PaddleX/paddlex/inference/pipelines/doc_preprocessor/pipeline.py:82  |
+|           _OCRPipeline.get_model_settings           |         /PaddleX/paddlex/inference/pipelines/ocr/pipeline.py:204        |
+|           _OCRPipeline.get_text_det_params          |         /PaddleX/paddlex/inference/pipelines/ocr/pipeline.py:236        |
+|                        Resize                       |    /PaddleX/paddlex/inference/models/common/vision/processors.py:117    |
+|                    CTCLabelDecode                   |   /PaddleX/paddlex/inference/models/text_recognition/processors.py:189  |
+|                         Topk                        | /PaddleX/paddlex/inference/models/image_classification/processors.py:83 |
+|                  OCRReisizeNormImg                  |   /PaddleX/paddlex/inference/models/text_recognition/processors.py:65   |
+|                       ToBatch                       |   /PaddleX/paddlex/inference/models/text_recognition/processors.py:235  |
+|                      ToCHWImage                     |    /PaddleX/paddlex/inference/models/common/vision/processors.py:277    |
+|                 _OCRPipeline.predict                |         /PaddleX/paddlex/inference/pipelines/ocr/pipeline.py:282        |
+|                       ToBatch                       |    /PaddleX/paddlex/inference/models/common/vision/processors.py:284    |
+|       _OCRPipeline.check_model_settings_valid       |         /PaddleX/paddlex/inference/pipelines/ocr/pipeline.py:176        |
++-----------------------------------------------------+-------------------------------------------------------------------------------+
+                                           Detail Data
++------+----------------------------------------------------------------+-----------------------+
+| Step | Operation                                                      | Time                  |
++------+----------------------------------------------------------------+-----------------------+
+|  1   | _OCRPipeline.predict                                           | 375.11244628956774    |
+|  2   |     -> _OCRPipeline.get_model_settings                         | 0.00428391998866573   |
+|  3   |     -> _OCRPipeline.check_model_settings_valid                 | 0.0024828016466926783 |
+|  4   |     -> _OCRPipeline.get_text_det_params                        | 0.005152080120751634  |
+|  5   |     -> ReadImage                                               | 3.2029549301660154    |
+|  6   |     -> _DocPreprocessorPipeline.predict                        | 27.310913350374904    |
+|  7   |         -> _DocPreprocessorPipeline.get_model_settings         | 0.004107539862161502  |
+|  8   |         -> _DocPreprocessorPipeline.check_model_settings_valid | 0.0016830896493047476 |
+|  9   |         -> ReadImage                                           | 0.0029576495580840856 |
+|  10  |         -> ClasPredictor.apply                                 | 4.701614730001893     |
+|  11  |             -> ReadImage                                       | 0.13587839042884298   |
+|  12  |             -> ResizeByShort                                   | 0.3281894406245556    |
+|  13  |             -> Crop                                            | 0.01503000981756486   |
+|  14  |             -> Normalize                                       | 0.3884544402535539    |
+|  15  |             -> ToCHWImage                                      | 0.006330519245238975  |
+|  16  |             -> ToBatch                                         | 0.14169737987685949   |
+|  17  |             -> PaddleInferChainLegacy                          | 3.283889550511958     |
+|  18  |             -> Topk                                            | 0.10010718091507442   |
+|  19  |         -> WarpPredictor.apply                                 | 21.893062600429403    |
+|  20  |             -> ReadImage                                       | 0.004573430051095784  |
+|  21  |             -> Normalize                                       | 4.245691860560328     |
+|  22  |             -> ToCHWImage                                      | 0.005895959911867976  |
+|  23  |             -> ToBatch                                         | 1.7250755491841119    |
+|  24  |             -> PaddleInferChainLegacy                          | 10.887994960212382    |
+|  25  |             -> DocTrPostProcess                                | 1.4253830898087472    |
+|  26  |     -> TextDetPredictor.apply                                  | 49.976056129235076    |
+|  27  |         -> ReadImage                                           | 0.004843260976485908  |
+|  28  |         -> DetResizeForTest                                    | 3.3269549095712136    |
+|  29  |         -> NormalizeImage                                      | 2.9576204597833566    |
+|  30  |         -> ToCHWImage                                          | 0.005182310123927891  |
+|  31  |         -> ToBatch                                             | 1.046062790119322     |
+|  32  |         -> PaddleInferChainLegacy                              | 34.70224040953326     |
+|  33  |         -> DBPostProcess                                       | 5.826775671303039     |
+|  34  |     -> ClasPredictor.apply                                     | 23.43678753997665     |
+|  35  |         -> ReadImage                                           | 0.0633359991479665    |
+|  36  |         -> Resize                                              | 0.24419097986537963   |
+|  37  |         -> Normalize                                           | 0.480741420033155     |
+|  38  |         -> ToCHWImage                                          | 0.0066608507768251    |
+|  39  |         -> ToBatch                                             | 0.18536171046434902   |
+|  40  |         -> PaddleInferChainLegacy                              | 3.3766339404974133    |
+|  41  |         -> Topk                                                | 0.15909907990135252   |
+|  42  |         -> ReadImage                                           | 0.0395357194065582    |
+|  43  |         -> Resize                                              | 0.2085290702234488    |
+|  44  |         -> Normalize                                           | 0.4068155895220116    |
+|  45  |         -> ToCHWImage                                          | 0.005677459557773545  |
+|  46  |         -> ToBatch                                             | 0.11155156986205839   |
+|  47  |         -> PaddleInferChainLegacy                              | 2.7268862597702537    |
+|  48  |         -> Topk                                                | 0.13428127014776692   |
+|  49  |         -> ReadImage                                           | 0.032502070971531793  |
+|  50  |         -> Resize                                              | 0.20152631899691187   |
+|  51  |         -> Normalize                                           | 0.347195100330282     |
+|  52  |         -> ToCHWImage                                          | 0.005517759709618986  |
+|  53  |         -> ToBatch                                             | 0.10656953061698005   |
+|  54  |         -> PaddleInferChainLegacy                              | 2.612808299745666     |
+|  55  |         -> Topk                                                | 0.13188434022595175   |
+|  56  |         -> ReadImage                                           | 0.03589507090509869   |
+|  57  |         -> Resize                                              | 0.2076980892161373    |
+|  58  |         -> Normalize                                           | 0.3592138692329172    |
+|  59  |         -> ToCHWImage                                          | 0.005206359783187509  |
+|  60  |         -> ToBatch                                             | 0.1359267797670327    |
+|  61  |         -> PaddleInferChainLegacy                              | 2.619662079960108     |
+|  62  |         -> Topk                                                | 0.130717080028262     |
+|  63  |         -> ReadImage                                           | 0.038393009890569374  |
+|  64  |         -> Resize                                              | 0.19743553988519125   |
+|  65  |         -> Normalize                                           | 0.33197281998582184   |
+|  66  |         -> ToCHWImage                                          | 0.00512515107402578   |
+|  67  |         -> ToBatch                                             | 0.10293568033375777   |
+|  68  |         -> PaddleInferChainLegacy                              | 2.5824282996472903    |
+|  69  |         -> Topk                                                | 0.129485729848966     |
+|  70  |         -> ReadImage                                           | 0.04028105002362281   |
+|  71  |         -> Resize                                              | 0.10972122952807695   |
+|  72  |         -> Normalize                                           | 0.1787920702190604    |
+|  73  |         -> ToCHWImage                                          | 0.00408922991482541   |
+|  74  |         -> ToBatch                                             | 0.05458273953991011   |
+|  75  |         -> PaddleInferChainLegacy                              | 2.262636839877814     |
+|  76  |         -> Topk                                                | 0.1055472502775956    |
+|  77  |     -> _OCRPipeline.rotate_image                               | 0.05102259965497069   |
+|  78  |     -> TextRecPredictor.apply                                  | 169.44437422047486    |
+|  79  |         -> ReadImage                                           | 0.004737989947898313  |
+|  80  |         -> OCRReisizeNormImg                                   | 0.46037410967983305   |
+|  81  |         -> ToBatch                                             | 0.6405122207070235    |
+|  82  |         -> PaddleInferChainLegacy                              | 15.439773340767715    |
+|  83  |         -> CTCLabelDecode                                      | 10.742378439754248    |
+|  84  |         -> ReadImage                                           | 0.006349970353767276  |
+|  85  |         -> OCRReisizeNormImg                                   | 0.6252558408596087    |
+|  86  |         -> ToBatch                                             | 0.7338531101413537    |
+|  87  |         -> PaddleInferChainLegacy                              | 15.204189889482222    |
+|  88  |         -> CTCLabelDecode                                      | 6.7516070799320005    |
+|  89  |         -> ReadImage                                           | 0.006978959863772616  |
+|  90  |         -> OCRReisizeNormImg                                   | 0.7167729703360237    |
+|  91  |         -> ToBatch                                             | 0.6568272292497568    |
+|  92  |         -> PaddleInferChainLegacy                              | 14.973864750063512    |
+|  93  |         -> CTCLabelDecode                                      | 6.695752280211309     |
+|  94  |         -> ReadImage                                           | 0.0070425499870907515 |
+|  95  |         -> OCRReisizeNormImg                                   | 0.7757280093210284    |
+|  96  |         -> ToBatch                                             | 0.6442721793428063    |
+|  97  |         -> PaddleInferChainLegacy                              | 15.027350780292181    |
+|  98  |         -> CTCLabelDecode                                      | 6.661591530573787     |
+|  99  |         -> ReadImage                                           | 0.007066540565574542  |
+| 100  |         -> OCRReisizeNormImg                                   | 0.9195591000025161    |
+| 101  |         -> ToBatch                                             | 0.7951801503077149    |
+| 102  |         -> PaddleInferChainLegacy                              | 15.379044259898365    |
+| 103  |         -> CTCLabelDecode                                      | 9.372330370388227     |
+| 104  |         -> ReadImage                                           | 0.006225309771252796  |
+| 105  |         -> OCRReisizeNormImg                                   | 1.1437026296334807    |
+| 106  |         -> ToBatch                                             | 1.091715270158602     |
+| 107  |         -> PaddleInferChainLegacy                              | 23.505835609685164    |
+| 108  |         -> CTCLabelDecode                                      | 17.118994210031815    |
++------+----------------------------------------------------------------+-----------------------+
+                                      Summary Data
++-------+-----------------------------------------------------+-----------------------+
+| Level | Operation                                           | Time                  |
++-------+-----------------------------------------------------+-----------------------+
+|   1   | _OCRPipeline.predict                                | 375.11244628956774    |
+|       |                                                     |                       |
+|   2   | Layer                                               | 375.11244628956774    |
+|       | Core                                                | 273.4340275716386     |
+|       | Other                                               | 101.67841871792916    |
+|       | _OCRPipeline.get_model_settings                     | 0.00428391998866573   |
+|       | _OCRPipeline.check_model_settings_valid             | 0.0024828016466926783 |
+|       | _OCRPipeline.get_text_det_params                    | 0.005152080120751634  |
+|       | ReadImage                                           | 3.2029549301660154    |
+|       | _DocPreprocessorPipeline.predict                    | 27.310913350374904    |
+|       | TextDetPredictor.apply                              | 49.976056129235076    |
+|       | ClasPredictor.apply                                 | 23.43678753997665     |
+|       | _OCRPipeline.rotate_image                           | 0.05102259965497069   |
+|       | TextRecPredictor.apply                              | 169.44437422047486    |
+|       |                                                     |                       |
+|   3   | Layer                                               | 270.1681312400615     |
+|       | Core                                                | 261.8130224109336     |
+|       | Other                                               | 8.355108829127857     |
+|       | _DocPreprocessorPipeline.get_model_settings         | 0.004107539862161502  |
+|       | _DocPreprocessorPipeline.check_model_settings_valid | 0.0016830896493047476 |
+|       | ReadImage                                           | 0.29614515136927366   |
+|       | ClasPredictor.apply                                 | 4.701614730001893     |
+|       | WarpPredictor.apply                                 | 21.893062600429403    |
+|       | DetResizeForTest                                    | 3.3269549095712136    |
+|       | NormalizeImage                                      | 2.9576204597833566    |
+|       | ToCHWImage                                          | 0.03745912094018422   |
+|       | ToBatch                                             | 6.305350960610667     |
+|       | PaddleInferChainLegacy                              | 150.41335475922097    |
+|       | DBPostProcess                                       | 5.826775671303039     |
+|       | Resize                                              | 1.1691012277151458    |
+|       | Normalize                                           | 2.104730869323248     |
+|       | Topk                                                | 0.7910147504298948    |
+|       | OCRReisizeNormImg                                   | 4.641392659832491     |
+|       | CTCLabelDecode                                      | 57.342653910891386    |
+|       |                                                     |                       |
+|   4   | Layer                                               | 26.594677330431296    |
+|       | Core                                                | 22.69419176140218     |
+|       | Other                                               | 3.900485569029115     |
+|       | ReadImage                                           | 0.14045182047993876   |
+|       | ResizeByShort                                       | 0.3281894406245556    |
+|       | Crop                                                | 0.01503000981756486   |
+|       | Normalize                                           | 4.634146300813882     |
+|       | ToCHWImage                                          | 0.012226479157106951  |
+|       | ToBatch                                             | 1.8667729290609714    |
+|       | PaddleInferChainLegacy                              | 14.17188451072434     |
+|       | Topk                                                | 0.10010718091507442   |
+|       | DocTrPostProcess                                    | 1.4253830898087472    |
++-------+-----------------------------------------------------+-----------------------+
+```
+
+上述结果会保存到到本地:`./benchmark/detail.csv` 和 `./benchmark/summary.csv`:
+
+`detail.csv` 内容如下:
+
+```
+Step,Operation,Time
+1,_OCRPipeline.predict,375.11244628956774
+2,    -> _OCRPipeline.get_model_settings,0.00428391998866573
+3,    -> _OCRPipeline.check_model_settings_valid,0.0024828016466926783
+4,    -> _OCRPipeline.get_text_det_params,0.005152080120751634
+5,    -> ReadImage,3.2029549301660154
+6,    -> _DocPreprocessorPipeline.predict,27.310913350374904
+7,        -> _DocPreprocessorPipeline.get_model_settings,0.004107539862161502
+8,        -> _DocPreprocessorPipeline.check_model_settings_valid,0.0016830896493047476
+9,        -> ReadImage,0.0029576495580840856
+10,        -> ClasPredictor.apply,4.701614730001893
+11,            -> ReadImage,0.13587839042884298
+12,            -> ResizeByShort,0.3281894406245556
+13,            -> Crop,0.01503000981756486
+14,            -> Normalize,0.3884544402535539
+15,            -> ToCHWImage,0.006330519245238975
+16,            -> ToBatch,0.14169737987685949
+17,            -> PaddleInferChainLegacy,3.283889550511958
+18,            -> Topk,0.10010718091507442
+19,        -> WarpPredictor.apply,21.893062600429403
+20,            -> ReadImage,0.004573430051095784
+21,            -> Normalize,4.245691860560328
+22,            -> ToCHWImage,0.005895959911867976
+23,            -> ToBatch,1.7250755491841119
+24,            -> PaddleInferChainLegacy,10.887994960212382
+25,            -> DocTrPostProcess,1.4253830898087472
+26,    -> TextDetPredictor.apply,49.976056129235076
+27,        -> ReadImage,0.004843260976485908
+28,        -> DetResizeForTest,3.3269549095712136
+29,        -> NormalizeImage,2.9576204597833566
+30,        -> ToCHWImage,0.005182310123927891
+31,        -> ToBatch,1.046062790119322
+32,        -> PaddleInferChainLegacy,34.70224040953326
+33,        -> DBPostProcess,5.826775671303039
+34,    -> ClasPredictor.apply,23.43678753997665
+35,        -> ReadImage,0.0633359991479665
+36,        -> Resize,0.24419097986537963
+37,        -> Normalize,0.480741420033155
+38,        -> ToCHWImage,0.0066608507768251
+39,        -> ToBatch,0.18536171046434902
+40,        -> PaddleInferChainLegacy,3.3766339404974133
+41,        -> Topk,0.15909907990135252
+42,        -> ReadImage,0.0395357194065582
+43,        -> Resize,0.2085290702234488
+44,        -> Normalize,0.4068155895220116
+45,        -> ToCHWImage,0.005677459557773545
+46,        -> ToBatch,0.11155156986205839
+47,        -> PaddleInferChainLegacy,2.7268862597702537
+48,        -> Topk,0.13428127014776692
+49,        -> ReadImage,0.032502070971531793
+50,        -> Resize,0.20152631899691187
+51,        -> Normalize,0.347195100330282
+52,        -> ToCHWImage,0.005517759709618986
+53,        -> ToBatch,0.10656953061698005
+54,        -> PaddleInferChainLegacy,2.612808299745666
+55,        -> Topk,0.13188434022595175
+56,        -> ReadImage,0.03589507090509869
+57,        -> Resize,0.2076980892161373
+58,        -> Normalize,0.3592138692329172
+59,        -> ToCHWImage,0.005206359783187509
+60,        -> ToBatch,0.1359267797670327
+61,        -> PaddleInferChainLegacy,2.619662079960108
+62,        -> Topk,0.130717080028262
+63,        -> ReadImage,0.038393009890569374
+64,        -> Resize,0.19743553988519125
+65,        -> Normalize,0.33197281998582184
+66,        -> ToCHWImage,0.00512515107402578
+67,        -> ToBatch,0.10293568033375777
+68,        -> PaddleInferChainLegacy,2.5824282996472903
+69,        -> Topk,0.129485729848966
+70,        -> ReadImage,0.04028105002362281
+71,        -> Resize,0.10972122952807695
+72,        -> Normalize,0.1787920702190604
+73,        -> ToCHWImage,0.00408922991482541
+74,        -> ToBatch,0.05458273953991011
+75,        -> PaddleInferChainLegacy,2.262636839877814
+76,        -> Topk,0.1055472502775956
+77,    -> _OCRPipeline.rotate_image,0.05102259965497069
+78,    -> TextRecPredictor.apply,169.44437422047486
+79,        -> ReadImage,0.004737989947898313
+80,        -> OCRReisizeNormImg,0.46037410967983305
+81,        -> ToBatch,0.6405122207070235
+82,        -> PaddleInferChainLegacy,15.439773340767715
+83,        -> CTCLabelDecode,10.742378439754248
+84,        -> ReadImage,0.006349970353767276
+85,        -> OCRReisizeNormImg,0.6252558408596087
+86,        -> ToBatch,0.7338531101413537
+87,        -> PaddleInferChainLegacy,15.204189889482222
+88,        -> CTCLabelDecode,6.7516070799320005
+89,        -> ReadImage,0.006978959863772616
+90,        -> OCRReisizeNormImg,0.7167729703360237
+91,        -> ToBatch,0.6568272292497568
+92,        -> PaddleInferChainLegacy,14.973864750063512
+93,        -> CTCLabelDecode,6.695752280211309
+94,        -> ReadImage,0.0070425499870907515
+95,        -> OCRReisizeNormImg,0.7757280093210284
+96,        -> ToBatch,0.6442721793428063
+97,        -> PaddleInferChainLegacy,15.027350780292181
+98,        -> CTCLabelDecode,6.661591530573787
+99,        -> ReadImage,0.007066540565574542
+100,        -> OCRReisizeNormImg,0.9195591000025161
+101,        -> ToBatch,0.7951801503077149
+102,        -> PaddleInferChainLegacy,15.379044259898365
+103,        -> CTCLabelDecode,9.372330370388227
+104,        -> ReadImage,0.006225309771252796
+105,        -> OCRReisizeNormImg,1.1437026296334807
+106,        -> ToBatch,1.091715270158602
+107,        -> PaddleInferChainLegacy,23.505835609685164
+108,        -> CTCLabelDecode,17.118994210031815
+```
+
+`summary.csv` 内容如下:
+
+```
+Level,Operation,Time
+1,_OCRPipeline.predict,375.11244628956774
+,,
+2,Layer,375.11244628956774
+,Core,273.4340275716386
+,Other,101.67841871792916
+,_OCRPipeline.get_model_settings,0.00428391998866573
+,_OCRPipeline.check_model_settings_valid,0.0024828016466926783
+,_OCRPipeline.get_text_det_params,0.005152080120751634
+,ReadImage,3.2029549301660154
+,_DocPreprocessorPipeline.predict,27.310913350374904
+,TextDetPredictor.apply,49.976056129235076
+,ClasPredictor.apply,23.43678753997665
+,_OCRPipeline.rotate_image,0.05102259965497069
+,TextRecPredictor.apply,169.44437422047486
+,,
+3,Layer,270.1681312400615
+,Core,261.8130224109336
+,Other,8.355108829127857
+,_DocPreprocessorPipeline.get_model_settings,0.004107539862161502
+,_DocPreprocessorPipeline.check_model_settings_valid,0.0016830896493047476
+,ReadImage,0.29614515136927366
+,ClasPredictor.apply,4.701614730001893
+,WarpPredictor.apply,21.893062600429403
+,DetResizeForTest,3.3269549095712136
+,NormalizeImage,2.9576204597833566
+,ToCHWImage,0.03745912094018422
+,ToBatch,6.305350960610667
+,PaddleInferChainLegacy,150.41335475922097
+,DBPostProcess,5.826775671303039
+,Resize,1.1691012277151458
+,Normalize,2.104730869323248
+,Topk,0.7910147504298948
+,OCRReisizeNormImg,4.641392659832491
+,CTCLabelDecode,57.342653910891386
+,,
+4,Layer,26.594677330431296
+,Core,22.69419176140218
+,Other,3.900485569029115
+,ReadImage,0.14045182047993876
+,ResizeByShort,0.3281894406245556
+,Crop,0.01503000981756486
+,Normalize,4.634146300813882
+,ToCHWImage,0.012226479157106951
+,ToBatch,1.8667729290609714
+,PaddleInferChainLegacy,14.17188451072434
+,Topk,0.10010718091507442
+,DocTrPostProcess,1.4253830898087472
+```

+ 1 - 0
mkdocs.yml

@@ -421,6 +421,7 @@ nav:
          - PaddleX通用模型配置文件参数说明: module_usage/instructions/config_parameters_common.md
          - PaddleX时序任务模型配置文件参数说明: module_usage/instructions/config_parameters_time_series.md
          - 模型推理 Benchmark: module_usage/instructions/benchmark.md
+         - 产线推理 Benchmark: pipeline_usage/instructions/benchmark.md
   - 模型产线部署:
        - 高性能推理: pipeline_deploy/high_performance_inference.md
        - 服务化部署: pipeline_deploy/serving.md

+ 8 - 0
paddlex/inference/models/base/predictor/base_predictor.py

@@ -27,6 +27,7 @@ from .....utils.flags import (
     INFER_BENCHMARK,
     INFER_BENCHMARK_ITERS,
     INFER_BENCHMARK_WARMUP,
+    PIPELINE_BENCHMARK,
 )
 from .....utils.subclass_register import AutoRegisterABCMetaClass
 from ....common.batch_sampler import BaseBatchSampler
@@ -207,6 +208,13 @@ class BasePredictor(
                 benchmark.collect(batch_size)
 
             yield output[0]
+        elif PIPELINE_BENCHMARK:
+
+            @benchmark.timeit_with_options(name=type(self).__name__ + ".apply")
+            def _apply(input, **kwargs):
+                return list(self.apply(input, **kwargs))
+
+            yield from _apply(input, **kwargs)
         else:
             yield from self.apply(input, **kwargs)
 

+ 2 - 0
paddlex/inference/pipelines/anomaly_detection/pipeline.py

@@ -18,12 +18,14 @@ import numpy as np
 
 from ....utils.deps import pipeline_requires_extra
 from ...models.anomaly_detection.result import UadResult
+from ...utils.benchmark import benchmark
 from ...utils.hpi import HPIConfig
 from ...utils.pp_option import PaddlePredictorOption
 from .._parallel import AutoParallelImageSimpleInferencePipeline
 from ..base import BasePipeline
 
 
+@benchmark.time_methods
 class _AnomalyDetectionPipeline(BasePipeline):
     """Image AnomalyDetectionPipeline Pipeline"""
 

+ 2 - 0
paddlex/inference/pipelines/attribute_recognition/pipeline.py

@@ -19,6 +19,7 @@ import numpy as np
 from ....utils.deps import pipeline_requires_extra
 from ...common.batch_sampler import ImageBatchSampler
 from ...common.reader import ReadImage
+from ...utils.benchmark import benchmark
 from ...utils.hpi import HPIConfig
 from ...utils.pp_option import PaddlePredictorOption
 from .._parallel import AutoParallelImageSimpleInferencePipeline
@@ -27,6 +28,7 @@ from ..components import CropByBoxes
 from .result import AttributeRecResult
 
 
+@benchmark.time_methods
 class _AttributeRecPipeline(BasePipeline):
     """Attribute Rec Pipeline"""
 

+ 2 - 0
paddlex/inference/pipelines/doc_preprocessor/pipeline.py

@@ -20,6 +20,7 @@ from ....utils import logging
 from ....utils.deps import pipeline_requires_extra
 from ...common.batch_sampler import ImageBatchSampler
 from ...common.reader import ReadImage
+from ...utils.benchmark import benchmark
 from ...utils.hpi import HPIConfig
 from ...utils.pp_option import PaddlePredictorOption
 from .._parallel import AutoParallelImageSimpleInferencePipeline
@@ -28,6 +29,7 @@ from ..components import rotate_image
 from .result import DocPreprocessorResult
 
 
+@benchmark.time_methods
 class _DocPreprocessorPipeline(BasePipeline):
     """Doc Preprocessor Pipeline"""
 

+ 2 - 0
paddlex/inference/pipelines/doc_understanding/pipeline.py

@@ -16,11 +16,13 @@ from typing import Any, Dict, Optional, Union
 
 from ....utils.deps import pipeline_requires_extra
 from ...models.doc_vlm.result import DocVLMResult
+from ...utils.benchmark import benchmark
 from ...utils.hpi import HPIConfig
 from ...utils.pp_option import PaddlePredictorOption
 from ..base import BasePipeline
 
 
+@benchmark.time_methods
 @pipeline_requires_extra("multimodal")
 class DocUnderstandingPipeline(BasePipeline):
     """Doc Understanding Pipeline"""

+ 2 - 0
paddlex/inference/pipelines/face_recognition/pipeline.py

@@ -15,10 +15,12 @@
 import numpy as np
 
 from ....utils.deps import pipeline_requires_extra
+from ...utils.benchmark import benchmark
 from ..pp_shitu_v2 import ShiTuV2Pipeline
 from .result import FaceRecResult
 
 
+@benchmark.time_methods
 @pipeline_requires_extra("cv")
 class FaceRecPipeline(ShiTuV2Pipeline):
     """Face Recognition Pipeline"""

+ 2 - 0
paddlex/inference/pipelines/formula_recognition/pipeline.py

@@ -21,6 +21,7 @@ from ....utils.deps import pipeline_requires_extra
 from ...common.batch_sampler import ImageBatchSampler
 from ...common.reader import ReadImage
 from ...models.object_detection.result import DetResult
+from ...utils.benchmark import benchmark
 from ...utils.hpi import HPIConfig
 from ...utils.pp_option import PaddlePredictorOption
 from .._parallel import AutoParallelImageSimpleInferencePipeline
@@ -29,6 +30,7 @@ from ..components import CropByBoxes
 from .result import FormulaRecognitionResult
 
 
+@benchmark.time_methods
 class _FormulaRecognitionPipeline(BasePipeline):
     """Formula Recognition Pipeline"""
 

+ 2 - 0
paddlex/inference/pipelines/image_classification/pipeline.py

@@ -18,12 +18,14 @@ import numpy as np
 
 from ....utils.deps import pipeline_requires_extra
 from ...models.image_classification.result import TopkResult
+from ...utils.benchmark import benchmark
 from ...utils.hpi import HPIConfig
 from ...utils.pp_option import PaddlePredictorOption
 from .._parallel import AutoParallelImageSimpleInferencePipeline
 from ..base import BasePipeline
 
 
+@benchmark.time_methods
 class _ImageClassificationPipeline(BasePipeline):
     """Image Classification Pipeline"""
 

+ 2 - 0
paddlex/inference/pipelines/image_multilabel_classification/pipeline.py

@@ -18,12 +18,14 @@ import numpy as np
 
 from ....utils.deps import pipeline_requires_extra
 from ...models.image_multilabel_classification.result import MLClassResult
+from ...utils.benchmark import benchmark
 from ...utils.hpi import HPIConfig
 from ...utils.pp_option import PaddlePredictorOption
 from .._parallel import AutoParallelImageSimpleInferencePipeline
 from ..base import BasePipeline
 
 
+@benchmark.time_methods
 class _ImageMultiLabelClassificationPipeline(BasePipeline):
     """Image Multi Label Classification Pipeline"""
 

+ 2 - 0
paddlex/inference/pipelines/instance_segmentation/pipeline.py

@@ -18,12 +18,14 @@ import numpy as np
 
 from ....utils.deps import pipeline_requires_extra
 from ...models.instance_segmentation.result import InstanceSegResult
+from ...utils.benchmark import benchmark
 from ...utils.hpi import HPIConfig
 from ...utils.pp_option import PaddlePredictorOption
 from .._parallel import AutoParallelImageSimpleInferencePipeline
 from ..base import BasePipeline
 
 
+@benchmark.time_methods
 class _InstanceSegmentationPipeline(BasePipeline):
     """Instance Segmentation Pipeline"""
 

+ 2 - 0
paddlex/inference/pipelines/keypoint_detection/pipeline.py

@@ -18,6 +18,7 @@ import numpy as np
 
 from ....utils.deps import pipeline_requires_extra
 from ...models.keypoint_detection.result import KptResult
+from ...utils.benchmark import benchmark
 from ...utils.hpi import HPIConfig
 from ...utils.pp_option import PaddlePredictorOption
 from .._parallel import AutoParallelImageSimpleInferencePipeline
@@ -26,6 +27,7 @@ from ..base import BasePipeline
 Number = Union[int, float]
 
 
+@benchmark.time_methods
 class _KeypointDetectionPipeline(BasePipeline):
     """Keypoint Detection pipeline"""
 

+ 2 - 0
paddlex/inference/pipelines/layout_parsing/pipeline.py

@@ -21,6 +21,7 @@ from ....utils.deps import pipeline_requires_extra
 from ...common.batch_sampler import ImageBatchSampler
 from ...common.reader import ReadImage
 from ...models.object_detection.result import DetResult
+from ...utils.benchmark import benchmark
 from ...utils.hpi import HPIConfig
 from ...utils.pp_option import PaddlePredictorOption
 from .._parallel import AutoParallelImageSimpleInferencePipeline
@@ -31,6 +32,7 @@ from .result import LayoutParsingResult
 from .utils import get_sub_regions_ocr_res, sorted_layout_boxes
 
 
+@benchmark.time_methods
 class _LayoutParsingPipeline(BasePipeline):
     """Layout Parsing Pipeline"""
 

+ 2 - 0
paddlex/inference/pipelines/layout_parsing/pipeline_v2.py

@@ -25,6 +25,7 @@ from ....utils.deps import pipeline_requires_extra
 from ...common.batch_sampler import ImageBatchSampler
 from ...common.reader import ReadImage
 from ...models.object_detection.result import DetResult
+from ...utils.benchmark import benchmark
 from ...utils.hpi import HPIConfig
 from ...utils.pp_option import PaddlePredictorOption
 from .._parallel import AutoParallelImageSimpleInferencePipeline
@@ -48,6 +49,7 @@ from .utils import (
 from .xycut_enhanced import xycut_enhanced
 
 
+@benchmark.time_methods
 class _LayoutParsingPipelineV2(BasePipeline):
     """Layout Parsing Pipeline V2"""
 

+ 2 - 0
paddlex/inference/pipelines/m_3d_bev_detection/pipeline.py

@@ -18,11 +18,13 @@ import numpy as np
 
 from ....utils.deps import pipeline_requires_extra
 from ...models.m_3d_bev_detection.result import BEV3DDetResult
+from ...utils.benchmark import benchmark
 from ...utils.hpi import HPIConfig
 from ...utils.pp_option import PaddlePredictorOption
 from ..base import BasePipeline
 
 
+@benchmark.time_methods
 @pipeline_requires_extra("cv")
 class BEVDet3DPipeline(BasePipeline):
     """3D Detection Pipeline"""

+ 2 - 0
paddlex/inference/pipelines/multilingual_speech_recognition/pipeline.py

@@ -18,11 +18,13 @@ import numpy as np
 
 from ....utils.deps import pipeline_requires_extra
 from ...models.multilingual_speech_recognition.result import WhisperResult
+from ...utils.benchmark import benchmark
 from ...utils.hpi import HPIConfig
 from ...utils.pp_option import PaddlePredictorOption
 from ..base import BasePipeline
 
 
+@benchmark.time_methods
 @pipeline_requires_extra("speech")
 class MultilingualSpeechRecognitionPipeline(BasePipeline):
     """Multilingual Speech Recognition Pipeline"""

+ 2 - 0
paddlex/inference/pipelines/object_detection/pipeline.py

@@ -18,12 +18,14 @@ import numpy as np
 
 from ....utils.deps import pipeline_requires_extra
 from ...models.object_detection.result import DetResult
+from ...utils.benchmark import benchmark
 from ...utils.hpi import HPIConfig
 from ...utils.pp_option import PaddlePredictorOption
 from .._parallel import AutoParallelImageSimpleInferencePipeline
 from ..base import BasePipeline
 
 
+@benchmark.time_methods
 class _ObjectDetectionPipeline(BasePipeline):
     """Object Detection Pipeline"""
 

+ 2 - 0
paddlex/inference/pipelines/ocr/pipeline.py

@@ -20,6 +20,7 @@ from ....utils import logging
 from ....utils.deps import pipeline_requires_extra
 from ...common.batch_sampler import ImageBatchSampler
 from ...common.reader import ReadImage
+from ...utils.benchmark import benchmark
 from ...utils.hpi import HPIConfig
 from ...utils.pp_option import PaddlePredictorOption
 from .._parallel import AutoParallelImageSimpleInferencePipeline
@@ -34,6 +35,7 @@ from ..components import (
 from .result import OCRResult
 
 
+@benchmark.time_methods
 class _OCRPipeline(BasePipeline):
     """OCR Pipeline"""
 

+ 2 - 0
paddlex/inference/pipelines/open_vocabulary_detection/pipeline.py

@@ -18,11 +18,13 @@ import numpy as np
 
 from ....utils.deps import pipeline_requires_extra
 from ...models.object_detection.result import DetResult
+from ...utils.benchmark import benchmark
 from ...utils.hpi import HPIConfig
 from ...utils.pp_option import PaddlePredictorOption
 from ..base import BasePipeline
 
 
+@benchmark.time_methods
 @pipeline_requires_extra("multimodal")
 class OpenVocabularyDetectionPipeline(BasePipeline):
     """Open Vocabulary Detection Pipeline"""

+ 2 - 0
paddlex/inference/pipelines/open_vocabulary_segmentation/pipeline.py

@@ -18,6 +18,7 @@ import numpy as np
 
 from ....utils.deps import pipeline_requires_extra
 from ...models.open_vocabulary_segmentation.results import SAMSegResult
+from ...utils.benchmark import benchmark
 from ...utils.hpi import HPIConfig
 from ...utils.pp_option import PaddlePredictorOption
 from ..base import BasePipeline
@@ -25,6 +26,7 @@ from ..base import BasePipeline
 Number = Union[int, float]
 
 
+@benchmark.time_methods
 @pipeline_requires_extra("multimodal")
 class OpenVocabularySegmentationPipeline(BasePipeline):
     """Open Vocabulary Segmentation pipeline"""

+ 2 - 0
paddlex/inference/pipelines/pp_chatocr/pipeline_v3.py

@@ -25,6 +25,7 @@ from ....utils.deps import pipeline_requires_extra
 from ....utils.file_interface import custom_open
 from ...common.batch_sampler import ImageBatchSampler
 from ...common.reader import ReadImage
+from ...utils.benchmark import benchmark
 from ...utils.hpi import HPIConfig
 from ...utils.pp_option import PaddlePredictorOption
 from ..components.chat_server import BaseChat
@@ -32,6 +33,7 @@ from ..layout_parsing.result import LayoutParsingResult
 from .pipeline_base import PP_ChatOCR_Pipeline
 
 
+@benchmark.time_methods
 @pipeline_requires_extra("ie")
 class PP_ChatOCRv3_Pipeline(PP_ChatOCR_Pipeline):
     """PP-ChatOCR Pipeline"""

+ 2 - 0
paddlex/inference/pipelines/pp_chatocr/pipeline_v4.py

@@ -30,6 +30,7 @@ from ....utils.deps import (
 from ....utils.file_interface import custom_open
 from ...common.batch_sampler import ImageBatchSampler
 from ...common.reader import ReadImage
+from ...utils.benchmark import benchmark
 from ...utils.hpi import HPIConfig
 from ...utils.pp_option import PaddlePredictorOption
 from ..components.chat_server import BaseChat
@@ -40,6 +41,7 @@ if is_dep_available("opencv-contrib-python"):
     import cv2
 
 
+@benchmark.time_methods
 @pipeline_requires_extra("ie")
 class PP_ChatOCRv4_Pipeline(PP_ChatOCR_Pipeline):
     """PP-ChatOCRv4 Pipeline"""

+ 2 - 0
paddlex/inference/pipelines/pp_doctranslation/pipeline.py

@@ -21,6 +21,7 @@ import numpy as np
 from ....utils import logging
 from ....utils.deps import pipeline_requires_extra
 from ...common.batch_sampler import MarkDownBatchSampler
+from ...utils.benchmark import benchmark
 from ...utils.hpi import HPIConfig
 from ...utils.pp_option import PaddlePredictorOption
 from ..base import BasePipeline
@@ -33,6 +34,7 @@ from .utils import (
 )
 
 
+@benchmark.time_methods
 @pipeline_requires_extra("trans")
 class PP_DocTranslation_Pipeline(BasePipeline):
     """

+ 2 - 0
paddlex/inference/pipelines/pp_shitu_v2/pipeline.py

@@ -17,6 +17,7 @@ from typing import Any, Dict, Optional, Union
 from ....utils.deps import pipeline_requires_extra
 from ...common.batch_sampler import ImageBatchSampler
 from ...common.reader import ReadImage
+from ...utils.benchmark import benchmark
 from ...utils.hpi import HPIConfig
 from ...utils.pp_option import PaddlePredictorOption
 from ..base import BasePipeline
@@ -24,6 +25,7 @@ from ..components import CropByBoxes, FaissBuilder, FaissIndexer
 from .result import ShiTuResult
 
 
+@benchmark.time_methods
 @pipeline_requires_extra("cv")
 class ShiTuV2Pipeline(BasePipeline):
     """ShiTuV2 Pipeline"""

+ 2 - 0
paddlex/inference/pipelines/rotated_object_detection/pipeline.py

@@ -18,12 +18,14 @@ import numpy as np
 
 from ....utils.deps import pipeline_requires_extra
 from ...models.object_detection.result import DetResult
+from ...utils.benchmark import benchmark
 from ...utils.hpi import HPIConfig
 from ...utils.pp_option import PaddlePredictorOption
 from .._parallel import AutoParallelImageSimpleInferencePipeline
 from ..base import BasePipeline
 
 
+@benchmark.time_methods
 class _RotatedObjectDetectionPipeline(BasePipeline):
     """Rotated Object Detection Pipeline"""
 

+ 2 - 0
paddlex/inference/pipelines/seal_recognition/pipeline.py

@@ -21,6 +21,7 @@ from ....utils.deps import pipeline_requires_extra
 from ...common.batch_sampler import ImageBatchSampler
 from ...common.reader import ReadImage
 from ...models.object_detection.result import DetResult
+from ...utils.benchmark import benchmark
 from ...utils.hpi import HPIConfig
 from ...utils.pp_option import PaddlePredictorOption
 from .._parallel import AutoParallelImageSimpleInferencePipeline
@@ -29,6 +30,7 @@ from ..components import CropByBoxes
 from .result import SealRecognitionResult
 
 
+@benchmark.time_methods
 class _SealRecognitionPipeline(BasePipeline):
     """Seal Recognition Pipeline"""
 

+ 2 - 0
paddlex/inference/pipelines/semantic_segmentation/pipeline.py

@@ -18,12 +18,14 @@ import numpy as np
 
 from ....utils.deps import pipeline_requires_extra
 from ...models.semantic_segmentation.result import SegResult
+from ...utils.benchmark import benchmark
 from ...utils.hpi import HPIConfig
 from ...utils.pp_option import PaddlePredictorOption
 from .._parallel import AutoParallelImageSimpleInferencePipeline
 from ..base import BasePipeline
 
 
+@benchmark.time_methods
 class _SemanticSegmentationPipeline(BasePipeline):
     """Semantic Segmentation Pipeline"""
 

+ 2 - 0
paddlex/inference/pipelines/small_object_detection/pipeline.py

@@ -18,12 +18,14 @@ import numpy as np
 
 from ....utils.deps import pipeline_requires_extra
 from ...models.object_detection.result import DetResult
+from ...utils.benchmark import benchmark
 from ...utils.hpi import HPIConfig
 from ...utils.pp_option import PaddlePredictorOption
 from .._parallel import AutoParallelImageSimpleInferencePipeline
 from ..base import BasePipeline
 
 
+@benchmark.time_methods
 class _SmallObjectDetectionPipeline(BasePipeline):
     """Small Object Detection Pipeline"""
 

+ 2 - 0
paddlex/inference/pipelines/table_recognition/pipeline.py

@@ -22,6 +22,7 @@ from ....utils.deps import pipeline_requires_extra
 from ...common.batch_sampler import ImageBatchSampler
 from ...common.reader import ReadImage
 from ...models.object_detection.result import DetResult
+from ...utils.benchmark import benchmark
 from ...utils.hpi import HPIConfig
 from ...utils.pp_option import PaddlePredictorOption
 from .._parallel import AutoParallelImageSimpleInferencePipeline
@@ -34,6 +35,7 @@ from .table_recognition_post_processing import get_table_recognition_res
 from .utils import get_neighbor_boxes_idx
 
 
+@benchmark.time_methods
 class _TableRecognitionPipeline(BasePipeline):
     """Table Recognition Pipeline"""
 

+ 2 - 0
paddlex/inference/pipelines/table_recognition/pipeline_v2.py

@@ -27,6 +27,7 @@ from ....utils.deps import (
 from ...common.batch_sampler import ImageBatchSampler
 from ...common.reader import ReadImage
 from ...models.object_detection.result import DetResult
+from ...utils.benchmark import benchmark
 from ...utils.hpi import HPIConfig
 from ...utils.pp_option import PaddlePredictorOption
 from .._parallel import AutoParallelImageSimpleInferencePipeline
@@ -46,6 +47,7 @@ if is_dep_available("scikit-learn"):
     from sklearn.cluster import KMeans
 
 
+@benchmark.time_methods
 class _TableRecognitionPipelineV2(BasePipeline):
     """Table Recognition Pipeline"""
 

+ 2 - 0
paddlex/inference/pipelines/ts_anomaly_detection/pipeline.py

@@ -18,11 +18,13 @@ import pandas as pd
 
 from ....utils.deps import pipeline_requires_extra
 from ...models.ts_anomaly_detection.result import TSAdResult
+from ...utils.benchmark import benchmark
 from ...utils.hpi import HPIConfig
 from ...utils.pp_option import PaddlePredictorOption
 from ..base import BasePipeline
 
 
+@benchmark.time_methods
 @pipeline_requires_extra("ts")
 class TSAnomalyDetPipeline(BasePipeline):
     """TSAnomalyDetPipeline Pipeline"""

+ 2 - 0
paddlex/inference/pipelines/ts_classification/pipeline.py

@@ -18,11 +18,13 @@ import pandas as pd
 
 from ....utils.deps import pipeline_requires_extra
 from ...models.ts_classification.result import TSClsResult
+from ...utils.benchmark import benchmark
 from ...utils.hpi import HPIConfig
 from ...utils.pp_option import PaddlePredictorOption
 from ..base import BasePipeline
 
 
+@benchmark.time_methods
 @pipeline_requires_extra("ts")
 class TSClsPipeline(BasePipeline):
     """TSClsPipeline Pipeline"""

+ 2 - 0
paddlex/inference/pipelines/ts_forecasting/pipeline.py

@@ -18,11 +18,13 @@ import pandas as pd
 
 from ....utils.deps import pipeline_requires_extra
 from ...models.ts_forecasting.result import TSFcResult
+from ...utils.benchmark import benchmark
 from ...utils.hpi import HPIConfig
 from ...utils.pp_option import PaddlePredictorOption
 from ..base import BasePipeline
 
 
+@benchmark.time_methods
 @pipeline_requires_extra("ts")
 class TSFcPipeline(BasePipeline):
     """TSFcPipeline Pipeline"""

+ 2 - 0
paddlex/inference/pipelines/video_classification/pipeline.py

@@ -18,11 +18,13 @@ import numpy as np
 
 from ....utils.deps import pipeline_requires_extra
 from ...models.video_classification.result import TopkVideoResult
+from ...utils.benchmark import benchmark
 from ...utils.hpi import HPIConfig
 from ...utils.pp_option import PaddlePredictorOption
 from ..base import BasePipeline
 
 
+@benchmark.time_methods
 @pipeline_requires_extra("video")
 class VideoClassificationPipeline(BasePipeline):
     """Video Classification Pipeline"""

+ 2 - 0
paddlex/inference/pipelines/video_detection/pipeline.py

@@ -18,11 +18,13 @@ import numpy as np
 
 from ....utils.deps import pipeline_requires_extra
 from ...models.video_detection.result import DetVideoResult
+from ...utils.benchmark import benchmark
 from ...utils.hpi import HPIConfig
 from ...utils.pp_option import PaddlePredictorOption
 from ..base import BasePipeline
 
 
+@benchmark.time_methods
 @pipeline_requires_extra("video")
 class VideoDetectionPipeline(BasePipeline):
     """Video detection Pipeline"""

+ 274 - 21
paddlex/inference/utils/benchmark.py

@@ -29,6 +29,7 @@ from ...utils.flags import (
     INFER_BENCHMARK,
     INFER_BENCHMARK_OUTPUT_DIR,
     INFER_BENCHMARK_USE_CACHE_FOR_READ,
+    PIPELINE_BENCHMARK,
 )
 
 ENTRY_POINT_NAME = "_entry_point_"
@@ -38,12 +39,20 @@ _inference_operations = []
 
 _is_measuring_time = False
 
+PIPELINE_FUNC_BLACK_LIST = ["inintial_predictor"]
+_step = 0
+_level = 0
+_top_func = None
+
 
 class Benchmark:
     def __init__(self, enabled):
         self._enabled = enabled
         self._elapses = {}
         self._warmup = False
+        self._detail_list = []
+        self._summary_list = []
+        self._operation_list = []
 
     def timeit_with_options(self, name=None, is_read_operation=False):
         # TODO: Refactor
@@ -97,27 +106,63 @@ class Benchmark:
                     return output
 
             else:
-
-                @functools.wraps(func)
-                def _wrapper(*args, **kwargs):
-                    global _is_measuring_time
-                    operation_name = f"{name}@{location}"
-                    if _is_measuring_time:
-                        raise RuntimeError(
-                            "Nested calls detected: Check the timed modules and exclude nested calls to prevent double-counting."
-                        )
-                    if not operation_name.startswith(f"{ENTRY_POINT_NAME}@"):
-                        _is_measuring_time = True
-                    tic = time.perf_counter()
-                    try:
-                        output = func(*args, **kwargs)
-                    finally:
+                if INFER_BENCHMARK:
+
+                    @functools.wraps(func)
+                    def _wrapper(*args, **kwargs):
+                        global _is_measuring_time
+                        operation_name = f"{name}@{location}"
+                        if _is_measuring_time:
+                            raise RuntimeError(
+                                "Nested calls detected: Check the timed modules and exclude nested calls to prevent double-counting."
+                            )
                         if not operation_name.startswith(f"{ENTRY_POINT_NAME}@"):
-                            _is_measuring_time = False
-                    if isinstance(output, GeneratorType):
-                        return self.watch_generator(output, operation_name)
-                    else:
-                        self._update(time.perf_counter() - tic, operation_name)
+                            _is_measuring_time = True
+                        tic = time.perf_counter()
+                        try:
+                            output = func(*args, **kwargs)
+                        finally:
+                            if not operation_name.startswith(f"{ENTRY_POINT_NAME}@"):
+                                _is_measuring_time = False
+                        if isinstance(output, GeneratorType):
+                            return self.watch_generator(output, operation_name)
+                        else:
+                            self._update(time.perf_counter() - tic, operation_name)
+                            return output
+
+                elif PIPELINE_BENCHMARK:
+
+                    @functools.wraps(func)
+                    def _wrapper(*args, **kwargs):
+                        global _step, _level, _top_func
+
+                        _step += 1
+                        _level += 1
+
+                        if _level == 1:
+                            if _top_func is None:
+                                _top_func = f"{name}@{location}"
+                            elif _top_func != f"{name}@{location}":
+                                raise RuntimeError(
+                                    f"Multiple top-level function calls detected:\n"
+                                    f"  Function 1: {_top_func.split('@')[0]}\n"
+                                    f"    Location: {_top_func.split('@')[1]}\n"
+                                    f"  Function 2: {name}\n"
+                                    f"    Location: {location}\n"
+                                    "Only one top-level function can be tracked at a time.\n"
+                                    "Please call 'benchmark.reset()' between top-level function calls."
+                                )
+
+                        operation_name = f"{_step}@{_level}@{name}@{location}"
+
+                        tic = time.perf_counter()
+                        output = func(*args, **kwargs)
+                        if isinstance(output, GeneratorType):
+                            return self.watch_generator_simple(output, operation_name)
+                        else:
+                            self._update(time.perf_counter() - tic, operation_name)
+                            _level -= 1
+
                         return output
 
             if isinstance(func_or_cls, type):
@@ -131,6 +176,20 @@ class Benchmark:
     def timeit(self, func_or_cls):
         return self.timeit_with_options()(func_or_cls)
 
+    def _is_public_method(self, name):
+        return not name.startswith("_")
+
+    def time_methods(self, cls):
+        for name, func in cls.__dict__.items():
+            if (
+                callable(func)
+                and self._is_public_method(name)
+                and not name.startswith("__")
+                and name not in PIPELINE_FUNC_BLACK_LIST
+            ):
+                setattr(cls, name, self.timeit(func))
+        return cls
+
     def watch_generator(self, generator, name):
         @functools.wraps(generator)
         def wrapper():
@@ -156,8 +215,34 @@ class Benchmark:
 
         return wrapper()
 
+    def watch_generator_simple(self, generator, name):
+        @functools.wraps(generator)
+        def wrapper():
+            global _level
+
+            while True:
+                tic = time.perf_counter()
+                try:
+                    item = next(generator)
+                except StopIteration:
+                    break
+                self._update(time.perf_counter() - tic, name)
+                yield item
+
+            _level -= 1
+
+        return wrapper()
+
     def reset(self):
+        global _step, _level, _top_func
+
+        _step = 0
+        _level = 0
+        _top_func = None
         self._elapses = {}
+        self._detail_list = []
+        self._summary_list = []
+        self._operation_list = []
 
     def _update(self, elapse, name):
         elapse = elapse * 1000
@@ -363,6 +448,174 @@ class Benchmark:
                     writer = csv.writer(file)
                     writer.writerows(csv_data)
 
+    def gather_pipeline(self):
+        info_list = []
+        detail_list = []
+        operation_list = set()
+        summary_list = []
+        max_level = 0
+        loop_num = 0
+
+        for name, time_list in self.logs.items():
+            op_time = np.sum(time_list)
+
+            parts = name.split("@")
+            step = int(parts[0])
+            level = int(parts[1])
+            operation_name = parts[2]
+            location = parts[3]
+            if ":" not in location:
+                location = "Unknown"
+
+            operation_list.add((operation_name, location))
+            max_level = max(level, max_level)
+
+            if level == 1:
+                loop_num += 1
+                format_operation_name = operation_name
+            else:
+                format_operation_name = "    " * int(level - 1) + "-> " + operation_name
+            info_list.append(
+                (step, level, operation_name, format_operation_name, op_time)
+            )
+
+        operation_list = list(operation_list)
+        info_list.sort(key=lambda x: x[0])
+        step_num = int(len(info_list) / loop_num)
+        for idx in range(step_num):
+            step = info_list[idx][0]
+            format_operation_name = info_list[idx][3]
+            op_time = (
+                np.sum(
+                    [info_list[pos][4] for pos in range(idx, len(info_list), step_num)]
+                )
+                / loop_num
+            )
+            detail_list.append([step, format_operation_name, op_time])
+
+        level_time_list = [[0] for _ in range(max_level)]
+        for idx, info in enumerate(info_list):
+            step = info[0]
+            level = info[1]
+            operation_name = info[2]
+            op_time = info[4]
+
+            # The total time consumed by all operations on this layer
+            if level > info_list[idx - 1][1]:
+                level_time_list[level - 1].append(info_list[idx - 1][4])
+
+            # The total time consumed by each operation on this layer
+            while len(summary_list) < level:
+                summary_list.append([len(summary_list) + 1, {}])
+            if summary_list[level - 1][1].get(operation_name, None) is None:
+                summary_list[level - 1][1][operation_name] = [op_time]
+            else:
+                summary_list[level - 1][1][operation_name].append(op_time)
+
+        new_summary_list = []
+        for i in range(len(summary_list)):
+            level = summary_list[i][0]
+            op_dict = summary_list[i][1]
+
+            ops_all_time = 0.0
+            op_info_list = []
+            for idx, (name, time_list) in enumerate(op_dict.items()):
+                op_all_time = np.sum(time_list) / loop_num
+                op_info_list.append([level if i + idx == 0 else "", name, op_all_time])
+                ops_all_time += op_all_time
+
+            if i > 0:
+                new_summary_list.append(["", "", ""])
+                new_summary_list.append(
+                    [level, "Layer", np.sum(level_time_list[i]) / loop_num]
+                )
+                new_summary_list.append(["", "Core", ops_all_time])
+                new_summary_list.append(
+                    ["", "Other", np.sum(level_time_list[i]) / loop_num - ops_all_time]
+                )
+            new_summary_list += op_info_list
+
+        return detail_list, new_summary_list, operation_list
+
+    def _initialize_pipeline_data(self):
+        if not (self._operation_list and self._detail_list and self._summary_list):
+            self._detail_list, self._summary_list, self._operation_list = (
+                self.gather_pipeline()
+            )
+
+    def print_pipeline_data(self):
+        self._initialize_pipeline_data()
+        self.print_operation_info()
+        self.print_detail_data()
+        self.print_summary_data()
+
+    def print_operation_info(self):
+        self._initialize_pipeline_data()
+        operation_head = [
+            "Operation",
+            "Source Code Location",
+        ]
+        table = PrettyTable(operation_head)
+        table.add_rows(self._operation_list)
+        table_title = "Operation Info".center(len(str(table).split("\n")[0]), " ")
+        logging.info(table_title)
+        logging.info(table)
+
+    def print_detail_data(self):
+        self._initialize_pipeline_data()
+        detail_head = [
+            "Step",
+            "Operation",
+            "Time",
+        ]
+        table = PrettyTable(detail_head)
+        table.add_rows(self._detail_list)
+        table_title = "Detail Data".center(len(str(table).split("\n")[0]), " ")
+        table.align["Operation"] = "l"
+        table.align["Time"] = "l"
+        logging.info(table_title)
+        logging.info(table)
+
+    def print_summary_data(self):
+        self._initialize_pipeline_data()
+        summary_head = [
+            "Level",
+            "Operation",
+            "Time",
+        ]
+        table = PrettyTable(summary_head)
+        table.add_rows(self._summary_list)
+        table_title = "Summary Data".center(len(str(table).split("\n")[0]), " ")
+        table.align["Operation"] = "l"
+        table.align["Time"] = "l"
+        logging.info(table_title)
+        logging.info(table)
+
+    def save_pipeline_data(self, save_path):
+        self._initialize_pipeline_data()
+        save_dir = Path(save_path)
+        save_dir.mkdir(parents=True, exist_ok=True)
+
+        detail_head = [
+            "Step",
+            "Operation",
+            "Time",
+        ]
+        csv_data = [detail_head, *self._detail_list]
+        with open(Path(save_dir) / "detail.csv", "w", newline="") as file:
+            writer = csv.writer(file)
+            writer.writerows(csv_data)
+
+        summary_head = [
+            "Level",
+            "Operation",
+            "Time",
+        ]
+        csv_data = [summary_head, *self._summary_list]
+        with open(Path(save_dir) / "summary.csv", "w", newline="") as file:
+            writer = csv.writer(file)
+            writer.writerows(csv_data)
+
 
 def get_inference_operations():
     return _inference_operations
@@ -373,7 +626,7 @@ def set_inference_operations(val):
     _inference_operations = val
 
 
-if INFER_BENCHMARK:
+if INFER_BENCHMARK or PIPELINE_BENCHMARK:
     benchmark = Benchmark(enabled=True)
 else:
     benchmark = Benchmark(enabled=False)

+ 2 - 0
paddlex/utils/flags.py

@@ -21,6 +21,7 @@ __all__ = [
     "CHECK_OPTS",
     "EAGER_INITIALIZATION",
     "INFER_BENCHMARK",
+    "PIPELINE_BENCHMARK",
     "INFER_BENCHMARK_ITERS",
     "INFER_BENCHMARK_WARMUP",
     "INFER_BENCHMARK_OUTPUT_DIR",
@@ -65,6 +66,7 @@ MODEL_SOURCE = os.environ.get("PADDLE_PDX_MODEL_SOURCE", "huggingface")
 
 # Inference Benchmark
 INFER_BENCHMARK = get_flag_from_env_var("PADDLE_PDX_INFER_BENCHMARK", False)
+PIPELINE_BENCHMARK = get_flag_from_env_var("PADDLE_PDX_PIPELINE_BENCHMARK", False)
 INFER_BENCHMARK_WARMUP = get_flag_from_env_var(
     "PADDLE_PDX_INFER_BENCHMARK_WARMUP", 0, int
 )