Browse Source

[Feat] upgrade benchmark (#3416)

* update benchmark

* update

* update

* h2d d2h

* update

* update doc

* fix input sequence

* update

* update

* update

* update

* update ts common processors

* update

* update

* update

* update
zhang-prog 8 months ago
parent
commit
5158085f23
33 changed files with 515 additions and 320 deletions
  1. 167 46
      docs/module_usage/instructions/benchmark.md
  2. 3 0
      paddlex/engine.py
  3. 1 14
      paddlex/inference/common/batch_sampler/base_batch_sampler.py
  4. 6 4
      paddlex/inference/common/batch_sampler/image_batch_sampler.py
  5. 2 0
      paddlex/inference/common/reader/image_reader.py
  6. 9 0
      paddlex/inference/models/3d_bev_detection/processors.py
  7. 3 0
      paddlex/inference/models/anomaly_detection/processors.py
  8. 0 1
      paddlex/inference/models/base/predictor/base_predictor.py
  9. 23 16
      paddlex/inference/models/base/predictor/basic_predictor.py
  10. 32 63
      paddlex/inference/models/common/static_infer.py
  11. 7 1
      paddlex/inference/models/common/ts/processors.py
  12. 7 0
      paddlex/inference/models/common/vision/processors.py
  13. 11 0
      paddlex/inference/models/formula_recognition/processors.py
  14. 3 0
      paddlex/inference/models/image_classification/processors.py
  15. 3 0
      paddlex/inference/models/image_feature/processors.py
  16. 3 0
      paddlex/inference/models/image_multilabel_classification/processors.py
  17. 3 0
      paddlex/inference/models/image_unwarping/processors.py
  18. 2 0
      paddlex/inference/models/instance_segmentation/processors.py
  19. 3 0
      paddlex/inference/models/keypoint_detection/processors.py
  20. 10 0
      paddlex/inference/models/object_detection/processors.py
  21. 5 0
      paddlex/inference/models/open_vocabulary_detection/processors/groundingdino_processors.py
  22. 3 0
      paddlex/inference/models/open_vocabulary_segmentation/processors/sam_processer.py
  23. 3 0
      paddlex/inference/models/semantic_segmentation/processors.py
  24. 3 0
      paddlex/inference/models/table_structure_recognition/processors.py
  25. 4 0
      paddlex/inference/models/text_detection/processors.py
  26. 5 0
      paddlex/inference/models/text_recognition/processors.py
  27. 3 0
      paddlex/inference/models/ts_anomaly_detection/processors.py
  28. 4 0
      paddlex/inference/models/ts_classification/processors.py
  29. 4 0
      paddlex/inference/models/ts_forecasting/processors.py
  30. 8 0
      paddlex/inference/models/video_classification/processors.py
  31. 6 0
      paddlex/inference/models/video_detection/processors.py
  32. 168 170
      paddlex/inference/utils/benchmark.py
  33. 1 5
      paddlex/utils/flags.py

+ 167 - 46
docs/module_usage/instructions/benchmark.md

@@ -1,74 +1,195 @@
 # 模型推理 Benchmark
 # 模型推理 Benchmark
 
 
-PaddleX 支持统计模型推理耗时,需通过环境变量进行设置,具体如下:
+## 目录
+
+- [1. 使用说明](#1.使用说明)
+- [2. 使用示例](#2.使用示例)
+  - [2.1 命令行方式](#2.1-命令行方式)
+  - [2.2 Python 脚本方式](#2.2-Python-脚本方式)
+- [3. 结果说明](#3.结果说明)
+
+## 1.使用说明
+
+Benchmark 会统计模型在端到端推理过程中,所有操作(`Operation`)和阶段(`Stage`)的每次迭代的平均执行时间(`Avg Time Per Iter (ms)`)和每个样本的平均执行时间(`Avg Time Per Instance (ms)`),单位为毫秒。
+
+需通过环境变量启用 Benchmark,具体如下:
 
 
 * `PADDLE_PDX_INFER_BENCHMARK`:设置为 `True` 时则开启 Benchmark,默认为 `False`;
 * `PADDLE_PDX_INFER_BENCHMARK`:设置为 `True` 时则开启 Benchmark,默认为 `False`;
-* `PADDLE_PDX_INFER_BENCHMARK_WARMUP`:设置 warm up,在开始测试前,使用随机数据循环迭代 n 次,默认为 `0`;
-* `PADDLE_PDX_INFER_BENCHMARK_DATA_SIZE`: 设置随机数据的尺寸,默认为 `224`;
-* `PADDLE_PDX_INFER_BENCHMARK_ITER`:使用随机数据进行 Benchmark 测试的循环次数,仅当输入数据为 `None` 时,将使用随机数据进行测试,默认为 `10`;
+* `PADDLE_PDX_INFER_BENCHMARK_WARMUP`:设置 warm up,在开始测试前循环迭代 n 次,默认为 `0`;
+* `PADDLE_PDX_INFER_BENCHMARK_ITER`:进行 Benchmark 测试的循环次数,默认为 `0`;
 * `PADDLE_PDX_INFER_BENCHMARK_OUTPUT`:用于设置保存的目录,如 `./benchmark`,默认为 `None`,表示不保存 Benchmark 指标;
 * `PADDLE_PDX_INFER_BENCHMARK_OUTPUT`:用于设置保存的目录,如 `./benchmark`,默认为 `None`,表示不保存 Benchmark 指标;
 
 
-使用示例如下:
+**注意**:
+
+* `PADDLE_PDX_INFER_BENCHMARK_WARMUP` 或 `PADDLE_PDX_INFER_BENCHMARK_ITER` 需要至少设置一个大于零的值,否则无法启用 Benchmark。
+
+## 2.使用示例
+
+您可以通过以下两种方式来使用 benchmark:命令行方式和 Python 脚本方式。
+
+### 2.1 命令行方式
+
+**注意**:
+
+- 输入参数说明可参考 [PaddleX通用模型配置文件参数说明](./config_parameters_common.md)
+- `Predict.input` 在 Benchmark 只能被设置为输入数据的本地路径。如果 `batch_size` 大于 1,输入数据将被重复 `batch_size` 次以匹配 `batch_size` 的大小。
+
+执行命令:
 
 
 ```bash
 ```bash
 PADDLE_PDX_INFER_BENCHMARK=True \
 PADDLE_PDX_INFER_BENCHMARK=True \
 PADDLE_PDX_INFER_BENCHMARK_WARMUP=5 \
 PADDLE_PDX_INFER_BENCHMARK_WARMUP=5 \
-PADDLE_PDX_INFER_BENCHMARK_DATA_SIZE=320 \
 PADDLE_PDX_INFER_BENCHMARK_ITER=10 \
 PADDLE_PDX_INFER_BENCHMARK_ITER=10 \
 PADDLE_PDX_INFER_BENCHMARK_OUTPUT=./benchmark \
 PADDLE_PDX_INFER_BENCHMARK_OUTPUT=./benchmark \
 python main.py \
 python main.py \
-    -c ./paddlex/configs/object_detection/PicoDet-XS.yaml \
+    -c ./paddlex/configs/modules/object_detection/PicoDet-XS.yaml \
     -o Global.mode=predict \
     -o Global.mode=predict \
     -o Predict.model_dir=None \
     -o Predict.model_dir=None \
     -o Predict.batch_size=2 \
     -o Predict.batch_size=2 \
-    -o Predict.input=None
+    -o Predict.input=./test.png
+
+# 使用pptrt推理后端
+#   -o Predict.kernel_option="{'run_mode': 'trt_fp32'}"
+```
+
+### 2.2 Python 脚本方式
+
+**注意**:
+
+- 输入参数说明可参考 [PaddleX单模型Python脚本使用说明](./model_python_API.md)
+- `input` 在 Benchmark 只能被设置为输入数据的本地路径。如果 `batch_size` 大于 1,输入数据将被重复 `batch_size` 次以匹配 `batch_size` 的大小。
+
+创建 `test_infer.py` 脚本:
+
+```python
+from paddlex import create_model
+
+model = create_model(model_name="PicoDet-XS", model_dir=None)
+output = list(model.predict(input="./test.png", batch_size=2))
+
+# 使用pptrt推理后端
+# from paddlex import create_model
+# from paddlex.inference.utils.pp_option import PaddlePredictorOption
+
+# pp_option = PaddlePredictorOption()
+# pp_option.run_mode = "trt_fp32"
+# model = create_model(model_name="PicoDet-XS", model_dir=None, pp_option=pp_option)
+# output = list(model.predict(input="./test.png", batch_size=2))
 ```
 ```
 
 
-在开启 Benchmark 后,将自动打印 benchmark 指标:
+执行脚本
 
 
+```bash
+PADDLE_PDX_INFER_BENCHMARK=True \
+PADDLE_PDX_INFER_BENCHMARK_WARMUP=5 \
+PADDLE_PDX_INFER_BENCHMARK_ITER=10 \
+PADDLE_PDX_INFER_BENCHMARK_OUTPUT=./benchmark \
+python test_infer.py
 ```
 ```
-+----------------+-----------------+-----------------+------------------------+
-|   Component    | Total Time (ms) | Number of Calls | Avg Time Per Call (ms) |
-+----------------+-----------------+-----------------+------------------------+
-|    ReadCmp     |   99.60412979   |        10       |       9.96041298       |
-|     Resize     |   17.01641083   |        20       |       0.85082054       |
-|   Normalize    |   44.61312294   |        20       |       2.23065615       |
-|   ToCHWImage   |    0.03385544   |        20       |       0.00169277       |
-|    Copy2GPU    |   13.46874237   |        10       |       1.34687424       |
-|     Infer      |   71.31743431   |        10       |       7.13174343       |
-|    Copy2CPU    |    0.39076805   |        10       |       0.03907681       |
-| DetPostProcess |    0.36168098   |        20       |       0.01808405       |
-+----------------+-----------------+-----------------+------------------------+
-+-------------+-----------------+---------------------+----------------------------+
-|    Stage    | Total Time (ms) | Number of Instances | Avg Time Per Instance (ms) |
-+-------------+-----------------+---------------------+----------------------------+
-|  PreProcess |   161.26751900  |          20         |         8.06337595         |
-|  Inference  |   85.17694473   |          20         |         4.25884724         |
-| PostProcess |    0.36168098   |          20         |         0.01808405         |
-|   End2End   |   256.90770149  |          20         |        12.84538507         |
-|    WarmUp   |  5412.37807274  |          10         |        541.23780727        |
-+-------------+-----------------+---------------------+----------------------------+
+
+## 3.结果示例
+
+在开启 Benchmark 后,将自动打印 Benchmark 结果,具体说明如下:
+
+<table border="1">
+    <thead>
+        <tr>
+            <th>字段名</th>
+            <th>字段含义</th>
+        </tr>
+    </thead>
+    <tbody>
+        <tr>
+            <td>Iters</td>
+            <td>迭代次数,指执行模型推理的循环次数。</td>
+        </tr>
+        <tr>
+            <td>Batch Size</td>
+            <td>批处理大小,指每次迭代中处理的样本数量。</td>
+        </tr>
+        <tr>
+            <td>Instances</td>
+            <td>总样本数量,计算方式为 <code>Iters</code> 乘以 <code>Batch Size</code>。</td>
+        </tr>
+        <tr>
+            <td>Operation</td>
+            <td>操作名称,如 <code>Resize</code>、<code>Normalize</code> 等。</td>
+        </tr>
+        <tr>
+            <td>Stage</td>
+            <td>阶段名称,包括预处理(PreProcess)、推理(Inference)、后处理(PostProcess)、以及端到端(End2End)。</td>
+        </tr>
+        <tr>
+            <td>Avg Time Per Iter (ms)</td>
+            <td>每次迭代的平均执行时间,单位为毫秒。</td>
+        </tr>
+        <tr>
+            <td>Avg Time Per Instance (ms)</td>
+            <td>每个样本的平均执行时间,单位为毫秒。</td>
+        </tr>
+    </tbody>
+</table>
+
+运行第2节的示例程序所得到的 Benchmark 结果如下:
+
+```
+                                             WarmUp Data
++-------+------------+-----------+-------------+------------------------+----------------------------+
+| Iters | Batch Size | Instances |    Stage    | Avg Time Per Iter (ms) | Avg Time Per Instance (ms) |
++-------+------------+-----------+-------------+------------------------+----------------------------+
+|   5   |     2      |     10    |  PreProcess |      98.70615005       |        49.35307503         |
+|   5   |     2      |     10    |  Inference  |      68.70298386       |        34.35149193         |
+|   5   |     2      |     10    | PostProcess |       0.22978783       |         0.11489391         |
+|   5   |     2      |     10    |   End2End   |      167.63892174      |        83.81946087         |
++-------+------------+-----------+-------------+------------------------+----------------------------+
+                                               Detail Data
++-------+------------+-----------+----------------+------------------------+----------------------------+
+| Iters | Batch Size | Instances |   Operation    | Avg Time Per Iter (ms) | Avg Time Per Instance (ms) |
++-------+------------+-----------+----------------+------------------------+----------------------------+
+|   10  |     2      |     20    |   ReadImage    |      77.00567245       |        38.50283623         |
+|   10  |     2      |     20    |     Resize     |      11.97342873       |         5.98671436         |
+|   10  |     2      |     20    |   Normalize    |       6.09791279       |         3.04895639         |
+|   10  |     2      |     20    |   ToCHWImage   |       0.00574589       |         0.00287294         |
+|   10  |     2      |     20    |    ToBatch     |       0.72050095       |         0.36025047         |
+|   10  |     2      |     20    |    Copy2GPU    |       3.15101147       |         1.57550573         |
+|   10  |     2      |     20    |     Infer      |       9.58673954       |         4.79336977         |
+|   10  |     2      |     20    |    Copy2CPU    |       0.07462502       |         0.03731251         |
+|   10  |     2      |     20    | DetPostProcess |       0.22695065       |         0.11347532         |
++-------+------------+-----------+----------------+------------------------+----------------------------+
+                                             Summary Data
++-------+------------+-----------+-------------+------------------------+----------------------------+
+| Iters | Batch Size | Instances |    Stage    | Avg Time Per Iter (ms) | Avg Time Per Instance (ms) |
++-------+------------+-----------+-------------+------------------------+----------------------------+
+|   10  |     2      |     20    |  PreProcess |      95.80326080       |        47.90163040         |
+|   10  |     2      |     20    |  Inference  |      12.81237602       |         6.40618801         |
+|   10  |     2      |     20    | PostProcess |       0.22695065       |         0.11347532         |
+|   10  |     2      |     20    |   End2End   |      108.84258747      |        54.42129374         |
++-------+------------+-----------+-------------+------------------------+----------------------------+
 ```
 ```
 
 
-在 Benchmark 结果中,会统计该模型全部组件(`Component`)的总耗时(`Total Time`,单位为“毫秒”)、**调用次数**(`Number of Calls`)、**调用**平均执行耗时(`Avg Time Per Call`,单位“毫秒”),以及按预热(`WarmUp`)、预处理(`PreProcess`)、模型推理(`Inference`)、后处理(`PostProcess`)和端到端(`End2End`)进行划分的耗时统计,包括每个阶段的总耗时(`Total Time`,单位为“毫秒”)、**样本数**(`Number of Instances`)和**单样本**平均执行耗时(`Avg Time Per Instance`,单位“毫秒”),同时,上述指标会保存到到本地: `./benchmark/detail.csv` 和 `./benchmark/summary.csv`:
+同时,由于设置了`PADDLE_PDX_INFER_BENCHMARK_OUTPUT=./benchmark`,所以上述结果会保存到到本地: `./benchmark/detail.csv` 和 `./benchmark/summary.csv`:
+
+`detail.csv` 内容如下:
 
 
 ```csv
 ```csv
-Component,Total Time (ms),Number of Calls,Avg Time Per Call (ms)
-ReadCmp,99.60412979125977,10,9.960412979125977
-Resize,17.01641082763672,20,0.8508205413818359
-Normalize,44.61312294006348,20,2.230656147003174
-ToCHWImage,0.033855438232421875,20,0.0016927719116210938
-Copy2GPU,13.468742370605469,10,1.3468742370605469
-Infer,71.31743431091309,10,7.131743431091309
-Copy2CPU,0.39076805114746094,10,0.039076805114746094
-DetPostProcess,0.3616809844970703,20,0.018084049224853516
+Iters,Batch Size,Instances,Operation,Avg Time Per Iter (ms),Avg Time Per Instance (ms)
+10,2,20,ReadImage,77.00567245,38.50283623
+10,2,20,Resize,11.97342873,5.98671436
+10,2,20,Normalize,6.09791279,3.04895639
+10,2,20,ToCHWImage,0.00574589,0.00287294
+10,2,20,ToBatch,0.72050095,0.36025047
+10,2,20,Copy2GPU,3.15101147,1.57550573
+10,2,20,Infer,9.58673954,4.79336977
+10,2,20,Copy2CPU,0.07462502,0.03731251
+10,2,20,DetPostProcess,0.22695065,0.11347532
 ```
 ```
 
 
+`summary.csv` 内容如下:
+
 ```csv
 ```csv
-Stage,Total Time (ms),Number of Instances,Avg Time Per Instance (ms)
-PreProcess,161.26751899719238,20,8.06337594985962
-Inference,85.17694473266602,20,4.258847236633301
-PostProcess,0.3616809844970703,20,0.018084049224853516
-End2End,256.90770149230957,20,12.845385074615479
-WarmUp,5412.3780727386475,10,541.2378072738647
+Iters,Batch Size,Instances,Stage,Avg Time Per Iter (ms),Avg Time Per Instance (ms)
+10,2,20,PreProcess,95.80326080,47.90163040
+10,2,20,Inference,12.81237602,6.40618801
+10,2,20,PostProcess,0.22695065,0.11347532
+10,2,20,End2End,108.84258747,54.42129374
 ```
 ```

+ 3 - 0
paddlex/engine.py

@@ -19,6 +19,7 @@ from .utils.result_saver import try_except_decorator
 from .utils.config import parse_args, get_config
 from .utils.config import parse_args, get_config
 from .utils.errors import raise_unsupported_api_error
 from .utils.errors import raise_unsupported_api_error
 from .model import _ModelBasedConfig
 from .model import _ModelBasedConfig
+from .utils.flags import INFER_BENCHMARK
 
 
 
 
 class Engine(object):
 class Engine(object):
@@ -47,6 +48,8 @@ class Engine(object):
             return self._model.export()
             return self._model.export()
         elif self._mode == "predict":
         elif self._mode == "predict":
             for res in self._model.predict():
             for res in self._model.predict():
+                if INFER_BENCHMARK:
+                    continue
                 res.print()
                 res.print()
                 if self._output:
                 if self._output:
                     res.save_all(save_path=self._output)
                     res.save_all(save_path=self._output)

+ 1 - 14
paddlex/inference/common/batch_sampler/base_batch_sampler.py

@@ -15,12 +15,6 @@
 from typing import Union, Tuple, List, Dict, Any, Iterator
 from typing import Union, Tuple, List, Dict, Any, Iterator
 from abc import ABC, abstractmethod
 from abc import ABC, abstractmethod
 
 
-from ....utils.flags import (
-    INFER_BENCHMARK,
-    INFER_BENCHMARK_ITER,
-    INFER_BENCHMARK_DATA_SIZE,
-)
-
 
 
 class BaseBatchSampler:
 class BaseBatchSampler:
     """BaseBatchSampler"""
     """BaseBatchSampler"""
@@ -33,9 +27,6 @@ class BaseBatchSampler:
         """
         """
         super().__init__()
         super().__init__()
         self._batch_size = batch_size
         self._batch_size = batch_size
-        self._benchmark = INFER_BENCHMARK
-        self._benchmark_iter = INFER_BENCHMARK_ITER
-        self._benchmark_data_size = INFER_BENCHMARK_DATA_SIZE
 
 
     @property
     @property
     def batch_size(self) -> int:
     def batch_size(self) -> int:
@@ -69,11 +60,7 @@ class BaseBatchSampler:
         Yields:
         Yields:
             Iterator[List[Any]]: An iterator yielding the batch data.
             Iterator[List[Any]]: An iterator yielding the batch data.
         """
         """
-        if input is None and self._benchmark:
-            for _ in range(self._benchmark_iter):
-                yield self._rand_batch(self._benchmark_data_size)
-        else:
-            yield from self.sample(input)
+        yield from self.sample(input)
 
 
     @abstractmethod
     @abstractmethod
     def sample(self, *args: Tuple[Any], **kwargs: Dict[str, Any]) -> Iterator[list]:
     def sample(self, *args: Tuple[Any], **kwargs: Dict[str, Any]) -> Iterator[list]:

+ 6 - 4
paddlex/inference/common/batch_sampler/image_batch_sampler.py

@@ -128,9 +128,11 @@ class ImageBatchSampler(BaseBatchSampler):
                 assert all(isinstance(item, int) for item in res)
                 assert all(isinstance(item, int) for item in res)
                 return res
                 return res
 
 
+        rand_batch = ImgInstance()
         size = parse_size(data_size)
         size = parse_size(data_size)
-        rand_batch = [
-            np.random.randint(0, 256, (*size, 3), dtype=np.uint8)
-            for _ in range(self.batch_size)
-        ]
+        for _ in range(self.batch_size):
+            rand_batch.append(
+                np.random.randint(0, 256, (*size, 3), dtype=np.uint8), None, None
+            )
+
         return rand_batch
         return rand_batch

+ 2 - 0
paddlex/inference/common/reader/image_reader.py

@@ -16,6 +16,7 @@ import numpy as np
 import cv2
 import cv2
 
 
 from ...utils.io import ImageReader, PDFReader
 from ...utils.io import ImageReader, PDFReader
+from ...utils.benchmark import benchmark
 
 
 
 
 class ReadImage:
 class ReadImage:
@@ -40,6 +41,7 @@ class ReadImage:
         flags = self._FLAGS_DICT[self.format]
         flags = self._FLAGS_DICT[self.format]
         self._img_reader = ImageReader(backend="opencv", flags=flags)
         self._img_reader = ImageReader(backend="opencv", flags=flags)
 
 
+    @benchmark.timeit
     def __call__(self, imgs):
     def __call__(self, imgs):
         """apply"""
         """apply"""
         return [self.read(img) for img in imgs]
         return [self.read(img) for img in imgs]

+ 9 - 0
paddlex/inference/models/3d_bev_detection/processors.py

@@ -22,6 +22,7 @@ import lazy_paddle as paddle
 from ...utils.io import ImageReader
 from ...utils.io import ImageReader
 from ....utils import logging
 from ....utils import logging
 from ...common.reader.det_3d_reader import Sample
 from ...common.reader.det_3d_reader import Sample
+from ...utils.benchmark import benchmark
 
 
 
 
 cv2_interp_codes = {
 cv2_interp_codes = {
@@ -70,6 +71,7 @@ class LoadPointsFromFile:
         points = np.fromfile(pts_filename, dtype=np.float32)
         points = np.fromfile(pts_filename, dtype=np.float32)
         return points
         return points
 
 
+    @benchmark.timeit
     def __call__(self, results):
     def __call__(self, results):
         """Call function to load points data from file and process it.
         """Call function to load points data from file and process it.
 
 
@@ -219,6 +221,7 @@ class LoadPointsFromMultiSweeps(object):
         )
         )
         return points[filt]
         return points[filt]
 
 
+    @benchmark.timeit
     def __call__(self, results):
     def __call__(self, results):
         """Call function to load multi-sweep point clouds from files.
         """Call function to load multi-sweep point clouds from files.
 
 
@@ -305,6 +308,7 @@ class LoadMultiViewImageFromFiles:
         self.constant_std = constant_std
         self.constant_std = constant_std
         self.imread_flag = imread_flag
         self.imread_flag = imread_flag
 
 
+    @benchmark.timeit
     def __call__(self, sample):
     def __call__(self, sample):
         """
         """
         Call method to load multi-view image from files and update the sample dictionary.
         Call method to load multi-view image from files and update the sample dictionary.
@@ -636,6 +640,7 @@ class ResizeImage:
         """Resize semantic segmentation map with ``results['scale']``."""
         """Resize semantic segmentation map with ``results['scale']``."""
         raise NotImplementedError
         raise NotImplementedError
 
 
+    @benchmark.timeit
     def __call__(self, results):
     def __call__(self, results):
         """Call function to resize images, bounding boxes, masks, and semantic segmentation maps according to the provided scale or scale factor.
         """Call function to resize images, bounding boxes, masks, and semantic segmentation maps according to the provided scale or scale factor.
 
 
@@ -709,6 +714,7 @@ class NormalizeImage:
         cv2.multiply(img, stdinv, img)  # inplace
         cv2.multiply(img, stdinv, img)  # inplace
         return img
         return img
 
 
+    @benchmark.timeit
     def __call__(self, results):
     def __call__(self, results):
         """Call method to normalize images in the results dictionary.
         """Call method to normalize images in the results dictionary.
 
 
@@ -853,6 +859,7 @@ class PadImage(object):
         """Pad semantic segmentation map according to ``results['pad_shape']``."""
         """Pad semantic segmentation map according to ``results['pad_shape']``."""
         raise NotImplementedError
         raise NotImplementedError
 
 
+    @benchmark.timeit
     def __call__(self, results):
     def __call__(self, results):
         """Call function to pad images, masks, semantic segmentation maps."""
         """Call function to pad images, masks, semantic segmentation maps."""
         self._pad_img(results)
         self._pad_img(results)
@@ -890,6 +897,7 @@ class SampleFilterByKey:
         self.keys = keys
         self.keys = keys
         self.meta_keys = meta_keys
         self.meta_keys = meta_keys
 
 
+    @benchmark.timeit
     def __call__(self, sample):
     def __call__(self, sample):
         """Call function to filter sample by keys. The keys in `meta_keys` are used to filter metadata from the input sample.
         """Call function to filter sample by keys. The keys in `meta_keys` are used to filter metadata from the input sample.
 
 
@@ -944,6 +952,7 @@ class GetInferInput:
                 collated_batch[k] = [elem[k] for elem in batch]
                 collated_batch[k] = [elem[k] for elem in batch]
         return collated_batch
         return collated_batch
 
 
+    @benchmark.timeit
     def __call__(self, sample):
     def __call__(self, sample):
         """Call function to infer input data from transformed sample
         """Call function to infer input data from transformed sample
 
 

+ 3 - 0
paddlex/inference/models/anomaly_detection/processors.py

@@ -15,6 +15,8 @@
 import numpy as np
 import numpy as np
 from skimage import measure, morphology
 from skimage import measure, morphology
 
 
+from ...utils.benchmark import benchmark
+
 
 
 class MapToMask:
 class MapToMask:
     """Map_to_mask"""
     """Map_to_mask"""
@@ -25,6 +27,7 @@ class MapToMask:
         """
         """
         super().__init__()
         super().__init__()
 
 
+    @benchmark.timeit
     def __call__(self, preds, *args):
     def __call__(self, preds, *args):
         """apply"""
         """apply"""
         return [self.apply(pred) for pred in preds]
         return [self.apply(pred) for pred in preds]

+ 0 - 1
paddlex/inference/models/base/predictor/base_predictor.py

@@ -71,7 +71,6 @@ class BasePredictor(ABC):
 
 
         # alias predict() to the __call__()
         # alias predict() to the __call__()
         self.predict = self.__call__
         self.predict = self.__call__
-        self.benchmark = None
 
 
     @property
     @property
     def config_path(self) -> str:
     def config_path(self) -> str:

+ 23 - 16
paddlex/inference/models/base/predictor/basic_predictor.py

@@ -19,6 +19,7 @@ from .....utils.subclass_register import AutoRegisterABCMetaClass
 from .....utils.flags import (
 from .....utils.flags import (
     INFER_BENCHMARK,
     INFER_BENCHMARK,
     INFER_BENCHMARK_WARMUP,
     INFER_BENCHMARK_WARMUP,
+    INFER_BENCHMARK_ITER,
 )
 )
 from .....utils import logging
 from .....utils import logging
 from ....utils.pp_option import PaddlePredictorOption
 from ....utils.pp_option import PaddlePredictorOption
@@ -69,7 +70,6 @@ class BasicPredictor(
         self.batch_sampler.batch_size = batch_size
         self.batch_sampler.batch_size = batch_size
 
 
         logging.debug(f"{self.__class__.__name__}: {self.model_dir}")
         logging.debug(f"{self.__class__.__name__}: {self.model_dir}")
-        self.benchmark = benchmark
 
 
     def __call__(
     def __call__(
         self,
         self,
@@ -93,23 +93,30 @@ class BasicPredictor(
             Iterator[Any]: An iterator yielding the prediction output.
             Iterator[Any]: An iterator yielding the prediction output.
         """
         """
         self.set_predictor(batch_size, device, pp_option)
         self.set_predictor(batch_size, device, pp_option)
-        if self.benchmark:
-            self.benchmark.start()
+        if INFER_BENCHMARK:
+            # TODO(zhang-prog): Get metadata of input data
+            if not isinstance(input, str):
+                raise TypeError("Only support string as input")
+            input = [input] * batch_size
+
+            if not (INFER_BENCHMARK_WARMUP > 0 or INFER_BENCHMARK_ITER > 0):
+                raise RuntimeError(
+                    "At least one of `INFER_BENCHMARK_WARMUP` and `INFER_BENCHMARK_ITER` must be greater than zero"
+                )
+
             if INFER_BENCHMARK_WARMUP > 0:
             if INFER_BENCHMARK_WARMUP > 0:
-                output = self.apply(input, **kwargs)
-                warmup_num = 0
+                benchmark.start_warmup()
                 for _ in range(INFER_BENCHMARK_WARMUP):
                 for _ in range(INFER_BENCHMARK_WARMUP):
-                    try:
-                        next(output)
-                        warmup_num += 1
-                    except StopIteration:
-                        logging.warning(
-                            f"There are only {warmup_num} batches in input data, but `INFER_BENCHMARK_WARMUP` has been set to {INFER_BENCHMARK_WARMUP}."
-                        )
-                        break
-                self.benchmark.warmup_stop(warmup_num)
-            output = list(self.apply(input, **kwargs))
-            self.benchmark.collect(len(output))
+                    output = list(self.apply(input, **kwargs))
+                benchmark.collect(batch_size)
+                benchmark.stop_warmup()
+
+            if INFER_BENCHMARK_ITER > 0:
+                for _ in range(INFER_BENCHMARK_ITER):
+                    output = list(self.apply(input, **kwargs))
+                benchmark.collect(batch_size)
+
+            yield output[0]
         else:
         else:
             yield from self.apply(input, **kwargs)
             yield from self.apply(input, **kwargs)
 
 

+ 32 - 63
paddlex/inference/models/common/static_infer.py

@@ -14,13 +14,12 @@
 
 
 from typing import Union, Tuple, List, Dict, Any, Iterator
 from typing import Union, Tuple, List, Dict, Any, Iterator
 import os
 import os
-import shutil
-import threading
 from pathlib import Path
 from pathlib import Path
 import lazy_paddle as paddle
 import lazy_paddle as paddle
 import numpy as np
 import numpy as np
 
 
 from ....utils.flags import DEBUG, FLAGS_json_format_model, USE_PIR_TRT
 from ....utils.flags import DEBUG, FLAGS_json_format_model, USE_PIR_TRT
+from ...utils.benchmark import benchmark
 from ....utils import logging
 from ....utils import logging
 from ...utils.pp_option import PaddlePredictorOption
 from ...utils.pp_option import PaddlePredictorOption
 from ...utils.trt_config import TRT_CFG
 from ...utils.trt_config import TRT_CFG
@@ -90,29 +89,17 @@ def convert_trt(model_name, mode, pp_model_path, trt_save_path, trt_dynamic_shap
 
 
 
 
 class Copy2GPU:
 class Copy2GPU:
-
-    def __init__(self, input_handlers):
-        super().__init__()
-        self.input_handlers = input_handlers
-
-    def __call__(self, x):
-        for idx in range(len(x)):
-            self.input_handlers[idx].reshape(x[idx].shape)
-            self.input_handlers[idx].copy_from_cpu(x[idx])
+    @benchmark.timeit
+    def __call__(self, arrs):
+        paddle_tensors = [paddle.to_tensor(i) for i in arrs]
+        return paddle_tensors
 
 
 
 
 class Copy2CPU:
 class Copy2CPU:
-
-    def __init__(self, output_handlers):
-        super().__init__()
-        self.output_handlers = output_handlers
-
-    def __call__(self):
-        output = []
-        for out_tensor in self.output_handlers:
-            batch = out_tensor.copy_to_cpu()
-            output.append(batch)
-        return output
+    @benchmark.timeit
+    def __call__(self, paddle_tensors):
+        arrs = [i.numpy() for i in paddle_tensors]
+        return arrs
 
 
 
 
 class Infer:
 class Infer:
@@ -121,8 +108,9 @@ class Infer:
         super().__init__()
         super().__init__()
         self.predictor = predictor
         self.predictor = predictor
 
 
-    def __call__(self):
-        self.predictor.run()
+    @benchmark.timeit
+    def __call__(self, x):
+        return self.predictor.run(x)
 
 
 
 
 class StaticInfer:
 class StaticInfer:
@@ -135,22 +123,10 @@ class StaticInfer:
         self.model_dir = model_dir
         self.model_dir = model_dir
         self.model_prefix = model_prefix
         self.model_prefix = model_prefix
         self.option = option
         self.option = option
-        self.option.changed = True
-        self._lock = threading.Lock()
-
-    def _reset(self) -> None:
-        with self._lock:
-            self.option.changed = False
-            logging.debug(f"Env: {self.option}")
-            (
-                predictor,
-                input_handlers,
-                output_handlers,
-            ) = self._create()
-
-        self.copy2gpu = Copy2GPU(input_handlers)
-        self.copy2cpu = Copy2CPU(output_handlers)
-        self.infer = Infer(predictor)
+        self.predictor = self._create()
+        self.copy2gpu = Copy2GPU()
+        self.copy2cpu = Copy2CPU()
+        self.infer = Infer(self.predictor)
 
 
     def _create(
     def _create(
         self,
         self,
@@ -303,29 +279,22 @@ class StaticInfer:
         # Get input and output handlers
         # Get input and output handlers
         input_names = predictor.get_input_names()
         input_names = predictor.get_input_names()
         input_names.sort()
         input_names.sort()
-        input_handlers = []
-        output_handlers = []
-        for input_name in input_names:
-            input_handler = predictor.get_input_handle(input_name)
-            input_handlers.append(input_handler)
-        output_names = predictor.get_output_names()
-        for output_name in output_names:
-            output_handler = predictor.get_output_handle(output_name)
-            output_handlers.append(output_handler)
-        return predictor, input_handlers, output_handlers
+
+        return predictor
 
 
     def __call__(self, x) -> List[Any]:
     def __call__(self, x) -> List[Any]:
-        if self.option.changed:
-            self._reset()
-        self.copy2gpu(x)
-        self.infer()
-        pred = self.copy2cpu()
+        # NOTE: Adjust input tensors to match the sorted sequence.
+        names = self.predictor.get_input_names()
+        if len(names) != len(x):
+            raise ValueError(
+                f"The number of inputs does not match the model: {len(names)} vs {len(x)}"
+            )
+        indices = sorted(range(len(names)), key=names.__getitem__)
+        x = [x[indices.index(i)] for i in range(len(x))]
+        # TODO:
+        # Ensure that input tensors follow the model's input sequence without sorting.
+
+        inputs = self.copy2gpu(x)
+        outputs = self.infer(inputs)
+        pred = self.copy2cpu(outputs)
         return pred
         return pred
-
-    @property
-    def benchmark(self):
-        return {
-            "Copy2GPU": self.copy2gpu,
-            "Infer": self.infer,
-            "Copy2CPU": self.copy2cpu,
-        }

+ 7 - 1
paddlex/inference/models/common/ts/processors.py

@@ -20,7 +20,7 @@ import numpy as np
 import pandas as pd
 import pandas as pd
 
 
 from .funcs import load_from_dataframe, time_feature
 from .funcs import load_from_dataframe, time_feature
-
+from ....utils.benchmark import benchmark
 
 
 __all__ = [
 __all__ = [
     "BuildTSDataset",
     "BuildTSDataset",
@@ -53,6 +53,7 @@ class TSCutOff:
         super().__init__()
         super().__init__()
         self.size = size
         self.size = size
 
 
+    @benchmark.timeit
     def __call__(self, ts_list: List) -> List:
     def __call__(self, ts_list: List) -> List:
         """Applies the cut off operation to a list of time series.
         """Applies the cut off operation to a list of time series.
 
 
@@ -111,6 +112,7 @@ class TSNormalize:
         self.scaler = joblib.load(scale_path)
         self.scaler = joblib.load(scale_path)
         self.params_info = params_info
         self.params_info = params_info
 
 
+    @benchmark.timeit
     def __call__(self, ts_list: List[pd.DataFrame]) -> List[pd.DataFrame]:
     def __call__(self, ts_list: List[pd.DataFrame]) -> List[pd.DataFrame]:
         """Applies normalization to a list of time series data frames.
         """Applies normalization to a list of time series data frames.
 
 
@@ -158,6 +160,7 @@ class BuildTSDataset:
         super().__init__()
         super().__init__()
         self.params_info = params_info
         self.params_info = params_info
 
 
+    @benchmark.timeit
     def __call__(self, ts_list: List) -> List:
     def __call__(self, ts_list: List) -> List:
         """Applies the dataset construction to a list of time series.
         """Applies the dataset construction to a list of time series.
 
 
@@ -200,6 +203,7 @@ class TimeFeature:
         self.size = size
         self.size = size
         self.holiday = holiday
         self.holiday = holiday
 
 
+    @benchmark.timeit
     def __call__(self, ts_list: List) -> List:
     def __call__(self, ts_list: List) -> List:
         """Applies time feature extraction to a list of time series.
         """Applies time feature extraction to a list of time series.
 
 
@@ -258,6 +262,7 @@ class TStoArray:
         super().__init__()
         super().__init__()
         self.input_data = input_data
         self.input_data = input_data
 
 
+    @benchmark.timeit
     def __call__(self, ts_list: List[Dict[str, Any]]) -> List[List[np.ndarray]]:
     def __call__(self, ts_list: List[Dict[str, Any]]) -> List[List[np.ndarray]]:
         """Converts a list of time series data frames into arrays.
         """Converts a list of time series data frames into arrays.
 
 
@@ -295,6 +300,7 @@ class TStoBatch:
     equal-length arrays or DataFrames.
     equal-length arrays or DataFrames.
     """
     """
 
 
+    @benchmark.timeit
     def __call__(self, ts_list: List[np.ndarray]) -> List[np.ndarray]:
     def __call__(self, ts_list: List[np.ndarray]) -> List[np.ndarray]:
         """Convert a list of time series into batches.
         """Convert a list of time series into batches.
 
 

+ 7 - 0
paddlex/inference/models/common/vision/processors.py

@@ -23,6 +23,7 @@ import cv2
 from PIL import Image
 from PIL import Image
 
 
 from . import funcs as F
 from . import funcs as F
+from ....utils.benchmark import benchmark
 
 
 
 
 class _BaseResize:
 class _BaseResize:
@@ -112,6 +113,7 @@ class Resize(_BaseResize):
 
 
         self.keep_ratio = keep_ratio
         self.keep_ratio = keep_ratio
 
 
+    @benchmark.timeit
     def __call__(self, imgs):
     def __call__(self, imgs):
         """apply"""
         """apply"""
         return [self.resize(img) for img in imgs]
         return [self.resize(img) for img in imgs]
@@ -155,6 +157,7 @@ class ResizeByLong(_BaseResize):
         super().__init__(size_divisor=size_divisor, interp=interp, backend=backend)
         super().__init__(size_divisor=size_divisor, interp=interp, backend=backend)
         self.target_long_edge = target_long_edge
         self.target_long_edge = target_long_edge
 
 
+    @benchmark.timeit
     def __call__(self, imgs):
     def __call__(self, imgs):
         """apply"""
         """apply"""
         return [self.resize(img) for img in imgs]
         return [self.resize(img) for img in imgs]
@@ -196,6 +199,7 @@ class ResizeByShort(_BaseResize):
         super().__init__(size_divisor=size_divisor, interp=interp, backend=backend)
         super().__init__(size_divisor=size_divisor, interp=interp, backend=backend)
         self.target_short_edge = target_short_edge
         self.target_short_edge = target_short_edge
 
 
+    @benchmark.timeit
     def __call__(self, imgs):
     def __call__(self, imgs):
         """apply"""
         """apply"""
         return [self.resize(img) for img in imgs]
         return [self.resize(img) for img in imgs]
@@ -243,6 +247,7 @@ class Normalize:
         self.std = np.asarray(std).astype("float32")
         self.std = np.asarray(std).astype("float32")
         self.preserve_dtype = preserve_dtype
         self.preserve_dtype = preserve_dtype
 
 
+    @benchmark.timeit
     def __call__(self, imgs):
     def __call__(self, imgs):
         """apply"""
         """apply"""
         old_type = imgs[0].dtype
         old_type = imgs[0].dtype
@@ -260,11 +265,13 @@ class Normalize:
 class ToCHWImage:
 class ToCHWImage:
     """Reorder the dimensions of the image from HWC to CHW."""
     """Reorder the dimensions of the image from HWC to CHW."""
 
 
+    @benchmark.timeit
     def __call__(self, imgs):
     def __call__(self, imgs):
         """apply"""
         """apply"""
         return [img.transpose((2, 0, 1)) for img in imgs]
         return [img.transpose((2, 0, 1)) for img in imgs]
 
 
 
 
 class ToBatch:
 class ToBatch:
+    @benchmark.timeit
     def __call__(self, imgs):
     def __call__(self, imgs):
         return [np.stack(imgs, axis=0).astype(dtype=np.float32, copy=False)]
         return [np.stack(imgs, axis=0).astype(dtype=np.float32, copy=False)]

+ 11 - 0
paddlex/inference/models/formula_recognition/processors.py

@@ -28,6 +28,7 @@ from tokenizers import AddedToken
 from typing import List, Tuple, Optional, Any, Dict, Union
 from typing import List, Tuple, Optional, Any, Dict, Union
 
 
 from ....utils import logging
 from ....utils import logging
+from ...utils.benchmark import benchmark
 
 
 
 
 class MinMaxResize:
 class MinMaxResize:
@@ -142,6 +143,7 @@ class MinMaxResize:
             img = np.dstack((img, img, img))
             img = np.dstack((img, img, img))
             return img
             return img
 
 
+    @benchmark.timeit
     def __call__(self, imgs: List[np.ndarray]) -> List[np.ndarray]:
     def __call__(self, imgs: List[np.ndarray]) -> List[np.ndarray]:
         """Applies the resize method to a list of images.
         """Applies the resize method to a list of images.
 
 
@@ -181,6 +183,7 @@ class LatexTestTransform:
         squeezed = np.squeeze(grayscale_image)
         squeezed = np.squeeze(grayscale_image)
         return cv2.merge([squeezed] * self.num_output_channels)
         return cv2.merge([squeezed] * self.num_output_channels)
 
 
+    @benchmark.timeit
     def __call__(self, imgs: List[np.ndarray]) -> List[np.ndarray]:
     def __call__(self, imgs: List[np.ndarray]) -> List[np.ndarray]:
         """
         """
         Apply the transform to a list of images.
         Apply the transform to a list of images.
@@ -220,6 +223,7 @@ class LatexImageFormat:
         img_expanded = img[:, :, np.newaxis].transpose(2, 0, 1)
         img_expanded = img[:, :, np.newaxis].transpose(2, 0, 1)
         return img_expanded[np.newaxis, :]
         return img_expanded[np.newaxis, :]
 
 
+    @benchmark.timeit
     def __call__(self, imgs: List[np.ndarray]) -> List[np.ndarray]:
     def __call__(self, imgs: List[np.ndarray]) -> List[np.ndarray]:
         """Applies the format method to a list of images.
         """Applies the format method to a list of images.
 
 
@@ -275,6 +279,7 @@ class NormalizeImage(object):
         img = (img.astype("float32") * self.scale - self.mean) / self.std
         img = (img.astype("float32") * self.scale - self.mean) / self.std
         return img
         return img
 
 
+    @benchmark.timeit
     def __call__(self, imgs: List[Union[np.ndarray, Image.Image]]) -> List[np.ndarray]:
     def __call__(self, imgs: List[Union[np.ndarray, Image.Image]]) -> List[np.ndarray]:
         """Apply normalization to a list of images."""
         """Apply normalization to a list of images."""
         return [self.normalize(img) for img in imgs]
         return [self.normalize(img) for img in imgs]
@@ -287,6 +292,7 @@ class ToBatch(object):
         """Initializes the ToBatch object."""
         """Initializes the ToBatch object."""
         super(ToBatch, self).__init__()
         super(ToBatch, self).__init__()
 
 
+    @benchmark.timeit
     def __call__(self, imgs: List[np.ndarray]) -> List[np.ndarray]:
     def __call__(self, imgs: List[np.ndarray]) -> List[np.ndarray]:
         """Concatenates a list of images into a single batch.
         """Concatenates a list of images into a single batch.
 
 
@@ -371,6 +377,7 @@ class LaTeXOCRDecode(object):
         ]
         ]
         return [self.post_process(dec_str) for dec_str in dec_str_list]
         return [self.post_process(dec_str) for dec_str in dec_str_list]
 
 
+    @benchmark.timeit
     def __call__(
     def __call__(
         self,
         self,
         preds: np.ndarray,
         preds: np.ndarray,
@@ -543,6 +550,7 @@ class UniMERNetImgDecode(object):
         )
         )
         return np.array(ImageOps.expand(img, padding))
         return np.array(ImageOps.expand(img, padding))
 
 
+    @benchmark.timeit
     def __call__(self, imgs: List[np.ndarray]) -> List[Optional[np.ndarray]]:
     def __call__(self, imgs: List[np.ndarray]) -> List[Optional[np.ndarray]]:
         """Calls the img_decode method on a list of images.
         """Calls the img_decode method on a list of images.
 
 
@@ -871,6 +879,7 @@ class UniMERNetDecode(object):
         text = self.normalize(text)
         text = self.normalize(text)
         return text
         return text
 
 
+    @benchmark.timeit
     def __call__(
     def __call__(
         self,
         self,
         preds: np.ndarray,
         preds: np.ndarray,
@@ -934,6 +943,7 @@ class UniMERNetTestTransform:
         img = cv2.merge([squeezed] * self.num_output_channels)
         img = cv2.merge([squeezed] * self.num_output_channels)
         return img
         return img
 
 
+    @benchmark.timeit
     def __call__(self, imgs: List[np.ndarray]) -> List[np.ndarray]:
     def __call__(self, imgs: List[np.ndarray]) -> List[np.ndarray]:
         """
         """
         Applies the transform to a list of images.
         Applies the transform to a list of images.
@@ -974,6 +984,7 @@ class UniMERNetImageFormat:
         img_expanded = img[:, :, np.newaxis].transpose(2, 0, 1)
         img_expanded = img[:, :, np.newaxis].transpose(2, 0, 1)
         return img_expanded[np.newaxis, :]
         return img_expanded[np.newaxis, :]
 
 
+    @benchmark.timeit
     def __call__(self, imgs: List[np.ndarray]) -> List[np.ndarray]:
     def __call__(self, imgs: List[np.ndarray]) -> List[np.ndarray]:
         """Applies the format method to a list of images.
         """Applies the format method to a list of images.
 
 

+ 3 - 0
paddlex/inference/models/image_classification/processors.py

@@ -16,6 +16,7 @@ import numpy as np
 
 
 from ....utils import logging
 from ....utils import logging
 from ..common.vision import F
 from ..common.vision import F
+from ...utils.benchmark import benchmark
 
 
 
 
 class Crop:
 class Crop:
@@ -41,6 +42,7 @@ class Crop:
             raise ValueError("Unsupported interpolation method")
             raise ValueError("Unsupported interpolation method")
         self.mode = mode
         self.mode = mode
 
 
+    @benchmark.timeit
     def __call__(self, imgs):
     def __call__(self, imgs):
         """apply"""
         """apply"""
         return [self.crop(img) for img in imgs]
         return [self.crop(img) for img in imgs]
@@ -78,6 +80,7 @@ class Topk:
         class_id_map = {id: str(lb) for id, lb in enumerate(class_ids)}
         class_id_map = {id: str(lb) for id, lb in enumerate(class_ids)}
         return class_id_map
         return class_id_map
 
 
+    @benchmark.timeit
     def __call__(self, preds, topk=5):
     def __call__(self, preds, topk=5):
         indexes = preds[0].argsort(axis=1)[:, -topk:][:, ::-1].astype("int32")
         indexes = preds[0].argsort(axis=1)[:, -topk:][:, ::-1].astype("int32")
         scores = [
         scores = [

+ 3 - 0
paddlex/inference/models/image_feature/processors.py

@@ -14,6 +14,8 @@
 
 
 import numpy as np
 import numpy as np
 
 
+from ...utils.benchmark import benchmark
+
 
 
 class NormalizeFeatures:
 class NormalizeFeatures:
     """Normalize Features Transform"""
     """Normalize Features Transform"""
@@ -24,6 +26,7 @@ class NormalizeFeatures:
         features = np.divide(preds[0], feas_norm)
         features = np.divide(preds[0], feas_norm)
         return features
         return features
 
 
+    @benchmark.timeit
     def __call__(self, preds):
     def __call__(self, preds):
         normalized_features = [self._normalize(feature) for feature in preds]
         normalized_features = [self._normalize(feature) for feature in preds]
         return normalized_features
         return normalized_features

+ 3 - 0
paddlex/inference/models/image_multilabel_classification/processors.py

@@ -15,6 +15,8 @@
 import numpy as np
 import numpy as np
 from typing import Union
 from typing import Union
 
 
+from ...utils.benchmark import benchmark
+
 
 
 class MultiLabelThreshOutput:
 class MultiLabelThreshOutput:
     """MultiLabelThresh Transform"""
     """MultiLabelThresh Transform"""
@@ -31,6 +33,7 @@ class MultiLabelThreshOutput:
         class_id_map = {id: str(lb) for id, lb in enumerate(class_ids)}
         class_id_map = {id: str(lb) for id, lb in enumerate(class_ids)}
         return class_id_map
         return class_id_map
 
 
+    @benchmark.timeit
     def __call__(self, preds, threshold: Union[float, dict, list]):
     def __call__(self, preds, threshold: Union[float, dict, list]):
         threshold_list = []
         threshold_list = []
         num_classes = preds[0].shape[-1]
         num_classes = preds[0].shape[-1]

+ 3 - 0
paddlex/inference/models/image_unwarping/processors.py

@@ -15,6 +15,8 @@
 import numpy as np
 import numpy as np
 from typing import List, Union, Tuple
 from typing import List, Union, Tuple
 
 
+from ...utils.benchmark import benchmark
+
 
 
 class DocTrPostProcess:
 class DocTrPostProcess:
     """
     """
@@ -44,6 +46,7 @@ class DocTrPostProcess:
             np.float32(scale) if isinstance(scale, (str, float)) else np.float32(255.0)
             np.float32(scale) if isinstance(scale, (str, float)) else np.float32(255.0)
         )
         )
 
 
+    @benchmark.timeit
     def __call__(
     def __call__(
         self, imgs: List[Union[np.ndarray, Tuple[np.ndarray, ...]]]
         self, imgs: List[Union[np.ndarray, Tuple[np.ndarray, ...]]]
     ) -> List[np.ndarray]:
     ) -> List[np.ndarray]:

+ 2 - 0
paddlex/inference/models/instance_segmentation/processors.py

@@ -18,6 +18,7 @@ from typing import List, Sequence, Tuple, Union, Optional
 import numpy as np
 import numpy as np
 from ....utils import logging
 from ....utils import logging
 from ..object_detection.processors import restructured_boxes
 from ..object_detection.processors import restructured_boxes
+from ...utils.benchmark import benchmark
 
 
 import cv2
 import cv2
 
 
@@ -78,6 +79,7 @@ class InstanceSegPostProcess(object):
 
 
         return result
         return result
 
 
+    @benchmark.timeit
     def __call__(
     def __call__(
         self,
         self,
         batch_outputs: List[dict],
         batch_outputs: List[dict],

+ 3 - 0
paddlex/inference/models/keypoint_detection/processors.py

@@ -20,6 +20,7 @@ import numpy as np
 from numpy import ndarray
 from numpy import ndarray
 
 
 from ..object_detection.processors import get_affine_transform
 from ..object_detection.processors import get_affine_transform
+from ...utils.benchmark import benchmark
 
 
 Number = Union[int, float]
 Number = Union[int, float]
 Kpts = List[dict]
 Kpts = List[dict]
@@ -136,6 +137,7 @@ class TopDownAffine:
 
 
         return img, center, scale
         return img, center, scale
 
 
+    @benchmark.timeit
     def __call__(self, datas: List[dict]) -> List[dict]:
     def __call__(self, datas: List[dict]) -> List[dict]:
         for data in datas:
         for data in datas:
             ori_img = data["img"]
             ori_img = data["img"]
@@ -216,6 +218,7 @@ class KptPostProcess:
             for kpt, score in zip(keypoints, scores)
             for kpt, score in zip(keypoints, scores)
         ]
         ]
 
 
+    @benchmark.timeit
     def __call__(self, batch_outputs: List[dict], datas: List[dict]) -> List[Kpts]:
     def __call__(self, batch_outputs: List[dict], datas: List[dict]) -> List[Kpts]:
         """Apply the post-processing to a batch of outputs.
         """Apply the post-processing to a batch of outputs.
 
 

+ 10 - 0
paddlex/inference/models/object_detection/processors.py

@@ -21,6 +21,7 @@ from numpy import ndarray
 from ..common import Resize as CommonResize
 from ..common import Resize as CommonResize
 from ..common import Normalize as CommonNormalize
 from ..common import Normalize as CommonNormalize
 from ...common.reader import ReadImage as CommonReadImage
 from ...common.reader import ReadImage as CommonReadImage
+from ...utils.benchmark import benchmark
 
 
 Boxes = List[dict]
 Boxes = List[dict]
 Number = Union[int, float]
 Number = Union[int, float]
@@ -29,6 +30,7 @@ Number = Union[int, float]
 class ReadImage(CommonReadImage):
 class ReadImage(CommonReadImage):
     """Reads images from a list of raw image data or file paths."""
     """Reads images from a list of raw image data or file paths."""
 
 
+    @benchmark.timeit
     def __call__(self, raw_imgs: List[Union[ndarray, str, dict]]) -> List[dict]:
     def __call__(self, raw_imgs: List[Union[ndarray, str, dict]]) -> List[dict]:
         """Processes the input list of raw image data or file paths and returns a list of dictionaries containing image information.
         """Processes the input list of raw image data or file paths and returns a list of dictionaries containing image information.
 
 
@@ -93,6 +95,7 @@ class ReadImage(CommonReadImage):
 
 
 
 
 class Resize(CommonResize):
 class Resize(CommonResize):
+    @benchmark.timeit
     def __call__(self, datas: List[dict]) -> List[dict]:
     def __call__(self, datas: List[dict]) -> List[dict]:
         """
         """
         Args:
         Args:
@@ -138,6 +141,7 @@ class Normalize(CommonNormalize):
             img = img.astype(old_type, copy=False)
             img = img.astype(old_type, copy=False)
         return img
         return img
 
 
+    @benchmark.timeit
     def __call__(self, datas: List[dict]) -> List[dict]:
     def __call__(self, datas: List[dict]) -> List[dict]:
         """Normalizes images in a list of dictionaries. Iterates over each dictionary,
         """Normalizes images in a list of dictionaries. Iterates over each dictionary,
         applies normalization to the 'img' key, and returns the modified list.
         applies normalization to the 'img' key, and returns the modified list.
@@ -150,6 +154,7 @@ class Normalize(CommonNormalize):
 class ToCHWImage:
 class ToCHWImage:
     """Converts images in a list of dictionaries from HWC to CHW format."""
     """Converts images in a list of dictionaries from HWC to CHW format."""
 
 
+    @benchmark.timeit
     def __call__(self, datas: List[dict]) -> List[dict]:
     def __call__(self, datas: List[dict]) -> List[dict]:
         """Converts the image data in the list of dictionaries from HWC to CHW format in-place.
         """Converts the image data in the list of dictionaries from HWC to CHW format in-place.
 
 
@@ -207,6 +212,7 @@ class ToBatch:
                 dtype=dtype, copy=False
                 dtype=dtype, copy=False
             )
             )
 
 
+    @benchmark.timeit
     def __call__(self, datas: List[dict]) -> Sequence[ndarray]:
     def __call__(self, datas: List[dict]) -> Sequence[ndarray]:
         return [self.apply(datas, key) for key in self.ordered_required_keys]
         return [self.apply(datas, key) for key in self.ordered_required_keys]
 
 
@@ -242,6 +248,7 @@ class DetPad:
         canvas[0:im_h, 0:im_w, :] = im.astype(np.float32)
         canvas[0:im_h, 0:im_w, :] = im.astype(np.float32)
         return canvas
         return canvas
 
 
+    @benchmark.timeit
     def __call__(self, datas: List[dict]) -> List[dict]:
     def __call__(self, datas: List[dict]) -> List[dict]:
         for data in datas:
         for data in datas:
             data["img"] = self.apply(data["img"])
             data["img"] = self.apply(data["img"])
@@ -276,6 +283,7 @@ class PadStride:
         padding_im[:, :im_h, :im_w] = im
         padding_im[:, :im_h, :im_w] = im
         return padding_im
         return padding_im
 
 
+    @benchmark.timeit
     def __call__(self, datas: List[dict]) -> List[dict]:
     def __call__(self, datas: List[dict]) -> List[dict]:
         for data in datas:
         for data in datas:
             data["img"] = self.apply(data["img"])
             data["img"] = self.apply(data["img"])
@@ -438,6 +446,7 @@ class WarpAffine:
 
 
         return inp
         return inp
 
 
+    @benchmark.timeit
     def __call__(self, datas: List[dict]) -> List[dict]:
     def __call__(self, datas: List[dict]) -> List[dict]:
 
 
         for data in datas:
         for data in datas:
@@ -760,6 +769,7 @@ class DetPostProcess:
             )
             )
         return boxes
         return boxes
 
 
+    @benchmark.timeit
     def __call__(
     def __call__(
         self,
         self,
         batch_outputs: List[dict],
         batch_outputs: List[dict],

+ 5 - 0
paddlex/inference/models/open_vocabulary_detection/processors/groundingdino_processors.py

@@ -20,6 +20,7 @@ import PIL
 
 
 from ...common.tokenizer.bert_tokenizer import BertTokenizer
 from ...common.tokenizer.bert_tokenizer import BertTokenizer
 from .....utils.lazy_loader import LazyLoader
 from .....utils.lazy_loader import LazyLoader
+from ....utils.benchmark import benchmark
 
 
 # NOTE: LazyLoader is used to avoid conflicts between ultra-infer and Paddle
 # NOTE: LazyLoader is used to avoid conflicts between ultra-infer and Paddle
 paddle = LazyLoader("lazy_paddle", globals(), "paddle")
 paddle = LazyLoader("lazy_paddle", globals(), "paddle")
@@ -117,6 +118,7 @@ class GroundingDINOPostProcessor(object):
         self.box_threshold = box_threshold
         self.box_threshold = box_threshold
         self.text_threshold = text_threshold
         self.text_threshold = text_threshold
 
 
+    @benchmark.timeit
     def __call__(
     def __call__(
         self,
         self,
         pred_boxes,
         pred_boxes,
@@ -234,6 +236,7 @@ class GroundingDINOProcessor(object):
         assert os.path.isdir(tokenizer_dir), f"{tokenizer_dir} not exists."
         assert os.path.isdir(tokenizer_dir), f"{tokenizer_dir} not exists."
         self.tokenizer = BertTokenizer.from_pretrained(tokenizer_dir)
         self.tokenizer = BertTokenizer.from_pretrained(tokenizer_dir)
 
 
+    @benchmark.timeit
     def __call__(
     def __call__(
         self,
         self,
         images: List[PIL.Image.Image],
         images: List[PIL.Image.Image],
@@ -270,6 +273,7 @@ class GroundingDinoTextProcessor(object):
     ):
     ):
         self.max_words = max_words
         self.max_words = max_words
 
 
+    @benchmark.timeit
     def __call__(
     def __call__(
         self,
         self,
         input_ids,
         input_ids,
@@ -387,6 +391,7 @@ class GroundingDinoImageProcessor(object):
         self.image_std = image_std
         self.image_std = image_std
         self.do_nested = do_nested
         self.do_nested = do_nested
 
 
+    @benchmark.timeit
     def __call__(self, images, **kwargs):
     def __call__(self, images, **kwargs):
         """Preprocess an image or a batch of images."""
         """Preprocess an image or a batch of images."""
         return self.preprocess(images, **kwargs)
         return self.preprocess(images, **kwargs)

+ 3 - 0
paddlex/inference/models/open_vocabulary_segmentation/processors/sam_processer.py

@@ -20,6 +20,7 @@ import PIL
 from copy import deepcopy
 from copy import deepcopy
 
 
 from .....utils.lazy_loader import LazyLoader
 from .....utils.lazy_loader import LazyLoader
+from ....utils.benchmark import benchmark
 
 
 # NOTE: LazyLoader is used to avoid conflicts between ultra-infer and Paddle
 # NOTE: LazyLoader is used to avoid conflicts between ultra-infer and Paddle
 paddle = LazyLoader("lazy_paddle", globals(), "paddle")
 paddle = LazyLoader("lazy_paddle", globals(), "paddle")
@@ -159,6 +160,7 @@ class SamPromptProcessor(object):
         boxes = self.apply_coords(boxes.reshape([-1, 2, 2]), original_size)
         boxes = self.apply_coords(boxes.reshape([-1, 2, 2]), original_size)
         return boxes.reshape([-1, 4])
         return boxes.reshape([-1, 4])
 
 
+    @benchmark.timeit
     def __call__(
     def __call__(
         self,
         self,
         original_size,
         original_size,
@@ -213,6 +215,7 @@ class SamImageProcessor(object):
 
 
         return np.array(T.resize(image, target_size))
         return np.array(T.resize(image, target_size))
 
 
+    @benchmark.timeit
     def __call__(self, images, **kwargs):
     def __call__(self, images, **kwargs):
         if not isinstance(images, (list, tuple)):
         if not isinstance(images, (list, tuple)):
             images = [images]
             images = [images]

+ 3 - 0
paddlex/inference/models/semantic_segmentation/processors.py

@@ -23,6 +23,7 @@ import numpy as np
 from ..common.vision.processors import _BaseResize
 from ..common.vision.processors import _BaseResize
 
 
 from ..common.vision import funcs as F
 from ..common.vision import funcs as F
+from ...utils.benchmark import benchmark
 
 
 
 
 class Resize(_BaseResize):
 class Resize(_BaseResize):
@@ -52,6 +53,7 @@ class Resize(_BaseResize):
 
 
         self.keep_ratio = keep_ratio
         self.keep_ratio = keep_ratio
 
 
+    @benchmark.timeit
     def __call__(self, imgs, target_size=None):
     def __call__(self, imgs, target_size=None):
         """apply"""
         """apply"""
         target_size = self.target_size if target_size is None else target_size
         target_size = self.target_size if target_size is None else target_size
@@ -88,6 +90,7 @@ class SegPostProcess:
     restoring the prediction segmentation map to the original image size for now.
     restoring the prediction segmentation map to the original image size for now.
     """
     """
 
 
+    @benchmark.timeit
     def __call__(self, imgs, src_images):
     def __call__(self, imgs, src_images):
         assert len(imgs) == len(src_images)
         assert len(imgs) == len(src_images)
 
 

+ 3 - 0
paddlex/inference/models/table_structure_recognition/processors.py

@@ -17,6 +17,7 @@ import cv2
 import numpy as np
 import numpy as np
 from numpy import ndarray
 from numpy import ndarray
 from ..common.vision import funcs as F
 from ..common.vision import funcs as F
+from ...utils.benchmark import benchmark
 
 
 
 
 class Pad:
 class Pad:
@@ -55,6 +56,7 @@ class Pad:
 
 
         return [img, [img.shape[1], img.shape[0]]]
         return [img, [img.shape[1], img.shape[0]]]
 
 
+    @benchmark.timeit
     def __call__(self, imgs):
     def __call__(self, imgs):
         """apply"""
         """apply"""
         return [self.apply(img) for img in imgs]
         return [self.apply(img) for img in imgs]
@@ -119,6 +121,7 @@ class TableLabelDecode:
             assert False, "unsupported type %s in get_beg_end_flag_idx" % beg_or_end
             assert False, "unsupported type %s in get_beg_end_flag_idx" % beg_or_end
         return idx
         return idx
 
 
+    @benchmark.timeit
     def __call__(self, pred, img_size, ori_img_size):
     def __call__(self, pred, img_size, ori_img_size):
         """apply"""
         """apply"""
         bbox_preds, structure_probs = [], []
         bbox_preds, structure_probs = [], []

+ 4 - 0
paddlex/inference/models/text_detection/processors.py

@@ -26,6 +26,7 @@ from shapely.geometry import Polygon
 
 
 from ...utils.io import ImageReader
 from ...utils.io import ImageReader
 from ....utils import logging
 from ....utils import logging
+from ...utils.benchmark import benchmark
 
 
 
 
 class DetResizeForTest:
 class DetResizeForTest:
@@ -50,6 +51,7 @@ class DetResizeForTest:
             self.limit_side_len = 736
             self.limit_side_len = 736
             self.limit_type = "min"
             self.limit_type = "min"
 
 
+    @benchmark.timeit
     def __call__(
     def __call__(
         self,
         self,
         imgs,
         imgs,
@@ -196,6 +198,7 @@ class NormalizeImage:
         self.mean = np.array(mean).reshape(shape).astype("float32")
         self.mean = np.array(mean).reshape(shape).astype("float32")
         self.std = np.array(std).reshape(shape).astype("float32")
         self.std = np.array(std).reshape(shape).astype("float32")
 
 
+    @benchmark.timeit
     def __call__(self, imgs):
     def __call__(self, imgs):
         """apply"""
         """apply"""
 
 
@@ -412,6 +415,7 @@ class DBPostProcess:
         cv2.fillPoly(mask, contour.reshape(1, -1, 2).astype(np.int32), 1)
         cv2.fillPoly(mask, contour.reshape(1, -1, 2).astype(np.int32), 1)
         return cv2.mean(bitmap[ymin : ymax + 1, xmin : xmax + 1], mask)[0]
         return cv2.mean(bitmap[ymin : ymax + 1, xmin : xmax + 1], mask)[0]
 
 
+    @benchmark.timeit
     def __call__(
     def __call__(
         self,
         self,
         preds,
         preds,

+ 5 - 0
paddlex/inference/models/text_recognition/processors.py

@@ -27,6 +27,7 @@ import tempfile
 from tokenizers import Tokenizer as TokenizerFast
 from tokenizers import Tokenizer as TokenizerFast
 
 
 from ....utils import logging
 from ....utils import logging
+from ...utils.benchmark import benchmark
 
 
 
 
 class OCRReisizeNormImg:
 class OCRReisizeNormImg:
@@ -57,6 +58,7 @@ class OCRReisizeNormImg:
         padding_im[:, :, 0:resized_w] = resized_image
         padding_im[:, :, 0:resized_w] = resized_image
         return padding_im
         return padding_im
 
 
+    @benchmark.timeit
     def __call__(self, imgs):
     def __call__(self, imgs):
         """apply"""
         """apply"""
         return [self.resize(img) for img in imgs]
         return [self.resize(img) for img in imgs]
@@ -146,6 +148,7 @@ class BaseRecLabelDecode:
         """get_ignored_tokens"""
         """get_ignored_tokens"""
         return [0]  # for ctc blank
         return [0]  # for ctc blank
 
 
+    @benchmark.timeit
     def __call__(self, pred):
     def __call__(self, pred):
         """apply"""
         """apply"""
         preds = np.array(pred)
         preds = np.array(pred)
@@ -168,6 +171,7 @@ class CTCLabelDecode(BaseRecLabelDecode):
     def __init__(self, character_list=None, use_space_char=True):
     def __init__(self, character_list=None, use_space_char=True):
         super().__init__(character_list, use_space_char=use_space_char)
         super().__init__(character_list, use_space_char=use_space_char)
 
 
+    @benchmark.timeit
     def __call__(self, pred):
     def __call__(self, pred):
         """apply"""
         """apply"""
         preds = np.array(pred[0])
         preds = np.array(pred[0])
@@ -213,6 +217,7 @@ class ToBatch:
             padded_imgs.append(padded_img)
             padded_imgs.append(padded_img)
         return padded_imgs
         return padded_imgs
 
 
+    @benchmark.timeit
     def __call__(self, imgs: List[np.ndarray]) -> List[np.ndarray]:
     def __call__(self, imgs: List[np.ndarray]) -> List[np.ndarray]:
         """Call method to pad images and stack them into a batch.
         """Call method to pad images and stack them into a batch.
 
 

+ 3 - 0
paddlex/inference/models/ts_anomaly_detection/processors.py

@@ -16,6 +16,8 @@ from typing import List, Dict, Any
 import numpy as np
 import numpy as np
 import pandas as pd
 import pandas as pd
 
 
+from ...utils.benchmark import benchmark
+
 
 
 class GetAnomaly:
 class GetAnomaly:
     """A class to detect anomalies in time series data based on a model threshold."""
     """A class to detect anomalies in time series data based on a model threshold."""
@@ -32,6 +34,7 @@ class GetAnomaly:
         self.model_threshold = model_threshold
         self.model_threshold = model_threshold
         self.info_params = info_params
         self.info_params = info_params
 
 
+    @benchmark.timeit
     def __call__(
     def __call__(
         self, ori_ts_list: List[Dict[str, Any]], pred_list: List[np.ndarray]
         self, ori_ts_list: List[Dict[str, Any]], pred_list: List[np.ndarray]
     ) -> List[pd.DataFrame]:
     ) -> List[pd.DataFrame]:

+ 4 - 0
paddlex/inference/models/ts_classification/processors.py

@@ -16,6 +16,8 @@ import numpy as np
 import pandas as pd
 import pandas as pd
 from typing import List, Any, Dict
 from typing import List, Any, Dict
 
 
+from ...utils.benchmark import benchmark
+
 
 
 class GetCls:
 class GetCls:
     """A class to process prediction outputs and return class IDs and scores."""
     """A class to process prediction outputs and return class IDs and scores."""
@@ -24,6 +26,7 @@ class GetCls:
         """Initializes the GetCls instance."""
         """Initializes the GetCls instance."""
         super().__init__()
         super().__init__()
 
 
+    @benchmark.timeit
     def __call__(self, pred_list: List[Any]) -> List[pd.DataFrame]:
     def __call__(self, pred_list: List[Any]) -> List[pd.DataFrame]:
         """
         """
         Processes a list of predictions and returns a list of DataFrames with class IDs and scores.
         Processes a list of predictions and returns a list of DataFrames with class IDs and scores.
@@ -70,6 +73,7 @@ class BuildPadMask:
         super().__init__()
         super().__init__()
         self.input_data = input_data
         self.input_data = input_data
 
 
+    @benchmark.timeit
     def __call__(self, ts_list: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
     def __call__(self, ts_list: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
         """
         """
         Applies padding mask to a list of time series data.
         Applies padding mask to a list of time series data.

+ 4 - 0
paddlex/inference/models/ts_forecasting/processors.py

@@ -17,6 +17,8 @@ import joblib
 import numpy as np
 import numpy as np
 import pandas as pd
 import pandas as pd
 
 
+from ...utils.benchmark import benchmark
+
 
 
 class TSDeNormalize:
 class TSDeNormalize:
     """A class to de-normalize time series prediction data using a pre-fitted scaler."""
     """A class to de-normalize time series prediction data using a pre-fitted scaler."""
@@ -33,6 +35,7 @@ class TSDeNormalize:
         self.scaler = joblib.load(scale_path)
         self.scaler = joblib.load(scale_path)
         self.params_info = params_info
         self.params_info = params_info
 
 
+    @benchmark.timeit
     def __call__(self, preds_list: List[pd.DataFrame]) -> List[pd.DataFrame]:
     def __call__(self, preds_list: List[pd.DataFrame]) -> List[pd.DataFrame]:
         """
         """
         Applies de-normalization to a list of prediction DataFrames.
         Applies de-normalization to a list of prediction DataFrames.
@@ -73,6 +76,7 @@ class ArraytoTS:
         super().__init__()
         super().__init__()
         self.info_params = info_params
         self.info_params = info_params
 
 
+    @benchmark.timeit
     def __call__(
     def __call__(
         self, ori_ts_list: List[Dict[str, Any]], pred_list: List[np.ndarray]
         self, ori_ts_list: List[Dict[str, Any]], pred_list: List[np.ndarray]
     ) -> List[pd.DataFrame]:
     ) -> List[pd.DataFrame]:

+ 8 - 0
paddlex/inference/models/video_classification/processors.py

@@ -25,6 +25,8 @@ import json
 import tempfile
 import tempfile
 import lazy_paddle
 import lazy_paddle
 
 
+from ...utils.benchmark import benchmark
+
 
 
 class Scale:
 class Scale:
     """Scale images."""
     """Scale images."""
@@ -121,6 +123,7 @@ class Scale:
         imgs = resized_imgs
         imgs = resized_imgs
         return imgs
         return imgs
 
 
+    @benchmark.timeit
     def __call__(self, videos: List[np.ndarray]) -> List[np.ndarray]:
     def __call__(self, videos: List[np.ndarray]) -> List[np.ndarray]:
         """
         """
         Apply the scaling operation to a list of videos.
         Apply the scaling operation to a list of videos.
@@ -181,6 +184,7 @@ class CenterCrop:
                 crop_imgs.append(img[y1 : y1 + th, x1 : x1 + tw])
                 crop_imgs.append(img[y1 : y1 + th, x1 : x1 + tw])
         return crop_imgs
         return crop_imgs
 
 
+    @benchmark.timeit
     def __call__(self, videos: List[np.ndarray]) -> List[np.ndarray]:
     def __call__(self, videos: List[np.ndarray]) -> List[np.ndarray]:
         """
         """
         Apply the center crop operation to a list of videos.
         Apply the center crop operation to a list of videos.
@@ -234,6 +238,7 @@ class Image2Array:
                 t_imgs = t_imgs.transpose([3, 0, 1, 2])  # cthw
                 t_imgs = t_imgs.transpose([3, 0, 1, 2])  # cthw
         return t_imgs
         return t_imgs
 
 
+    @benchmark.timeit
     def __call__(self, videos: List[np.ndarray]) -> List[np.ndarray]:
     def __call__(self, videos: List[np.ndarray]) -> List[np.ndarray]:
         """
         """
         Apply the image to array conversion to a list of videos.
         Apply the image to array conversion to a list of videos.
@@ -311,6 +316,7 @@ class NormalizeVideo:
         imgs = np.expand_dims(imgs, axis=0).copy()
         imgs = np.expand_dims(imgs, axis=0).copy()
         return imgs
         return imgs
 
 
+    @benchmark.timeit
     def __call__(self, videos: List[np.ndarray]) -> List[np.ndarray]:
     def __call__(self, videos: List[np.ndarray]) -> List[np.ndarray]:
         """
         """
         Apply normalization to a list of videos.
         Apply normalization to a list of videos.
@@ -368,6 +374,7 @@ class VideoClasTopk:
         class_id_map = {id: str(lb) for id, lb in enumerate(class_ids)}
         class_id_map = {id: str(lb) for id, lb in enumerate(class_ids)}
         return class_id_map
         return class_id_map
 
 
+    @benchmark.timeit
     def __call__(
     def __call__(
         self, preds: np.ndarray, topk: int = 5
         self, preds: np.ndarray, topk: int = 5
     ) -> Tuple[np.ndarray, List[np.ndarray], List[List[str]]]:
     ) -> Tuple[np.ndarray, List[np.ndarray], List[List[str]]]:
@@ -397,6 +404,7 @@ class VideoClasTopk:
 class ToBatch:
 class ToBatch:
     """A class for batching videos."""
     """A class for batching videos."""
 
 
+    @benchmark.timeit
     def __call__(self, videos: List[np.ndarray]) -> List[np.ndarray]:
     def __call__(self, videos: List[np.ndarray]) -> List[np.ndarray]:
         """Call method to stack videos into a batch.
         """Call method to stack videos into a batch.
 
 

+ 6 - 0
paddlex/inference/models/video_detection/processors.py

@@ -21,6 +21,8 @@ import numpy as np
 import cv2
 import cv2
 import lazy_paddle as paddle
 import lazy_paddle as paddle
 
 
+from ...utils.benchmark import benchmark
+
 
 
 class ResizeVideo:
 class ResizeVideo:
     """Resizes frames of a video to a specified target size.
     """Resizes frames of a video to a specified target size.
@@ -75,6 +77,7 @@ class ResizeVideo:
                 )
                 )
         return video
         return video
 
 
+    @benchmark.timeit
     def __call__(self, videos: List) -> List:
     def __call__(self, videos: List) -> List:
         """Resizes frames of multiple videos.
         """Resizes frames of multiple videos.
 
 
@@ -129,6 +132,7 @@ class Image2Array:
             video[i] = video_one
             video[i] = video_one
         return video
         return video
 
 
+    @benchmark.timeit
     def __call__(self, videos: List[List[np.ndarray]]) -> List[np.ndarray]:
     def __call__(self, videos: List[List[np.ndarray]]) -> List[np.ndarray]:
         """
         """
         Process videos by converting each video to a transposed numpy array.
         Process videos by converting each video to a transposed numpy array.
@@ -177,6 +181,7 @@ class NormalizeVideo:
 
 
         return video
         return video
 
 
+    @benchmark.timeit
     def __call__(self, videos: List[List[np.ndarray]]) -> List[List[np.ndarray]]:
     def __call__(self, videos: List[List[np.ndarray]]) -> List[List[np.ndarray]]:
         """
         """
         Apply normalization to a list of videos.
         Apply normalization to a list of videos.
@@ -446,5 +451,6 @@ class DetVideoPostProcess:
             pred_all.append(preds)
             pred_all.append(preds)
         return pred_all
         return pred_all
 
 
+    @benchmark.timeit
     def __call__(self, preds: List, nms_thresh, score_thresh) -> List:
     def __call__(self, preds: List, nms_thresh, score_thresh) -> List:
         return [self.postprocess(pred, nms_thresh, score_thresh) for pred in preds]
         return [self.postprocess(pred, nms_thresh, score_thresh) for pred in preds]

+ 168 - 170
paddlex/inference/utils/benchmark.py

@@ -21,190 +21,41 @@ import numpy as np
 from prettytable import PrettyTable
 from prettytable import PrettyTable
 
 
 from ...utils.flags import INFER_BENCHMARK, INFER_BENCHMARK_OUTPUT
 from ...utils.flags import INFER_BENCHMARK, INFER_BENCHMARK_OUTPUT
-from ...utils.misc import Singleton
 from ...utils import logging
 from ...utils import logging
 
 
 
 
-class Benchmark(metaclass=Singleton):
-    def __init__(self):
-        self._components = {}
-        self._warmup_start = None
-        self._warmup_elapse = None
-        self._warmup_num = None
-        self._e2e_tic = None
-        self._e2e_elapse = None
-
-    def attach(self, component):
-        self._components[component.name] = component
-
-    def start(self):
-        self._warmup_start = time.time()
-        self._reset()
-
-    def warmup_stop(self, warmup_num):
-        self._warmup_elapse = (time.time() - self._warmup_start) * 1000
-        self._warmup_num = warmup_num
-        self._reset()
-
-    def _reset(self):
-        for name, cmp in self.iterate_cmp(self._components):
-            cmp.timer.reset()
-        self._e2e_tic = time.time()
-
-    def iterate_cmp(self, cmps):
-        if cmps is None:
-            return
-        for name, cmp in cmps.items():
-            if hasattr(cmp, "benchmark"):
-                yield from self.iterate_cmp(cmp.benchmark)
-            yield name, cmp
-
-    def gather(self, e2e_num):
-        # lazy import for avoiding circular import
-        from ..new_models.base import BasePaddlePredictor
-
-        detail = []
-        summary = {"preprocess": 0, "inference": 0, "postprocess": 0}
-        op_tag = "preprocess"
-        for name, cmp in self._components.items():
-            if isinstance(cmp, BasePaddlePredictor):
-                # TODO(gaotingquan): show by hierarchy. Now dont show xxxPredictor benchmark info to ensure mutual exclusivity between components.
-                for name, sub_cmp in cmp.benchmark.items():
-                    times = sub_cmp.timer.logs
-                    counts = len(times)
-                    avg = np.mean(times) * 1000
-                    total = np.sum(times) * 1000
-                    detail.append((name, total, counts, avg))
-                    summary["inference"] += total
-                op_tag = "postprocess"
-            else:
-                # TODO(gaotingquan): support sub_cmps for others
-                # if hasattr(cmp, "benchmark"):
-                times = cmp.timer.logs
-                counts = len(times)
-                avg = np.mean(times) * 1000
-                total = np.sum(times) * 1000
-                detail.append((name, total, counts, avg))
-                summary[op_tag] += total
-
-        summary = [
-            (
-                "PreProcess",
-                summary["preprocess"],
-                e2e_num,
-                summary["preprocess"] / e2e_num,
-            ),
-            (
-                "Inference",
-                summary["inference"],
-                e2e_num,
-                summary["inference"] / e2e_num,
-            ),
-            (
-                "PostProcess",
-                summary["postprocess"],
-                e2e_num,
-                summary["postprocess"] / e2e_num,
-            ),
-            ("End2End", self._e2e_elapse, e2e_num, self._e2e_elapse / e2e_num),
-        ]
-        if self._warmup_elapse:
-            warmup_elapse, warmup_num, warmup_avg = (
-                self._warmup_elapse,
-                self._warmup_num,
-                self._warmup_elapse / self._warmup_num,
-            )
-        else:
-            warmup_elapse, warmup_num, warmup_avg = 0, 0, 0
-        summary.append(
-            (
-                "WarmUp",
-                warmup_elapse,
-                warmup_num,
-                warmup_avg,
-            )
-        )
-        return detail, summary
-
-    def collect(self, e2e_num):
-        self._e2e_elapse = (time.time() - self._e2e_tic) * 1000
-        detail, summary = self.gather(e2e_num)
-
-        detail_head = [
-            "Component",
-            "Total Time (ms)",
-            "Number of Calls",
-            "Avg Time Per Call (ms)",
-        ]
-        table = PrettyTable(detail_head)
-        table.add_rows(
-            [
-                (name, f"{total:.8f}", cnts, f"{avg:.8f}")
-                for name, total, cnts, avg in detail
-            ]
-        )
-        logging.info(table)
+class Benchmark:
+    def __init__(self, enabled):
+        self._enabled = enabled
+        self._elapses = {}
+        self._warmup = False
 
 
-        summary_head = [
-            "Stage",
-            "Total Time (ms)",
-            "Number of Instances",
-            "Avg Time Per Instance (ms)",
-        ]
-        table = PrettyTable(summary_head)
-        table.add_rows(
-            [
-                (name, f"{total:.8f}", cnts, f"{avg:.8f}")
-                for name, total, cnts, avg in summary
-            ]
-        )
-        logging.info(table)
-
-        if INFER_BENCHMARK_OUTPUT:
-            save_dir = Path(INFER_BENCHMARK_OUTPUT)
-            save_dir.mkdir(parents=True, exist_ok=True)
-            csv_data = [detail_head, *detail]
-            with open(Path(save_dir) / "detail.csv", "w", newline="") as file:
-                writer = csv.writer(file)
-                writer.writerows(csv_data)
-
-            csv_data = [summary_head, *summary]
-            with open(Path(save_dir) / "summary.csv", "w", newline="") as file:
-                writer = csv.writer(file)
-                writer.writerows(csv_data)
-
-
-class Timer:
-    def __init__(self, component):
-        from ..new_models.base import BaseComponent
-
-        assert isinstance(component, BaseComponent)
-        benchmark.attach(component)
-        component.apply = self.watch_func(component.apply)
-        self._tic = None
-        self._elapses = []
-
-    def watch_func(self, func):
+    def timeit(self, func):
         @functools.wraps(func)
         @functools.wraps(func)
         def wrapper(*args, **kwargs):
         def wrapper(*args, **kwargs):
+            if not self._enabled:
+                return func(*args, **kwargs)
+
+            name = func.__qualname__
+
             tic = time.time()
             tic = time.time()
             output = func(*args, **kwargs)
             output = func(*args, **kwargs)
             if isinstance(output, GeneratorType):
             if isinstance(output, GeneratorType):
-                return self.watch_generator(output)
+                return self.watch_generator(output, name)
             else:
             else:
-                self._update(time.time() - tic)
+                self._update(time.time() - tic, name)
                 return output
                 return output
 
 
         return wrapper
         return wrapper
 
 
-    def watch_generator(self, generator):
+    def watch_generator(self, generator, name):
         @functools.wraps(generator)
         @functools.wraps(generator)
         def wrapper():
         def wrapper():
-            while 1:
+            while True:
                 try:
                 try:
                     tic = time.time()
                     tic = time.time()
                     item = next(generator)
                     item = next(generator)
-                    self._update(time.time() - tic)
+                    self._update(time.time() - tic, name)
                     yield item
                     yield item
                 except StopIteration:
                 except StopIteration:
                     break
                     break
@@ -212,15 +63,162 @@ class Timer:
         return wrapper()
         return wrapper()
 
 
     def reset(self):
     def reset(self):
-        self._tic = None
-        self._elapses = []
+        self._elapses = {}
 
 
-    def _update(self, elapse):
-        self._elapses.append(elapse)
+    def _update(self, elapse, name):
+        elapse = elapse * 1000
+        if name in self._elapses:
+            self._elapses[name].append(elapse)
+        else:
+            self._elapses[name] = [elapse]
 
 
     @property
     @property
     def logs(self):
     def logs(self):
         return self._elapses
         return self._elapses
 
 
+    def start_timing(self):
+        self._enabled = True
 
 
-benchmark = Benchmark() if INFER_BENCHMARK else None
+    def stop_timing(self):
+        self._enabled = False
+
+    def start_warmup(self):
+        self._warmup = True
+
+    def stop_warmup(self):
+        self._warmup = False
+        self.reset()
+
+    def gather(self, batch_size):
+        logs = {k.split(".")[0]: v for k, v in self.logs.items()}
+
+        iters = len(logs["Infer"])
+        instances = iters * batch_size
+        detail_list = []
+        summary = {"preprocess": 0, "inference": 0, "postprocess": 0}
+        op_tag = "preprocess"
+
+        for name, time_list in logs.items():
+            avg = np.mean(time_list)
+            detail_list.append(
+                (iters, batch_size, instances, name, avg, avg / batch_size)
+            )
+
+            if name in ["Copy2GPU", "Infer", "Copy2CPU"]:
+                summary["inference"] += avg
+                op_tag = "postprocess"
+            else:
+                summary[op_tag] += avg
+
+        summary["end2end"] = (
+            summary["preprocess"] + summary["inference"] + summary["postprocess"]
+        )
+        summary_list = [
+            (
+                iters,
+                batch_size,
+                instances,
+                "PreProcess",
+                summary["preprocess"],
+                summary["preprocess"] / batch_size,
+            ),
+            (
+                iters,
+                batch_size,
+                instances,
+                "Inference",
+                summary["inference"],
+                summary["inference"] / batch_size,
+            ),
+            (
+                iters,
+                batch_size,
+                instances,
+                "PostProcess",
+                summary["postprocess"],
+                summary["postprocess"] / batch_size,
+            ),
+            (
+                iters,
+                batch_size,
+                instances,
+                "End2End",
+                summary["end2end"],
+                summary["end2end"] / batch_size,
+            ),
+        ]
+
+        return detail_list, summary_list
+
+    def collect(self, batch_size):
+        detail_list, summary_list = self.gather(batch_size)
+
+        if self._warmup:
+            summary_head = [
+                "Iters",
+                "Batch Size",
+                "Instances",
+                "Stage",
+                "Avg Time Per Iter (ms)",
+                "Avg Time Per Instance (ms)",
+            ]
+            table = PrettyTable(summary_head)
+            summary_list = [
+                i[:4] + (f"{i[4]:.8f}", f"{i[5]:.8f}") for i in summary_list
+            ]
+            table.add_rows(summary_list)
+            header = "WarmUp Data".center(len(str(table).split("\n")[0]), " ")
+            logging.info(header)
+            logging.info(table)
+
+        else:
+            detail_head = [
+                "Iters",
+                "Batch Size",
+                "Instances",
+                "Operation",
+                "Avg Time Per Iter (ms)",
+                "Avg Time Per Instance (ms)",
+            ]
+            table = PrettyTable(detail_head)
+            detail_list = [i[:4] + (f"{i[4]:.8f}", f"{i[5]:.8f}") for i in detail_list]
+            table.add_rows(detail_list)
+            header = "Detail Data".center(len(str(table).split("\n")[0]), " ")
+            logging.info(header)
+            logging.info(table)
+
+            summary_head = [
+                "Iters",
+                "Batch Size",
+                "Instances",
+                "Stage",
+                "Avg Time Per Iter (ms)",
+                "Avg Time Per Instance (ms)",
+            ]
+            table = PrettyTable(summary_head)
+            summary_list = [
+                i[:4] + (f"{i[4]:.8f}", f"{i[5]:.8f}") for i in summary_list
+            ]
+            table.add_rows(summary_list)
+            header = "Summary Data".center(len(str(table).split("\n")[0]), " ")
+            logging.info(header)
+            logging.info(table)
+
+            if INFER_BENCHMARK_OUTPUT:
+                save_dir = Path(INFER_BENCHMARK_OUTPUT)
+                save_dir.mkdir(parents=True, exist_ok=True)
+                csv_data = [detail_head, *detail_list]
+                with open(Path(save_dir) / "detail.csv", "w", newline="") as file:
+                    writer = csv.writer(file)
+                    writer.writerows(csv_data)
+
+                csv_data = [summary_head, *summary_list]
+                with open(Path(save_dir) / "summary.csv", "w", newline="") as file:
+                    writer = csv.writer(file)
+                    writer.writerows(csv_data)
+
+
+if INFER_BENCHMARK:
+    benchmark = Benchmark(enabled=True)
+else:
+    benchmark = Benchmark(enabled=False)

+ 1 - 5
paddlex/utils/flags.py

@@ -24,7 +24,6 @@ __all__ = [
     "INFER_BENCHMARK_ITER",
     "INFER_BENCHMARK_ITER",
     "INFER_BENCHMARK_WARMUP",
     "INFER_BENCHMARK_WARMUP",
     "INFER_BENCHMARK_OUTPUT",
     "INFER_BENCHMARK_OUTPUT",
-    "INFER_BENCHMARK_DATA_SIZE",
     "FLAGS_json_format_model",
     "FLAGS_json_format_model",
     "USE_PIR_TRT",
     "USE_PIR_TRT",
     "DISABLE_DEV_MODEL_WL",
     "DISABLE_DEV_MODEL_WL",
@@ -59,7 +58,4 @@ INFER_BENCHMARK_WARMUP = get_flag_from_env_var(
 INFER_BENCHMARK_OUTPUT = get_flag_from_env_var(
 INFER_BENCHMARK_OUTPUT = get_flag_from_env_var(
     "PADDLE_PDX_INFER_BENCHMARK_OUTPUT", None
     "PADDLE_PDX_INFER_BENCHMARK_OUTPUT", None
 )
 )
-INFER_BENCHMARK_ITER = get_flag_from_env_var("PADDLE_PDX_INFER_BENCHMARK_ITER", 10, int)
-INFER_BENCHMARK_DATA_SIZE = get_flag_from_env_var(
-    "PADDLE_PDX_INFER_BENCHMARK_DATA_SIZE", 1024
-)
+INFER_BENCHMARK_ITER = get_flag_from_env_var("PADDLE_PDX_INFER_BENCHMARK_ITER", 0, int)