The benchmark feature collects the average execution time per iteration for each operation in the end-to-end model inference process as well as the average execution time per instance, and provides summary information. The time measurements are in milliseconds.
To enable the benchmark feature, you must set the following environment variables:
PADDLE_PDX_INFER_BENCHMARK: When set to True, the benchmark feature is enabled (default is False).PADDLE_PDX_INFER_BENCHMARK_WARMUP: The number of warm-up iterations before testing (default is 0).PADDLE_PDX_INFER_BENCHMARK_ITERS: The number of iterations for testing (default is 0).PADDLE_PDX_INFER_BENCHMARK_OUTPUT_DIR: The directory where the metrics are saved (e.g., ./benchmark). The default is None, meaning the benchmark metrics will not be saved.PADDLE_PDX_INFER_BENCHMARK_USE_CACHE_FOR_READ: When set to True, the caching mechanism is applied to the operation of reading input data to avoid repetitive I/O overhead, and the time consumed by data read and cache is not recorded in the core time (default is False).PADDLE_PDX_INFER_BENCHMARK_USE_NEW_INFER_API: When set to True,the new inference API is enabled, providing more detailed information for inference operations on benchmarks (default is False).Note:
PADDLE_PDX_INFER_BENCHMARK_WARMUP or PADDLE_PDX_INFER_BENCHMARK_ITERS must be set to a value greater than zero; otherwise, the benchmark feature cannot be used.You can use the benchmark feature by either the command line method or the Python script method.
Note:
batch_size is greater than 1, the input data will be duplicated batch_size times to match the size of batch_size.Execute the command:
PADDLE_PDX_INFER_BENCHMARK=True \
PADDLE_PDX_INFER_BENCHMARK_WARMUP=5 \
PADDLE_PDX_INFER_BENCHMARK_ITERS=10 \
PADDLE_PDX_INFER_BENCHMARK_OUTPUT_DIR=./benchmark \
python main.py \
-c ./paddlex/configs/modules/object_detection/PicoDet-XS.yaml \
-o Global.mode=predict \
-o Predict.model_dir=None \
-o Predict.batch_size=2 \
-o Predict.input=./test.png
Note:
batch_size is greater than 1, the input data will be duplicated batch_size times to match the size of batch_size.Create the script test_infer.py:
from paddlex import create_model
model = create_model(model_name="PicoDet-XS", model_dir=None)
output = list(model.predict(input="./test.png", batch_size=2))
Run the script:
PADDLE_PDX_INFER_BENCHMARK=True \
PADDLE_PDX_INFER_BENCHMARK_WARMUP=5 \
PADDLE_PDX_INFER_BENCHMARK_ITERS=10 \
PADDLE_PDX_INFER_BENCHMARK_OUTPUT_DIR=./benchmark \
python test_infer.py
After enabling the benchmark feature, the benchmark results will be automatically printed. The details are as follows:
| Field Name | Field Description |
|---|---|
| Iters | Number of iterations, i.e., the number of times inference is executed in a loop. |
| Batch Size | Batch size, i.e., the number of instances processed in each iteration. |
| Instances | Total number of instances, calculated as Iters multiplied by Batch Size. |
| Operation | Name of the operation, such as Resize, Normalize, etc. |
| Type | Type of time consumption, including:
|
| Avg Time Per Iter (ms) | Average execution time per iteration, in milliseconds. |
| Avg Time Per Instance (ms) | Average execution time per instance, in milliseconds. |
Below is an example of the benchmark results obtained by running the example program in Section 2:
WarmUp Data
+-------+------------+-----------+----------------+------------------------+----------------------------+
| Iters | Batch Size | Instances | Type | Avg Time Per Iter (ms) | Avg Time Per Instance (ms) |
+-------+------------+-----------+----------------+------------------------+----------------------------+
| 5 | 2 | 10 | Preprocessing | 97.89338876 | 48.94669438 |
| 5 | 2 | 10 | Inference | 66.70711380 | 33.35355690 |
| 5 | 2 | 10 | Postprocessing | 0.20138482 | 0.10069241 |
| 5 | 2 | 10 | Core | 164.80188738 | 82.40094369 |
| 5 | 2 | 10 | Other | 3.41097047 | 1.70548523 |
| 5 | 2 | 10 | End-to-End | 168.21285784 | 84.10642892 |
+-------+------------+-----------+----------------+------------------------+----------------------------+
Operation Info
+--------------------+----------------------------------------------------------------------+
| Operation | Source Code Location |
+--------------------+----------------------------------------------------------------------+
| ReadImage | /PaddleX/paddlex/inference/models/object_detection/processors.py:34 |
| Resize | /PaddleX/paddlex/inference/models/object_detection/processors.py:99 |
| Normalize | /PaddleX/paddlex/inference/models/object_detection/processors.py:145 |
| ToCHWImage | /PaddleX/paddlex/inference/models/object_detection/processors.py:158 |
| ToBatch | /PaddleX/paddlex/inference/models/object_detection/processors.py:216 |
| PaddleCopyToDevice | /PaddleX/paddlex/inference/models/common/static_infer.py:214 |
| PaddleModelInfer | /PaddleX/paddlex/inference/models/common/static_infer.py:234 |
| PaddleCopyToHost | /PaddleX/paddlex/inference/models/common/static_infer.py:223 |
| DetPostProcess | /PaddleX/paddlex/inference/models/object_detection/processors.py:773 |
+--------------------+----------------------------------------------------------------------+
Detail Data
+-------+------------+-----------+--------------------+------------------------+----------------------------+
| Iters | Batch Size | Instances | Operation | Avg Time Per Iter (ms) | Avg Time Per Instance (ms) |
+-------+------------+-----------+--------------------+------------------------+----------------------------+
| 10 | 2 | 20 | ReadImage | 76.22221033 | 38.11110517 |
| 10 | 2 | 20 | Resize | 12.02824502 | 6.01412251 |
| 10 | 2 | 20 | Normalize | 6.14072606 | 3.07036303 |
| 10 | 2 | 20 | ToCHWImage | 0.00533939 | 0.00266969 |
| 10 | 2 | 20 | ToBatch | 0.93134162 | 0.46567081 |
| 10 | 2 | 20 | PaddleCopyToDevice | 0.92240779 | 0.46120390 |
| 10 | 2 | 20 | PaddleModelInfer | 9.66330138 | 4.83165069 |
| 10 | 2 | 20 | PaddleCopyToHost | 0.06802108 | 0.03401054 |
| 10 | 2 | 20 | DetPostProcess | 0.18665448 | 0.09332724 |
+-------+------------+-----------+--------------------+------------------------+----------------------------+
Summary Data
+-------+------------+-----------+----------------+------------------------+----------------------------+
| Iters | Batch Size | Instances | Type | Avg Time Per Iter (ms) | Avg Time Per Instance (ms) |
+-------+------------+-----------+----------------+------------------------+----------------------------+
| 10 | 2 | 20 | Preprocessing | 95.32786242 | 47.66393121 |
| 10 | 2 | 20 | Inference | 10.65373025 | 5.32686512 |
| 10 | 2 | 20 | Postprocessing | 0.18665448 | 0.09332724 |
| 10 | 2 | 20 | Core | 106.16824715 | 53.08412358 |
| 10 | 2 | 20 | Other | 2.74794563 | 1.37397281 |
| 10 | 2 | 20 | End-to-End | 108.91619278 | 54.45809639 |
+-------+------------+-----------+----------------+------------------------+----------------------------+
Additionally, since PADDLE_PDX_INFER_BENCHMARK_OUTPUT_DIR=./benchmark is set, the above results will be saved locally in ./benchmark/detail.csv and ./benchmark/summary.csv.
The contents of detail.csv are as follows:
Iters,Batch Size,Instances,Operation,Avg Time Per Iter (ms),Avg Time Per Instance (ms)
10,2,20,ReadImage,76.22221033,38.11110517
10,2,20,Resize,12.02824502,6.01412251
10,2,20,Normalize,6.14072606,3.07036303
10,2,20,ToCHWImage,0.00533939,0.00266969
10,2,20,ToBatch,0.93134162,0.46567081
10,2,20,PaddleCopyToDevice,0.92240779,0.46120390
10,2,20,PaddleModelInfer,9.66330138,4.83165069
10,2,20,PaddleCopyToHost,0.06802108,0.03401054
10,2,20,DetPostProcess,0.18665448,0.09332724
The contents of summary.csv are as follows:
Iters,Batch Size,Instances,Type,Avg Time Per Iter (ms),Avg Time Per Instance (ms)
10,2,20,Preprocessing,95.32786242,47.66393121
10,2,20,Inference,10.65373025,5.32686512
10,2,20,Postprocessing,0.18665448,0.09332724
10,2,20,Core,106.16824715,53.08412358
10,2,20,Other,2.74794563,1.37397281
10,2,20,End-to-End,108.91619278,54.45809639