# Model Inference Benchmark ## Table of Contents - [1. Instructions](#1.-Instructions) - [2. Usage Examples](#2.-Usage-Examples) - [2.1 Command Line Method](#2.1-Command-Line-Method) - [2.2 Python Script Method](#2.2-Python-Script-Method) - [3. Explanation of Results](#3.-Explanation-of-Results) ## 1. Instructions The benchmark feature collects the average execution time per iteration for each operation in the end-to-end model inference process as well as the average execution time per instance, and provides summary information. The time measurements are in milliseconds. To enable the benchmark feature, you must set the following environment variables: * `PADDLE_PDX_INFER_BENCHMARK`: When set to `True`, the benchmark feature is enabled (default is `False`). * `PADDLE_PDX_INFER_BENCHMARK_WARMUP`: The number of warm-up iterations before testing (default is `0`). * `PADDLE_PDX_INFER_BENCHMARK_ITERS`: The number of iterations for testing (default is `0`). * `PADDLE_PDX_INFER_BENCHMARK_OUTPUT_DIR`: The directory where the metrics are saved (e.g., `./benchmark`). The default is `None`, meaning the benchmark metrics will not be saved. * `PADDLE_PDX_INFER_BENCHMARK_USE_CACHE_FOR_READ`: When set to `True`, the caching mechanism is applied to the operation of reading input data to avoid repetitive I/O overhead, and the time consumed by data read and cache is not recorded in the core time (default is `False`). * `PADDLE_PDX_INFER_BENCHMARK_USE_NEW_INFER_API`: When set to `True`,the new inference API is enabled, providing more detailed information for inference operations on benchmarks (default is `False`). **Note**: * At least one of `PADDLE_PDX_INFER_BENCHMARK_WARMUP` or `PADDLE_PDX_INFER_BENCHMARK_ITERS` must be set to a value greater than zero; otherwise, the benchmark feature cannot be used. * The benchmark feature does not currently apply to model pipelines. ## 2. Usage Examples You can use the benchmark feature by either the command line method or the Python script method. ### 2.1 Command Line Method **Note**: - For a description of the input parameters, please refer to the [PaddleX Common Model Configuration File Parameter Explanation](./config_parameters_common.en.md). - If `batch_size` is greater than 1, the input data will be duplicated `batch_size` times to match the size of `batch_size`. Execute the command: ```bash PADDLE_PDX_INFER_BENCHMARK=True \ PADDLE_PDX_INFER_BENCHMARK_WARMUP=5 \ PADDLE_PDX_INFER_BENCHMARK_ITERS=10 \ PADDLE_PDX_INFER_BENCHMARK_OUTPUT_DIR=./benchmark \ python main.py \ -c ./paddlex/configs/modules/object_detection/PicoDet-XS.yaml \ -o Global.mode=predict \ -o Predict.model_dir=None \ -o Predict.batch_size=2 \ -o Predict.input=./test.png ``` ### 2.2 Python Script Method **Note**: - For a description of the input parameters, please refer to the [PaddleX Single Model Python Usage Instructions](./model_python_API.en.md). - If `batch_size` is greater than 1, the input data will be duplicated `batch_size` times to match the size of `batch_size`. Create the script `test_infer.py`: ```python from paddlex import create_model model = create_model(model_name="PicoDet-XS", model_dir=None) output = list(model.predict(input="./test.png", batch_size=2)) ``` Run the script: ```bash PADDLE_PDX_INFER_BENCHMARK=True \ PADDLE_PDX_INFER_BENCHMARK_WARMUP=5 \ PADDLE_PDX_INFER_BENCHMARK_ITERS=10 \ PADDLE_PDX_INFER_BENCHMARK_OUTPUT_DIR=./benchmark \ python test_infer.py ``` ## 3. Explanation of Results After enabling the benchmark feature, the benchmark results will be automatically printed. The details are as follows:
Field Name Field Description
Iters Number of iterations, i.e., the number of times inference is executed in a loop.
Batch Size Batch size, i.e., the number of instances processed in each iteration.
Instances Total number of instances, calculated as Iters multiplied by Batch Size.
Operation Name of the operation, such as Resize, Normalize, etc.
Type Type of time consumption, including:
  • preprocessing time (Preprocessing)
  • model inference time (Inference)
  • postprocessing time (Postprocessing)
  • core time (Core, i.e., Preprocessing + Inference + Postprocessing)
  • other time (Other, e.g., the time taken to run the code that orchestrates the operations, and the extra overhead introduced by the benchmark feature)
  • end-to-end time (End-to-End, i.e., Core + Other)
Avg Time Per Iter (ms) Average execution time per iteration, in milliseconds.
Avg Time Per Instance (ms) Average execution time per instance, in milliseconds.
Below is an example of the benchmark results obtained by running the example program in Section 2: ``` WarmUp Data +-------+------------+-----------+----------------+------------------------+----------------------------+ | Iters | Batch Size | Instances | Type | Avg Time Per Iter (ms) | Avg Time Per Instance (ms) | +-------+------------+-----------+----------------+------------------------+----------------------------+ | 5 | 2 | 10 | Preprocessing | 97.89338876 | 48.94669438 | | 5 | 2 | 10 | Inference | 66.70711380 | 33.35355690 | | 5 | 2 | 10 | Postprocessing | 0.20138482 | 0.10069241 | | 5 | 2 | 10 | Core | 164.80188738 | 82.40094369 | | 5 | 2 | 10 | Other | 3.41097047 | 1.70548523 | | 5 | 2 | 10 | End-to-End | 168.21285784 | 84.10642892 | +-------+------------+-----------+----------------+------------------------+----------------------------+ Operation Info +--------------------+----------------------------------------------------------------------+ | Operation | Source Code Location | +--------------------+----------------------------------------------------------------------+ | ReadImage | /PaddleX/paddlex/inference/models/object_detection/processors.py:34 | | Resize | /PaddleX/paddlex/inference/models/object_detection/processors.py:99 | | Normalize | /PaddleX/paddlex/inference/models/object_detection/processors.py:145 | | ToCHWImage | /PaddleX/paddlex/inference/models/object_detection/processors.py:158 | | ToBatch | /PaddleX/paddlex/inference/models/object_detection/processors.py:216 | | PaddleCopyToDevice | /PaddleX/paddlex/inference/models/common/static_infer.py:214 | | PaddleModelInfer | /PaddleX/paddlex/inference/models/common/static_infer.py:234 | | PaddleCopyToHost | /PaddleX/paddlex/inference/models/common/static_infer.py:223 | | DetPostProcess | /PaddleX/paddlex/inference/models/object_detection/processors.py:773 | +--------------------+----------------------------------------------------------------------+ Detail Data +-------+------------+-----------+--------------------+------------------------+----------------------------+ | Iters | Batch Size | Instances | Operation | Avg Time Per Iter (ms) | Avg Time Per Instance (ms) | +-------+------------+-----------+--------------------+------------------------+----------------------------+ | 10 | 2 | 20 | ReadImage | 76.22221033 | 38.11110517 | | 10 | 2 | 20 | Resize | 12.02824502 | 6.01412251 | | 10 | 2 | 20 | Normalize | 6.14072606 | 3.07036303 | | 10 | 2 | 20 | ToCHWImage | 0.00533939 | 0.00266969 | | 10 | 2 | 20 | ToBatch | 0.93134162 | 0.46567081 | | 10 | 2 | 20 | PaddleCopyToDevice | 0.92240779 | 0.46120390 | | 10 | 2 | 20 | PaddleModelInfer | 9.66330138 | 4.83165069 | | 10 | 2 | 20 | PaddleCopyToHost | 0.06802108 | 0.03401054 | | 10 | 2 | 20 | DetPostProcess | 0.18665448 | 0.09332724 | +-------+------------+-----------+--------------------+------------------------+----------------------------+ Summary Data +-------+------------+-----------+----------------+------------------------+----------------------------+ | Iters | Batch Size | Instances | Type | Avg Time Per Iter (ms) | Avg Time Per Instance (ms) | +-------+------------+-----------+----------------+------------------------+----------------------------+ | 10 | 2 | 20 | Preprocessing | 95.32786242 | 47.66393121 | | 10 | 2 | 20 | Inference | 10.65373025 | 5.32686512 | | 10 | 2 | 20 | Postprocessing | 0.18665448 | 0.09332724 | | 10 | 2 | 20 | Core | 106.16824715 | 53.08412358 | | 10 | 2 | 20 | Other | 2.74794563 | 1.37397281 | | 10 | 2 | 20 | End-to-End | 108.91619278 | 54.45809639 | +-------+------------+-----------+----------------+------------------------+----------------------------+ ``` Additionally, since `PADDLE_PDX_INFER_BENCHMARK_OUTPUT_DIR=./benchmark` is set, the above results will be saved locally in `./benchmark/detail.csv` and `./benchmark/summary.csv`. The contents of `detail.csv` are as follows: ```csv Iters,Batch Size,Instances,Operation,Avg Time Per Iter (ms),Avg Time Per Instance (ms) 10,2,20,ReadImage,76.22221033,38.11110517 10,2,20,Resize,12.02824502,6.01412251 10,2,20,Normalize,6.14072606,3.07036303 10,2,20,ToCHWImage,0.00533939,0.00266969 10,2,20,ToBatch,0.93134162,0.46567081 10,2,20,PaddleCopyToDevice,0.92240779,0.46120390 10,2,20,PaddleModelInfer,9.66330138,4.83165069 10,2,20,PaddleCopyToHost,0.06802108,0.03401054 10,2,20,DetPostProcess,0.18665448,0.09332724 ``` The contents of `summary.csv` are as follows: ```csv Iters,Batch Size,Instances,Type,Avg Time Per Iter (ms),Avg Time Per Instance (ms) 10,2,20,Preprocessing,95.32786242,47.66393121 10,2,20,Inference,10.65373025,5.32686512 10,2,20,Postprocessing,0.18665448,0.09332724 10,2,20,Core,106.16824715,53.08412358 10,2,20,Other,2.74794563,1.37397281 10,2,20,End-to-End,108.91619278,54.45809639 ```