---
comments: true
---
# 3D Multi-modal Fusion Detection Pipeline Usage Tutorial
## 1. Introduction to 3D Multi-modal Fusion Detection Pipeline
The 3D multi-modal fusion detection pipeline supports input from multiple sensors (LiDAR, surround RGB cameras, etc.), processes the data using deep learning methods, and outputs information such as the position, shape, orientation, and category of objects in three-dimensional space. It has a wide range of applications in fields such as autonomous driving, robot navigation, and industrial automation.
BEVFusion is a multi-modal 3D object detection model that fuses surround camera images and LiDAR point cloud data into a unified Bird's Eye View (BEV) representation, aligning and fusing features from different sensors, overcoming the limitations of a single sensor, and significantly improving detection accuracy and robustness. It is suitable for complex scenarios such as autonomous driving.
The 3D multi-modal fusion detection pipeline includes a 3D multi-modal fusion detection module,which contains a BEVFusion model. We provide benchmark data for this model:
3D Multi-modal Fusion Detection Module:
| Model | Model Download Link | mAP(%) | NDS | Description |
|---|---|---|---|---|
| BEVFusion | Inference Model/Training Model | 53.9 | 60.9 | BEVFusion is a multi-modal fusion detection model from a BEV perspective. It uses two branches to process data from different modalities, obtaining features for LiDAR and camera in the BEV perspective. The camera branch uses the LSS bottom-up approach to explicitly generate image BEV features, while the LiDAR branch uses a classic point cloud detection network. Finally, the BEV features from both modalities are aligned and fused, applied to the detection head or segmentation head. |
Note: The above accuracy metrics are for the nuscenes validation set with mAP(0.5:0.95) and NDS 60.9, with an accuracy type of FP32.
## 2. Quick Start The pre-trained model pipelines provided by PaddleX allow for quick experimentation. You can experience the effects of the 3D multi-modal fusion detection pipeline online or locally using the command line or Python. ### 2.1 Online Experience Online experience is currently not supported. ### 2.2 Local Experience > ❗ Before using the 3D multi-modal fusion detection pipeline locally, please ensure you have completed the PaddleX wheel package installation according to [the PaddleX Installation Tutorial](../../../installation/installation.md). Demo dataset download: You can use the following command to download the demo dataset to a specified folder: ```bash wget https://paddle-model-ecology.bj.bcebos.com/paddlex/data/nuscenes_demo.tar -P ./data tar -xf ./data/nuscenes_demo.tar -C ./data/ ``` #### 2.2.1 Command Line Experience You can quickly experience the 3D multi-modal fusion detection pipeline with a single command. Use the [test file](https://paddle-model-ecology.bj.bcebos.com/paddlex/det_3d/demo_det_3d/nuscenes_demo_infer.tar),and `--input` replace with the local path for prediction. ```bash paddlex --pipeline 3d_bev_detection \ --input nuscenes_demo_infer.tar \ --device gpu:0 ``` Parameter description: ``` --pipeline: The name of the pipeline, here it is the 3D multi-modal fusion detection pipeline. --input: The input path to the .tar file containing image and lidar data to be processed. 3D multi-modal fusion detection pipeline is a multi-input pipeline depending on images, pointclouds and transition matrix information. Tar file contains "samples" directory with all images and pointclouds data, "sweeps" directories with pointclouds data of relative frames and nuscnes_infos_val.pkl file containing relataive data path from "samples" and "sweeps" directories and transition matrix infomation. --device: The GPU index to be used (e.g., gpu:0 means using the 0th GPU, gpu:1,2 means using the 1st and 2nd GPUs), or you can choose to use CPU (--device cpu). ``` After running, the results will be printed on the terminal as follows: ```bash {"res": { 'input_path': 'samples/LIDAR_TOP/n015-2018-10-08-15-36-50+0800__LIDAR_TOP__1538984253447765.pcd.bin', 'sample_id': 'b4ff30109dd14c89b24789dc5713cf8c', 'input_img_paths': [ 'samples/CAM_FRONT_LEFT/n015-2018-10-08-15-36-50+0800__CAM_FRONT_LEFT__1538984253404844.jpg', 'samples/CAM_FRONT/n015-2018-10-08-15-36-50+0800__CAM_FRONT__1538984253412460.jpg', 'samples/CAM_FRONT_RIGHT/n015-2018-10-08-15-36-50+0800__CAM_FRONT_RIGHT__1538984253420339.jpg', 'samples/CAM_BACK_RIGHT/n015-2018-10-08-15-36-50+0800__CAM_BACK_RIGHT__1538984253427893.jpg', 'samples/CAM_BACK/n015-2018-10-08-15-36-50+0800__CAM_BACK__1538984253437525.jpg', 'samples/CAM_BACK_LEFT/n015-2018-10-08-15-36-50+0800__CAM_BACK_LEFT__1538984253447423.jpg' ] "boxes_3d": [ [ 14.5425386428833, 22.142045974731445, -1.2903141975402832, 1.8441576957702637, 4.433370113372803, 1.7367216348648071, 6.367165565490723, 0.0036598597653210163, -0.013568558730185032 ] ], "labels_3d": [ 0 ], "scores_3d": [ 0.9920279383659363 ] } } ``` The meanings of the result parameters are as follows: - `input_path`: Indicates the path to the input point cloud data of the sample to be predicted. - `sample_id`: Indicates the unique identifier of the input sample to be predicted. - `input_img_paths`: Indicates the paths to the input image data of the sample to be predicted. - `boxes_3d`: Represents all the predicted bounding box information for the 3D sample. Each bounding box information is a list of length 9, with each element representing: - 0: x-coordinate of the center point - 1: y-coordinate of the center point - 2: z-coordinate of the center point - 3: Width of the detection box - 4: Length of the detection box - 5: Height of the detection box - 6: Rotation angle - 7: Velocity in the x-direction of the coordinate system - 8: Velocity in the y-direction of the coordinate system - `labels_3d`: Represents the predicted categories corresponding to all the predicted bounding boxes of the 3D sample. - `scores_3d`: Represents the confidence levels corresponding to all the predicted bounding boxes of the 3D sample. #### 2.2.2 Python Script Integration * The above command line is for quick experience. Generally, in projects, integration through code is often required. You can complete quick inference of the pipeline with a few lines of code as follows: ```python from paddlex import create_pipeline pipeline = create_pipeline(pipeline="3d_bev_detection") output = pipeline.predict("nuscenes_demo_infer.tar") for res in output: res.print() ## Print the structured output of the prediction res.save_to_json("./output/") ## Save the results to a json file ``` In the above Python script, the following steps are executed: (1)Call `create_pipeline` to instantiate the 3D multi-modal fusion detection pipeline object. Specific parameter descriptions are as follows:| Parameter | Parameter Description | Parameter Type | Default Value |
|---|---|---|---|
pipeline |
The name of the pipeline or the path to the pipeline configuration file. If it is a pipeline name, it must be a pipeline supported by PaddleX. | str |
None |
device |
The device for pipeline model inference. Supports: "gpu", "cpu". | str |
gpu |
use_hpip |
Whether to enable high-performance inference, only available when the pipeline supports high-performance inference. | bool |
False |
| Parameter Type | Parameter Description |
|---|---|
| str | tar file path,e.g., /root/data/nuscenes_demo_infer.tar |
| list | List,list elements need to be data of the above type, e.g., ["/root/data/nuscenes_demo_infer1.tar", "/root/data/nuscenes_demo_infer2.tar"] |
Note: pkl files can be created according to the script.
(3)Call the `predict` method to obtain prediction results: The `predict` method is a `generator`, so prediction results need to be obtained through iteration. The `predict` method predicts data in batches, so the prediction results are represented as a list of prediction results. (4)Process the prediction results: The prediction result for each sample is of `dict` type and supports printing or saving as a json file, as follows:| Method | Description | Method Parameters |
|---|---|---|
| Print results to the terminal | - format_json:bool, whether to format the output content using json indentation, default is True;- indent:int, json formatting setting, only effective when format_json is True, default is 4;- ensure_ascii:bool, json formatting setting, only effective when format_json is True, default is False; |
|
| save_to_json | Save the results as a json format file | - save_path:str, the file path to save, when it is a directory, the saved file naming is consistent with the input file type naming;- indent:int, json formatting setting, default is 4;- ensure_ascii:bool, json formatting setting, default is False; |