--- comments: true --- # PaddleX Object Detection Task Data Preparation Tutorial This section will introduce how to use [Labelme](https://github.com/wkentaro/labelme) and [PaddleLabel](https://github.com/PaddleCV-SIG/PaddleLabel) annotation tools to complete data annotation for single-model object detection tasks. Click the above links to install the annotation tools and view detailed usage instructions. ## 1. Annotation Data Examples
* Create a category label file `label.txt` in the `helmet` folder and write the categories of the dataset to be annotated, line by line. For example, for a helmet detection dataset, `label.txt` would look like this:
#### 2.3.2 Start Labelme
Navigate to the root directory of the dataset to be annotated in the terminal and start the `Labelme` annotation tool:
```bash
cd path/to/helmet
labelme images --labels label.txt --nodata --autosave --output annotations
```
* `flags` creates classification labels for images, passing in the path to the labels.
* `nodata` stops storing image data in the `JSON` file.
* `autosave` enables automatic saving.
* `output` specifies the path for storing label files.
#### 2.3.3 Begin Image Annotation
* After starting `Labelme`, it will look like this:
* Click "Edit" to select the annotation type.
* Choose to create a rectangular box.
* Drag the crosshair to select the target area on the image.
* Click again to select the category of the target box.
* After annotation, click Save. (If `output` is not specified when starting `Labelme`, it will prompt to select a save path upon the first save. If `autosave` is enabled, no need to click Save.)
* Click `Next Image` to annotate the next.
* The final labeled tag file looks like this:
* Adjust the directory to obtain the safety helmet detection dataset in the standard `Labelme` format
* Create two text files, `train_anno_list.txt` and `val_anno_list.txt`, in the root directory of the dataset. Write the paths of all `json` files in the `annotations` directory into `train_anno_list.txt` and `val_anno_list.txt` at a certain ratio, or write all of them into `train_anno_list.txt` and create an empty `val_anno_list.txt` file. Use the data splitting function to re-split. The specific filling format of `train_anno_list.txt` and `val_anno_list.txt` is shown below:
* The final directory structure after organization is as follows:
#### 2.3.4 Format Conversion
After labeling with `Labelme`, the data format needs to be converted to `coco` format. Below is a code example for converting the data labeled using `Labelme` according to the above tutorial:
```bash
cd /path/to/paddlex
wget https://paddle-model-ecology.bj.bcebos.com/paddlex/data/det_labelme_examples.tar -P ./dataset
tar -xf ./dataset/det_labelme_examples.tar -C ./dataset/
python main.py -c paddlex/configs/obeject_detection/PicoDet-L.yaml \
-o Global.mode=check_dataset \
-o Global.dataset_dir=./dataset/det_labelme_examples \
-o CheckDataset.convert.enable=True \
-o CheckDataset.convert.src_dataset_type=LabelMe
```
## 3. PaddleLabel Annotation
### 3.1 Installation and Startup of PaddleLabel
* To avoid environment conflicts, it is recommended to create a clean `conda` environment:
```bash
conda create -n paddlelabel python=3.11
conda activate paddlelabel
```
* Alternatively, you can install it with `pip` in one command:
```bash
pip install --upgrade paddlelabel
pip install a2wsgi uvicorn==0.18.1
pip install connexion==2.14.1
pip install Flask==2.2.2
pip install Werkzeug==2.2.2
```
* After successful installation, you can start PaddleLabel using one of the following commands in the terminal:
```bash
paddlelabel # Start paddlelabel
pdlabel # Abbreviation, identical to paddlelabel
```
PaddleLabel will automatically open a webpage in your browser after startup. You can then proceed with the annotation process based on your task.
### 3.2 Annotation Process of PaddleLabel
* Open the automatically popped-up webpage, click on the sample project, and then click on Object Detection.
* Fill in the project name and dataset path. Note that the path should be an absolute path on your local machine. Click Create when done.
* First, define the categories that need to be annotated. Taking layout analysis as an example, provide 10 categories, each with a unique corresponding ID. Click Add Category to create the required category names.
* Start Annotating
* First, select the label you need to annotate.
* Click the rectangular selection button on the left.
* Draw a bounding box around the desired area in the image, paying attention to semantic partitioning. If there are multiple columns, please annotate each separately.
* After completing the annotation, the annotation result will appear in the lower right corner. You can check if the annotation is correct.
* When all annotations are complete, click Project Overview.
* Export Annotation Files
* In Project Overview, divide the dataset as needed, then click Export Dataset.
* Fill in the export path and export format. The export path should also be an absolute path, and the export format should be selected as `coco`.
* After successful export, you can obtain the annotation files in the specified path.
* Adjust the directory to obtain the standard `coco` format dataset for helmet detection
* Rename the three `json` files and the `image` directory according to the following correspondence:
| Original File (Directory) Name | Renamed File (Directory) Name |
|---|---|
train.json |
instance_train.json |
val.json |
instance_val.json |
test.json |
instance_test.json |
image |
images |
* Compress the `helmet` directory into a `.tar` or `.zip` format compressed package to obtain the standard `coco` format dataset for helmet detection.
## 4. Data Format
The dataset defined by PaddleX for object detection tasks is named `COCODetDataset`, with the following organizational structure and annotation format:
```bash
dataset_dir # Root directory of the dataset, the directory name can be changed
├── annotations # Directory for saving annotation files, the directory name cannot be changed
│ ├── instance_train.json # Annotation file for the training set, the file name cannot be changed, using COCO annotation format
│ └── instance_val.json # Annotation file for the validation set, the file name cannot be changed, using COCO annotation format
└── images # Directory for saving images, the directory name cannot be changed
```
The annotation files use the COCO format. Please prepare your data according to the above specifications. Additionally, you can refer to the [example dataset](https://paddle-model-ecology.bj.bcebos.com/paddlex/data/det_coco_examples.tar).