[简体中文](object_detection.md) | English
# PaddleX Object Detection Task Data Preparation Tutorial
This section will introduce how to use [Labelme](https://github.com/wkentaro/labelme) and [PaddleLabel](https://github.com/PaddleCV-SIG/PaddleLabel) annotation tools to complete data annotation for single-model object detection tasks. Click the above links to install the annotation tools and view detailed usage instructions.
## 1. Annotation Data Examples
## 2. Labelme Annotation
### 2.1 Introduction to Labelme Annotation Tool
`Labelme` is a Python-based image annotation software with a graphical user interface. It can be used for tasks such as image classification, object detection, and image segmentation. For object detection annotation tasks, labels are stored as `JSON` files.
### 2.2 Labelme Installation
To avoid environment conflicts, it is recommended to install in a `conda` environment.
```bash
conda create -n labelme python=3.10
conda activate labelme
pip install pyqt5
pip install labelme
```
### 2.3 Labelme Annotation Process
#### 2.3.1 Prepare Data for Annotation
* Create a root directory for the dataset, e.g., `helmet`.
* Create an `images` directory (must be named `images`) within `helmet` and store the images to be annotated in the `images` directory, as shown below:

* Create a category label file `label.txt` in the `helmet` folder and write the categories of the dataset to be annotated, line by line. For example, for a helmet detection dataset, `label.txt` would look like this:

#### 2.3.2 Start Labelme
Navigate to the root directory of the dataset to be annotated in the terminal and start the `Labelme` annotation tool:
```bash
cd path/to/helmet
labelme images --labels label.txt --nodata --autosave --output annotations
```
* `flags` creates classification labels for images, passing in the path to the labels.
* `nodata` stops storing image data in the `JSON` file.
* `autosave` enables automatic saving.
* `output` specifies the path for storing label files.
#### 2.3.3 Begin Image Annotation
* After starting `Labelme`, it will look like this:

* Click "Edit" to select the annotation type.

* Choose to create a rectangular box.

* Drag the crosshair to select the target area on the image.

* Click again to select the category of the target box.

* After annotation, click Save. (If `output` is not specified when starting `Labelme`, it will prompt to select a save path upon the first save. If `autosave` is enabled, no need to click Save.)

* Click `Next Image` to annotate the next.

* The final labeled tag file looks like this:

* Adjust the directory to obtain the safety helmet detection dataset in the standard `Labelme` format
* Create two text files, `train_anno_list.txt` and `val_anno_list.txt`, in the root directory of the dataset. Write the paths of all `json` files in the `annotations` directory into `train_anno_list.txt` and `val_anno_list.txt` at a certain ratio, or write all of them into `train_anno_list.txt` and create an empty `val_anno_list.txt` file. Use the data splitting function to re-split. The specific filling format of `train_anno_list.txt` and `val_anno_list.txt` is shown below:

* The final directory structure after organization is as follows:

#### 2.3.4 Format Conversion
After labeling with `Labelme`, the data format needs to be converted to `coco` format. Below is a code example for converting the data labeled using `Labelme` according to the above tutorial:
```bash
cd /path/to/paddlex
wget https://paddle-model-ecology.bj.bcebos.com/paddlex/data/det_labelme_examples.tar -P ./dataset
tar -xf ./dataset/det_labelme_examples.tar -C ./dataset/
python main.py -c paddlex/configs/obeject_detection/PicoDet-L.yaml \
-o Global.mode=check_dataset \
-o Global.dataset_dir=./dataset/det_labelme_examples \
-o CheckDataset.convert.enable=True \
-o CheckDataset.convert.src_dataset_type=LabelMe
```
## 3. PaddleLabel Annotation
### 3.1 Installation and Startup of PaddleLabel
* To avoid environment conflicts, it is recommended to create a clean `conda` environment:
```bash
conda create -n paddlelabel python=3.11
conda activate paddlelabel
```
* Alternatively, you can install it with `pip` in one command:
```bash
pip install --upgrade paddlelabel
pip install a2wsgi uvicorn==0.18.1
pip install connexion==2.14.1
pip install Flask==2.2.2
pip install Werkzeug==2.2.2
```
* After successful installation, you can start PaddleLabel using one of the following commands in the terminal:
```bash
paddlelabel # Start paddlelabel
pdlabel # Abbreviation, identical to paddlelabel
```
PaddleLabel will automatically open a webpage in your browser after startup. You can then proceed with the annotation process based on your task.
### 3.2 Annotation Process of PaddleLabel
* Open the automatically popped-up webpage, click on the sample project, and then click on Object Detection.

* Fill in the project name and dataset path. Note that the path should be an absolute path on your local machine. Click Create when done.

* First, define the categories that need to be annotated. Taking layout analysis as an example, provide 10 categories, each with a unique corresponding ID. Click Add Category to create the required category names.
* Start Annotating
* First, select the label you need to annotate.
* Click the rectangular selection button on the left.
* Draw a bounding box around the desired area in the image, paying attention to semantic partitioning. If there are multiple columns, please annotate each separately.
* After completing the annotation, the annotation result will appear in the lower right corner. You can check if the annotation is correct.
* When all annotations are complete, click **Project Overview**.

* Export Annotation Files
* In Project Overview, divide the dataset as needed, then click Export Dataset.

* Fill in the export path and export format. The export path should also be an absolute path, and the export format should be selected as `coco`.

* After successful export, you can obtain the annotation files in the specified path.

* Adjust the directory to obtain the standard `coco` format dataset for helmet detection
* Rename the three `json` files and the `image` directory according to the following correspondence:
|Original File (Directory) Name|Renamed File (Directory) Name|
|-|-|
|`train.json`|`instance_train.json`|
|`val.json`|`instance_val.json`|
|`test.json`|`instance_test.json`|
|`image`|`images`|
* Create an `annotations` directory in the root directory of the dataset and move all `json` files to the `annotations` directory. The final dataset directory structure will look like this:

* Compress the `helmet` directory into a `.tar` or `.zip` format compressed package to obtain the standard `coco` format dataset for helmet detection.
## 4. Data Format
The dataset defined by PaddleX for object detection tasks is named `COCODetDataset`, with the following organizational structure and annotation format:
```bash
dataset_dir # Root directory of the dataset, the directory name can be changed
├── annotations # Directory for saving annotation files, the directory name cannot be changed
│ ├── instance_train.json # Annotation file for the training set, the file name cannot be changed, using COCO annotation format
│ └── instance_val.json # Annotation file for the validation set, the file name cannot be changed, using COCO annotation format
└── images # Directory for saving images, the directory name cannot be changed
```
The annotation files use the COCO format. Please prepare your data according to the above specifications. Additionally, you can refer to the [example dataset](https://paddle-model-ecology.bj.bcebos.com/paddlex/data/det_coco_examples.tar).