简体中文 | English
This section will introduce how to use Labelme and PaddleLabel annotation tools to complete data annotation for multi-label classification tasks with a single model. Click on the above links to install the annotation tools and view detailed usage instructions by referring to the homepage documentation.
This dataset is manually collected, covering two categories: safety helmets and human heads, with photos taken from different angles. Image examples:
Labelme is a Python-based image annotation software with a graphical user interface. It can be used for image classification, object detection, image segmentation, and other tasks. In object detection annotation tasks, labels are stored as JSON files.
To avoid environment conflicts, it is recommended to install in a conda environment.
conda create -n labelme python=3.10
conda activate labelme
pip install pyqt5
pip install labelme
helmet.images directory (must be named images) within helmet and store the images to be annotated in the images directory, as shown below:label.txt in the helmet folder and write the categories of the dataset to be annotated into label.txt by line. For example, for a helmet detection dataset, label.txt would look like this:Navigate to the root directory of the dataset to be annotated in the terminal and start the Labelme annotation tool:
cd path/to/helmet
labelme images --labels label.txt --nodata --autosave --output annotations
flags creates classification labels for images, passing in the label path.nodata stops storing image data in the JSON file.autosave enables automatic saving.output specifies the storage path for label files.Labelme, it will look like this:output field is not specified when starting Labelme, it will prompt you to select a save path the first time you save. If the autosave field is used for automatic saving, there is no need to click the Save button).Next Image to label the next image.Labelme format
train_anno_list.txt and val_anno_list.txt, in the root directory of the dataset. Write the paths of all json files in the annotations directory into train_anno_list.txt and val_anno_list.txt at a certain ratio, or write all of them into train_anno_list.txt and create an empty val_anno_list.txt file. Use the data splitting function to re-split. The specific filling format of train_anno_list.txt and val_anno_list.txt is shown below:
After labeling with Labelme, the data format needs to be converted to coco format. Below is a code example for converting the data labeled using Labelme according to the above tutorial:
cd /path/to/paddlex
wget https://paddle-model-ecology.bj.bcebos.com/paddlex/data/det_labelme_examples.tar -P ./dataset
tar -xf ./dataset/det_labelme_examples.tar -C ./dataset/
python main.py -c paddlex/configs/obeject_detection/PicoDet-L.yaml \
-o Global.mode=check_dataset \
-o Global.dataset_dir=./dataset/det_labelme_examples \
-o CheckDataset.convert.enable=True \
-o CheckDataset.convert.src_dataset_type=LabelMe
To avoid environment conflicts, it is recommended to create a clean conda environment:
conda create -n paddlelabel python=3.11
conda activate paddlelabel
It can also be installed with pip in one step
pip install --upgrade paddlelabel
pip install a2wsgi uvicorn==0.18.1
pip install connexion==2.14.1
pip install Flask==2.2.2
pip install Werkzeug==2.2.2
After successful installation, you can start PaddleLabel using one of the following commands in the terminal:
paddlelabel # Start paddlelabel
pdlabel # Abbreviation, identical to paddlelabel
PaddleLabel will automatically open a webpage in the browser after starting. Next, you can start the annotation process based on the task.
Export Annotated Files
coco.Adjust directories to obtain COCO-formatted dataset for helmet detection
json files and the image directory as follows:| Original File/Directory Name | Renamed File/Directory Name |
|-|-|
|train.json|instance_train.json|
|val.json|instance_val.json|
|test.json|instance_test.json|
|image|images|
annotations directory in the root of the dataset and move all json files into it. The final dataset structure should look like this:helmet directory into a .tar or .zip file to obtain the COCO-formatted dataset for helmet detection.After obtaining data in COCO format, you need to convert the data format to MLClsDataset format. Below is a code example that follows the previous tutorial to use LabelMe or PaddleLabel annotated data and perform data format conversion:
# Download and unzip the COCO example dataset
cd /path/to/paddlex
wget https://paddle-model-ecology.bj.bcebos.com/paddlex/data/det_coco_examples.tar -P ./dataset
tar -xf ./dataset/det_coco_examples.tar -C ./dataset/
# Convert the COCO example dataset to MLClsDataset
python main.py -c paddlex/configs/multilabel_classification/PP-LCNet_x1_0_ML.yaml \
-o Global.mode=check_dataset \
-o Global.dataset_dir=./dataset/det_coco_examples \
-o CheckDataset.convert.enable=True \
-o CheckDataset.convert.src_dataset_type=COCO
The dataset defined by PaddleX for image multi-label classification tasks is named MLClsDataset, with the following directory structure and annotation format:
dataset_dir # Root directory of the dataset, the directory name can be changed
├── images # Directory where images are saved, the directory name can be changed, but note the correspondence with the content of train.txt and val.txt
├── label.txt # Correspondence between annotation IDs and category names, the file name cannot be changed. Each line gives the category ID and category name, for example: 45 wallflower
├── train.txt # Annotation file for the training set, the file name cannot be changed. Each line gives the image path and multi-label classification tags for the image, separated by spaces, for example: images/0041_2456602544.jpg 0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
└── val.txt # Annotation file for the validation set, the file name cannot be changed. Each line gives the image path and multi-label classification tags for the image, separated by spaces, for example: images/0045_845243484.jpg 0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
The annotation files use the multi-label classification format. Please prepare your data according to the above specifications. Additionally, you can refer to the example dataset.