This section will introduce how to use the Labelme annotation tool to complete data annotation for image feature-related single models. Click the link above to install the data annotation tool and view detailed usage instructions by referring to the homepage documentation.
Labelme is a Python-based image annotation software with a graphical user interface. It can be used for tasks such as image classification, object detection, and image segmentation. In image feature annotation tasks, labels are stored as JSON files.
To avoid environment conflicts, it is recommended to install in a conda environment.
conda create -n labelme python=3.10
conda activate labelme
pip install pyqt5
pip install labelme
pets.images directory (must be named images) within pets and store the images to be annotated in the images directory, as shown below:flags.txt in the pets folder for the dataset to be annotated, and write the categories of the dataset to be annotated into flags.txt line by line. Taking the flags.txt for a cat-dog classification dataset as an example, as shown below:Navigate to the root directory of the dataset to be annotated in the terminal and start the labelme annotation tool.
cd path/to/pets
labelme images --nodata --autosave --output annotations --flags flags.txt
flags creates classification labels for images, passing in the label path.nodata stops storing image data in JSON files.autosave enables automatic saving.output specifies the storage path for label files.labelme, it will look like this:Flags interface.output is not specified when starting labelme, it will prompt to select the save path upon the first save. If autosave is enabled, there is no need to click the Save button).Next Image to annotate the next image.After annotating all images, use the convert_to_imagenet.py to convert the annotated dataset to ImageNet-1k dataset format, generating train.txt, val.txt, and label.txt.
python convert_to_imagenet.py --dataset_path /path/to/dataset
dataset_path is the annotated labelme format classification dataset.
After obtaining data in LabelMe format, the data format needs to be converted to ShiTuRecDataset format. Below is a code example that demonstrates how to convert the data labeled using LabelMe according to the previous tutorial.
# Download and unzip the LabelMe format example dataset
cd /path/to/paddlex
wget https://paddle-model-ecology.bj.bcebos.com/paddlex/data/image_classification_labelme_examples.tar -P ./dataset
tar -xf ./dataset/image_classification_labelme_examples.tar -C ./dataset/
# Convert the LabelMe example dataset
python main.py -c paddlex/configs/general_recognition/PP-ShiTuV2_rec.yaml \
-o Global.mode=check_dataset \
-o Global.dataset_dir=./dataset/image_classification_labelme_examples \
-o CheckDataset.convert.enable=True \
-o CheckDataset.convert.src_dataset_type=LabelMe
The dataset defined by PaddleX for image classification tasks is named ShiTuRecDataset, with the following organizational structure and annotation format:
dataset_dir # Root directory of the dataset, the directory name can be changed
├── images # Directory where images are saved, the directory name can be changed, but should correspond to the content of train.txt, query.txt, gallery.txt
├── gallery.txt # Annotation file for the gallery set, the file name cannot be changed. Each line gives the path of the image to be retrieved and its feature label, separated by a space. Example: images/WOMEN/Blouses_Shirts/id_00000001/02_2_side.jpg 3997
└── query.txt # Annotation file for the query set, the file name cannot be changed. Each line gives the path of the database image and its feature label, separated by a space. Example: images/WOMEN/Blouses_Shirts/id_00000001/02_1_front.jpg 3997
The annotation files use an image feature format. Please refer to the above specifications to prepare your data. Additionally, you can refer to the example dataset.