This document will introduce how to use the Labelme annotation tool to complete data annotation for a single model related to instance segmentation. Click on the link above to install the data annotation tool and view detailed usage instructions by referring to the homepage documentation.
This dataset is a fruit instance segmentation dataset, covering five different types of fruits, including photos taken from different angles of the targets. Image examples:
Labelme is a Python-based image annotation software with a graphical user interface. It can be used for tasks such as image classification, object detection, and image segmentation. For instance segmentation annotation tasks, labels are stored as JSON files.
To avoid environment conflicts, it is recommended to install in a conda environment.
conda create -n labelme python=3.10
conda activate labelme
pip install pyqt5
pip install labelme
fruit.images directory (must be named images) within fruit and store the images to be annotated in the images directory, as shown below:label.txt in the fruit folder for the dataset to be annotated, and write the categories of the dataset to be annotated into label.txt by line. Taking the fruit instance segmentation dataset's label.txt as an example, as shown below:Navigate to the root directory of the dataset to be annotated in the terminal and start the labelme annotation tool.
cd path/to/fruit
labelme images --labels label.txt --nodata --autosave --output annotations
labels is the path to the category labels.nodata stops storing image data in the JSON file.autosave enables automatic saving.output specifies the path for storing label files.labelme, it will look like this:Edit to select the annotation type, choose Create Polygons.output is not specified when starting labelme, it will prompt to select a save path upon the first save. If autosave is specified, there is no need to click the Save button).Next Image to annotate the next image.Adjusting Directory Structure to Obtain a Standard labelme Format Dataset for Fruit Instance Segmentation
Prepare the train_anno_list.txt and val_anno_list.txt text files in the root directory of your dataset. Populate these files with the paths of all json files in the annotations directory, distributing them into train_anno_list.txt and val_anno_list.txt at a specified ratio. Alternatively, you can include all paths in train_anno_list.txt and create an empty val_anno_list.txt file, intending to use a zero-code data splitting feature for re-partitioning upon upload. The specific format for filling train_anno_list.txt and val_anno_list.txt is illustrated as follows:
fruit directory into a .tar or .zip format archive to obtain the standard labelme format dataset for fruit instance segmentation.PaddleX defines a dataset named COCOInstSegDataset for instance segmentation tasks, with the following organizational structure and annotation format:
dataset_dir # Root directory of the dataset, the directory name can be changed
├── annotations # Directory for saving annotation files, the directory name cannot be changed
│ ├── instance_train.json # Training set annotation file, the file name cannot be changed, using COCO annotation format
│ └── instance_val.json # Validation set annotation file, the file name cannot be changed, using COCO annotation format
└── images # Directory for saving images, the directory name cannot be changed
Annotation files adopt the COCO format. Please refer to the above specifications for data preparation. Additionally, refer to: Example Dataset.
When using PaddleX 2.x version for instance segmentation datasets, please refer to the corresponding format conversion section in Instance Segmentation Module Development Tutorial to convert VOC format datasets to COCO datasets. (Note in module development documentation)
Note:
COCO data format to annotate the pixel boundaries and categories of each target area in each image in the dataset. The polygon boundaries (segmentation) of objects are represented as [x1,y1,x2,y2,...,xn,yn], where (xn,yn) denotes the coordinates of each corner point of the polygon. Annotation information is stored in json files under the annotations directory, with separate files for the training set (instance_train.json) and validation set (instance_val.json).LabelMe for data annotation. PaddleX Pipelines support data format conversion for datasets annotated with LabelMe.