--- comments: true --- # Open-Vocabulary Segmentation Module Tutorial ## I. Overview Open-vocabulary segmentation is an image segmentation task that aims to segment objects in an image based on additional information such as text descriptions, bounding boxes, keypoints, etc., rather than just the image itself. It allows the model to handle a wide range of object categories without a predefined list. This technology combines visual and multimodal techniques, significantly enhancing the flexibility and accuracy of image processing. Open-vocabulary segmentation has important applications in the field of computer vision, especially in object segmentation tasks in complex scenes. ## II. Supported Model List
| Model | Model Download Link | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
Model Storage Size (MB) | Description |
|---|---|---|---|---|---|
| SAM-H_box | Inference Model | - / - | - / - | 2433.7 | SAM (Segment Anything Model) is an advanced image segmentation model that can segment any object in an image based on simple user-provided prompts (such as points, boxes, or text). Trained on the SA-1B dataset with ten million images and 1.1 billion mask annotations, it performs well in most scenarios. |
| SAM-H_point | Inference Model | - / - | - / - | 2433.7 |
| Mode | GPU Configuration | CPU Configuration | Acceleration Technology Combination |
|---|---|---|---|
| Normal Mode | FP32 Precision / No TRT Acceleration | FP32 Precision / 8 Threads | PaddleInference |
| High-Performance Mode | Optimal combination of pre-selected precision types and acceleration strategies | FP32 Precision / 8 Threads | Pre-selected optimal backend (Paddle/OpenVINO/TRT, etc.) |
Note: Due to network issues, the parsing of the above URL may not have been successful. If you need the content of this webpage, please check the validity of the URL and try again.
Related methods and parameter explanations are as follows:
* `create_model` instantiates an open-vocabulary segmentation model (using `SAM-H_box` as an example). The specific explanations are as follows:
| Parameter | Parameter Description | Parameter Type | Options | Default Value |
|---|---|---|---|---|
model_name |
The name of the model | str |
None | None |
model_dir |
The storage path of the model | str |
None | None |
device |
The device used for model inference | str |
It supports specifying specific GPU card numbers, such as "gpu:0", other hardware card numbers, such as "npu:0", or CPU, such as "cpu". | gpu:0 |
use_hpip |
Whether to enable the high-performance inference plugin | bool |
None | False |
hpi_config |
High-performance inference configuration | dict | None |
None | None |
| Parameter | Parameter Description | Parameter Type | Options | Default Value |
|---|---|---|---|---|
input |
Data to be predicted, supporting multiple input types | Python Var/str/list |
|
None |
batch_size |
Batch size | int |
Any integer | 1 |
prompts |
Prompts used by the model | dict |
|
None |
| Method | Method Description | Parameter | Parameter Type | Parameter Description | Default Value |
|---|---|---|---|---|---|
print() |
Print the results to the terminal | format_json |
bool |
Whether to format the output content using JSON indentation |
True |
indent |
int |
Specify the indentation level to beautify the output JSON data and make it more readable. This is only effective when format_json is True |
4 | ||
ensure_ascii |
bool |
Control whether non-ASCII characters are escaped to Unicode. When set to True, all non-ASCII characters will be escaped; False retains the original characters. This is only effective when format_json is True |
False |
||
save_to_json() |
Save the results as a file in JSON format | save_path |
str |
The file path for saving. When it is a directory, the saved file name will be consistent with the input file name | None |
indent |
int |
Specify the indentation level to beautify the output JSON data and make it more readable. This is only effective when format_json is True |
4 | ||
ensure_ascii |
bool |
Control whether non-ASCII characters are escaped to Unicode. When set to True, all non-ASCII characters will be escaped; False retains the original characters. This is only effective when format_json is True |
False |
||
save_to_img() |
Save the results as a file in image format | save_path |
str |
The file path for saving. When it is a directory, the saved file name will be consistent with the input file name | None |
| Attribute | Attribute Description |
|---|---|
json |
Get the prediction results in json format |
img |
Get the visualization image in dict format |