--- comments: true --- # Open-Vocabulary Object Detection Module Usage Tutorial ## I. Overview Open-vocabulary object detection is an advanced object detection technology aimed at overcoming the limitations of traditional object detection. Traditional methods can only recognize objects within predefined categories, while open-vocabulary object detection allows models to identify objects not seen during training. By integrating natural language processing techniques and using text descriptions to define new categories, the model can recognize and locate these new objects. This makes object detection more flexible and generalizable, with significant application potential. ## II. List of Supported Models
| Model | Model Download Link | mAP(0.5:0.95) | mAP(0.5) | GPU Inference Time (ms) [Normal Mode / High-Performance Mode] |
CPU Inference Time (ms) | Model Storage Size (M) | Introduction |
|---|---|---|---|---|---|---|---|
| GroundingDINO-T | Inference Model | 49.4 | 64.4 | 253.72 | 1807.4 | 658.3 | This is an open-vocabulary object detection model trained on the O365, GoldG, and Cap4M datasets. It uses Bert for text encoding and DINO for the visual model, with additional cross-modal fusion modules, achieving good performance in open-vocabulary object detection. |
[xmin, ymin, xmax, ymax].
The visualization image is as follows:
Note: Due to network issues, the parsing of the above URL may not have been successful. If you need the content of this webpage, please check the validity of the URL and try again.
Related methods, parameters, and explanations are as follows:
* `create_model` instantiates an open-vocabulary object detection model (using `GroundingDINO-T` as an example). The specific explanations are as follows:
| Parameter | Parameter Description | Parameter Type | Options | Default Value |
|---|---|---|---|---|
model_name |
The name of the model | str |
None | None |
model_dir |
The storage path of the model | str |
None | None |
thresholds |
The filtering thresholds used by the model | dict/None |
None | None |
| Parameter | Parameter Description | Parameter Type | Options | Default Value |
|---|---|---|---|---|
input |
Data to be predicted, supporting multiple input types | Python Var/str/list |
|
None |
batch_size |
Batch size | int |
Any integer | 1 |
thresholds |
The filtering thresholds used by the model | dict/None |
|
None |
prompt |
The prompt used by the model for prediction | str |
Any string | 1 |
| Method | Method Description | Parameter | Parameter Type | Parameter Description | Default Value |
|---|---|---|---|---|---|
print() |
Print the results to the terminal | format_json |
bool |
Whether to format the output content using JSON indentation |
True |
indent |
int |
Specify the indentation level to beautify the output JSON data and make it more readable. This is only effective when format_json is True |
4 | ||
ensure_ascii |
bool |
Control whether non-ASCII characters are escaped to Unicode. When set to True, all non-ASCII characters will be escaped; False retains the original characters. This is only effective when format_json is True |
False |
||
save_to_json() |
Save the results as a file in JSON format | save_path |
str |
The file path for saving. When it is a directory, the saved file name will be consistent with the input file name | None |
indent |
int |
Specify the indentation level to beautify the output JSON data and make it more readable. This is only effective when format_json is True |
4 | ||
ensure_ascii |
bool |
Control whether non-ASCII characters are escaped to Unicode. When set to True, all non-ASCII characters will be escaped; False retains the original characters. This is only effective when format_json is True |
False |
||
save_to_img() |
Save the results as a file in image format | save_path |
str |
The file path for saving. When it is a directory, the saved file name will be consistent with the input file name | None |
| Attribute | Attribute Description |
|---|---|
json |
Get the prediction results in json format |
img |
Get the visualization image in dict format |