This script, convert_search.py
, is designed to automate the conversion of bounding box annotations to segmentation masks for image datasets. It leverages the SAM (Segment Anything Model) for generating high-quality segmentation masks from images and their corresponding bounding box annotations.
Create a conda environment from the conda_env.yml
or:
- Python 3.8+
- OpenCV
- NumPy
- PIL
- PyTorch
- tqdm
- SAM Model Checkpoint (e.g.,
sam_vit_h_4b8939.pth
)
- Download the SAM Model checkpoint file (
sam_vit_h_4b8939.pth
) and place it in the root directory of the project.
-
Prepare your dataset with images and their corresponding YOLO format label files in the same directory. The script expects the following directory structure:
datasets ├── Batch_4 ├── Batch_6 └── temp
-
Modify the
ROOT_FILEPATH
in the script to point to the root directory of your dataset. By default, it is set to"./datasets/temp/"
. -
Run the script with Python:
python convert_search.py
-
The script will process each directory containing
.jpg
files, generate segmentation masks, and save them along with the original images in a new directory suffixed with_converted_to_segments
.
- Directory Traversal: Automatically finds directories containing
.jpg
images and processes them. - Segmentation Mask Generation: Uses the SAM model to generate segmentation masks from bounding box annotations.
- Validation Masks and Videos: Optionally, saves validation masks as images and compiles them into a video for easy review.
- SAM Model and Device: Configure the SAM model checkpoint and the device (
cuda
orcpu
) at the beginning of the script. - Exclusion of Directories: Directories ending with
_converted_to_segments
are automatically excluded from processing to avoid duplication.
- Ensure the SAM Model checkpoint matches the model architecture specified in the script.
- The script includes utilities for parsing label files, converting bounding boxes, and saving masks, which can be customized as needed.
Feel free to fork the repository, make improvements, and submit pull requests. We appreciate your contributions to enhancing this tool.
This project is open-sourced under the MIT License. See the LICENSE file for more details.