GitHub

Todo

Modify all function calls to be the same as OneFormer
Add SAM and HQ-SAM
Support DINOSeg with FastSAM and HQ-SAM
Create a Colab demo
Add a function to utils for obtaining bbox from mask

Prerequisites

Before you begin, ensure you have met the following requirements:

Operating System: Linux
Python Version: 3.10
CUDA Version: 12.1

Additionally, you will need the following packages:

PyTorch: 2.1.0
Torchvision: 0.16.0
Wheel: 0.42.0

For detailed installation instructions for PyTorch and Torchvision, refer to the PyTorch previous versions documentation.

Installation Commands

First, install the specific version of Wheel:

pip install wheel==0.42.0

Next, install PyTorch and Torchvision using the following commands:

pip install torch==2.1.0 torchvision==0.16.0 --index-url https://download.pytorch.org/whl/cu121

For additional tools and libraries, execute:

pip install git+https://github.com/qwertyroiro/segment_tools.git

Ensure all the prerequisites are properly installed to avoid any compatibility issues during the setup process.

Usage

Colab Demo

Image Preparation

from PIL import Image
import numpy as np
import segment_tools as st

image_path = "cityscapes.jpg"
image_pil = Image.open(image_path)  # Open image with Pillow
image_np = np.array(image_pil)      # Convert to numpy array

Managing Log Verbosity(Optional)

import logging
logging.getLogger("fvcore").setLevel(logging.ERROR)
logging.getLogger("detectron2").setLevel(logging.ERROR)
logging.getLogger("ultralytics").setLevel(logging.ERROR)
logging.getLogger("dinov2").setLevel(logging.ERROR)

Define Prompt

prompt = "car"  # Define your prompt

Segment Tools Usage

FastSAM

# Segment without prompt
fastsam = st.FastSAM()
result = fastsam.run(image_np)
if result is not None:
    image, ann = result["image"], result["mask"]

# Segment with prompt
fastsam = st.FastSAM()
result = fastsam.run(image_np, prompt)
if result is not None:
    image, ann = result["image"], result["mask"]

Additional Notes

For FastSAM, the ann (annotation) format is such that non-mask areas are represented by 0 and mask areas are represented by 1.

CLIPSeg

clipseg = st.CLIPSeg()
result = clipseg.run(image_np, prompt)
if result is not None:
    image, ann = result["image"], result["mask"]

Additional Notes

For CLIPSeg, the ann (annotation) format is such that non-mask areas are represented by 0 and mask areas are represented by 1.

DINO

dino = st.DINO()
result = dino.run(image_np, prompt)
if result is not None:
    image, bbox = result["image"], result["bbox"]

DINOSeg

dinoseg = st.DINOSeg(sam_checkpoint="vit_h")
result = dinoseg.run(image_np, prompt)
if result is not None:
    image, ann = result["image"], result["mask"]

Additional Notes

For DINOSeg, the ann (annotation) format is such that non-mask areas are represented by 0 and mask areas are represented by 1.

OneFormer Variants

OneFormer (ADE20K Dataset)

oneformer_ade20k = st.OneFormer(dataset="ade20k")
result = oneformer_ade20k.run(image_np)
if result is not None:
    image, ann, info = result["image"], result["mask"], result["info"]

# With SWIN Transformer
oneformer_ade20k_swin = st.OneFormer(dataset="ade20k", use_swin=True)
result = oneformer_ade20k_swin.run(image_np)
if result is not None:
    image, ann, info = result["image"], result["mask"], result["info"]

# Using prompt
result = oneformer_ade20k.run(image_np, prompt)
if result is not None:
    image, ann = result["image"], result["mask"]
result = oneformer_ade20k_swin.run(image_np, prompt)
if result is not None:
    image, ann = result["image"], result["mask"]

OneFormer (Cityscapes Dataset)

oneformer_city = st.OneFormer(dataset="cityscapes")
result = oneformer_city.run(image_np)
if result is not None:
    image, ann, info = result["image"], result["mask"], result["info"]

# With SWIN Transformer
oneformer_city_swin = st.OneFormer(dataset="cityscapes", use_swin=True)
result = oneformer_city_swin.run(image_np)
if result is not None:
    image, ann, info = result["image"], result["mask"], result["info"]

# Using prompt
result = oneformer_city.run(image_np, prompt)
if result is not None:
    image, ann = result["image"], result["mask"]
result = oneformer_city_swin.run(image_np, prompt)
if result is not None:
    image, ann = result["image"], result["mask"]

OneFormer (COCO Dataset)

oneformer_coco = st.OneFormer(dataset="coco")
result = oneformer_coco.run(image_np)
if result is not None:
    image, ann, info = result["image"], result["mask"], result["info"]

# With SWIN Transformer
oneformer_coco_swin = st.OneFormer(dataset="coco", use_swin=True)
result = oneformer_coco_swin.run(image_np)
if result is not None:
    image, ann, info = result["image"], result["mask"], result["info"]

# Using prompt
result = oneformer_coco.run(image_np, prompt)
if result is not None:
    image, ann = result["image"], result["mask"]
result = oneformer_coco_swin.run(image_np, prompt)
if result is not None:
    image, ann = result["image"], result["mask"]

Additinal Notes

For OneFormer without a prompt, the info variable contains information about the segmented objects. Each entry in info includes details such as id, isthing, category_id, area, and class. The ann (annotation) format is such that non-mask areas are represented by 0.
For OneFormer with a prompt, the ann (annotation) format is such that mask areas are represented by 1 and non-mask areas are represented by 0.
The use_swin=True parameter enables the use of the Swin Transformer as the backbone for the OneFormer models.

Depth Anything

depth_model = st.Depth_Anything(encoder="vitl") # vits or vitb or vitl
result = depth_model.run(image)
if result is not None:
    image, depth = result["image"], result["depth"]

DINOv2 (depth estimation) (CPU is not supported)

depth_model = st.DINOv2_depth(BACKBONE_SIZE="base") # small, base, large, giant
result = depth_model.run(image)
if result is not None:
    depth_img, depth = result["image"], result["depth"]

Additional Notes

The run method can be called with or without a prompt for all OneFormer variants.
The image and ann (annotations) are obtained from the result dictionary, which is the output from the segmentation models.
If result is None, it indicates that the segmentation process was not successful. This could be due to various reasons such as incorrect input data or model limitations. It is important to handle this case in your code to avoid errors.

Name		Name	Last commit message	Last commit date
Latest commit History 144 Commits
image_dir		image_dir
src/segment_tools		src/segment_tools
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
all_nonprompt_test.py		all_nonprompt_test.py
all_prompt_test.py		all_prompt_test.py
all_test.py		all_test.py
all_test_new.py		all_test_new.py
cityscapes.png		cityscapes.png
multimask_test.py		multimask_test.py
pyproject.toml		pyproject.toml
segment_tools_demo.ipynb		segment_tools_demo.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Todo

Prerequisites

Installation Commands

Usage

Image Preparation

Managing Log Verbosity(Optional)

Define Prompt

Segment Tools Usage

FastSAM

Additional Notes

CLIPSeg

Additional Notes

DINO

DINOSeg

Additional Notes

OneFormer Variants

OneFormer (ADE20K Dataset)

OneFormer (Cityscapes Dataset)

OneFormer (COCO Dataset)

Additinal Notes

Depth Anything

DINOv2 (depth estimation) (CPU is not supported)

Additional Notes

About

Releases 1

Packages

Languages

qwertyroiro/segment_tools

Folders and files

Latest commit

History

Repository files navigation

Todo

Prerequisites

Installation Commands

Usage

Image Preparation

Managing Log Verbosity(Optional)

Define Prompt

Segment Tools Usage

FastSAM

Additional Notes

CLIPSeg

Additional Notes

DINO

DINOSeg

Additional Notes

OneFormer Variants

OneFormer (ADE20K Dataset)

OneFormer (Cityscapes Dataset)

OneFormer (COCO Dataset)

Additinal Notes

Depth Anything

DINOv2 (depth estimation) (CPU is not supported)

Additional Notes

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages