Skip to content
/ RSVIS Public

(TMI-2024) Video-Instrument Synergistic Network for Referring Video Instrument Segmentation in Robotic Surgery

License

Notifications You must be signed in to change notification settings

whq-xxh/RSVIS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Video-Instrument Synergistic Network for Referring Video Instrument Segmentation in Robotic Surgery

We are excited to announce that our paper was accepted for publication at IEEE TMI 2024! 🥳🥳🥳

This repository contains the implementation of our paper. You can access the paper here.

Introduction 📑

This project introduces a new setting in surgical image segmentation, termed Referring Surgical Video Instrument Segmentation (RSVIS). RSVIS aims to automatically identify and segment the target surgical instruments from each video frame, referred by a given language expression, in a more natural and flexible way of human-computer interaction.

Fig. 1. Comparison of (a) existing instrument segmentation task and (b) our referring surgical video instrument segmentation (RSVIS).

1722049397169

How to Run the Code 🛠

Environment Installation

conda create --name RSVIS --file requirements.txt

conda activate RSVIS

Model training

python main.py -rm train -c configs/RS17.yaml -ws 3 -bs 5 -gids 1

-rm means the running model, -ws means window size, -bs means the training batch size per GPU, -gids means the GPU id. For the folders 'pretrained_swin_transformer' (swin_tiny_patch244_window877_kinetics400_1k.pth) and 'roberta-base' (pytorch_model.bin), two pretrained weights need to be placed inside respectively. you can download in Google Drive.

Others

For object detection, please refer to YoloV5, DETR, DINO. We also referenced parts of the MTTR code (an excellent project), and we acknowledge the contribution of the above projects. Since the code is from a long time ago and we have tried many variations, I uploaded a preliminary version first and will sort it out later with more accuracy.

I acknowledge that from my perspective, the work isn't perfect and there's room for improvement. To satisfy the demands of the major revision period, the content of the paper has also become longer and more tedious. However, work is only part of our life and everyone needs to eat, I've done my best with open-source code and data—let's show a little patience so we can all thrive together.

Revisiting data and code from long ago isn't a walk in the park 😴 (the paper takes months to publish). Got questions? Just ping me—let's make improvements, no gripes, skip the scolding, please! 🫡 📮: [email protected] (Wechat: whqqq7).

Dataset 📊

The datasets have been organized!

Please contact Hongqiu ([email protected]) for the dataset. One step is needed to download the dataset: **1) Use your Google email to apply for the download permission (Google Drive BaiduPan). We will get back to you within three days, so please don't send them multiple times. We just handle the real-name email and your email suffix must match your affiliation. The email should contain the following information:

Name/Homepage/Google Scholar: (Tell us who you are.)
Primary Affiliation: (The name of your institution or university, etc.)
Job Title: (E.g., Professor, Associate Professor, Ph.D., etc.)
Affiliation Email: (the password will be sent to this email, we just reply to the email which is the end of "edu".)
How to use: (Only for academic research, not for commercial use or second-development.)

The data set is stored as follows:

RSVIS/
└── EndoVis-RS18/ 
    ├── train/
    │   ├── JPEGImages/
    │   │   └── */ (video folders)
    │   │       └── *.png (frame image files) 
    │   └── Annotations/
    │       └── */ (video folders)
    │           └── *.png (mask annotation files) 
    ├── valid/
    │   ├── JPEGImages/
    │   │   └── */ (video folders)
    │   │       └── *.png (frame image files) 
    │   └── Annotations/
    │       └── */ (video folders)
    │           └── *.png (mask annotation files) 
    └── meta_expressions/
        ├── train/
        │   └── meta_expressions.json  (text annotations)
        └── valid/
            └── meta_expressions.json  (text annotations)

We build our RSVIS dataset based on previous works. We acknowledge with gratitude the organizers of the previous two challenge competitions. To access the raw surgical video data, please see EndoVis2017 and EndoVis2018. If you utilize these data, please remember to cite their respective papers.

Citation 📖

If you find our work useful or relevant to your research, please consider citing:

@article{wang2024video,
  title={Video-instrument synergistic network for referring video instrument segmentation in robotic surgery},
  author={Wang, Hongqiu and Yang, Guang and Zhang, Shichen and Qin, Jing and Guo, Yike and Xu, Bo and Jin, Yueming and Zhu, Lei},
  journal={IEEE Transactions on Medical Imaging},
  year={2024},
  publisher={IEEE}
}

Releases

No releases published

Packages

No packages published

Languages