Here we present a real-time, interactive, open-vocabulary scene understanding tool. A user can type in an arbitrary query phrase like snoopy
(rare object), somewhere soft
(property), made of metal
(material), where can I cook?
(activity), festive
(abstract concept) etc, and the correponding regions are highlighted.
You can activate the openscene
environment, or simply make sure the following package installed:
torch
clip
numpy
The demo has been tested under Linux and Macbook.
First, download the demo data:
cd demo
wget https://cvg-data.inf.ethz.ch/openscene/demo/demo_data.zip
unzip demo_data.zip
Second, set up the demo with following commands:
# compile gaps library
cd gaps
make
# download and compile RNNets into gaps/pkgs/RNNets
cd pkgs
wget https://cvg-data.inf.ethz.ch/openscene/demo/RNNets.zip
unzip RNNets.zip
cd RNNets
make
# download and compile osview into gaps/apps/osview
# the executable will be in gaps/bin/x86_64/osview
cd ../apps
wget https://cvg-data.inf.ethz.ch/openscene/demo/osview.zip
unzip osview.zip
cd osview
make
Now, make sure you are under demo/
, and you can simply run to have fun with the interactive demo:
./run_demo
You might need to edit the run_demo
file to adapt the path to osview
.
Text query: type words/sentences directly into the window, hit enter.
Main commands:
Left-click
: set the center point for camera zoom and rotate (important)Right-click
: move the mesh with translationEsc
: remove the current queryAlt-c
: change the color schemeCtrl-q
: quit
Other commands: Look in the Keyboard() function in gaps to see various alt- commands to toggle displays
Coming soon.
- If you get the error
OSError: Address already in use
, you might need to change another port inclip_server.py
. - Alternatively, a more elegent way is to find/kill the process id via netstat, as suggested in this issue (thanks dbuck!).
- For Mac users, you might need to change inside
run_demo
accordingly fromx86_64
toarm64
.
For additional help, please refer to the code documentation or contact the author.