This is a "Human-In-The-Loop" machine learning tool for partially supervised image segmentation. The video shows a basic usage of doodler. 1) Annotate the scene with a few examples of each class (colorful buttons). 2) Check 'compute and show segmentation' and wait for the result. The label image is written to the 'results' folder
Here's a movie of Doodler in action:
Check out the Doodler website
Buscombe, D., Goldstein, E.B., Sherwood, C.R., Bodine, C., Brown, J.A., Favela, J., Fitzpatrick, S., Kranenburg, C.J., Over, J.R., Ritchie, A.C. and Warrick, J.A., 2021. Human‐in‐the‐Loop Segmentation of Earth Surface Imagery. Earth and Space Science, p.e2021EA002085https://doi.org/10.1029/2021EA002085
There are many great tools for exhaustive (i.e. whole image) image labeling for segmentation tasks, using polygons. Examples include makesense.ai and cvat. However, for high-resolution imagery with large spatial footprints and complex scenes, such as aerial and satellite imagery, exhaustive labeling using polygonal tools can be prohibitively time-consuming. This is especially true of scenes with many classes of interest, and covering relatively small, spatially discontinuous regions of the image.
What is generally required in the above case is a semi-supervised tool for efficient image labeling, based on sparse examples provided by a human annotator. Those sparse annotations are used by a secondary automated process to estimate the class of every pixel in the image. The number of pixels annotated by the human annotator is typically a small fraction of the total pixels in the image.
Doodler
is a tool for sparse, not exhaustive, labeling. The approach taken here is to freehand label only some of the scene, then use a model to complete the scene. Sparse annotations are provided to a Multilayer Perceptron model for initial predictions, refined by a Conditional Random Field (CRF) model, that develops a scene-specific model for each class and creates a dense (i.e. per pixel) label image based on the information you provide it. This approach can reduce the time required for detailed labeling of large and complex scenes by an order of magnitude or more. Your annotations are first used to train and apply a random forest on the entire image, then a CRF is used to refine labels further based on the underlying image.
This is python software that is designed to be used from within a conda
environment. After setting up that environment, create a classes.txt
file that tells the program what classes will be labeled (and what buttons to create). The minimum number of classes is 2. The maximum number of classes allowed is 24. The images that you upload will go into the assets/
folder. The labels images you create are written to the results
folder.
Package maintainers:
- @dbuscombe-usgs Marda Science / USGS Pacific Coastal and Marine Science Center. Developed originally for the USGS Coastal Marine Geology program, as part of the Florence Supplemental project
Contributions:
Doodler is based on code previously contained in the "doodle_labeller" repository which implements a similar algorithm in OpenCV. The Conditional Random Field (CRF) model used by this tool is described by Buscombe and Ritchie (2018). Inspired by this plotly example and the previous openCV based implementation doodle_labeller, that actually has origins in a USGS CDI-sponsored class I taught in summer of 2018, called dl-tools. So, it's been a 3+ year effort!
Check out the installation guide on the Doodler website
We advise creating a new conda environment to run the program.
- Clone the repo:
git clone --depth 1 https://github.com/Doodleverse/dash_doodler.git
(--depth 1
means "give me only the present code, not the whole history of git commits" - this saves disk space, and time)
- Create and activate a conda environment called
dashdoodler
conda create --name dashdoodler python=3.8
conda activate dashdoodler
- Install the dependencies:
conda install -c conda-forge pydensecrf cairo cairosvg scikit-learn scikit-image psutil dash flask-caching requests pandas matplotlib ipython tqdm
pip install doodler-engine
If the above doesn't work, try this:
conda env create --file environment/dashdoodler.yml
conda activate dashdoodler
and good luck to you!
Check out the user guide on the Doodler website
Move your images into the assets
folder. For the moment, they must be jpegs with the .jpg
(or JPG
or jpeg
) extension. Support for other image types forthcoming ...
Run the app. An IP address where you can view the app in your browser will be displayed in the terminal. Some browsers will launch automatically, while others you may have to manually type (or copy/paste) the IP address into a browser. Tested so far with Chrome, Firefox, and Edge.
python doodler.py
Open a browser and go to 127.0.0.1:8050. You may have to hit the refresh button. If, after some time doodling things seem odd or buggy, sometimes a browser refresh will fix those glitches.
Videos showing Doodler in action:
To use the labels in their native class sets (that vary per image), use the gen_images_and_labels.py
script as described below. To use the labels in remapped classes (standardized across image sets), use the gen_remapped_images_and_labels.py
script described below.
Doodler is compatible with the partner segmentation program, Segmentation Gym in a couple of different ways:
-
You could run the function
gen_npz_4gym.py
to create npz files that contain only image and label pairs. This is the same output as you would get from running the Gym program `make_nd_datasets.py' -
You could alternatively run the function
gen_images_and_labels.py
that would generate jpeg greyscale image files and label image jpegs for use with the Gym program `make_nd_datasets.py'. -
Finally, you could run the function
gen_remapped_images_and_labels.py
that would generate jpeg greyscale image files and remapped label image jpegs for use with the Gym program `make_nd_datasets.py'. Labels are remapped based on a dictionary of class aliases and a list of classes present, using a special config file. To remap Coast Train data, use the config files provided here
The first scenario might be most common because it requires one less step, however the second scenario might be useful for using the labels with another software package, or for further post-processing of the labels
There are additional scripts in the utils
folder:
-
viz_npz.py
creates transparent overlay plots of images and labels, and has three modes with the following syntaxviz_npz.py [-t npz type {0}/1/2]
where optional-t
controls what type of npz file: native from doodler (option 0, default), alabelgen
file fromplot_label_generation.py
, a npz file used as input for Gym -
plot_label_generation.py
that generates a detailed sequence of plots for every input npz file from doodler, including plots of the doodles themselves, overlays, and internal model outputs. -
gen_overlays_from_images_and_labels.py
that generates color overlay figures from folders of images and greyscale labels -
gen_remapped_images_and_labels.py
that generates remapped label images from one class set to another
Please read our code of conduct
Please contribute to the Discussions tab - we welcome your ideas and feedback.
We also invite all to open issues for bugs/feature requests using the Issues tab
Contributions are welcome, and they are greatly appreciated! Credit will always be given.
Report bugs at https://github.com/Doodleverse/dash_doodler/issues.
Please include:
* Your operating system name and version.
* Any details about your local setup that might be helpful in troubleshooting.
* Detailed steps to reproduce the bug.
* the log file made by the program during the session, found in
Look through the GitHub issues for bugs. Anything tagged with "bug" and "help wanted" is open to whoever wants to implement it.
Look through the GitHub issues for features. Anything tagged with "enhancement" and "help wanted" is open to whoever wants to implement it.
We could always use more documentation, whether as part of the docs, in docstrings, or using this software in blog posts, articles, etc.
See the how to contribute section of the Doodler website
Ready to contribute? Here's how to set up for local development.
-
Fork the dash_doodler repo on GitHub.
-
Clone your fork locally:
$ git clone [email protected]:your_name_here/dash_doodler.git
Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed, this is how you set up your fork for local development:
$ cd dash_doodler/
$ conda env create --file install/dashdoodler.yml
$ conda activate dashdoodler
Create a branch for local development:
$ git checkout -b name-of-your-bugfix-or-feature
Now you can make your changes locally.
Commit your changes and push your branch to GitHub:
$ git add .
$ git commit -m "Your detailed description of your changes."
$ git push origin name-of-your-bugfix-or-feature
Submit a pull request through the GitHub website.
-
The entrypoint is
doodler.py
, which will first download sample imagery ifDOWNLOAD_SAMPLE=True
inenvironment\settings.py
. -
By default,
DOWNLOAD_SAMPLE=False
so imagery is not downloaded. -
The other variables in
environment\settings.py
are found in Dash'sapp.run_server()
documentation.HOST="127.0.0.1"`` (should be
#"0.0.0.0"` for web deployment)PORT="8050"
DEBUG=False
DEV_TOOLS_PROPS_CHECK=False
-
doodler.py
basically just calls and servesapp
, fromapp.py
- Loads classes and files and creates results folders and log file
- Creates the application layout and links all buttons to callback functions
- utility functions are in
app_files\src\app_funcs.py
- functions for drawing the imagery on the screen and making label overlays are in
app_files\src\plot_utils.py
- functions for converting SVG annotations to raster label annotations and segmentations are in
app_files\src\annotations_to_segmentations.py
- image segmentation/ML functions are in
app_files\src\image_segmentation.py
To build your own docker image based on miniconda continuumio/miniconda3
, called doodler_docker_image
:
docker build -t doodler_docker_image .
then when it has finished building (it takes a while), check its size
sudo docker image ls doodler_docker_image
It is large - 4.8 GB. Run it in a container called www
:
sudo docker run -p 8050:8050 -d -it --name www doodler_docker_image
The terminal will show no output, but you can see the process running a few different ways
Lists running containers:
docker ps
the container name will be at the end of the line of output of docker ps (images don't have logs; they're like classes)
docker logs [container_name] -
To stop and remove:
sudo docker stop www
sudo docker rm www
Please don't ask me about Docker - that's all I know. Please contribute Docker workflows and suggestions!