Releases: collectionslab/Omniscribe
Omniscribe
This is a command line interface for detecting annotations in IIIF-hosted printed books. Give the script a list of IIIF manifest URLs (either saved locally or hosted elsewhere) and it will generate a resultsManifest.json
file that is IIIF-compliant that can be displayed via IIIF Viewers like Mirador. Other outputs available are an HTML gallery and plain text file.
Release notes
Support for smaller-than-max inference images
Sometimes IIIF image servers limit the dimensions of the largest "full" image they make available (so people can't just download the full-res version); the code to highlight the detected handwriting regions via IIIF annotations now handles these situations.
There's a new command-line option, --max_pages=N
, which can be used to avoid processing all of the images/pages on a given manifest, if for example the manifest is really long and you only want to detect handwriting on the first N pages.
The generated IIIF annotations have been streamlined considerably, thanks to suggestions from @glenrobson.
Add option to generate IIIF annotation lists
If the user specifies --annotate
when running inferencer.py
with the --manifest
flag, IIIF annotation list files are created in the annotations/
folder and referenced in the resulting manifest file. IIIF-compatible viewers like Mirador can visualize the detected handwriting/notes on the input images when they load the generated manifest.
Other notes/contributions
There's also an optional --iiif_root
argument to specify the web address where the manifest of the detected annotations will be posted, with the annotations/
folder for the optional IIIF annotation lists within it.
The detected handwriting/notes are highlighted in the IIIF annotation overlays when viewed in Mirador via a dashed rectangle bounding box with a dashed mask path within it. Mousing over the detected annotations displays a tag and the confidence level.
Installing Omniscribe
Requires Python 3.6.x
- Download the Source Code package, unzip it, and save the
Omniscribe-1.1
folder to your local machine or server. - Download the
model.h5
and save to theOmniscribe-1.1
folder. - Using the command line, navigate to the
Omniscribe-1.1
folder. - Install dependencies by running the command
pip install -r requirements.txt
.
NOTE: We recommend setting up a virtual environment to install Omniscribe. For more information on setting up a virtual environment, please refer to https://packaging.python.org/guides/installing-using-pip-and-virtualenv/ up to the Leaving the virtualenv section of this documentation.
Usage
Run inferencer.py
with the manifest URLs where you wish to detect annotations:
python3 inferencer.py [ export ] [ confidence ] [ manifest-url/path ]
Export options:
--manifest
- exportsresultsManifest.json
, a IIIF manifest listing the images with detected annotations.--text
- exportsresultsURIs.txt
, a text file that contains URLs of images with detected annotations.--html
- exportsresultsImages.html
, a simple HTML gallery of images with detected annotations.--max_pages=N
- limits the number of images/pages processed in a given manifest, if for example the manifest is really long and you only want to detect handwriting on the first N pages.--annotate
(when runninginferencer.py
with the--manifest
flag) - IIIF annotation list files are created in theannotations/
folder and referenced in the resulting manifest file.--iiif_root
- specify the web address where the manifest of the detected annotations will be posted, with theannotations/
folder for the optional IIIF annotation lists within it.
The default export format is resultsManifest.json
if no export options are specified.
--confidence=VALUE
- adjust this value for any values between 0 and 1 (inclusive).
E.g. --confidence=0.91
sets the threshold to 0.91. This means that any region that receives a score of 0.91 or higher from our model will be inferred as an annotation.
The default confidence level is 0.95 if no confidence value is specified.
Gauging a "Good" Confidence Value
We found that marginalia are often detected with a confidence value of 0.90 and higher, but detecting interlinear annotations require lower confidence values, somewhere between 0.70-0.85. This means that setting a confidence score of --confidence=0.90
will detect marginalia, but will be less effective in detecting interlinear annotations since these often receive scores below the threshold of 0.90. Setting a confidence score of --confidence=0.70
will detect both interlinear annotations and marginalia (as both types of annotations will receive scores that are equal or higher than the confidence score); however, using the lower confidence threshold will likely result in more false positives.
Operating on Multiple Manifests
The manifests can be hosted or local IIIF manifest files. You can input multiple manifest URLs or paths, and the application will crawl through all the images from each manifest such that the resulting export is a single conglomerate of all the sub-results from every manifest.
https://marinus.library.ucla.edu/iiif/annotated/uclaclark_SB322S53-short.json
path/to/a/localManifestFile.json
Example Command Lines
python3 inferencer.py --manifest --confidence=0.93 manifest1.json
python3 inferencer.py --html --confidence=0.90 manifest1.json
python3 inferencer.py --text --confidence=0.94 manifest1.json
python3 inferencer.py --manifest --html --confidence=0.92 manifest1.json
python3 inferencer.py --text --manifest --confidence=0.97 manifest1.json
python3 inferencer.py --html --text --confidence=0.93 manifest1.json
python3 inferencer.py --html --manifest --text --confidence=0.91 manifest1.json
python3 inferencer.py --confidence=0.95 manifest1.json
python3 inferencer.py --text manifest1.json
python3 inferencer.py manifest1.json
python3 inferencer.py --manifest --text --html --confidence=0.96 manifest1.json manifest2.json
python3 inferencer.py --manifest --text manifest1.json manifest2.json manifest3.json manifest4.json
Note that omitting the confidence option will be interpreted as setting the confidence score to 0.95. Additionally, omitting all export options will be interpreted as setting the export to a manifest file.
Collecting the Results
After inferencer.py
is done processing all the images, you will see the message Finished detecting annotations
.
All the export files will be saved in the Omniscribe-1.1
folder.
Sample Images
Command Line will typically display this as it processes through all the images.
TensorFlow may automatically use any available GPUs to do the predictions on the images as shown below.
Displaying resultsManifest.json
through Mirador, an image viewing client that supports IIIF.
Omniscribe v1.0.2
Thanks for using our package! This is a command line interface for detecting annotations in IIIF-hosted printed books. Give the script a list of IIIF manifest URLs (either saved locally or hosted elsewhere) and it will generate a resultsManifest.json
file that is IIIF-compliant that can be displayed via IIIF Viewers like Mirador. Other outputs available are an HTML gallery and plain text file.
Installing Omniscribe
Requires Python 3.6.x
- Download the Source Code package, unzip it, and save the
Omniscribe-1.0.2
folder to your local machine or server. - Download the
model.h5
and save to theOmniscribe-1.0.2
folder. - Using the command line, navigate to the
Omniscribe-1.0.2
folder. - Install dependencies by running the command
pip install -r requirements.txt
.
NOTE: We recommend setting up a virtual environment to install Omniscribe. For more information on setting up a virtual environment, please refer to https://packaging.python.org/guides/installing-using-pip-and-virtualenv/ up to the Leaving the virtualenv section of this documentation.
Usage
Run inferencer.py
with the manifest URLs where you wish to detect annotations:
python3 inferencer.py [ export ] [ confidence ] [ manifest-url/path ]
Export options:
--manifest
- exportsresultsManifest.json
, a IIIF manifest listing the images with detected annotations.--text
- exportsresultsURIs.txt
, a text file that contains URLs of images with detected annotations.--html
- exportsresultsImages.html
, a simple HTML gallery of images with detected annotations.
The default export format is resultsManifest.json
if no export options are specified.
--confidence=VALUE
- adjust this value for any values between 0 and 1 (inclusive).
E.g. --confidence=0.91
sets the threshold to 0.91. This means that any region that receives a score of 0.91 or higher from our model will be inferred as an annotation.
The default confidence level is 0.95 if no confidence value is specified.
Gauging a "Good" Confidence Value
We found that marginalia are often detected with a confidence value of 0.90 and higher, but detecting interlinear annotations require lower confidence values, somewhere between 0.70-0.85. This means that setting a confidence score of --confidence=0.90
will detect marginalia, but will be less effective in detecting interlinear annotations since these often receive scores below the threshold of 0.90. Setting a confidence score of --confidence=0.70
will detect both interlinear annotations and marginalia (as both types of annotations will receive scores that are equal or higher than the confidence score); however, using the lower confidence threshold will likely result in more false positives.
Operating on Multiple Manifests
The manifests can be hosted or local IIIF manifest files. You can input multiple manifest URLs or paths, and the application will crawl through all the images from each manifest such that the resulting export is a single conglomerate of all the sub-results from every manifest.
https://marinus.library.ucla.edu/iiif/annotated/uclaclark_SB322S53-short.json
path/to/a/localManifestFile.json
Example Command Lines
python3 inferencer.py --manifest --confidence=0.93 manifest1.json
python3 inferencer.py --html --confidence=0.90 manifest1.json
python3 inferencer.py --text --confidence=0.94 manifest1.json
python3 inferencer.py --manifest --html --confidence=0.92 manifest1.json
python3 inferencer.py --text --manifest --confidence=0.97 manifest1.json
python3 inferencer.py --html --text --confidence=0.93 manifest1.json
python3 inferencer.py --html --manifest --text --confidence=0.91 manifest1.json
python3 inferencer.py --confidence=0.95 manifest1.json
python3 inferencer.py --text manifest1.json
python3 inferencer.py manifest1.json
python3 inferencer.py --manifest --text --html --confidence=0.96 manifest1.json manifest2.json
python3 inferencer.py --manifest --text manifest1.json manifest2.json manifest3.json manifest4.json
Note that omitting the confidence option will be interpreted as setting the confidence score to 0.95. Additionally, omitting all export options will be interpreted as setting the export to a manifest file.
Collecting the Results
After inferencer.py
is done processing all the images, you will see the message Finished detecting annotations
.
All the export files will be saved in the Omniscribe-1.0.2
folder.
Sample Images
Command Line will typically display this as it processes through all the images.
TensorFlow may automatically use any available GPUs to do the predictions on the images as shown below.
Displaying resultsManifest.json
through Mirador, an image viewing client that supports IIIF.
Omniscribe
Thanks for using our package! This is a command line interface for detecting annotations in IIIF-hosted printed books. Give the script a list of IIIF manifest URLs (either saved locally or hosted elsewhere) and it will generate a resultsManifest.json
file that is IIIF-compliant that can be displayed via IIIF Viewers like Mirador. Other outputs available are an HTML gallery and plain text file.
Installing Omniscribe
Requires Python 3.6.x
- Download the Source Code package, unzip it, and save the
Omniscribe-1.0.1
folder to your local machine or server. - Download the
model.h5
and save to theOmniscribe-1.0.1
folder. - Using the command line, navigate to the
Omniscribe-1.0.1
folder. - Install dependencies by running the command
pip install -r requirements.txt
.
NOTE: we recommend setting up a virtual environment to install these dependencies in. For more information on setting up a virtual environment, please refer to https://packaging.python.org/guides/installing-using-pip-and-virtualenv/ up to the Leaving the virtualenv section of this documentation.
Usage
Run inferencer.py
with the manifest URLs where you wish to detect annotations:
python3 inferencer.py [ export ] [ confidence ] [ manifest ]
Export options:
--manifest
- exportsresultsManifest.json
, a IIIF manifest that contains images with annotations.--text
- exportsresultsURIs.txt
, a text file that contains URIs of images with annotations.--html
- exportsresultsImages.html
, a simple HTML gallery of images with annotations.
The default export format is resultsManifest.json
if no export options are specified.
--confidence=VALUE
- adjust this value for any values between 0 and 1 (inclusive).
E.g. --confidence=0.91
sets the threshold to 0.91. This means that any region that receives a score of 0.91 or higher from our model will be inferred as an annotation.
The default confidence level is 0.95 if no confidence value is specified.
Gauging a "Good" Confidence Value
We found in our experience that interlinear annotations get picked up with values 0.70-0.85, while marginalia are often 0.90+. That is, setting a confidence score of --confidence=0.70
will pick up both interlinear annotations and marginalia (as both types of annotations will receive scores that are equal or higher than the confidence score); in turn, setting a confidence score of --confidence=0.90
will only detect marginalia (as interlinear annotations often receive scores below the threshold of 0.90).
NOTE: The lower the confidence threshold, the more likely the model may give more false positives (accuracy increases, precision decreases). Similarly, increasing the confidence threshold might give less true positives (accuracy decreases, precision increases).
Operating on Multiple Manifests
The manifests can be hosted or local IIIF manifest files. You can input multiple manifest URLs or paths, and the application will crawl through all the images from each manifest such that the resulting export is a single conglomerate of all the sub-results from every manifest.
https://marinus.library.ucla.edu/iiif/annotated/uclaclark_SB322S53-short.json
path/to/a/localManifestFile.json
Example Command Lines
python3 inferencer.py --manifest --confidence=0.93 manifest1.json
python3 inferencer.py --html --confidence=0.90 manifest1.json
python3 inferencer.py --text --confidence=0.94 manifest1.json
python3 inferencer.py --manifest --html --confidence=0.92 manifest1.json
python3 inferencer.py --text --manifest --confidence=0.97 manifest1.json
python3 inferencer.py --html --text --confidence=0.93 manifest1.json
python3 inferencer.py --html --manifest --text --confidence=0.91 manifest1.json
python3 inferencer.py --confidence=0.95 manifest1.json
python3 inferencer.py --text manifest1.json
python3 inferencer.py manifest1.json
python3 inferencer.py --manifest --text --html --confidence=0.96 manifest1.json manifest2.json
python3 inferencer.py --manifest --text manifest1.json manifest2.json manifest3.json manifest4.json
Note that omitting the confidence option will be interpreted as setting the confidence score to 0.95. Additionally, omitting all export options will be interpreted as setting the export to a manifest file.
Collecting the Results
After inferencer.py
is done processing all the images, you will see the message Finished detecting annotations
.
All the export files will be saved in the Omniscribe-1.0.1
folder.
Sample Images
Command Line will typically display this as it processes through all the images.
TensorFlow may automatically use any available GPUs to do the predictions on the images as shown below.
Displaying resultsManifest.json
through Mirador, an image viewing client that supports IIIF.
Omniscribe
Thanks for using our package! This is a command line interface for detecting annotations in IIIF-hosted printed books. Give the script a list of IIIF manifest URLs (either saved locally or hosted elsewhere) and it will generate a resultsManifest.json
file that is IIIF-compliant that can be displayed via IIIF Viewers like Mirador. Other outputs available are an HTML gallery and plain text file.
Install Omniscribe
Requires Python 3.6.x
- Download the Source Code package, unzip it, and save the
Omniscribe-1.0
folder to your local machine or server. - Download the
model.h5
and save to theOmniscribe-1.0
folder. - Using the command line, navigate to the
Omniscribe-1.0
folder. - Install dependencies by running the command
pip3 install -r requirements.txt
.
Usage
Run inferencer.py
with the manifest URLs where you wish to detect annotations:
python3 inferencer.py [ output ] [ confidence ] [ manifest ]
Output options:
--manifest
(default) - outputs a IIIF manifest--text
- outputs a text file--html
- outputs an HTML gallery
The default confidence level is 0.95. If you wish to change the confidence level, you can use:
--confidence=0.91
- adjust the percent value as needed
The manifests can be hosted or local IIIF manifest files. You can input multiple manifest URLs or paths.
https://marinus.library.ucla.edu/iiif/annotated/uclaclark_SB322S53-short.json
path/to/a/localManifestFile.json
After inferencer.py
is done processing all the images, you will see the message `Finished detecting annotations'.
All the export files will be saved in the Omniscribe-1.0
folder.
Sample Images:
Command Line will typically display this as it processes through all the images.
TensorFlow may automatically use any available GPUs to do the predictions on the images as shown below.
Omniscribe
Thanks for using our package! This is a command line interface for detecting annotations in IIIF-hosted printed books. Give the script a list of IIIF manifest URLs (either saved locally or hosted elsewhere) and it will generate a resultsManifest.json
file that is IIIF-compliant that can be displayed via Presentation APIs like Mirador.
*Note: This is a pre-release for testing. We are planning to deploy a production-ready release sometime in April.
How to Use:
Requires Python 3.6.x
- Download the Source Code package, unzip it, and save the
Omniscribe-v0.4-alpha
folder to your local machine or server. - Download the
model.h5
and save them to theOmniscribe-v0.4-alpha
folder. - Using the command line, navigate to the
Omniscribe-v0.4-alpha
folder. - Install dependencies by running the command
pip3 install -r requirements.txt
. - Run
inferencer.py
with the manifest URLs where you wish to detect annotations.
Example syntax: a web-hosted manifest:
$ python3 inferencer.py https://marinus.library.ucla.edu/iiif/annotated/uclaclark_BF1681A441713.json
Example syntax: local manifest
$ python3 inferencer.py path/to/a/localManifestFile.json
Example syntax: multiple manifests
$ python3 inferencer.py manifest1.json manifest2.json manifest3.json
-
After
inferencer.py
is done processing all the images, you will see prompted a message of form `Finished detecting annotations on the manifest(s). -
Done! All the export files will be saved in the
Omniscribe-v0.4-alpha
folder.
Sample Images:
Command Line will typically display this as it processes through all the images.
TensorFlow may automatically use any available GPUs to do the predictions on the images as shown below.
Omniscribe
Thanks for using our package! This is a command line interface for detecting annotations in IIIF-hosted printed books. Give the script a list of IIIF manifest URLs (either saved locally or hosted elsewhere) and it will generate a resultsURIs.txt
file listing all the page images that were predicted to have at least one annotation.
*Note: This is a pre-release for testing. We are planning to deploy a production-ready release sometime in April.
How to Use:
Requires Python 3.6.x
- Download the Source Code package, unzip it, and save the
Omniscribe-v0.3-alpha
folder to your local machine or server. - Download the
model.h5
and save them to theOmniscribe-v0.3-alpha
folder. - Using the command line, navigate to the
Omniscribe-v0.3-alpha
folder. - Install dependencies by running the command
pip3 install -r requirements.txt
. - Run
inferencer.py
with the manifest URLs where you wish to detect annotations.
Example syntax: a web-hosted manifest:
$ python3 inferencer.py https://marinus.library.ucla.edu/iiif/annotated/uclaclark_BF1681A441713.json
Example syntax: local manifest
$ python3 inferencer.py path/to/a/localManifestFile.json
Example syntax: multiple manifests
$ python3 inferencer.py manifest1.json manifest2.json manifest3.json
-
After
inferencer.py
is done processing all the images, you will see prompted a message of formFINISHED PROCESSING MANIFESTS. SAVED [EXPORT FILE] TO [CURRENT DIRECTORY]
. -
Done! a
resultsURIs.txt
file containing a list of all the image URIs that were predicted to have at least one annotation will be saved in theOmniscribe-v0.3-alpha
folder.
Sample Images:
Command Line will typically display this as it processes through all the images.
TensorFlow may automatically use any available GPUs to do the predictions on the images as shown below.
Book Annotation Detection
Thanks for using our package! This is a command line interface for detecting annotations in IIIF-hosted printed books. Give the script a list of IIIF manifest URLs (either saved locally or hosted elsewhere) and it will generate a regionURIs.txt
file listing all the page images that were predicted to have at least one annotation.
Note: This is a pre-release for testing. We are still fine-tuning our model for the best performance and accuracy. Check back for a more robust beta version soon.
How to Use:
Requires Python 3.6 or higher
- Download the Source Code package, unzip it, and save the
book-annotation-classification-v0.1-alpha
folder to your local machine or server. - Download the
w_smallData.h5
andw_bigData.h5
files and save them to thebook-annotation-classification-v0.1-alpha
folder. - Using the command line, navigate to the
book-annotation-classification-v0.1-alpha
folder. - Install dependencies by running the command
pip3 install -r requirements.txt
. - Run
inferencer.py
with the manifest URLs where you wish to detect annotations.
Example syntax: a web-hosted manifest:
$ python3 inferencer.py https://marinus.library.ucla.edu/iiif/annotated/uclaclark_BF1681A441713.json
Example syntax: local manifest
$ python3 inferencer.py path/to/a/localManifestFile.json
Example syntax: multiple manifests
$ python3 inferencer.py manifest1.json manifest2.json manifest3.json
- After
inferencer.py
is done processing all the images, you will see the messageFINISHED PROCESSING MANIFESTS. SAVED regionURIS.txt TO CURRENT DIRECTORY
- Done! a
regionURIs.txt
file containing a list of all the image URIs that were predicted to have at least one annotation will be saved in thebook-annotation-classification-v0.1-alpha
folder.
Sample Images:
Command Line will typically display this as it processes through all the images.
TensorFlow may automatically use any available GPUs to do the predictions on the images as shown below.
Annotation Detector on Command (ADC)
Thanks for using our package! May you find the annotations that we find helpful and interesting.
This is a command line interface for annotation detection! Give the script a list of IIIF manifest files (either saved locally or hosted elsewhere) and it will generate a regionURIs.txt text file that contains a list of all images that were predicted to have at least one annotation.
How to Use:
Step 1. Download the Source Code package, unzip it, and save the book-annotation-classification-v0.1-alpha folder to a place of your choosing.
Step 2. Download the w_smallData.h5 and w_bigData.h5 files and save them in the book-annotation-classification-v0.1-alpha folder.
Step 3. Using a command line, navigate to the book-annotation-classification-v0.1-alpha folder.
Step 4. Install dependencies by running the command pip3 install -r requirements.txt.
Step 5. Run inferencer.py with the manifests that you are interested in detecting annotations from.
Step 6. After inferencer.py is done processing all the manifest, you will see the message "FINISHED PROCESSING MANIFESTS. SAVED regionURIS.txt TO CURRENT DIRECTORY"
Step 7. Done! a regionURIs.txt file containing a list of all the image URIs that were predicted to have at least one annotation will be saved in the book-annotation-classification-v0.1-alpha folder.
Examples of How to Invoke Our Script:
$ python3 inferencer.py https://marinus.library.ucla.edu/iiif/annotated/uclaclark_BF1681A441713.json
$ python3 inferencer.py path/to/a/localManifestFile.json
$ python3 inferencer.py foo.json https://marinus.library.ucla.edu/iiif/annotated/uclaclark_SB322S53.json
Sample Image of How Your Set Up Should Look Like:
Command Line will typically display this as it processes through all the images.
TensorFlow may automatically use any available GPUs to do the predictions on the images as shown below.