Skip to content

Releases: collectionslab/Omniscribe

Omniscribe

27 Mar 19:33
3e6d9b5
Compare
Choose a tag to compare

DOI

This is a command line interface for detecting annotations in IIIF-hosted printed books. Give the script a list of IIIF manifest URLs (either saved locally or hosted elsewhere) and it will generate a resultsManifest.json file that is IIIF-compliant that can be displayed via IIIF Viewers like Mirador. Other outputs available are an HTML gallery and plain text file.

Release notes

Support for smaller-than-max inference images

Sometimes IIIF image servers limit the dimensions of the largest "full" image they make available (so people can't just download the full-res version); the code to highlight the detected handwriting regions via IIIF annotations now handles these situations.
There's a new command-line option, --max_pages=N, which can be used to avoid processing all of the images/pages on a given manifest, if for example the manifest is really long and you only want to detect handwriting on the first N pages.
The generated IIIF annotations have been streamlined considerably, thanks to suggestions from @glenrobson.

Add option to generate IIIF annotation lists

If the user specifies --annotate when running inferencer.py with the --manifest flag, IIIF annotation list files are created in the annotations/ folder and referenced in the resulting manifest file. IIIF-compatible viewers like Mirador can visualize the detected handwriting/notes on the input images when they load the generated manifest.

Other notes/contributions

There's also an optional --iiif_root argument to specify the web address where the manifest of the detected annotations will be posted, with the annotations/ folder for the optional IIIF annotation lists within it.
The detected handwriting/notes are highlighted in the IIIF annotation overlays when viewed in Mirador via a dashed rectangle bounding box with a dashed mask path within it. Mousing over the detected annotations displays a tag and the confidence level.

Installing Omniscribe

Requires Python 3.6.x

  1. Download the Source Code package, unzip it, and save the Omniscribe-1.1 folder to your local machine or server.
  2. Download the model.h5 and save to the Omniscribe-1.1 folder.
  3. Using the command line, navigate to the Omniscribe-1.1 folder.
  4. Install dependencies by running the command pip install -r requirements.txt.

    NOTE: We recommend setting up a virtual environment to install Omniscribe. For more information on setting up a virtual environment, please refer to https://packaging.python.org/guides/installing-using-pip-and-virtualenv/ up to the Leaving the virtualenv section of this documentation.

Usage

Run inferencer.py with the manifest URLs where you wish to detect annotations:

python3 inferencer.py [ export ] [ confidence ] [ manifest-url/path ]

Export options:

  • --manifest - exports resultsManifest.json, a IIIF manifest listing the images with detected annotations.
  • --text - exports resultsURIs.txt, a text file that contains URLs of images with detected annotations.
  • --html - exports resultsImages.html, a simple HTML gallery of images with detected annotations.
  • --max_pages=N - limits the number of images/pages processed in a given manifest, if for example the manifest is really long and you only want to detect handwriting on the first N pages.
  • --annotate (when running inferencer.py with the --manifest flag) - IIIF annotation list files are created in the annotations/ folder and referenced in the resulting manifest file.
  • --iiif_root - specify the web address where the manifest of the detected annotations will be posted, with the annotations/ folder for the optional IIIF annotation lists within it.

The default export format is resultsManifest.json if no export options are specified.

  • --confidence=VALUE - adjust this value for any values between 0 and 1 (inclusive).

E.g. --confidence=0.91 sets the threshold to 0.91. This means that any region that receives a score of 0.91 or higher from our model will be inferred as an annotation.

The default confidence level is 0.95 if no confidence value is specified.

Gauging a "Good" Confidence Value

We found that marginalia are often detected with a confidence value of 0.90 and higher, but detecting interlinear annotations require lower confidence values, somewhere between 0.70-0.85. This means that setting a confidence score of --confidence=0.90 will detect marginalia, but will be less effective in detecting interlinear annotations since these often receive scores below the threshold of 0.90. Setting a confidence score of --confidence=0.70 will detect both interlinear annotations and marginalia (as both types of annotations will receive scores that are equal or higher than the confidence score); however, using the lower confidence threshold will likely result in more false positives.

Operating on Multiple Manifests

The manifests can be hosted or local IIIF manifest files. You can input multiple manifest URLs or paths, and the application will crawl through all the images from each manifest such that the resulting export is a single conglomerate of all the sub-results from every manifest.

  • https://marinus.library.ucla.edu/iiif/annotated/uclaclark_SB322S53-short.json
  • path/to/a/localManifestFile.json

Example Command Lines

python3 inferencer.py --manifest --confidence=0.93 manifest1.json
python3 inferencer.py --html --confidence=0.90 manifest1.json
python3 inferencer.py --text --confidence=0.94 manifest1.json
python3 inferencer.py --manifest --html --confidence=0.92 manifest1.json
python3 inferencer.py --text --manifest --confidence=0.97 manifest1.json
python3 inferencer.py --html --text --confidence=0.93 manifest1.json
python3 inferencer.py --html --manifest --text --confidence=0.91 manifest1.json
python3 inferencer.py --confidence=0.95 manifest1.json
python3 inferencer.py --text manifest1.json
python3 inferencer.py manifest1.json
python3 inferencer.py --manifest --text --html --confidence=0.96 manifest1.json manifest2.json
python3 inferencer.py --manifest --text manifest1.json manifest2.json manifest3.json manifest4.json

Note that omitting the confidence option will be interpreted as setting the confidence score to 0.95. Additionally, omitting all export options will be interpreted as setting the export to a manifest file.

Collecting the Results

After inferencer.py is done processing all the images, you will see the message Finished detecting annotations.

All the export files will be saved in the Omniscribe-1.1 folder.

Sample Images

Command Line will typically display this as it processes through all the images.

TensorFlow may automatically use any available GPUs to do the predictions on the images as shown below.

HTML Gallery

Displaying resultsManifest.json through Mirador, an image viewing client that supports IIIF.

Omniscribe v1.0.2

19 May 18:15
Compare
Choose a tag to compare

DOI

Thanks for using our package! This is a command line interface for detecting annotations in IIIF-hosted printed books. Give the script a list of IIIF manifest URLs (either saved locally or hosted elsewhere) and it will generate a resultsManifest.json file that is IIIF-compliant that can be displayed via IIIF Viewers like Mirador. Other outputs available are an HTML gallery and plain text file.

Installing Omniscribe

Requires Python 3.6.x

  1. Download the Source Code package, unzip it, and save the Omniscribe-1.0.2 folder to your local machine or server.
  2. Download the model.h5 and save to the Omniscribe-1.0.2 folder.
  3. Using the command line, navigate to the Omniscribe-1.0.2 folder.
  4. Install dependencies by running the command pip install -r requirements.txt.

    NOTE: We recommend setting up a virtual environment to install Omniscribe. For more information on setting up a virtual environment, please refer to https://packaging.python.org/guides/installing-using-pip-and-virtualenv/ up to the Leaving the virtualenv section of this documentation.

Usage

Run inferencer.py with the manifest URLs where you wish to detect annotations:

python3 inferencer.py [ export ] [ confidence ] [ manifest-url/path ]

Export options:

  • --manifest - exports resultsManifest.json, a IIIF manifest listing the images with detected annotations.
  • --text - exports resultsURIs.txt, a text file that contains URLs of images with detected annotations.
  • --html - exports resultsImages.html, a simple HTML gallery of images with detected annotations.

The default export format is resultsManifest.json if no export options are specified.

  • --confidence=VALUE - adjust this value for any values between 0 and 1 (inclusive).

E.g. --confidence=0.91 sets the threshold to 0.91. This means that any region that receives a score of 0.91 or higher from our model will be inferred as an annotation.

The default confidence level is 0.95 if no confidence value is specified.

Gauging a "Good" Confidence Value

We found that marginalia are often detected with a confidence value of 0.90 and higher, but detecting interlinear annotations require lower confidence values, somewhere between 0.70-0.85. This means that setting a confidence score of --confidence=0.90 will detect marginalia, but will be less effective in detecting interlinear annotations since these often receive scores below the threshold of 0.90. Setting a confidence score of --confidence=0.70 will detect both interlinear annotations and marginalia (as both types of annotations will receive scores that are equal or higher than the confidence score); however, using the lower confidence threshold will likely result in more false positives.

Operating on Multiple Manifests

The manifests can be hosted or local IIIF manifest files. You can input multiple manifest URLs or paths, and the application will crawl through all the images from each manifest such that the resulting export is a single conglomerate of all the sub-results from every manifest.

  • https://marinus.library.ucla.edu/iiif/annotated/uclaclark_SB322S53-short.json
  • path/to/a/localManifestFile.json

Example Command Lines

python3 inferencer.py --manifest --confidence=0.93 manifest1.json
python3 inferencer.py --html --confidence=0.90 manifest1.json
python3 inferencer.py --text --confidence=0.94 manifest1.json
python3 inferencer.py --manifest --html --confidence=0.92 manifest1.json
python3 inferencer.py --text --manifest --confidence=0.97 manifest1.json
python3 inferencer.py --html --text --confidence=0.93 manifest1.json
python3 inferencer.py --html --manifest --text --confidence=0.91 manifest1.json
python3 inferencer.py --confidence=0.95 manifest1.json
python3 inferencer.py --text manifest1.json
python3 inferencer.py manifest1.json
python3 inferencer.py --manifest --text --html --confidence=0.96 manifest1.json manifest2.json
python3 inferencer.py --manifest --text manifest1.json manifest2.json manifest3.json manifest4.json

Note that omitting the confidence option will be interpreted as setting the confidence score to 0.95. Additionally, omitting all export options will be interpreted as setting the export to a manifest file.

Collecting the Results

After inferencer.py is done processing all the images, you will see the message Finished detecting annotations.

All the export files will be saved in the Omniscribe-1.0.2 folder.

Sample Images

Command Line will typically display this as it processes through all the images.

TensorFlow may automatically use any available GPUs to do the predictions on the images as shown below.

HTML Gallery

Displaying resultsManifest.json through Mirador, an image viewing client that supports IIIF.

Omniscribe

18 Apr 02:18
Compare
Choose a tag to compare

Thanks for using our package! This is a command line interface for detecting annotations in IIIF-hosted printed books. Give the script a list of IIIF manifest URLs (either saved locally or hosted elsewhere) and it will generate a resultsManifest.json file that is IIIF-compliant that can be displayed via IIIF Viewers like Mirador. Other outputs available are an HTML gallery and plain text file.

Installing Omniscribe

Requires Python 3.6.x

  1. Download the Source Code package, unzip it, and save the Omniscribe-1.0.1 folder to your local machine or server.
  2. Download the model.h5 and save to the Omniscribe-1.0.1 folder.
  3. Using the command line, navigate to the Omniscribe-1.0.1 folder.
  4. Install dependencies by running the command pip install -r requirements.txt.

    NOTE: we recommend setting up a virtual environment to install these dependencies in. For more information on setting up a virtual environment, please refer to https://packaging.python.org/guides/installing-using-pip-and-virtualenv/ up to the Leaving the virtualenv section of this documentation.

Usage

Run inferencer.py with the manifest URLs where you wish to detect annotations:

python3 inferencer.py [ export ] [ confidence ] [ manifest ]

Export options:

  • --manifest - exports resultsManifest.json, a IIIF manifest that contains images with annotations.
  • --text - exports resultsURIs.txt, a text file that contains URIs of images with annotations.
  • --html - exports resultsImages.html, a simple HTML gallery of images with annotations.

The default export format is resultsManifest.json if no export options are specified.

  • --confidence=VALUE - adjust this value for any values between 0 and 1 (inclusive).

E.g. --confidence=0.91 sets the threshold to 0.91. This means that any region that receives a score of 0.91 or higher from our model will be inferred as an annotation.

The default confidence level is 0.95 if no confidence value is specified.

Gauging a "Good" Confidence Value

We found in our experience that interlinear annotations get picked up with values 0.70-0.85, while marginalia are often 0.90+. That is, setting a confidence score of --confidence=0.70 will pick up both interlinear annotations and marginalia (as both types of annotations will receive scores that are equal or higher than the confidence score); in turn, setting a confidence score of --confidence=0.90 will only detect marginalia (as interlinear annotations often receive scores below the threshold of 0.90).

NOTE: The lower the confidence threshold, the more likely the model may give more false positives (accuracy increases, precision decreases). Similarly, increasing the confidence threshold might give less true positives (accuracy decreases, precision increases).

Operating on Multiple Manifests

The manifests can be hosted or local IIIF manifest files. You can input multiple manifest URLs or paths, and the application will crawl through all the images from each manifest such that the resulting export is a single conglomerate of all the sub-results from every manifest.

  • https://marinus.library.ucla.edu/iiif/annotated/uclaclark_SB322S53-short.json
  • path/to/a/localManifestFile.json

Example Command Lines

python3 inferencer.py --manifest --confidence=0.93 manifest1.json
python3 inferencer.py --html --confidence=0.90 manifest1.json
python3 inferencer.py --text --confidence=0.94 manifest1.json
python3 inferencer.py --manifest --html --confidence=0.92 manifest1.json
python3 inferencer.py --text --manifest --confidence=0.97 manifest1.json
python3 inferencer.py --html --text --confidence=0.93 manifest1.json
python3 inferencer.py --html --manifest --text --confidence=0.91 manifest1.json
python3 inferencer.py --confidence=0.95 manifest1.json
python3 inferencer.py --text manifest1.json
python3 inferencer.py manifest1.json
python3 inferencer.py --manifest --text --html --confidence=0.96 manifest1.json manifest2.json
python3 inferencer.py --manifest --text manifest1.json manifest2.json manifest3.json manifest4.json

Note that omitting the confidence option will be interpreted as setting the confidence score to 0.95. Additionally, omitting all export options will be interpreted as setting the export to a manifest file.

Collecting the Results

After inferencer.py is done processing all the images, you will see the message Finished detecting annotations.

All the export files will be saved in the Omniscribe-1.0.1 folder.

Sample Images

Command Line will typically display this as it processes through all the images.

TensorFlow may automatically use any available GPUs to do the predictions on the images as shown below.

HTML Gallery

Displaying resultsManifest.json through Mirador, an image viewing client that supports IIIF.

Omniscribe

17 Apr 21:06
Compare
Choose a tag to compare

Thanks for using our package! This is a command line interface for detecting annotations in IIIF-hosted printed books. Give the script a list of IIIF manifest URLs (either saved locally or hosted elsewhere) and it will generate a resultsManifest.json file that is IIIF-compliant that can be displayed via IIIF Viewers like Mirador. Other outputs available are an HTML gallery and plain text file.

Install Omniscribe

Requires Python 3.6.x

  1. Download the Source Code package, unzip it, and save the Omniscribe-1.0 folder to your local machine or server.
  2. Download the model.h5 and save to the Omniscribe-1.0 folder.
  3. Using the command line, navigate to the Omniscribe-1.0 folder.
  4. Install dependencies by running the command pip3 install -r requirements.txt.

Usage

Run inferencer.py with the manifest URLs where you wish to detect annotations:

python3 inferencer.py [ output ] [ confidence ] [ manifest ]

Output options:

  • --manifest (default) - outputs a IIIF manifest
  • --text - outputs a text file
  • --html - outputs an HTML gallery

The default confidence level is 0.95. If you wish to change the confidence level, you can use:

  • --confidence=0.91 - adjust the percent value as needed

The manifests can be hosted or local IIIF manifest files. You can input multiple manifest URLs or paths.

  • https://marinus.library.ucla.edu/iiif/annotated/uclaclark_SB322S53-short.json
  • path/to/a/localManifestFile.json

After inferencer.py is done processing all the images, you will see the message `Finished detecting annotations'.

All the export files will be saved in the Omniscribe-1.0 folder.

Sample Images:

Command Line will typically display this as it processes through all the images.

TensorFlow may automatically use any available GPUs to do the predictions on the images as shown below.

Omniscribe

16 Apr 20:01
Compare
Choose a tag to compare
Omniscribe Pre-release
Pre-release

Thanks for using our package! This is a command line interface for detecting annotations in IIIF-hosted printed books. Give the script a list of IIIF manifest URLs (either saved locally or hosted elsewhere) and it will generate a resultsManifest.json file that is IIIF-compliant that can be displayed via Presentation APIs like Mirador.

*Note: This is a pre-release for testing. We are planning to deploy a production-ready release sometime in April.

How to Use:

Requires Python 3.6.x

  1. Download the Source Code package, unzip it, and save the Omniscribe-v0.4-alpha folder to your local machine or server.
  2. Download the model.h5 and save them to the Omniscribe-v0.4-alpha folder.
  3. Using the command line, navigate to the Omniscribe-v0.4-alpha folder.
  4. Install dependencies by running the command pip3 install -r requirements.txt.
  5. Run inferencer.py with the manifest URLs where you wish to detect annotations.

Example syntax: a web-hosted manifest:

$ python3 inferencer.py https://marinus.library.ucla.edu/iiif/annotated/uclaclark_BF1681A441713.json

Example syntax: local manifest

$ python3 inferencer.py path/to/a/localManifestFile.json

Example syntax: multiple manifests

$ python3 inferencer.py manifest1.json manifest2.json manifest3.json
  1. After inferencer.py is done processing all the images, you will see prompted a message of form `Finished detecting annotations on the manifest(s).

  2. Done! All the export files will be saved in the Omniscribe-v0.4-alpha folder.

Sample Images:

Command Line will typically display this as it processes through all the images.

TensorFlow may automatically use any available GPUs to do the predictions on the images as shown below.

Omniscribe

15 Apr 19:14
Compare
Choose a tag to compare
Omniscribe Pre-release
Pre-release

Thanks for using our package! This is a command line interface for detecting annotations in IIIF-hosted printed books. Give the script a list of IIIF manifest URLs (either saved locally or hosted elsewhere) and it will generate a resultsURIs.txt file listing all the page images that were predicted to have at least one annotation.

*Note: This is a pre-release for testing. We are planning to deploy a production-ready release sometime in April.

How to Use:

Requires Python 3.6.x

  1. Download the Source Code package, unzip it, and save the Omniscribe-v0.3-alpha folder to your local machine or server.
  2. Download the model.h5 and save them to the Omniscribe-v0.3-alpha folder.
  3. Using the command line, navigate to the Omniscribe-v0.3-alpha folder.
  4. Install dependencies by running the command pip3 install -r requirements.txt.
  5. Run inferencer.py with the manifest URLs where you wish to detect annotations.

Example syntax: a web-hosted manifest:

$ python3 inferencer.py https://marinus.library.ucla.edu/iiif/annotated/uclaclark_BF1681A441713.json

Example syntax: local manifest

$ python3 inferencer.py path/to/a/localManifestFile.json

Example syntax: multiple manifests

$ python3 inferencer.py manifest1.json manifest2.json manifest3.json
  1. After inferencer.py is done processing all the images, you will see prompted a message of form FINISHED PROCESSING MANIFESTS. SAVED [EXPORT FILE] TO [CURRENT DIRECTORY].

  2. Done! a resultsURIs.txt file containing a list of all the image URIs that were predicted to have at least one annotation will be saved in the Omniscribe-v0.3-alpha folder.

Sample Images:

Command Line will typically display this as it processes through all the images.

TensorFlow may automatically use any available GPUs to do the predictions on the images as shown below.

Book Annotation Detection

09 Mar 02:17
cdb0f87
Compare
Choose a tag to compare
Pre-release

Thanks for using our package! This is a command line interface for detecting annotations in IIIF-hosted printed books. Give the script a list of IIIF manifest URLs (either saved locally or hosted elsewhere) and it will generate a regionURIs.txt file listing all the page images that were predicted to have at least one annotation.

Note: This is a pre-release for testing. We are still fine-tuning our model for the best performance and accuracy. Check back for a more robust beta version soon.

How to Use:

Requires Python 3.6 or higher

  1. Download the Source Code package, unzip it, and save the book-annotation-classification-v0.1-alpha folder to your local machine or server.
  2. Download the w_smallData.h5 and w_bigData.h5 files and save them to the book-annotation-classification-v0.1-alpha folder.
  3. Using the command line, navigate to the book-annotation-classification-v0.1-alpha folder.
  4. Install dependencies by running the command pip3 install -r requirements.txt.
  5. Run inferencer.py with the manifest URLs where you wish to detect annotations.

Example syntax: a web-hosted manifest:

$ python3 inferencer.py https://marinus.library.ucla.edu/iiif/annotated/uclaclark_BF1681A441713.json

Example syntax: local manifest

$ python3 inferencer.py path/to/a/localManifestFile.json

Example syntax: multiple manifests

$ python3 inferencer.py manifest1.json manifest2.json manifest3.json
  1. After inferencer.py is done processing all the images, you will see the message FINISHED PROCESSING MANIFESTS. SAVED regionURIS.txt TO CURRENT DIRECTORY
  2. Done! a regionURIs.txt file containing a list of all the image URIs that were predicted to have at least one annotation will be saved in the book-annotation-classification-v0.1-alpha folder.

Sample Images:

Command Line will typically display this as it processes through all the images.

TensorFlow may automatically use any available GPUs to do the predictions on the images as shown below.

Annotation Detector on Command (ADC)

07 Mar 13:24
Compare
Choose a tag to compare
Pre-release

Thanks for using our package! May you find the annotations that we find helpful and interesting.

This is a command line interface for annotation detection! Give the script a list of IIIF manifest files (either saved locally or hosted elsewhere) and it will generate a regionURIs.txt text file that contains a list of all images that were predicted to have at least one annotation.

How to Use:

Step 1. Download the Source Code package, unzip it, and save the book-annotation-classification-v0.1-alpha folder to a place of your choosing.

Step 2. Download the w_smallData.h5 and w_bigData.h5 files and save them in the book-annotation-classification-v0.1-alpha folder.

Step 3. Using a command line, navigate to the book-annotation-classification-v0.1-alpha folder.

Step 4. Install dependencies by running the command pip3 install -r requirements.txt.

Step 5. Run inferencer.py with the manifests that you are interested in detecting annotations from.

Step 6. After inferencer.py is done processing all the manifest, you will see the message "FINISHED PROCESSING MANIFESTS. SAVED regionURIS.txt TO CURRENT DIRECTORY"

Step 7. Done! a regionURIs.txt file containing a list of all the image URIs that were predicted to have at least one annotation will be saved in the book-annotation-classification-v0.1-alpha folder.

Examples of How to Invoke Our Script:

$ python3 inferencer.py https://marinus.library.ucla.edu/iiif/annotated/uclaclark_BF1681A441713.json
$ python3 inferencer.py path/to/a/localManifestFile.json
$ python3 inferencer.py foo.json https://marinus.library.ucla.edu/iiif/annotated/uclaclark_SB322S53.json

Sample Image of How Your Set Up Should Look Like:

Command Line will typically display this as it processes through all the images.

TensorFlow may automatically use any available GPUs to do the predictions on the images as shown below.