Skip to content

Automatic document processing and data submission pipeline powered by Google Cloud's DocumentAI

License

Notifications You must be signed in to change notification settings

BillionOysterProjectCommunity/document-processor

Repository files navigation

document-processor

Automatic document processing and data submission pipeline powered by Google Cloud's DocumentAI for the Billion Oyster Project

The current FormParser version provided by Google Cloud is pretrained-form-parser-v2.0-2022-11-10

Local Development Setup

git clone https://github.com/BillionOysterProjectCommunity/document-processor.git

cd document-processor

python3.11 -m venv .

pip install -r requirements.txt

pip install -e ./

Google Cloud

For local ADC configuration follow https://cloud.google.com/docs/authentication/provide-credentials-adc#local-dev

Local SSL

cd theia/cert

openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout key.pem -out cert.pem

Project configuration

# theia/config.toml

location = 'us'
project-id = "gcp-project-name"
processor-id = "gcp-project-id"
# Google Drive
master-datasheet-directory-id = "bop-master-sheet-google-drive-directory-id"
# Web
flask-secret-key = "supersecretkey"
iam-file-name = "<document-processor-service-account>.json"
# OAuth
oauth-client-id = "<google-oauth-client-id>.apps.googleusercontent.com"
oauth-client-secret = "oauth-client-secret"

debug = false/true

This project uses Cloud Native Buildpack to build images. To build an image use the pack CLI tool.

pack build theia --path . --builder gcr.io/buildpacks/builder:v1

# Then, run the buildpack.
# Make sure to open 127.0.0.1 in the browser when developing locally

docker run --rm -p 127.0.0.1:8080:8080 theia

About

Automatic document processing and data submission pipeline powered by Google Cloud's DocumentAI

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages