Mindful Automations Table & Arabic OCR

Problem Statement

Processing invoices in Arabic was a tedious manual task—people relied on Google Camera to translate invoices, which was inefficient and time-consuming. I was tasked with automating this process: extracting and translating invoice data, then storing it in JSON format for streamlined workflows.

Solution Approach

Initial Attempt:
I started with YOLOv8 Table Extraction, but it struggled with detecting boundaries and edges in complex invoices.
Improved Model:
Switching to Multi-Type-TD-TSR significantly improved detection accuracy and saved time.
Optimal Choice:
Ultimately, I discovered the Google Cloud Vision API, which provided robust OCR capabilities for Arabic text and proved to be the best fit for the job.

Google Cloud Vision OCR Documentation

With this workflow, I automated the extraction, translation, and structured storage of invoice data, making the process efficient and scalable.

# install requirements.txt
pip install -r requirements.txt

How It Works & Progress Notes

Upload your invoice images and the script extracts and returns the text from the invoices.

1. Enable Google Cloud Vision API

# 1. Select your GCP project & enable the Vision API
# 2. Open Cloud Shell

mkdir google-cloud-vision-python && touch google-cloud-vision-python/app.py
cd google-cloud-vision-python
cloudshell open-workspace .
export PROJECT_ID=gcp-kubernetes-ml-app

2. Authenticate & Set Up Google CLI

# Create a service account for authentication
gcloud iam service-accounts create google-cloud-vision-quickstart --project gcp-kubernetes-ml-app

# Grant viewer role to the service account
gcloud projects add-iam-policy-binding gcp-kubernetes-ml-app \
    --member serviceAccount:google-cloud-vision-quickstart@gcp-kubernetes-ml-app.iam.gserviceaccount.com \
    --role roles/viewer

# Create a service account key
gcloud iam service-accounts keys create google-cloud-vision-key.json \
    --iam-account google-cloud-vision-quickstart@gcp-kubernetes-ml-app.iam.gserviceaccount.com

# Set the key as your default credentials
export GOOGLE_APPLICATION_CREDENTIALS=google-cloud-vision-key.json

3. Make API Calls with Google Cloud Vision

# Download a sample image
wget https://raw.githubusercontent.com/GoogleCloudPlatform/python-docs-samples/main/vision/snippets/quickstart/resources/wakeupcat.jpg

# Open app.py in the Cloud Shell Editor
cloudshell open app.py

# Install the Cloud Vision client library
pip3 install --upgrade google-cloud-vision

4. Example: Detect Text in a Local File

Note: The following code is adapted from Google Cloud documentation.

from google.cloud import vision

def detect_text(path):
        """Detects text in the file."""
        client = vision.ImageAnnotatorClient()
        with open(path, "rb") as image_file:
                content = image_file.read()
        image = vision.Image(content=content)
        response = client.text_detection(image=image)
        texts = response.text_annotations
        print("Texts:")
        for text in texts:
                print(f'\n"{text.description}"')
                vertices = [f"({v.x},{v.y})" for v in text.bounding_poly.vertices]
                print("bounds: {}".format(",".join(vertices)))
        if response.error.message:
                raise Exception(
                        f"{response.error.message}\nFor more info, see: "
                        "https://cloud.google.com/apis/design/errors"
                )

detect_text('SHUKRAN.jpeg')

5. Run & Clean Up

python3 app.py

# Clean up: Delete your service account key file
rm google-cloud-vision-key.json

6. Example Service Account Key (for reference only)

{
    "YOUR GCP CREDENTIALS"
}

7. Detect Text from a URL

from google.cloud import vision

def detect_text_uri(uri):
        """Detects text in the file located in Google Cloud Storage or on the Web."""
        client = vision.ImageAnnotatorClient()
        image = vision.Image()
        image.source.image_uri = uri
        response = client.text_detection(image=image)
        texts = response.text_annotations
        print("Texts:")
        for text in texts:
                print(f'\n"{text.description}"')
                vertices = [f"({v.x},{v.y})" for v in text.bounding_poly.vertices]
                print("bounds: {}".format(",".join(vertices)))
        if response.error.message:
                raise Exception(
                        f"{response.error.message}\nFor more info, see: "
                        "https://cloud.google.com/apis/design/errors"
                )

detect_text_uri('URL')

8. Packaging & Running as a CLI Script

The project is packaged to run directly from the command line with arguments.
You can also containerize it using Docker if needed.

Progress & Implementation Notes

✅ Package Delivery & Setup

Task: Delivered the required package as requested.

Prerequisites:

Installed torch and other dependencies.

Installed detectron2:

python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

Repository:
- Cloned the repo locally.
- Ran tests and fixed code issues:
  - Fixed device issue (missing DEVICE flag in .yaml for CPU support).
  - Fixed cv2_imshow() (Google Colab function) for local runs.

🥳 Local Table Detection: Successful

Saved cropped images.
Extracted all data locally.
Set up Vision API locally.
Stored extracted text in .txt files (named after images).
Implemented reading files directly from a folder.
Introduced valid CLI arguments.
Removed unnecessary print statements.
Integrated Vision API into the main function.
Added arguments for showing cropped tables.
Stored extracted text in .txt files.
Created OCR file with bash commands using Google Vision API.
Enabled full-page detection.
Integrated translation functionality.

Translation Setup:
Installed and enabled Google Translate API:

pip install google-cloud-translate==2.0.1
pip install --upgrade google-cloud-translate

📦 Data Storage

Generated JSON files to store extracted data in the required format.

🛠️ Debugging & Validation

Used str(texts) to debug encoding issues (resolved).

🚀 How to Run

python app.py --input_folder 'FILES/FILES' --detect_full_page True --translate True

🖼️ Example

Input:

Outputs:

Everything works! If any bugs arise, they will be fixed as needed.

If you want to see more experiments I conducted, please check out my notebook.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
.github/workflows		.github/workflows
FILES/FILES		FILES/FILES
Mindful Automations Table & Arabic OCR f157866291334892acd0d80731d68c0e		Mindful Automations Table & Arabic OCR f157866291334892acd0d80731d68c0e
main		main
.gitignore		.gitignore
app.py		app.py
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mindful Automations Table & Arabic OCR

Problem Statement

Solution Approach

How It Works & Progress Notes

1. Enable Google Cloud Vision API

2. Authenticate & Set Up Google CLI

3. Make API Calls with Google Cloud Vision

4. Example: Detect Text in a Local File

5. Run & Clean Up

6. Example Service Account Key (for reference only)

7. Detect Text from a URL

8. Packaging & Running as a CLI Script

Progress & Implementation Notes

✅ Package Delivery & Setup

🥳 Local Table Detection: Successful

📦 Data Storage

🛠️ Debugging & Validation

🚀 How to Run

🖼️ Example

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Mindful Automations Table & Arabic OCR

Problem Statement

Solution Approach

How It Works & Progress Notes

1. Enable Google Cloud Vision API

2. Authenticate & Set Up Google CLI

3. Make API Calls with Google Cloud Vision

4. Example: Detect Text in a Local File

5. Run & Clean Up

6. Example Service Account Key (for reference only)

7. Detect Text from a URL

8. Packaging & Running as a CLI Script

Progress & Implementation Notes

✅ Package Delivery & Setup

🥳 Local Table Detection: Successful

📦 Data Storage

🛠️ Debugging & Validation

🚀 How to Run

🖼️ Example

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages