Processing invoices in Arabic was a tedious manual task—people relied on Google Camera to translate invoices, which was inefficient and time-consuming. I was tasked with automating this process: extracting and translating invoice data, then storing it in JSON format for streamlined workflows.
-
Initial Attempt:
I started with YOLOv8 Table Extraction, but it struggled with detecting boundaries and edges in complex invoices. -
Improved Model:
Switching to Multi-Type-TD-TSR significantly improved detection accuracy and saved time. -
Optimal Choice:
Ultimately, I discovered the Google Cloud Vision API, which provided robust OCR capabilities for Arabic text and proved to be the best fit for the job.
With this workflow, I automated the extraction, translation, and structured storage of invoice data, making the process efficient and scalable.
# install requirements.txt
pip install -r requirements.txtUpload your invoice images and the script extracts and returns the text from the invoices.
# 1. Select your GCP project & enable the Vision API
# 2. Open Cloud Shell
mkdir google-cloud-vision-python && touch google-cloud-vision-python/app.py
cd google-cloud-vision-python
cloudshell open-workspace .
export PROJECT_ID=gcp-kubernetes-ml-app# Create a service account for authentication
gcloud iam service-accounts create google-cloud-vision-quickstart --project gcp-kubernetes-ml-app
# Grant viewer role to the service account
gcloud projects add-iam-policy-binding gcp-kubernetes-ml-app \
--member serviceAccount:google-cloud-vision-quickstart@gcp-kubernetes-ml-app.iam.gserviceaccount.com \
--role roles/viewer
# Create a service account key
gcloud iam service-accounts keys create google-cloud-vision-key.json \
--iam-account google-cloud-vision-quickstart@gcp-kubernetes-ml-app.iam.gserviceaccount.com
# Set the key as your default credentials
export GOOGLE_APPLICATION_CREDENTIALS=google-cloud-vision-key.json# Download a sample image
wget https://raw.githubusercontent.com/GoogleCloudPlatform/python-docs-samples/main/vision/snippets/quickstart/resources/wakeupcat.jpg
# Open app.py in the Cloud Shell Editor
cloudshell open app.py
# Install the Cloud Vision client library
pip3 install --upgrade google-cloud-visionNote: The following code is adapted from Google Cloud documentation.
from google.cloud import vision
def detect_text(path):
"""Detects text in the file."""
client = vision.ImageAnnotatorClient()
with open(path, "rb") as image_file:
content = image_file.read()
image = vision.Image(content=content)
response = client.text_detection(image=image)
texts = response.text_annotations
print("Texts:")
for text in texts:
print(f'\n"{text.description}"')
vertices = [f"({v.x},{v.y})" for v in text.bounding_poly.vertices]
print("bounds: {}".format(",".join(vertices)))
if response.error.message:
raise Exception(
f"{response.error.message}\nFor more info, see: "
"https://cloud.google.com/apis/design/errors"
)
detect_text('SHUKRAN.jpeg')python3 app.py
# Clean up: Delete your service account key file
rm google-cloud-vision-key.json{
"YOUR GCP CREDENTIALS"
}from google.cloud import vision
def detect_text_uri(uri):
"""Detects text in the file located in Google Cloud Storage or on the Web."""
client = vision.ImageAnnotatorClient()
image = vision.Image()
image.source.image_uri = uri
response = client.text_detection(image=image)
texts = response.text_annotations
print("Texts:")
for text in texts:
print(f'\n"{text.description}"')
vertices = [f"({v.x},{v.y})" for v in text.bounding_poly.vertices]
print("bounds: {}".format(",".join(vertices)))
if response.error.message:
raise Exception(
f"{response.error.message}\nFor more info, see: "
"https://cloud.google.com/apis/design/errors"
)
detect_text_uri('URL')The project is packaged to run directly from the command line with arguments.
You can also containerize it using Docker if needed.
-
Task: Delivered the required package as requested.
-
Prerequisites:
-
Installed
torchand other dependencies. -
Installed
detectron2:python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
-
-
Repository:
- Cloned the repo locally.
- Ran tests and fixed code issues:
- Fixed device issue (missing
DEVICEflag in.yamlfor CPU support). - Fixed
cv2_imshow()(Google Colab function) for local runs.
- Fixed device issue (missing
- Saved cropped images.
- Extracted all data locally.
- Set up Vision API locally.
- Stored extracted text in
.txtfiles (named after images). - Implemented reading files directly from a folder.
- Introduced valid CLI arguments.
- Removed unnecessary print statements.
- Integrated Vision API into the main function.
- Added arguments for showing cropped tables.
- Stored extracted text in
.txtfiles. - Created OCR file with bash commands using Google Vision API.
- Enabled full-page detection.
- Integrated translation functionality.
Translation Setup:
Installed and enabled Google Translate API:
pip install google-cloud-translate==2.0.1
pip install --upgrade google-cloud-translate- Generated JSON files to store extracted data in the required format.
- Used
str(texts)to debug encoding issues (resolved).
python app.py --input_folder 'FILES/FILES' --detect_full_page True --translate TrueEverything works! If any bugs arise, they will be fixed as needed.
If you want to see more experiments I conducted, please check out my notebook.



