Skip to content

Streamlit app for extracting structured data from invoices. It preprocesses images, then uses Google Generative AI to identify and organize key details—like merchant info, item list, and totals—into JSON. Includes a Q&A feature for conversationally querying extracted data.

License

Notifications You must be signed in to change notification settings

ShivenPatel19/Invoice-Extractor

Repository files navigation

Invoice Extractor

A Streamlit-based application that extracts structured information from invoice images or PDFs. It preprocesses uploaded documents by straightening the images and then leverages Google’s Generative AI API to identify and format key details, such as merchant information, item details, taxes, and total amounts, into an organized JSON structure. The app also provides a Q&A feature for querying extracted information conversationally.

Features

  • Straightening & Preprocessing: Automatically corrects image orientation.
  • Invoice Data Extraction: Extracts key details such as merchant information, item list, tax details, and total amounts.
  • Q&A Interface: Enables users to ask specific questions based on the extracted data, with responses generated conversationally.

Demo

For a quick overview of how the application works, check out the video demo. For a quick overview of how the API works, check out the video demo.

Installation

Step 1: Clone the Repository

git clone <repository-url>
cd <repository-directory>

Step 2: Set up a Conda Environment

Ensure you have Conda installed. Then, create and activate a new environment:

conda create -n invoice_extractor_env python=3.10
conda activate invoice_extractor_env

Step 3: Install Requirements

Install all dependencies listed in requirements.txt:

pip install -r requirements.txt

Step 4: Set Up API Keys

This project uses Google’s Generative AI API. To use it:

  1. Get your API key from Google and store it in a .env file in the project directory. Create your API key from here.
  2. Add the following line in the .env file:
    GOOGLE_API_KEY=<Your-API-Key>

Usage

Step 1: Run the Application

Start the Streamlit app by running:

streamlit run invoice_app.py

or

streamlit run full_invoice_app.py

Step 2: Upload and Process an Invoice

  1. Open the app in your browser (Streamlit will display the link in your terminal).
  2. Upload an invoice image or PDF.
  3. The app will straighten the image, extract details, and display the formatted data.

Step 3: Query Extracted Data

After extraction, enter specific queries in the Q&A interface to get responses based on the extracted information.

File Structure

  • invoice_app.py: Main application code for processing invoices and handling Q&A interactions. Indicates a lightweight, focused application fetching only the required information.
  • full_invoice_app.py: Main application code for processing invoices. Indicates a more comprehensive application fetching all available details.
  • requirements.txt: List of dependencies for the project.

Notes

  • The Q&A function only responds to questions relevant to the extracted invoice data.
  • Missing values in the invoice are marked as "null".

About

Streamlit app for extracting structured data from invoices. It preprocesses images, then uses Google Generative AI to identify and organize key details—like merchant info, item list, and totals—into JSON. Includes a Q&A feature for conversationally querying extracted data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published