🦙 OCR Vision App

OCR Vision App is a user-friendly tool designed to extract structured text from images using advanced OCR models. This application supports multiple OCR models, including EasyOCR, Llama3.2 Vision (local), and Groq's Llama3.2 Vision (via API). The app provides flexible output formats: plain text, Markdown, or JSON.

Features

OCR Model Options:
- EasyOCR: Lightweight and quick.
- Llama3.2 Vision: Advanced local model for high-accuracy OCR tasks.
- Groq's Llama3.2 Vision API: State-of-the-art vision model via API integration.
Flexible Output Formats:
- Plain Text: For simple text extraction.
- Markdown: For formatted content.
- JSON: For structured output and metadata.
Dynamic Image Uploads:
- Supports PNG, JPG, and JPEG formats.
- Compresses and resizes images for efficient processing.
Downloadable Results:
- Save your extracted text in your chosen format.

Prerequisites

1. Install Dependencies

Install the required Python libraries from requirements.txt:

pip install -r requirements.txt
I'll help format this README section properly in Markdown:

```markdown
## Prerequisites

### 1. Install Dependencies
Install the required Python libraries from `requirements.txt`:

```bash
pip install -r requirements.txt

2. Set Up API Key for Groq's Llama Model

To use Groq's Llama3.2 Vision model via API:

Create a .streamlit folder in your project root directory
Inside .streamlit, create a .secrets.toml file

Here's a sample .secrets.toml structure:

# .streamlit/secrets.toml
GROQ_API_KEY = "your_groq_api_key_here"

A template .secrets_example.toml is provided in the repository to guide you.

3. Download Llama Model Locally

To run Llama3.2 Vision locally:

Visit Llama model hub to download the required model
Place the downloaded model in a directory accessible to the app, e.g., models/llama3.2

Getting Started

Clone the Repository

git clone https://github.com/obinopaul/OCR_vision-app.git
cd OCR_vision-app

Set Up Environment
- Install dependencies using requirements.txt
- Add your API keys to .streamlit/secrets.toml
Run the Application
```
streamlit run app.py
```

Usage

Upload an Image
- Choose an image (PNG, JPG, or JPEG)
- The app resizes the image for optimal processing
Select an OCR Model
- EasyOCR for lightweight tasks
- Llama3.2 Vision (local) for robust accuracy
- Groq's Llama3.2 Vision API for cutting-edge OCR via API
Choose an Output Format
- Plain text, Markdown, or JSON
Extract Text
- Click the "Extract Text 🔍" button to perform OCR
Download Results
- Save the extracted text in your preferred format

Output Formats

Plain Text
- Simple text format
- Filename: extracted_text.txt
Markdown
- Formatted content for documentation purposes
- Filename: extracted_text.md
JSON
- Structured output with extracted text and metadata
- Filename: extracted_text.json

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.streamlit		.streamlit
__pycache__		__pycache__
OCR_vision-app.gif		OCR_vision-app.gif
README.md		README.md
__init__.py		__init__.py
app.py		app.py
requirements.txt		requirements.txt
vision_models.py		vision_models.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🦙 OCR Vision App

Features

Prerequisites

1. Install Dependencies

2. Set Up API Key for Groq's Llama Model

3. Download Llama Model Locally

Getting Started

Usage

Output Formats

About

Uh oh!

Releases

Packages

Languages

obinopaul/OCR_vision-app

Folders and files

Latest commit

History

Repository files navigation

🦙 OCR Vision App

Features

Prerequisites

1. Install Dependencies

2. Set Up API Key for Groq's Llama Model

3. Download Llama Model Locally

Getting Started

Usage

Output Formats

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages