- Python 3.8 or higher
- Node.js 18 or higher (for frontend)
- Tesseract OCR installed on your system
cd /Users/champakjyotikonwar/My_Projects/BanRakshak/backendpython -m venv venv
source venv/bin/activate # On macOS/Linux
# or
venv\Scripts\activate # On Windowspip install -r requirements.txtpython -m spacy download en_core_web_smbrew install tesseractsudo apt update
sudo apt install tesseract-ocrDownload and install from: https://github.com/UB-Mannheim/tesseract/wiki
Edit main.py line ~95 to point to your Tesseract installation:
# For macOS/Linux (usually default)
parser = StructuredDocumentParser()
# For Windows (update path as needed)
parser = StructuredDocumentParser(r"C:\Program Files\Tesseract-OCR\tesseract.exe")cd /Users/champakjyotikonwar/My_Projects/BanRakshak/backend
python main.pyOr alternatively:
uvicorn main:app --host 0.0.0.0 --port 8000 --reloadOpen your browser and go to:
- http://localhost:8000 - Basic health check
- http://localhost:8000/docs - FastAPI automatic documentation
- http://localhost:8000/api/health - Detailed health check
cd /Users/champakjyotikonwar/My_Projects/BanRakshak/frontendnpm installnpm run devOpen your browser and go to: http://localhost:3000
POST /api/ocr/upload- Upload document for processingGET /api/ocr/status/{task_id}- Get processing statusGET /api/ocr/result/{task_id}- Get processing resultsGET /api/ocr/tasks- List all tasksDELETE /api/ocr/task/{task_id}- Delete a task
GET /- Basic health checkGET /api/health- Detailed health checkGET /api/assets/health- Asset mapping health check
- Start both backend (port 8000) and frontend (port 3000)
- Go to the OCR Processor page in the frontend
- Upload a document (PDF, PNG, JPG)
- Watch the processing status update in real-time
- View extracted text and entities once processing is complete
-
Tesseract not found error
- Make sure Tesseract is installed and in your PATH
- Update the tesseract path in the code if needed
-
spaCy model not found
- Run:
python -m spacy download en_core_web_sm
- Run:
-
CORS errors in browser
- Make sure backend is running on port 8000
- Check that frontend is running on port 3000
-
Import errors
- Make sure all Python dependencies are installed
- Verify you're in the correct virtual environment
-
File upload errors
- Check that the uploads directory is created and writable
- Verify file size limits and supported formats
You can set the following environment variables:
NEXT_PUBLIC_API_URL- Backend API URL (default: http://localhost:8000)TESSERACT_PATH- Path to Tesseract executable
BanRakshak/
├── backend/
│ ├── main.py # FastAPI server
│ ├── requirements.txt # Python dependencies
│ ├── uploads/ # Uploaded files directory
│ ├── OCR-NER/ # OCR processing modules
│ └── asset-map/ # GIS processing modules
└── frontend/
├── src/
│ └── app/
│ ├── config/
│ │ └── api.ts # API configuration
│ └── pages/
│ └── OCRProcessor.tsx # Updated with API calls
└── package.json