AI-Powered K-12 Worksheet Question Extraction Platform for Singapore Education
Extract, structure, and tag educational content from worksheets across all subjects (Math, Science, Languages, Humanities) with AI-powered OCR, segmentation, and curriculum-aligned tagging.
CurriculumExtractor automates the extraction of questions from K-12 worksheets, transforming hours of manual data entry into minutes. It combines:
- Multi-subject document processing - Math, Science, Languages, Humanities
- Intelligent AI pipeline - OCR β Segmentation β Curriculum Tagging
- Human-in-the-loop review - Side-by-side PDF viewer with question editor
- Singapore curriculum alignment - Auto-tagging with MOE syllabus taxonomies
- LaTeX rendering - Fast mathematical expression display with KaTeX
- Question bank persistence - Structured storage with version control
Target Users: Content Operations Reviewers, Admins, and Integrators in EdTech
- β FastAPI Backend - Python 3.10 with async support
- β React Frontend - React 19 with TypeScript 5.2
- β Supabase PostgreSQL - Managed database (Session Mode, ap-south-1)
- β Celery + Redis - Async task queue (4 worker processes)
- β User Authentication - JWT with 8-day expiry
- β Task API - Queue, monitor, and retrieve async task results
- β Docker Compose - Full-stack orchestration with hot-reload
- β Infrastructure complete (Supabase + Celery + Redis)
- β User management and authentication
- β Task queue for async processing
- β³ Extraction models (PDF β Question)
- β³ OCR and question segmentation (PaddleOCR + docTR)
- β³ Review UI with PDF annotation (react-pdf)
- β³ LaTeX math rendering (KaTeX)
- β³ Curriculum tagging
- β³ Question bank export
- Multi-subject expansion (Science, English, Humanities)
- Subject-specific ML adapters (DeBERTa-v3 fine-tuned)
- Advanced question types (essays, practicals)
- Semantic search and difficulty classification
See Product Requirements for complete feature list.
- Docker & Docker Compose
- Node.js v20+ (via nvm/fnm)
- Python 3.10+ with uv
- Supabase account (free tier)
For detailed setup instructions, see:
- Setup Guide - Complete installation guide
- Supabase Setup Guide - Database configuration
Quick Start:
-
Clone repository
git clone <repository-url> cd CurriculumExtractor
-
Configure Supabase
- Project ID:
wijzypbstiigssjuiuvh(ap-south-1 region) - Connection: Session Mode (port 5432)
- Update
.envwith your credentials
- Project ID:
-
Start development
docker compose watch
-
Access application
- Frontend: http://localhost:5173
- Backend: http://localhost:8000
- API Docs: http://localhost:8000/docs
-
Login
- Email:
[email protected] - Password: From
FIRST_SUPERUSER_PASSWORDin.env
- Email:
All services (7) will start automatically:
- Backend (FastAPI), Frontend (React), Database (Supabase)
- Redis, Celery Worker, Proxy (Traefik), MailCatcher
Backend (Python 3.10):
- FastAPI 0.115+ - Async web framework with OpenAPI docs
- SQLModel 0.0.24 - ORM combining Pydantic + SQLAlchemy
- PostgreSQL 17 via Supabase - Managed database (Session Mode)
- Celery 5.5 + Redis 7 - Distributed task queue (4 workers)
- psycopg3 - PostgreSQL driver with prepared statement support
- Alembic - Database migrations
- pyjwt - JWT authentication
Frontend (TypeScript 5.2):
- React 19 - UI framework
- Vite 7 - Build tool with HMR
- TanStack Router - File-based routing
- TanStack Query - Server state management
- Chakra UI 3 - Component library
- react-pdf 9.x (planned) - PDF viewing
- KaTeX (planned) - LaTeX math rendering
ML Pipeline (Phase 2):
- PaddleOCR - Text extraction with bounding boxes
- docTR - Document layout analysis
- DeBERTa-v3 - Curriculum tagging (fine-tuned for Singapore syllabus)
Infrastructure:
- Docker Compose - Development orchestration (7 services)
- Supabase - Managed PostgreSQL 17 + S3-compatible Storage
- Project: wijzypbstiigssjuiuvh
- Region: ap-south-1 (Mumbai, India)
- Mode: Session pooler (10 base + 20 overflow connections)
- Redis 7 - Message broker for Celery
- GitHub Actions - CI/CD with 7 workflows
- Traefik - Reverse proxy (production)
CurriculumExtractor/
βββ backend/ # FastAPI application
β βββ app/
β β βββ api/ # API routes
β β βββ core/ # Config, security, DB
β β βββ models.py # SQLModel schemas
β β βββ crud.py # Database operations
β β βββ worker.py # Celery configuration
β β βββ tasks/ # Async extraction tasks
β βββ tests/ # Pytest tests
β βββ scripts/ # Utility scripts
βββ frontend/ # React application
β βββ src/
β β βββ routes/ # TanStack Router pages
β β βββ components/ # React components
β β βββ client/ # Auto-generated OpenAPI client
β β βββ hooks/ # Custom React hooks
β βββ tests/ # Playwright E2E tests
βββ docs/ # Documentation
β βββ getting-started/ # Setup and development guides
β βββ prd/ # Product requirements
β βββ architecture/ # System design
β βββ api/ # API documentation
βββ scripts/ # Project scripts
βββ docker-compose.yml # Service orchestration
- Setup Guide - Installation instructions
- Supabase Setup - Database configuration
- Development Workflow - Daily development
- Environment Status - Current setup status
- Product Overview - Complete PRD
- Architecture Overview - System design
- Data Models - Database schema
- API Documentation - REST API reference
- CLAUDE.md - Quick reference for AI-assisted development
- SETUP_PLAN.md - Template cleanup and implementation phases
- SETUP_STATUS.md - Detailed environment status
cd backend
bash scripts/test.shcd frontend
npx playwright test# Pre-commit hooks (recommended)
uv run pre-commit install
uv run pre-commit run --all-files
# Manual checks
cd backend && uv run ruff check . && uv run mypy .
cd frontend && npm run lint-
Start services
docker compose watch # Hot-reload enabled -
Make changes - Edit code, changes auto-reload
-
Run tests
bash backend/scripts/test.sh cd frontend && npx playwright test
-
Database migrations (when models change)
docker compose exec backend bash alembic revision --autogenerate -m "Description" alembic upgrade head
-
Update frontend client (when API changes)
./scripts/generate-client.sh
See Development Guide for more.
Updated: October 23, 2025
Phase: MVP Development (Primary Mathematics)
Environment: β
Fully Operational
- FastAPI Backend - Python 3.10, async, JWT auth
- React Frontend - React 19, TypeScript, TanStack Router/Query
- Supabase PostgreSQL - Session Mode, 10+20 connection pool
- Celery Worker - 4 processes, tested with health_check + test_task
- Redis - Message broker, result backend
- Docker Compose - 7 services orchestrated with hot-reload
- GitHub Actions - 7 CI/CD workflows (lint, test, generate client)
- Documentation - CLAUDE.md, development.md, API docs, architecture
- Supabase project created (wijzypbstiigssjuiuvh, ap-south-1)
- Database connected (PostgreSQL 17.6.1)
- Migrations working (Alembic + Supabase MCP)
- Admin user created ([email protected])
- Celery tasks tested (health_check: 0.005s, test_task: 10s)
- Task API endpoints (/api/v1/tasks/)
- Template cleanup (Item model removed)
- All services healthy
Next Milestones:
- Create core models (Extraction, Question, Ingestion, Tag)
- Set up Supabase Storage buckets (worksheets, extractions)
- Add document processing libraries (PaddleOCR, docTR, pypdf)
- Install PDF viewing libraries (react-pdf, KaTeX)
- Build review UI components
- Implement extraction Celery task
- Create question bank API
Current Focus: Creating Extraction/Question data models β YOU ARE HERE
β
Environment Setup - 100% (All services operational)
β
Infrastructure - 100% (Supabase + Celery working)
β
Documentation - 100% (2,405+ lines updated)
β
CI/CD - 100% (7 workflows configured)
β³ Core Models - 0% (Next step)
β³ Document Processing - 0% (Libraries ready to add)
β³ Review UI - 0% (After models)
β³ ML Integration - 0% (Phase 2)
Track detailed progress: See CLAUDE.md
- Primary Math extraction pipeline
- Review UI with PDF viewer
- Curriculum tagging
- Question bank persistence
- Primary Science + English support
- Subject-specific ML adapters
- Expanded taxonomy management
- Secondary Math/Science/Humanities
- Advanced question types
- QTI export for LMS
- Semantic search
- Difficulty classification
- Question generation
- Duplicate detection
See Product Roadmap for details.
- Follow Setup Guide
- Create feature branch:
git checkout -b feature/my-feature - Make changes with tests
- Run
uv run pre-commit run --all-files - Submit pull request
- Backend: Ruff + mypy (enforced by pre-commit)
- Frontend: Biome linting (enforced by pre-commit)
- Tests: β₯80% coverage target
- Commits: Conventional commits format
[License information to be added]
Setup Problems:
- See Setup Guide
- See Development Workflow
Supabase Issues:
- See Supabase Setup Guide
- Use MCP:
mcp_supabase_get_project(id="wijzypbstiigssjuiuvh") - Check logs:
mcp_supabase_get_logs(project_id="wijzypbstiigssjuiuvh", service="postgres")
Celery Issues:
- Check worker:
docker compose logs celery-worker -f - Test Redis:
docker compose exec redis redis-cli -a <password> PING - Inspect tasks:
docker compose exec celery-worker celery -A app.worker inspect registered
Docker Issues:
- View logs:
docker compose logs -f - Restart service:
docker compose restart backend - Rebuild:
docker compose build backend && docker compose up -d
- Documentation: docs/ - Complete guides
- API Docs: http://localhost:8000/docs - Interactive API explorer
- Supabase Dashboard: https://app.supabase.com/project/wijzypbstiigssjuiuvh
- Development Guide: CLAUDE.md - AI-assisted development
- Architecture: docs/architecture/overview.md
Mission: Transform manual question entry from hours to minutes while maintaining curriculum alignment accuracy.
Success Metrics:
- 5x productivity improvement (10 β 50+ questions/hour)
- β₯85% extraction accuracy
- β₯90% curriculum tagging accuracy (Top-3)
- 1,000 worksheets/month capacity (Year 1)
Impact: Enable EdTech platforms to scale content operations efficiently across all K-12 subjects in Singapore.
Status: β All Systems Operational
Services Running:
β
Backend (FastAPI) - http://localhost:8000 (healthy)
β
Frontend (React) - http://localhost:5173
β
Database (Supabase) - PostgreSQL 17.6.1 (Session Mode)
β
Redis - localhost:6379 (healthy)
β
Celery Worker - 4 processes (ready)
β
Proxy (Traefik) - localhost:80
β
MailCatcher - localhost:1080
Configuration:
β
Supabase Project - wijzypbstiigssjuiuvh (ap-south-1)
β
Connection Pooling - 10 base + 20 overflow = 30 max
β
Task Queue - Celery 5.5 with Redis broker
β
Authentication - JWT with bcrypt password hashing
β
CI/CD - 7 GitHub Actions workflows
β
Documentation - 2,405+ lines (CLAUDE.md, docs/)
Ready for feature development! Start building extraction models β
Built with FastAPI + React + Supabase + Celery
Powered by AI for Singapore Education π
| Resource | Link | Purpose |
|---|---|---|
| CLAUDE.md | CLAUDE.md | AI development guide (Supabase MCP, patterns, quick ref) |
| Setup Guide | docs/getting-started/setup.md | Installation & Supabase setup |
| Development Workflow | docs/getting-started/development.md | Daily development guide |
| Product PRD | docs/prd/overview.md | Complete product requirements |
| Architecture | docs/architecture/overview.md | System design & data flow |
| API Reference | docs/api/overview.md | API endpoints & examples |
| Testing Strategy | docs/testing/strategy.md | Testing guide |
| Deployment | docs/deployment/environments.md | Environment setup |
| Service | URL | Description |
|---|---|---|
| Frontend | http://localhost:5173 | React application |
| Backend API | http://localhost:8000 | FastAPI server |
| API Docs | http://localhost:8000/docs | Swagger UI (try it out!) |
| MailCatcher | http://localhost:1080 | Email testing |
| Traefik Dashboard | http://localhost:8090 | Proxy stats |
| Supabase Dashboard | https://app.supabase.com/project/wijzypbstiigssjuiuvh | Database & storage management |
# Start development
docker compose watch
# View logs
docker compose logs -f backend
docker compose logs -f celery-worker
# Test Celery
curl -X POST http://localhost:8000/api/v1/tasks/health-check
# Run tests
cd backend && bash scripts/test.sh
cd frontend && npx playwright test
# Database migration
docker compose exec backend alembic revision --autogenerate -m "Add model"
docker compose exec backend alembic upgrade head