An AI-powered application for automatically generating Multiple Choice Questions (MCQs) from uploaded documents. The system uses state-of-the-art NLP models to extract context, generate relevant questions, correct answers, and plausible distractors.
- Generate high-quality MCQs from various document formats (PDF, DOCX, TXT, images)
- Automatic text extraction with OCR fallback for scanned documents
- Special handling for book-like documents with chapter detection
- Mobile-friendly React Native frontend
- FastAPI backend with WebSocket support for real-time progress updates
- Customizable number of questions to generate
├── api.py # FastAPI server implementation
├── mcq_generator.py # Core MCQ generation logic
├── fileprocessor.py # Document processing utilities
├── test.py # Test script for the MCQ generator
├── frontend1/ # React Native mobile app
├── qa/ # Question generation model (T5-based)
├── distractor/ # Distractor generation model
├── s2v_old/ # Sense2Vec model for semantic processing
- Python 3.8+
- Node.js and npm (for frontend)
- Expo CLI (for mobile app development)
-
Clone the repository:
git clone https://github.com/yourusername/mcq_generator_final.git cd mcq_generator_final -
Create and activate a virtual environment:
python -m venv myvenv source myvenv/bin/activate # On Windows, use: myvenv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Download the models:
- Question Generation Model (1.2GB) - Extract to
qadirectory - Distractor Generation Model (1.5GB) - Extract to
distractordirectory - Sense2Vec Model (573MB) - Extract to
s2v_olddirectory
- Question Generation Model (1.2GB) - Extract to
-
Navigate to the frontend directory:
cd frontend1 -
Install dependencies:
npm install
-
Update the server URL in
frontend1/services/apiService.jsto point to your backend server.
uvicorn api:app --host 0.0.0.0 --port 8000cd frontend1
expo startConnect using the Expo Go app on your mobile device or use an emulator.
- Launch the application on your device
- Tap "Upload Document" to select a file (PDF, DOCX, TXT, or image)
- Choose whether the document is a regular document or a book
- For books, you can choose to process the entire book or select specific chapters
- Set the number of questions to generate
- Review and save the generated MCQs
You can also generate MCQs using the command line:
python test.py --input_file path/to/your/document.pdf --num_questions 10- Based on T5 transformer architecture
- Fine-tuned on SQuAD and RACE datasets
- Optimized for creating context-relevant questions
- Specialized T5-based model for generating plausible incorrect options
- Trained to create distractors that are semantically related but factually incorrect
- Used for semantic understanding and relationship processing
- Helps in creating better distractors by identifying semantic relationships
The system exposes several API endpoints:
POST /process-file- Process a document and extract textPOST /generate-mcqs-from-file- Generate MCQs from a filePOST /generate-mcqs- Generate MCQs from raw textWebSocket /api/ws/{client_id}- Real-time processing updates
import requests
# Generate MCQs from text
response = requests.post(
"http://localhost:8000/generate-mcqs",
json={
"text": "Your document text here...",
"num_questions": 5
}
)
mcqs = response.json()
print(mcqs)- Document upload from device storage
- Camera capture for scanning documents
- Real-time progress tracking during MCQ generation
- Save and share generated MCQs
- Offline mode for viewing previously generated MCQs
- Model loading issues: Ensure all model files are correctly placed in their respective directories
- Memory errors: For large documents, try processing in smaller chunks
- OCR problems: For poor quality scans, try improving the image quality before upload
- Backend connection issues: Verify the API URL in the frontend configuration
- Backend: Python, FastAPI, WebSockets, PyTorch, Transformers, Spacy
- Frontend: React Native, Expo, JavaScript
- Models: T5 (fine-tuned), Sense2Vec
- Document Processing: PyMuPDF, PyPDF2, Tesseract OCR
- Subject-specific models for domains like medicine, law, etc.
- Difficulty level classification for questions
- Export options to various LMS formats
- Support for more document formats
- Enhanced UI for desktop environments
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- The SQuAD dataset for question generation training
- The RACE dataset for MCQ format training
- Hugging Face for model hosting and libraries
- The open-source community for various tools and libraries used