A modern, professional portal that automatically synchronizes your academic materials from Google Classroom or any web URL, processes them using Gemini AI into concise, high-quality Markdown notes, and serves them via a sleek, monochrome Docusaurus site.
- Multi-Source Sync: Fetch materials from Google Classroom or any public URL (Google Sites, SharePoint, etc.).
- Smart HTML Detection: New! Automatically prioritizes HTML-based slides (Marp/Reveal.js) over heavy PDFs for 10x faster generation.
- AI Note Generation: Converts messy slides into structured prose with definitions, theorems, and summaries.
-
$\LaTeX$ Support: High-fidelity mathematical rendering via KaTeX. - Auto-Scrolling TOC: The right sidebar automatically follows your reading position.
- Mobile Optimized: Premium glassmorphism design optimized for study sessions on any device.
- Node.js: v18.0.0 or higher
- Python: v3.10.0 or higher
- Gemini API Key: Obtain from Google AI Studio
# Clone the repository
git clone https://github.com/aadityarshah/classroom-scrapper.git
cd classroom-to-gh-pages
# Install Python dependencies
pip install -r requirements.txt
# Install Node dependencies
npm install --legacy-peer-deps- API Key: Create
api_key.pyin the root directory:GEMINI_API_KEY = "your_actual_key_here"
- Google Cloud (Optional for Classroom Sync):
- Create a project in Google Cloud Console.
- Enable Google Classroom API and Google Drive API.
- Download the OAuth Client ID JSON and save it as
credentials.jsonin the root.
All synchronization scripts are located in the scripts/ directory.
Use this when your course materials are hosted on a public website.
python scripts/sync_url.py --url "YOUR_URL" --course "COURSE_NAME" [FLAGS]Available Flags:
--url: (Required) The page containing PDF or HTML links.--course: (Required) Folder name indocs/(e.g., ES119).--summarize: Generates a "Summary.md" for each category based on AI insights.--force: Overwrites existing files even if they haven't changed.
Use this for officially enrolled courses.
# Sync all courses defined in TARGET_IDS within the script
python scripts/sync_classroom.py
# Sync a specific course only
python scripts/sync_classroom.py --course "MA104"Once the notes are generated in the docs/ folder, start the site:
| Command | Description |
|---|---|
npm run start |
Starts a local development server at http://localhost:3000 |
npm run build |
Bundles the site into static files for production (in build/) |
npm run serve |
Serves the production build locally |
npm run clear |
Clears the Docusaurus cache |
- Add New Courses: Run
sync_url.pywith a new course name; the sidebar will update automatically. - Change Aesthetics: Modify
src/css/custom.cssto adjust colors, fonts, or glassmorphism intensity. - Home Page: Edit
docs/index.mdxto update the cards and welcome message. - Site Config: Edit
docusaurus.config.mjsto change the site title, navbar links, or organization name.
scripts/: Python logic for scraping, downloading, and AI processing.docs/: The generated Markdown notes (categorized by course).src/: Custom CSS and JavaScript (including TOC scroll logic).static/: Images, logos, and static assets.
Developed by Aaditya Rushabh Shah