"The previous developer built a CNN for digit recognition. It compiles, it trains... but it is basically guessing randomly. Can you fix it?"
A junior developer was tasked with building a Convolutional Neural Network (CNN) to classify handwritten digits using the MNIST dataset. They wrote the full pipeline -- data loading, preprocessing, model building, training, and evaluation.
The code runs without any errors. It compiles fine. It trains.
But the accuracy? Suspiciously low. That is essentially random guessing across 10 digit classes (0-9).
Something is fundamentally wrong with the pipeline.
Your mission: Diagnose the issues, fix the bugs, and optimize the model to achieve maximum accuracy.
- Read and understand the existing code in
notebook.ipynb - Find and fix the bugs that make the model perform like a random guesser
- Optimize the model to achieve the highest accuracy possible
- Save your model and submit via Pull Request
Your submission is automatically evaluated when you create a Pull Request:
| Test Case | Threshold | Points | Description |
|---|---|---|---|
| TC-2 | Test Accuracy >= 75% | +11 pts | Pipeline is fixed and working |
| TC-3 | Test Accuracy >= 85% | +9 pts | Model architecture is solid |
| TC-4 | Test Accuracy >= 90% | +7 pts | Well-optimized model |
| TC-5 | Unseen Data >= 80% | +9 pts | Model generalizes well |
| TC-6 | No Overfitting | +4 pts | Train-test gap <= 10% |
Maximum score: 40 points (cumulative -- earn points for each threshold you clear)
Your model is evaluated on a hidden, unseen portion of the MNIST dataset. If the gap between your test accuracy and unseen accuracy exceeds 10%, you will not receive the No Overfitting bonus.
DevSphere-ML-Medium/
├── notebook.ipynb <-- THE BROKEN NOTEBOOK (fix this!)
├── requirements.txt <-- Python dependencies
└── README.md <-- You are here
The evaluation is handled automatically -- you do not need to run any test scripts.
- Fix bugs in the notebook
- Change the model architecture (layers, filters, activations)
- Tune hyperparameters (learning rate, epochs, batch size, optimizer)
- Modify preprocessing (normalization, reshaping)
- Add regularization (Dropout, BatchNormalization, etc.)
- Change the dataset -- must use
keras.datasets.mnist - Use pretrained or external models
- Change the model save filename -- must save as
model.h5 - Hardcode predictions or cheat the evaluation
Click the Fork button on GitHub to create your own copy.
git clone https://github.com/YOUR-USERNAME/DevSphere-ML-Medium.git
cd DevSphere-ML-Mediumpip install -r requirements.txtjupyter notebook notebook.ipynbRead the code carefully -- look for the HINT and BUG comments. Fix the bugs, optimize the model, and make sure it saves as model.h5.
Alternatively, open the notebook directly in Google Colab.
Commit message should be in the format - "Your_Roll_Number_CNN model"
Ex: "LCI2025001_CNN_model"
git add .
git commit -m "Your_Roll_Number_CNN model"
git push origin mainGo to the original repository and create a Pull Request from your fork.
The evaluation runs automatically on your PR. Your score and detailed feedback will appear as a comment on your Pull Request within a few minutes.
Tip: Every time you push a new commit to your PR branch, the evaluation re-runs automatically. Iterate until you are satisfied with your score.
The notebook contains BUG and HINT comments at suspicious locations. Read them carefully.
Stuck? Click for general guidance
Think about:
- Data preprocessing -- What range should pixel values be in?
- Architecture -- What makes a CNN different from a plain neural network?
- Activation functions -- Can a network learn complex patterns without non-linearity?
- Output layer -- How does a model output probabilities for classification?
- Loss function -- Is the loss function designed for classification?
- Optimizer -- Some optimizers converge much faster than others
- Training -- Is the model training long enough?
When your PR is evaluated, you will see a comment like this:
Digit Doctor -- Evaluation Report
Test Results (6/7 passed)
| TC-2 | Accuracy >= 75% | Pass | +11 |
| TC-3 | Accuracy >= 85% | Pass | +9 |
| TC-4 | Accuracy >= 90% | Pass | +7 |
| TC-5 | Unseen >= 80% | Pass | +9 |
| TC-6 | No Overfitting | Pass | +4 |
Score: 40/40
Status: Outstanding -- You are a true Digit Doctor!
Participants are ranked by:
- Total Score (primary)
- Test Accuracy (tiebreaker)
- Submission Time (earlier is better)
note that points regarding this might fluctuate after proper reviews from the members at the end of the event, but for the time being this repo contributes 40 pts to your score for completion
Built for DevSphere by GDG IIIT Lucknow