Skip to content

Conversation

@VincentSchaik
Copy link
Owner

What changes are you trying to make? (e.g. Adding or removing code, refactoring existing code, adding reports)

  • This PR adds the completed Jupyter Notebooks for Assignment 2 (assignment_2.ipynb) and Labs 4, 5, and 6.
  • For Assignment 2, the notebook now contains the full implementation for:
  • Loading the Fashion-MNIST dataset using torchvision and a DataLoader.
  • Establishing a zero-shot classification baseline (62.40% accuracy) using the openai/clip-vit-base-patch32 model.
  • Implementing prompt engineering with descriptive prompts to target class confusion, improving accuracy to 64.77%.
  • Visualizing the image embeddings using GPU-accelerated UMAP (cuml) to analyze class clusters.
  • Conducting Mini-Experiment A, which involved testing the larger openai/clip-vit-large-patch14 model. This new model yielded a 59.52% baseline accuracy but a 70.37% accuracy with improved prompts.
  • A short report in Markdown summarizing the findings from the mini-experiment.

What did you learn from the changes you have made?

Key concepts:

  • A pre-trained model like CLIP is incredibly powerful "out-of-the-box." Achieving 62.40% accuracy with no task-specific training is a massive leap over a simple CNN.
  • The quality of the text prompt is directly correlated with performance. Simply changing from "Shirt" to "a photo of a collared button-down shirt" provided the context needed to resolve ambiguity and improve accuracy.
  • The large model performed worse than the base model with simple prompts (59.52% vs 62.40%). This suggests a more complex model can be "confused" by ambiguity. Very interesting observation.
  • The large model's accuracy jumped +10.85% with good prompts, while the base model only gained +2.37%. This shows the larger model has a more nuanced language understanding that can leverage better instructions.
  • The UMAP plot was very insightful. It showed that the model visually clusters classes like "Shirt," "T-shirt," and "Coat" together, explaining why it struggles to distinguish them regardless of the prompt.

Was there another approach you were thinking about making? If so, what approach(es) were you thinking of?

I did considered the following other two options provided in the assignment:

  • Multiple-Description Classification: Instead of one prompt per class, I thought about using a list of 3-4 descriptive prompts for each (e.g., "a photo of a T-shirt," "a casual top," "a short-sleeve shirt"). The prediction would be correct if it matched any of the prompts in the correct class.
  • Top-K Classification: I also considered modifying the code to check if the correct label was in the model's top 2 or top 3 guesses, which would likely show a much higher "Top-K" accuracy.

Were there any challenges? If so, what issue(s) did you face? How did you overcome it?

Yes, I faced a major performance bottleneck in Step 4.

Issue: The UMAP visualization using the standard umap-learn library was running on the CPU and was projected to take over 15 minutes to process the 10,000 image embeddings.

Solution: I overcame this by:

  • Switching the Google Colab runtime to use a T4 GPU.
  • Installing the NVIDIA RAPIDS cuml library (!pip install cuml-cu12 ...).
  • Refactoring the UMAP code to use cuml.UMAP, which runs natively on the GPU.
  • Result: This change reduced the processing time from 15+ minutes to under 1 minute.

How were these changes tested?

The changes were tested by running the entire assignment_2.ipynb notebook in a Google Colab GPU-enabled environment.

  • The DataLoader was verified by the show_batch plot.
  • The baseline classification (Step 2) ran successfully and printed the 62.40% accuracy and its confusion matrix.
  • The prompt engineering (Step 3) ran successfully and printed the 64.77% accuracy.
  • The cuml.UMAP code (Step 4) ran successfully and generated the 2D cluster plot.
  • The Mini-Experiment (Step 5) successfully loaded the large model and produced two new accuracy scores (59.52% and 70.37%), confirming the new model was working.

A reference to a related issue in your repository (if applicable)

None

Checklist

  • I can confirm that my changes are working as intended

@VincentSchaik VincentSchaik changed the title Assignment 2 Completed: Assignment-2 and Lab 4, 5, 6 Nov 2, 2025
Copy link

@x-rojas-io x-rojas-io left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dear Participant

Lab 4, 5 and 6 are complete

Assignment 2 has been reviewed and it is complete👌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants