Completed: Assignment-2 and Lab 4, 5, 6 #2

VincentSchaik · 2025-11-02T23:43:26Z

What changes are you trying to make? (e.g. Adding or removing code, refactoring existing code, adding reports)

This PR adds the completed Jupyter Notebooks for Assignment 2 (assignment_2.ipynb) and Labs 4, 5, and 6.
For Assignment 2, the notebook now contains the full implementation for:
Loading the Fashion-MNIST dataset using torchvision and a DataLoader.
Establishing a zero-shot classification baseline (62.40% accuracy) using the openai/clip-vit-base-patch32 model.
Implementing prompt engineering with descriptive prompts to target class confusion, improving accuracy to 64.77%.
Visualizing the image embeddings using GPU-accelerated UMAP (cuml) to analyze class clusters.
Conducting Mini-Experiment A, which involved testing the larger openai/clip-vit-large-patch14 model. This new model yielded a 59.52% baseline accuracy but a 70.37% accuracy with improved prompts.
A short report in Markdown summarizing the findings from the mini-experiment.

What did you learn from the changes you have made?

Key concepts:

A pre-trained model like CLIP is incredibly powerful "out-of-the-box." Achieving 62.40% accuracy with no task-specific training is a massive leap over a simple CNN.
The quality of the text prompt is directly correlated with performance. Simply changing from "Shirt" to "a photo of a collared button-down shirt" provided the context needed to resolve ambiguity and improve accuracy.
The large model performed worse than the base model with simple prompts (59.52% vs 62.40%). This suggests a more complex model can be "confused" by ambiguity. Very interesting observation.
The large model's accuracy jumped +10.85% with good prompts, while the base model only gained +2.37%. This shows the larger model has a more nuanced language understanding that can leverage better instructions.
The UMAP plot was very insightful. It showed that the model visually clusters classes like "Shirt," "T-shirt," and "Coat" together, explaining why it struggles to distinguish them regardless of the prompt.

Was there another approach you were thinking about making? If so, what approach(es) were you thinking of?

I did considered the following other two options provided in the assignment:

Multiple-Description Classification: Instead of one prompt per class, I thought about using a list of 3-4 descriptive prompts for each (e.g., "a photo of a T-shirt," "a casual top," "a short-sleeve shirt"). The prediction would be correct if it matched any of the prompts in the correct class.
Top-K Classification: I also considered modifying the code to check if the correct label was in the model's top 2 or top 3 guesses, which would likely show a much higher "Top-K" accuracy.

Were there any challenges? If so, what issue(s) did you face? How did you overcome it?

Yes, I faced a major performance bottleneck in Step 4.

Issue: The UMAP visualization using the standard umap-learn library was running on the CPU and was projected to take over 15 minutes to process the 10,000 image embeddings.

Solution: I overcame this by:

Switching the Google Colab runtime to use a T4 GPU.
Installing the NVIDIA RAPIDS cuml library (!pip install cuml-cu12 ...).
Refactoring the UMAP code to use cuml.UMAP, which runs natively on the GPU.
Result: This change reduced the processing time from 15+ minutes to under 1 minute.

How were these changes tested?

The changes were tested by running the entire assignment_2.ipynb notebook in a Google Colab GPU-enabled environment.

The DataLoader was verified by the show_batch plot.
The baseline classification (Step 2) ran successfully and printed the 62.40% accuracy and its confusion matrix.
The prompt engineering (Step 3) ran successfully and printed the 64.77% accuracy.
The cuml.UMAP code (Step 4) ran successfully and generated the 2D cluster plot.
The Mini-Experiment (Step 5) successfully loaded the large model and produced two new accuracy scores (59.52% and 70.37%), confirming the new model was working.

A reference to a related issue in your repository (if applicable)

None

Checklist

I can confirm that my changes are working as intended

Removed activation function.

Error rendering your Notebook.

x-rojas-io

Dear Participant

Lab 4, 5 and 6 are complete

Assignment 2 has been reviewed and it is complete👌

VincentSchaik added 16 commits November 2, 2025 13:35

Delete 02_activities/assignments/assignment_1.ipynb

830b614

Revised baseline model.

0552225

Removed activation function.

Delete 02_activities/assignments/assignment_2.ipynb

24c141f

Error rendering your Notebook.

Resolving Notebook rendering error.

63736f0

Assignment 2 – Zero-Shot Image Classification with Transformers

0f1e3b5

Deep Learning Module - Completed.

a80e97b

Assignment-2 Uploaded

a9d47b5

Assignment 2 – Zero-Shot Image Classification with Transformers

5ee1d4d

Assignment 2 – Zero-Shot Image Classification with Transformers

0e91243

Assignment 2 – Zero-Shot Image Classification with Transformers

143361d

Assignment 2 – Zero-Shot Image Classification with Transformers

8443446

Assignment 2 – Zero-Shot Image Classification with Transformers

73b21da

Assignment 2 – Zero-Shot Image Classification with Transformers

72c45de

Assignment 2 – Zero-Shot Image Classification with Transformers

3ec3eab

Assignment 2 – Zero-Shot Image Classification with Transformers

b3068d6

Assignment 2 – Zero-Shot Image Classification with Transformers

d4004f8

VincentSchaik changed the title ~~Assignment 2~~ Completed: Assignment-2 and Lab 4, 5, 6 Nov 2, 2025

x-rojas-io approved these changes Nov 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Completed: Assignment-2 and Lab 4, 5, 6 #2

Completed: Assignment-2 and Lab 4, 5, 6 #2

Uh oh!

VincentSchaik commented Nov 2, 2025

Uh oh!

x-rojas-io left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Completed: Assignment-2 and Lab 4, 5, 6 #2

Are you sure you want to change the base?

Completed: Assignment-2 and Lab 4, 5, 6 #2

Uh oh!

Conversation

VincentSchaik commented Nov 2, 2025

What changes are you trying to make? (e.g. Adding or removing code, refactoring existing code, adding reports)

What did you learn from the changes you have made?

Was there another approach you were thinking about making? If so, what approach(es) were you thinking of?

Were there any challenges? If so, what issue(s) did you face? How did you overcome it?

How were these changes tested?

A reference to a related issue in your repository (if applicable)

Checklist

Uh oh!

x-rojas-io left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants