diff --git a/02_activities/assignments/assignment_1.ipynb b/02_activities/assignments/assignment_1.ipynb index 6a1f0581..3e5b99fc 100644 --- a/02_activities/assignments/assignment_1.ipynb +++ b/02_activities/assignments/assignment_1.ipynb @@ -1,309 +1,805 @@ { - "cells": [ - { - "cell_type": "markdown", - "id": "927ae8f4", - "metadata": {}, - "source": [ - "# Assignment 1 - Building a Vision Model with Keras\n", - "\n", - "In this assignment, you will build a simple vision model using Keras. The goal is to classify images from the Fashion MNIST dataset, which contains images of clothing items.\n", - "\n", - "You will:\n", - "1. Load and inspect the Fashion MNIST dataset.\n", - "2. Run a simple baseline model to establish a performance benchmark.\n", - "3. Build and evaluate a simple CNN model, choosing appropriate loss and metrics.\n", - "4. Design and run controlled experiments on one hyperparameter (e.g., number of filters, kernel size, etc.) and one regularization technique (e.g., dropout, L2 regularization).\n", - "5. Analyze the results and visualize the model's performance.\n", - "\n", - "# 1. Loading and Inspecting the Dataset\n", - "\n", - "Fashion MNIST is a dataset of grayscale images of clothing items, with 10 classes. Each image is 28x28 pixels, like the MNIST dataset of handwritten digits. Keras provides a convenient way to load this dataset. \n", - "\n", - "In this section, you should:\n", - "\n", - "- [ ] Inspect the shapes of the training and test sets to confirm their size and structure.\n", - "- [ ] Convert the labels to one-hot encoded format if necessary. (There is a utility function in Keras for this.)\n", - "- [ ] Visualize a few images from the dataset to understand what the data looks like." - ] + "cells": [ + { + "cell_type": "markdown", + "id": "927ae8f4", + "metadata": { + "id": "927ae8f4" + }, + "source": [ + "# Assignment 1 - Building a Vision Model with Keras\n", + "\n", + "In this assignment, you will build a simple vision model using Keras. The goal is to classify images from the Fashion MNIST dataset, which contains images of clothing items.\n", + "\n", + "You will:\n", + "1. Load and inspect the Fashion MNIST dataset.\n", + "2. Run a simple baseline model to establish a performance benchmark.\n", + "3. Build and evaluate a simple CNN model, choosing appropriate loss and metrics.\n", + "4. Design and run controlled experiments on one hyperparameter (e.g., number of filters, kernel size, etc.) and one regularization technique (e.g., dropout, L2 regularization).\n", + "5. Analyze the results and visualize the model's performance.\n", + "\n", + "# 1. Loading and Inspecting the Dataset\n", + "\n", + "Fashion MNIST is a dataset of grayscale images of clothing items, with 10 classes. Each image is 28x28 pixels, like the MNIST dataset of handwritten digits. Keras provides a convenient way to load this dataset.\n", + "\n", + "In this section, you should:\n", + "\n", + "- [ ] Inspect the shapes of the training and test sets to confirm their size and structure.\n", + "- [ ] Convert the labels to one-hot encoded format if necessary. (There is a utility function in Keras for this.)\n", + "- [ ] Visualize a few images from the dataset to understand what the data looks like." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "420c7178", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "420c7178", + "outputId": "ea4f8213-2719-42ba-b663-2eb4bc34e08a" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz\n", + "\u001b[1m29515/29515\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 0us/step\n", + "Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz\n", + "\u001b[1m26421880/26421880\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 0us/step\n", + "Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz\n", + "\u001b[1m5148/5148\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 0us/step\n", + "Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz\n", + "\u001b[1m4422102/4422102\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 0us/step\n" + ] + } + ], + "source": [ + "from tensorflow.keras.datasets import fashion_mnist\n", + "(X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()\n", + "\n", + "# Normalize the pixel values to be between 0 and 1\n", + "X_train = X_train.astype('float32') / 255.0\n", + "X_test = X_test.astype('float32') / 255.0\n", + "\n", + "# Classes in the Fashion MNIST dataset\n", + "class_names = [\"T-shirt/top\", \"Trouser\", \"Pullover\", \"Dress\", \"Coat\", \"Sandal\", \"Shirt\", \"Sneaker\", \"Bag\", \"Ankle boot\"]" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "a6c89fe7", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "a6c89fe7", + "outputId": "6e653f4c-5b05-44f5-c12f-e5d56b3c3a54" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Shape of X_train: (60000, 28, 28)\n", + "Shape of y_train: (60000,)\n", + "Shape of X_test: (10000, 28, 28)\n", + "Shape of y_test: (10000,)\n", + "Shape of y_train after one-hot encoding: (60000, 10)\n", + "Shape of y_test after one-hot encoding: (10000, 10)\n" + ] + } + ], + "source": [ + "# Inspect the shapes of the datasets\n", + "print(\"Shape of X_train:\", X_train.shape)\n", + "print(\"Shape of y_train:\", y_train.shape)\n", + "print(\"Shape of X_test:\", X_test.shape)\n", + "print(\"Shape of y_test:\", y_test.shape)\n", + "\n", + "# Convert labels to one-hot encoding\n", + "from tensorflow.keras.utils import to_categorical\n", + "\n", + "y_train_one_hot = to_categorical(y_train, num_classes=10)\n", + "y_test_one_hot = to_categorical(y_test, num_classes=10)\n", + "\n", + "print(\"Shape of y_train after one-hot encoding:\", y_train_one_hot.shape)\n", + "print(\"Shape of y_test after one-hot encoding:\", y_test_one_hot.shape)" + ] + }, + { + "cell_type": "code", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 699 + }, + "id": "51ca90c5", + "outputId": "5b18937e-60fd-45b0-9c43-2f553ab88344" + }, + "source": [ + "import matplotlib.pyplot as plt\n", + "\n", + "# Visualize a few images\n", + "plt.figure(figsize=(10, 10))\n", + "for i in range(25):\n", + " plt.subplot(5, 5, i + 1)\n", + " plt.xticks([])\n", + " plt.yticks([])\n", + " plt.grid(False)\n", + " plt.imshow(X_train[i], cmap=plt.cm.binary)\n", + " plt.xlabel(class_names[y_train[i]])\n", + "plt.show()" + ], + "id": "51ca90c5", + "execution_count": 5, + "outputs": [ + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ] + }, + { + "cell_type": "markdown", + "id": "989f7dd0", + "metadata": { + "id": "989f7dd0" + }, + "source": [ + "Reflection: Does the data look as expected? How is the quality of the images? Are there any issues with the dataset that you notice?\n", + "\n", + "**Your answer here**\n", + "The images are not clear but for the most part the type of image is distinguishable and corretly tagged. For example the 2 trousers are clearly distinguishable while the bag is less so. This issue with lack of clarity could potentially make it challenging for the model to correctly identify images and could lead to misclassification.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "c9e8ad60", + "metadata": { + "id": "c9e8ad60" + }, + "source": [ + "# 2. Baseline Model\n", + "\n", + "In this section, you will create a linear regression model as a baseline. This model will not use any convolutional layers, but it will help you understand the performance of a simple model on this dataset.\n", + "You should:\n", + "- [ ] Create a simple linear regression model using Keras.\n", + "- [ ] Compile the model with an appropriate loss function and optimizer.\n", + "- [ ] Train the model on the training set and evaluate it on the test set.\n", + "\n", + "A linear regression model can be created using the `Sequential` API in Keras. Using a single `Dense` layer with no activation function is equivalent to a simple linear regression model. Make sure that the number of units in the output layer matches the number of classes in the dataset.\n", + "\n", + "Note that for this step, we will need to use `Flatten` to convert the 2D images into 1D vectors before passing them to the model. Put a `Flatten()` layer as the first layer in your model so that the 2D image data can be flattened into 1D vectors." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "8563a7aa", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "8563a7aa", + "outputId": "13c59fa3-3169-4f25-8aa6-9af1ebd25f1e" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + "/usr/local/lib/python3.12/dist-packages/keras/src/layers/reshaping/flatten.py:37: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.\n", + " super().__init__(**kwargs)\n" + ] + }, + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Epoch 1/10\n", + "\u001b[1m1500/1500\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m6s\u001b[0m 3ms/step - accuracy: 0.6574 - loss: 0.0754 - val_accuracy: 0.7803 - val_loss: 0.0466\n", + "Epoch 2/10\n", + "\u001b[1m1500/1500\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m5s\u001b[0m 3ms/step - accuracy: 0.8040 - loss: 0.0422 - val_accuracy: 0.7839 - val_loss: 0.0435\n", + "Epoch 3/10\n", + "\u001b[1m1500/1500\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m4s\u001b[0m 2ms/step - accuracy: 0.8146 - loss: 0.0401 - val_accuracy: 0.8143 - val_loss: 0.0388\n", + "Epoch 4/10\n", + "\u001b[1m1500/1500\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m4s\u001b[0m 2ms/step - accuracy: 0.8133 - loss: 0.0397 - val_accuracy: 0.8153 - val_loss: 0.0388\n", + "Epoch 5/10\n", + "\u001b[1m1500/1500\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m5s\u001b[0m 3ms/step - accuracy: 0.8124 - loss: 0.0393 - val_accuracy: 0.8097 - val_loss: 0.0401\n", + "Epoch 6/10\n", + "\u001b[1m1500/1500\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m4s\u001b[0m 3ms/step - accuracy: 0.8175 - loss: 0.0387 - val_accuracy: 0.7937 - val_loss: 0.0409\n", + "Epoch 7/10\n", + "\u001b[1m1500/1500\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m4s\u001b[0m 2ms/step - accuracy: 0.8148 - loss: 0.0388 - val_accuracy: 0.8038 - val_loss: 0.0402\n", + "Epoch 8/10\n", + "\u001b[1m1500/1500\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m4s\u001b[0m 3ms/step - accuracy: 0.8175 - loss: 0.0385 - val_accuracy: 0.8048 - val_loss: 0.0399\n", + "Epoch 9/10\n", + "\u001b[1m1500/1500\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m4s\u001b[0m 3ms/step - accuracy: 0.8166 - loss: 0.0389 - val_accuracy: 0.8129 - val_loss: 0.0392\n", + "Epoch 10/10\n", + "\u001b[1m1500/1500\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8175 - loss: 0.0391 - val_accuracy: 0.8047 - val_loss: 0.0421\n", + "\u001b[1m313/313\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 2ms/step - accuracy: 0.8008 - loss: 0.0428\n", + "Test Loss: 0.0428\n", + "Test Accuracy: 0.7977\n" + ] + } + ], + "source": [ + "from keras.models import Sequential\n", + "from keras.layers import Dense, Flatten\n", + "from tensorflow.keras.optimizers import Adam\n", + "\n", + "# Create a simple linear regression model\n", + "model = Sequential()\n", + "model.add(Flatten(input_shape=(28, 28))) # Flatten the 2D images to 1D vectors\n", + "model.add(Dense(10, activation='linear')) # Output layer with 10 units for 10 classes, linear activation for linear regression\n", + "\n", + "# Compile the model using `model.compile()`\n", + "model.compile(optimizer=Adam(),\n", + " loss='mean_squared_error', # Appropriate loss for linear regression\n", + " metrics=['accuracy'])\n", + "\n", + "# Train the model with `model.fit()`\n", + "history = model.fit(X_train, y_train_one_hot, epochs=10, validation_split=0.2) # Using one-hot encoded labels for training\n", + "\n", + "# Evaluate the model with `model.evaluate()`\n", + "loss, accuracy = model.evaluate(X_test, y_test_one_hot) # Using one-hot encoded labels for evaluation\n", + "print(f\"Test Loss: {loss:.4f}\")\n", + "print(f\"Test Accuracy: {accuracy:.4f}\")" + ] + }, + { + "cell_type": "markdown", + "id": "9a07e9f7", + "metadata": { + "id": "9a07e9f7" + }, + "source": [ + "Reflection: What is the performance of the baseline model? How does it compare to what you expected? Why do you think the performance is at this level?\n", + "\n", + "**Your answer here**\n", + "The data shows a steady decrease in loss during training with a final 0.0422 test loss, which indicates that there is a better fit to the baseline model's prediction of unseen data. decrease in both training and validation loss during training, indicating that the model was learning. The final test loss was also comparable to the validation loss, which suggests that the model did not significantly overfit the training data. The model also correctly classified the training data with 78.7% accuracy. This is likely because of the image quality and the limitations of a simpler model in accurately identifying more complex visual patterns. Accuracy however is the primary metric for classification tasks. I didnt quite know what to expect." + ] + }, + { + "cell_type": "markdown", + "id": "fa107b59", + "metadata": { + "id": "fa107b59" + }, + "source": [ + "# 3. Building and Evaluating a Simple CNN Model\n", + "\n", + "In this section, you will build a simple Convolutional Neural Network (CNN) model using Keras. A convolutional neural network is a type of deep learning model that is particularly effective for image classification tasks. Unlike the basic neural networks we have built in the labs, CNNs can accept images as input without needing to flatten them into vectors.\n", + "\n", + "You should:\n", + "- [ ] Build a simple CNN model with at least one convolutional layer (to learn spatial hierarchies in images) and one fully connected layer (to make predictions).\n", + "- [ ] Compile the model with an appropriate loss function and metrics for a multi-class classification problem.\n", + "- [ ] Train the model on the training set and evaluate it on the test set.\n", + "\n", + "Convolutional layers are designed to accept inputs with three dimensions: height, width and channels (e.g., RGB for color images). For grayscale images like those in Fashion MNIST, the input shape will be (28, 28, 1).\n", + "\n", + "When you progress from the convolutional layers to the fully connected layers, you will need to flatten the output of the convolutional layers. This can be done using the `Flatten` layer in Keras, which doesn't require any parameters." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "3513cf3d", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "3513cf3d", + "outputId": "a49fe67f-55b5-44f1-d3b8-3b976cfac5ba" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + "/usr/local/lib/python3.12/dist-packages/keras/src/layers/convolutional/base_conv.py:113: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.\n", + " super().__init__(activity_regularizer=activity_regularizer, **kwargs)\n" + ] + }, + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Epoch 1/10\n", + "\u001b[1m1500/1500\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m39s\u001b[0m 25ms/step - accuracy: 0.8055 - loss: 0.5481 - val_accuracy: 0.8779 - val_loss: 0.3371\n", + "Epoch 2/10\n", + "\u001b[1m1500/1500\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m37s\u001b[0m 25ms/step - accuracy: 0.8965 - loss: 0.2830 - val_accuracy: 0.9001 - val_loss: 0.2806\n", + "Epoch 3/10\n", + "\u001b[1m1500/1500\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m41s\u001b[0m 25ms/step - accuracy: 0.9170 - loss: 0.2254 - val_accuracy: 0.9061 - val_loss: 0.2642\n", + "Epoch 4/10\n", + "\u001b[1m1500/1500\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m41s\u001b[0m 25ms/step - accuracy: 0.9259 - loss: 0.1947 - val_accuracy: 0.9112 - val_loss: 0.2485\n", + "Epoch 5/10\n", + "\u001b[1m1500/1500\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m36s\u001b[0m 24ms/step - accuracy: 0.9396 - loss: 0.1640 - val_accuracy: 0.9143 - val_loss: 0.2432\n", + "Epoch 6/10\n", + "\u001b[1m1500/1500\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m35s\u001b[0m 24ms/step - accuracy: 0.9478 - loss: 0.1425 - val_accuracy: 0.9133 - val_loss: 0.2595\n", + "Epoch 7/10\n", + "\u001b[1m1500/1500\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m37s\u001b[0m 25ms/step - accuracy: 0.9554 - loss: 0.1186 - val_accuracy: 0.9096 - val_loss: 0.2749\n", + "Epoch 8/10\n", + "\u001b[1m1500/1500\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m37s\u001b[0m 24ms/step - accuracy: 0.9634 - loss: 0.0992 - val_accuracy: 0.9093 - val_loss: 0.2958\n", + "Epoch 9/10\n", + "\u001b[1m1500/1500\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m41s\u001b[0m 25ms/step - accuracy: 0.9679 - loss: 0.0855 - val_accuracy: 0.9137 - val_loss: 0.2692\n", + "Epoch 10/10\n", + "\u001b[1m1500/1500\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m38s\u001b[0m 25ms/step - accuracy: 0.9761 - loss: 0.0675 - val_accuracy: 0.9150 - val_loss: 0.3142\n", + "\u001b[1m313/313\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m2s\u001b[0m 7ms/step - accuracy: 0.9130 - loss: 0.3498\n", + "Test Loss: 0.3372\n", + "Test Accuracy: 0.9133\n" + ] + } + ], + "source": [ + "from keras.models import Sequential\n", + "from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense\n", + "from tensorflow.keras.optimizers import Adam\n", + "from tensorflow.keras.losses import categorical_crossentropy\n", + "\n", + "# Reshape the data to include the channel dimension\n", + "X_train = X_train.reshape(-1, 28, 28, 1)\n", + "X_test = X_test.reshape(-1, 28, 28, 1)\n", + "\n", + "# Create a simple CNN model\n", + "model = Sequential()\n", + "model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1)))\n", + "model.add(MaxPooling2D(pool_size=(2, 2)))\n", + "model.add(Flatten())\n", + "model.add(Dense(128, activation='relu'))\n", + "model.add(Dense(10, activation='softmax')) # Output layer with softmax activation for multi-class classification\n", + "\n", + "# Compile the model\n", + "model.compile(loss=categorical_crossentropy,\n", + " optimizer=Adam(),\n", + " metrics=['accuracy'])\n", + "\n", + "# Train the model\n", + "history = model.fit(X_train, y_train_one_hot, epochs=10, validation_split=0.2)\n", + "\n", + "# Evaluate the model\n", + "loss, accuracy = model.evaluate(X_test, y_test_one_hot)\n", + "print(f\"Test Loss: {loss:.4f}\")\n", + "print(f\"Test Accuracy: {accuracy:.4f}\")" + ] + }, + { + "cell_type": "markdown", + "id": "fabe379c", + "metadata": { + "id": "fabe379c" + }, + "source": [ + "Reflection: Did the CNN model perform better than the baseline model? If so, by how much? What do you think contributed to this improvement?\n", + "\n", + "**Your answer here**\n", + "There is a a steady decrease in loss for the cnn model with a final training loss of 0.067, a test loss of 0.3234 and an accuracy of 91.3%. The cnn model performed significantly better than the linear model approximately 12% better.\n", + "\n", + "This substantial improvement is primarily attributed to the architecture of the CNN. Unlike the linear model which flattens the image data, the convolutional layers in the CNN are able to learn spatial hierarchies and important features within the images, such as edges, shapes, and textures. The pooling layers help to reduce the spatial dimensions and make the model more robust to variations in the image.\n", + "\n", + "The gap between training accuracy of 97.6% and test accuracy of 91.3% suggests there might be some degree of overfitting, although the model still generalizes much better than the baseline. Further experimentation with regularization techniques could help mitigate this." + ] + }, + { + "cell_type": "markdown", + "id": "1a5e2463", + "metadata": { + "id": "1a5e2463" + }, + "source": [ + "# 4. Designing and Running Controlled Experiments\n", + "\n", + "In this section, you will design and run controlled experiments to improve the model's performance. You will focus on one hyperparameter and one regularization technique.\n", + "You should:\n", + "- [ ] Choose one hyperparameter to experiment with (e.g., number of filters, kernel size, number of layers, etc.) and one regularization technique (e.g., dropout, L2 regularization). For your hyperparameter, you should choose at least three different values to test (but there is no upper limit). For your regularization technique, simply test the presence or absence of the technique.\n", + "- [ ] Run experiments by modifying the model architecture or hyperparameters, and evaluate the performance of each model on the test set.\n", + "- [ ] Record the results of your experiments, including the test accuracy and any other relevant metrics.\n", + "- [ ] Visualize the results of your experiments using plots or tables to compare the performance of different models.\n", + "\n", + "The best way to run your experiments is to create a `for` loop that iterates over a range of values for the hyperparameter you are testing. For example, if you are testing different numbers of filters, you can create a loop that runs the model with 32, 64, and 128 filters. Within the loop, you can compile and train the model, then evaluate it on the test set. After each iteration, you can store the results in a list or a dictionary for later analysis.\n", + "\n", + "Note: It's critical that you re-initialize the model (by creating a new instance of the model) before each experiment. If you don't, the model will retain the weights from the previous experiment, which can lead to misleading results." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "99d6f46c", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 732 + }, + "id": "99d6f46c", + "outputId": "5e6b8169-cdc3-4aac-d38a-e4c2138138f6" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Experimenting with number of filters:\n", + "\n", + "Training with 32 filters...\n", + "Test Accuracy with 32 filters: 0.9052\n", + "\n", + "Training with 64 filters...\n", + "Test Accuracy with 64 filters: 0.9098\n", + "\n", + "Training with 128 filters...\n", + "Test Accuracy with 128 filters: 0.9108\n", + "\n", + "Hyperparameter Experiment Results (Number of Filters):\n", + "Filters: 32, Test Accuracy: 0.9052\n", + "Filters: 64, Test Accuracy: 0.9098\n", + "Filters: 128, Test Accuracy: 0.9108\n" + ] + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ], + "source": [ + "from keras.models import Sequential\n", + "from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense\n", + "from tensorflow.keras.optimizers import Adam\n", + "from tensorflow.keras.losses import categorical_crossentropy\n", + "import matplotlib.pyplot as plt\n", + "\n", + "# Reshape data if not already done (important if running this cell independently)\n", + "if X_train.shape[-1] != 1:\n", + " X_train = X_train.reshape(-1, 28, 28, 1)\n", + " X_test = X_test.reshape(-1, 28, 28, 1)\n", + "\n", + "# Define hyperparameters to test\n", + "filter_counts = [32, 64, 128]\n", + "results_hyperparameter = {}\n", + "\n", + "print(\"Experimenting with number of filters:\")\n", + "\n", + "for filters in filter_counts:\n", + " print(f\"\\nTraining with {filters} filters...\")\n", + "\n", + " # Re-initialize the model for each experiment\n", + " model = Sequential()\n", + " model.add(Conv2D(filters, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1)))\n", + " model.add(MaxPooling2D(pool_size=(2, 2)))\n", + " model.add(Flatten())\n", + " model.add(Dense(128, activation='relu'))\n", + " model.add(Dense(10, activation='softmax'))\n", + "\n", + " # Compile the model\n", + " model.compile(loss=categorical_crossentropy,\n", + " optimizer=Adam(),\n", + " metrics=['accuracy'])\n", + "\n", + " # Train the model (using fewer epochs for quicker experimentation)\n", + " history = model.fit(X_train, y_train_one_hot, epochs=5, validation_split=0.2, verbose=0) # verbose=0 to reduce output\n", + "\n", + " # Evaluate the model\n", + " loss, accuracy = model.evaluate(X_test, y_test_one_hot, verbose=0) # verbose=0 to reduce output\n", + " print(f\"Test Accuracy with {filters} filters: {accuracy:.4f}\")\n", + "\n", + " # Store the results\n", + " results_hyperparameter[filters] = accuracy\n", + "\n", + "# Print summary of hyperparameter experiments\n", + "print(\"\\nHyperparameter Experiment Results (Number of Filters):\")\n", + "for filters, accuracy in results_hyperparameter.items():\n", + " print(f\"Filters: {filters}, Test Accuracy: {accuracy:.4f}\")\n", + "\n", + "# Optional: Visualize the results\n", + "plt.figure(figsize=(8, 5))\n", + "plt.bar(list(map(str, results_hyperparameter.keys())), list(results_hyperparameter.values())) # Convert keys and values to lists\n", + "plt.xlabel(\"Number of Filters\")\n", + "plt.ylabel(\"Test Accuracy\")\n", + "plt.title(\"CNN Performance vs. Number of Filters\")\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "dc43ac81", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 652 + }, + "id": "dc43ac81", + "outputId": "40de629f-8242-4ef1-8274-924171a7e7f6" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Experimenting with Dropout regularization:\n", + "\n", + "Training model No Dropout...\n", + "Test Accuracy No Dropout: 0.9098\n", + "\n", + "Training model With Dropout...\n", + "Test Accuracy With Dropout: 0.8947\n", + "\n", + "Regularization Experiment Results (Dropout):\n", + "No Dropout: Test Accuracy: 0.9098\n", + "With Dropout: Test Accuracy: 0.8947\n" + ] + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ], + "source": [ + "from keras.models import Sequential\n", + "from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout\n", + "from tensorflow.keras.optimizers import Adam\n", + "from tensorflow.keras.losses import categorical_crossentropy\n", + "import matplotlib.pyplot as plt\n", + "\n", + "# Reshape data if not already done\n", + "if X_train.shape[-1] != 1:\n", + " X_train = X_train.reshape(-1, 28, 28, 1)\n", + " X_test = X_test.reshape(-1, 28, 28, 1)\n", + "\n", + "# Define regularization options to test\n", + "regularization_options = {\n", + " \"No Dropout\": False,\n", + " \"With Dropout\": True\n", + "}\n", + "results_regularization = {}\n", + "\n", + "print(\"Experimenting with Dropout regularization:\")\n", + "\n", + "for name, use_dropout in regularization_options.items():\n", + " print(f\"\\nTraining model {name}...\")\n", + "\n", + " # Re-initialize the model for each experiment\n", + " model = Sequential()\n", + " model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1)))\n", + " model.add(MaxPooling2D(pool_size=(2, 2)))\n", + " model.add(Flatten())\n", + " model.add(Dense(128, activation='relu'))\n", + " if use_dropout:\n", + " model.add(Dropout(0.5)) # Add dropout layer with a rate of 0.5\n", + " model.add(Dense(10, activation='softmax'))\n", + "\n", + " # Compile the model\n", + " model.compile(loss=categorical_crossentropy,\n", + " optimizer=Adam(),\n", + " metrics=['accuracy'])\n", + "\n", + " # Train the model (using fewer epochs for quicker experimentation)\n", + " history = model.fit(X_train, y_train_one_hot, epochs=5, validation_split=0.2, verbose=0) # verbose=0 to reduce output\n", + "\n", + " # Evaluate the model\n", + " loss, accuracy = model.evaluate(X_test, y_test_one_hot, verbose=0) # verbose=0 to reduce output\n", + " print(f\"Test Accuracy {name}: {accuracy:.4f}\")\n", + "\n", + " # Store the results\n", + " results_regularization[name] = accuracy\n", + "\n", + "# Print summary of regularization experiments\n", + "print(\"\\nRegularization Experiment Results (Dropout):\")\n", + "for name, accuracy in results_regularization.items():\n", + " print(f\"{name}: Test Accuracy: {accuracy:.4f}\")\n", + "\n", + "# Optional: Visualize the results\n", + "plt.figure(figsize=(8, 5))\n", + "plt.bar(results_regularization.keys(), results_regularization.values())\n", + "plt.xlabel(\"Regularization\")\n", + "plt.ylabel(\"Test Accuracy\")\n", + "plt.title(\"CNN Performance with and without Dropout\")\n", + "plt.ylim(min(results_regularization.values()) * 0.9, max(results_regularization.values()) * 1.1) # Adjust y-axis limits for better visualization\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "cb426f26", + "metadata": { + "id": "cb426f26" + }, + "source": [ + "Reflection: Report on the performance of the models you tested. Did any of the changes you made improve the model's performance? If so, which ones? What do you think contributed to these improvements? Finally, what combination of hyperparameters and regularization techniques yielded the best performance?\n", + "\n", + "**Your answer here**\n", + "The hyperparameter experiment resullted in 128 filters having the best results. Each time more filters were added the accuracy improved. This could be because the model had more diverse features from the original input data to learn from with the addition of more filters. Also experimenting with regularization dropout did not give the best results instead no dropout performed best. Perhaps 5 epochs was to low a number to effective test or dropout rate of 0.5 was too high. The combination of 128 filters and no drpout was the most effective at helping to more acurately identify unseen data, as measured by test accuracy." + ] + }, + { + "cell_type": "markdown", + "id": "46c43a3d", + "metadata": { + "id": "46c43a3d" + }, + "source": [ + "# 5. Training Final Model and Evaluation\n", + "\n", + "In this section, you will train the final model using the best hyperparameters and regularization techniques you found in the previous section. You should:\n", + "- [ ] Compile the final model with the best hyperparameters and regularization techniques.\n", + "- [ ] Train the final model on the training set and evaluate it on the test set.\n", + "- [ ] Report the final model's performance on the test set, including accuracy and any other relevant metrics." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "31f926d1", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "31f926d1", + "outputId": "252bfbc4-b9df-4d74-c979-e56c2f530179" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + "/usr/local/lib/python3.12/dist-packages/keras/src/layers/convolutional/base_conv.py:113: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.\n", + " super().__init__(activity_regularizer=activity_regularizer, **kwargs)\n" + ] + }, + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Training the final model...\n", + "Epoch 1/10\n", + "\u001b[1m1500/1500\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m109s\u001b[0m 72ms/step - accuracy: 0.8173 - loss: 0.5039 - val_accuracy: 0.8949 - val_loss: 0.2963\n", + "Epoch 2/10\n", + "\u001b[1m1500/1500\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m141s\u001b[0m 72ms/step - accuracy: 0.9020 - loss: 0.2662 - val_accuracy: 0.9030 - val_loss: 0.2654\n", + "Epoch 3/10\n", + "\u001b[1m1500/1500\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m107s\u001b[0m 71ms/step - accuracy: 0.9283 - loss: 0.2000 - val_accuracy: 0.9103 - val_loss: 0.2441\n", + "Epoch 4/10\n", + "\u001b[1m1500/1500\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m107s\u001b[0m 71ms/step - accuracy: 0.9392 - loss: 0.1653 - val_accuracy: 0.8976 - val_loss: 0.2909\n", + "Epoch 5/10\n", + "\u001b[1m1500/1500\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m112s\u001b[0m 74ms/step - accuracy: 0.9490 - loss: 0.1366 - val_accuracy: 0.9123 - val_loss: 0.2611\n", + "Epoch 6/10\n", + "\u001b[1m1500/1500\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m106s\u001b[0m 71ms/step - accuracy: 0.9583 - loss: 0.1110 - val_accuracy: 0.9158 - val_loss: 0.2747\n", + "Epoch 7/10\n", + "\u001b[1m1500/1500\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m107s\u001b[0m 71ms/step - accuracy: 0.9675 - loss: 0.0894 - val_accuracy: 0.9147 - val_loss: 0.2813\n", + "Epoch 8/10\n", + "\u001b[1m1500/1500\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m106s\u001b[0m 71ms/step - accuracy: 0.9769 - loss: 0.0674 - val_accuracy: 0.9204 - val_loss: 0.2987\n", + "Epoch 9/10\n", + "\u001b[1m1500/1500\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m111s\u001b[0m 74ms/step - accuracy: 0.9780 - loss: 0.0600 - val_accuracy: 0.9133 - val_loss: 0.3524\n", + "Epoch 10/10\n", + "\u001b[1m1500/1500\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m142s\u001b[0m 74ms/step - accuracy: 0.9846 - loss: 0.0451 - val_accuracy: 0.9025 - val_loss: 0.3967\n", + "\n", + "Evaluating the final model...\n", + "\u001b[1m313/313\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m4s\u001b[0m 13ms/step - accuracy: 0.9007 - loss: 0.4252\n", + "Final Test Loss: 0.4303\n", + "Final Test Accuracy: 0.9005\n" + ] + } + ], + "source": [ + "from keras.models import Sequential\n", + "from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense\n", + "from tensorflow.keras.optimizers import Adam\n", + "from tensorflow.keras.losses import categorical_crossentropy\n", + "\n", + "# Reshape data if not already done (important if running this cell independently)\n", + "if X_train.shape[-1] != 1:\n", + " X_train = X_train.reshape(-1, 28, 28, 1)\n", + " X_test = X_test.reshape(-1, 28, 28, 1)\n", + "\n", + "# Define the final model with the best hyperparameters and regularization\n", + "# Best: 128 filters, No Dropout\n", + "final_model = Sequential()\n", + "final_model.add(Conv2D(128, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1)))\n", + "final_model.add(MaxPooling2D(pool_size=(2, 2)))\n", + "final_model.add(Flatten())\n", + "final_model.add(Dense(128, activation='relu'))\n", + "final_model.add(Dense(10, activation='softmax')) # Output layer with softmax activation\n", + "\n", + "# Compile the final model\n", + "final_model.compile(loss=categorical_crossentropy,\n", + " optimizer=Adam(),\n", + " metrics=['accuracy'])\n", + "\n", + "# Train the final model (using more epochs for potentially better performance)\n", + "print(\"Training the final model...\")\n", + "history_final = final_model.fit(X_train, y_train_one_hot, epochs=10, validation_split=0.2)\n", + "\n", + "# Evaluate the final model\n", + "print(\"\\nEvaluating the final model...\")\n", + "loss_final, accuracy_final = final_model.evaluate(X_test, y_test_one_hot)\n", + "print(f\"Final Test Loss: {loss_final:.4f}\")\n", + "print(f\"Final Test Accuracy: {accuracy_final:.4f}\")" + ] + }, + { + "cell_type": "markdown", + "id": "a01f8ebc", + "metadata": { + "id": "a01f8ebc" + }, + "source": [ + "Reflection: How does the final model's performance compare to the baseline and the CNN model? What do you think contributed to the final model's performance? If you had time, what other experiments would you run to further improve the model's performance?\n", + "\n", + "**Your answer here**\n", + "The final model correctly classified approximately 90.05% of the images accurately in the test dataset. This is a good accuracy and a significant improvement over the baseline model. The model demonstrates a strong ability to classify images from the Fashion MNIST dataset, achieving a high accuracy on unseen data. The training loss consistently decreased with each epoch, with the tenth epoch havung a 0.0451 training loss and the final test loss being 0.4303.\n", + "Of the three models the simple CNN has the best performance. I dont quite know why would this be. CNNs are better suited than linear models for image data. This is based on the test loss and accuracy rates.\n", + "Other experimennts could be other hyperparameters, combination regularization tuning techniques, or different model architectures.\n" + ] + }, + { + "cell_type": "markdown", + "id": "01db8512", + "metadata": { + "id": "01db8512" + }, + "source": [ + "🚨 **Please review our [Assignment Submission Guide](https://github.com/UofT-DSI/onboarding/blob/main/onboarding_documents/submissions.md)** 🚨 for detailed instructions on how to format, branch, and submit your work. Following these guidelines is crucial for your submissions to be evaluated correctly.\n", + "### Submission Parameters:\n", + "* Submission Due Date: `23:59 PM - 26/10/2025`\n", + "* The branch name for your repo should be: `assignment-1`\n", + "* What to submit for this assignment:\n", + " * This Jupyter Notebook (assignment_1.ipynb)\n", + " * The Lab 1 notebook (labs/lab_1.ipynb)\n", + " * The Lab 2 notebook (labs/lab_2.ipynb)\n", + " * The Lab 3 notebook (labs/lab_3.ipynb)\n", + "* What the pull request link should look like for this assignment: `https://github.com//deep_learning/pull/`\n", + "* Open a private window in your browser. Copy and paste the link to your pull request into the address bar. Make sure you can see your pull request properly. This helps the technical facilitator and learning support staff review your submission easily.\n", + "Checklist:\n", + "- [ ] Created a branch with the correct naming convention.\n", + "- [ ] Ensured that the repository is public.\n", + "- [ ] Reviewed the PR description guidelines and adhered to them.\n", + "- [ ] Verify that the link is accessible in a private browser window.\n", + "If you encounter any difficulties or have questions, please don't hesitate to reach out to our team via our Slack at `#cohort-7-help-ml`. Our Technical Facilitators and Learning Support staff are here to help you navigate any challenges." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "deep_learning", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.11" + }, + "colab": { + "provenance": [] + } }, - { - "cell_type": "code", - "execution_count": null, - "id": "420c7178", - "metadata": {}, - "outputs": [], - "source": [ - "from tensorflow.keras.datasets import fashion_mnist\n", - "(X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()\n", - "\n", - "# Normalize the pixel values to be between 0 and 1\n", - "X_train = X_train.astype('float32') / 255.0\n", - "X_test = X_test.astype('float32') / 255.0\n", - "\n", - "# Classes in the Fashion MNIST dataset\n", - "class_names = [\"T-shirt/top\", \"Trouser\", \"Pullover\", \"Dress\", \"Coat\", \"Sandal\", \"Shirt\", \"Sneaker\", \"Bag\", \"Ankle boot\"]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "a6c89fe7", - "metadata": {}, - "outputs": [], - "source": [ - "# Inspect the shapes of the datasets\n", - "\n", - "\n", - "# Convert labels to one-hot encoding\n", - "from tensorflow.keras.utils import to_categorical\n", - "\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "13e100db", - "metadata": {}, - "outputs": [], - "source": [ - "import matplotlib.pyplot as plt\n", - "# Verify the data looks as expected\n" - ] - }, - { - "cell_type": "markdown", - "id": "989f7dd0", - "metadata": {}, - "source": [ - "Reflection: Does the data look as expected? How is the quality of the images? Are there any issues with the dataset that you notice?\n", - "\n", - "**Your answer here**" - ] - }, - { - "cell_type": "markdown", - "id": "c9e8ad60", - "metadata": {}, - "source": [ - "# 2. Baseline Model\n", - "\n", - "In this section, you will create a linear regression model as a baseline. This model will not use any convolutional layers, but it will help you understand the performance of a simple model on this dataset.\n", - "You should:\n", - "- [ ] Create a simple linear regression model using Keras.\n", - "- [ ] Compile the model with an appropriate loss function and optimizer.\n", - "- [ ] Train the model on the training set and evaluate it on the test set.\n", - "\n", - "A linear regression model can be created using the `Sequential` API in Keras. Using a single `Dense` layer with no activation function is equivalent to a simple linear regression model. Make sure that the number of units in the output layer matches the number of classes in the dataset.\n", - "\n", - "Note that for this step, we will need to use `Flatten` to convert the 2D images into 1D vectors before passing them to the model. Put a `Flatten()` layer as the first layer in your model so that the 2D image data can be flattened into 1D vectors." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "8563a7aa", - "metadata": {}, - "outputs": [], - "source": [ - "from keras.models import Sequential\n", - "from keras.layers import Dense, Flatten\n", - "\n", - "# Create a simple linear regression model\n", - "model = Sequential()\n", - "# You can use `model.add()` to add layers to the model\n", - "\n", - "# Compile the model using `model.compile()`\n", - "\n", - "# Train the model with `model.fit()`\n", - "\n", - "# Evaluate the model with `model.evaluate()`" - ] - }, - { - "cell_type": "markdown", - "id": "9a07e9f7", - "metadata": {}, - "source": [ - "Reflection: What is the performance of the baseline model? How does it compare to what you expected? Why do you think the performance is at this level?\n", - "\n", - "**Your answer here**" - ] - }, - { - "cell_type": "markdown", - "id": "fa107b59", - "metadata": {}, - "source": [ - "# 3. Building and Evaluating a Simple CNN Model\n", - "\n", - "In this section, you will build a simple Convolutional Neural Network (CNN) model using Keras. A convolutional neural network is a type of deep learning model that is particularly effective for image classification tasks. Unlike the basic neural networks we have built in the labs, CNNs can accept images as input without needing to flatten them into vectors.\n", - "\n", - "You should:\n", - "- [ ] Build a simple CNN model with at least one convolutional layer (to learn spatial hierarchies in images) and one fully connected layer (to make predictions).\n", - "- [ ] Compile the model with an appropriate loss function and metrics for a multi-class classification problem.\n", - "- [ ] Train the model on the training set and evaluate it on the test set.\n", - "\n", - "Convolutional layers are designed to accept inputs with three dimensions: height, width and channels (e.g., RGB for color images). For grayscale images like those in Fashion MNIST, the input shape will be (28, 28, 1).\n", - "\n", - "When you progress from the convolutional layers to the fully connected layers, you will need to flatten the output of the convolutional layers. This can be done using the `Flatten` layer in Keras, which doesn't require any parameters." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "3513cf3d", - "metadata": {}, - "outputs": [], - "source": [ - "from keras.layers import Conv2D\n", - "\n", - "# Reshape the data to include the channel dimension\n", - "X_train = X_train.reshape(-1, 28, 28, 1)\n", - "X_test = X_test.reshape(-1, 28, 28, 1)\n", - "\n", - "# Create a simple CNN model\n", - "model = Sequential()\n", - "\n", - "# Train the model\n", - "\n", - "# Evaluate the model" - ] - }, - { - "cell_type": "markdown", - "id": "fabe379c", - "metadata": {}, - "source": [ - "Reflection: Did the CNN model perform better than the baseline model? If so, by how much? What do you think contributed to this improvement?\n", - "\n", - "**Your answer here**" - ] - }, - { - "cell_type": "markdown", - "id": "1a5e2463", - "metadata": {}, - "source": [ - "# 3. Designing and Running Controlled Experiments\n", - "\n", - "In this section, you will design and run controlled experiments to improve the model's performance. You will focus on one hyperparameter and one regularization technique.\n", - "You should:\n", - "- [ ] Choose one hyperparameter to experiment with (e.g., number of filters, kernel size, number of layers, etc.) and one regularization technique (e.g., dropout, L2 regularization). For your hyperparameter, you should choose at least three different values to test (but there is no upper limit). For your regularization technique, simply test the presence or absence of the technique.\n", - "- [ ] Run experiments by modifying the model architecture or hyperparameters, and evaluate the performance of each model on the test set.\n", - "- [ ] Record the results of your experiments, including the test accuracy and any other relevant metrics.\n", - "- [ ] Visualize the results of your experiments using plots or tables to compare the performance of different models.\n", - "\n", - "The best way to run your experiments is to create a `for` loop that iterates over a range of values for the hyperparameter you are testing. For example, if you are testing different numbers of filters, you can create a loop that runs the model with 32, 64, and 128 filters. Within the loop, you can compile and train the model, then evaluate it on the test set. After each iteration, you can store the results in a list or a dictionary for later analysis.\n", - "\n", - "Note: It's critical that you re-initialize the model (by creating a new instance of the model) before each experiment. If you don't, the model will retain the weights from the previous experiment, which can lead to misleading results." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "99d6f46c", - "metadata": {}, - "outputs": [], - "source": [ - "# A. Test Hyperparameters" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "dc43ac81", - "metadata": {}, - "outputs": [], - "source": [ - "# B. Test presence or absence of regularization" - ] - }, - { - "cell_type": "markdown", - "id": "cb426f26", - "metadata": {}, - "source": [ - "Reflection: Report on the performance of the models you tested. Did any of the changes you made improve the model's performance? If so, which ones? What do you think contributed to these improvements? Finally, what combination of hyperparameters and regularization techniques yielded the best performance?\n", - "\n", - "**Your answer here**" - ] - }, - { - "cell_type": "markdown", - "id": "46c43a3d", - "metadata": {}, - "source": [ - "# 5. Training Final Model and Evaluation\n", - "\n", - "In this section, you will train the final model using the best hyperparameters and regularization techniques you found in the previous section. You should:\n", - "- [ ] Compile the final model with the best hyperparameters and regularization techniques.\n", - "- [ ] Train the final model on the training set and evaluate it on the test set.\n", - "- [ ] Report the final model's performance on the test set, including accuracy and any other relevant metrics." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "31f926d1", - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "markdown", - "id": "a01f8ebc", - "metadata": {}, - "source": [ - "Reflection: How does the final model's performance compare to the baseline and the CNN model? What do you think contributed to the final model's performance? If you had time, what other experiments would you run to further improve the model's performance?\n", - "\n", - "**Your answer here**" - ] - }, - { - "cell_type": "markdown", - "id": "01db8512", - "metadata": {}, - "source": [ - "🚨 **Please review our [Assignment Submission Guide](https://github.com/UofT-DSI/onboarding/blob/main/onboarding_documents/submissions.md)** 🚨 for detailed instructions on how to format, branch, and submit your work. Following these guidelines is crucial for your submissions to be evaluated correctly.\n", - "### Submission Parameters:\n", - "* Submission Due Date: `23:59 PM - 26/10/2025`\n", - "* The branch name for your repo should be: `assignment-1`\n", - "* What to submit for this assignment:\n", - " * This Jupyter Notebook (assignment_1.ipynb)\n", - " * The Lab 1 notebook (labs/lab_1.ipynb)\n", - " * The Lab 2 notebook (labs/lab_2.ipynb)\n", - " * The Lab 3 notebook (labs/lab_3.ipynb)\n", - "* What the pull request link should look like for this assignment: `https://github.com//deep_learning/pull/`\n", - "* Open a private window in your browser. Copy and paste the link to your pull request into the address bar. Make sure you can see your pull request properly. This helps the technical facilitator and learning support staff review your submission easily.\n", - "Checklist:\n", - "- [ ] Created a branch with the correct naming convention.\n", - "- [ ] Ensured that the repository is public.\n", - "- [ ] Reviewed the PR description guidelines and adhered to them.\n", - "- [ ] Verify that the link is accessible in a private browser window.\n", - "If you encounter any difficulties or have questions, please don't hesitate to reach out to our team via our Slack at `#cohort-7-help-ml`. Our Technical Facilitators and Learning Support staff are here to help you navigate any challenges." - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "deep_learning", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.12.11" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} + "nbformat": 4, + "nbformat_minor": 5 +} \ No newline at end of file