Skip to content

Implement a simple Neural Network to extract image embeddings from scratch. More like an explore project to learn how things work under the hood.

Notifications You must be signed in to change notification settings

bnkf1156f/Image-Embeddings-From-Scratch

Repository files navigation

Image Embeddings From Scratch

A hands-on implementation of a Siamese Neural Network using Contrastive Learning to generate embeddings for handwritten digit images. This project demonstrates how to build embeddings from the ground up without using transformers or complex architectures—just a simple neural network!

🎯 Project Goal

Train a neural network that generates embeddings (coordinate vectors in multi-dimensional space) for images such that:

  • Similar images (e.g., two different "3" digits) have embeddings that are close together
  • Different images (e.g., a "3" and a "7") have embeddings that are far apart

This fundamental concept powers many modern AI applications like RAG (Retrieval-Augmented Generation), similarity search, and recommendation systems.

📚 What Are Embeddings?

Embeddings are arrays of floating-point numbers that represent coordinates in a multi-dimensional space. Think of them like latitude and longitude for locating points on Earth, but in hundreds of dimensions!

Why Are They Important?

Embeddings enable machines to understand similarity and context. For example:

  • In RAG systems: Find relevant documents by comparing query embeddings with document embeddings
  • In recommendation systems: Suggest similar items based on embedding proximity
  • In image search: Find visually similar images

🏗️ Architecture: Siamese Network

This project implements a Siamese Network with Contrastive Loss, based on the paper: Dimensionality Reduction by Learning an Invariant Mapping

Network Structure

┌─────────────┐         ┌─────────────┐
│  Input 1    │         │  Input 2    │
│  (784 dims) │         │  (784 dims) │
└──────┬──────┘         └──────┬──────┘
       │                       │
       └───────────┬───────────┘
                   │
            ┌──────▼──────┐
            │   Shared    │
            │   Network   │
            │  (512→256→  │
            │    128)     │
            └──────┬──────┘
                   │
          ┌────────┴────────┐
          │   Embedding 1   │   Embedding 2
          │     (128)       │      (128)
          └────────┬────────┘
                   │
            ┌──────▼──────┐
            │  Distance   │
            │ Calculation │
            └─────────────┘
                   │
                Output
            (Similar/Different)

Key Components:

  1. Twin Inputs: Two image inputs feed into the network
  2. Shared Weights: Both inputs pass through the same neural network (weight sharing)
  3. Embedding Generation: The network outputs 128-dimensional embeddings
  4. Distance Calculation: Euclidean distance measures similarity between embeddings
  5. Contrastive Loss: Trains the network to minimize distance for similar pairs and maximize for different pairs

🔬 Implementation Details

Dataset

  • MNIST: Handwritten digits (0-9)
  • Training set: 60,000 images (28×28 pixels)
  • Test set: 10,000 images

Data Preprocessing

  1. Reshape: Convert 28×28 images to 784-dimensional vectors
  2. Normalize: Scale pixel values from [0, 255] to [0, 1] using Min-Max normalization
  3. Pair Generation: Create 120,000 training pairs
    • Positive pairs (label=0): Same digit images
    • Negative pairs (label=1): Different digit images

Network Architecture

Input (784) → Dense(512, ReLU) → Dense(256, ReLU) → Dense(128) → Embedding

Contrastive Loss Function

The loss function balances two objectives:

  • For similar pairs (y=0): Minimize embedding distance
  • For dissimilar pairs (y=1): Maximize embedding distance (up to a margin)
Loss = (1-y) × 0.5 × d² + y × 0.5 × max(0, margin - d)²

Where:

  • y = ground truth label (0 for similar, 1 for different)
  • d = Euclidean distance between embeddings
  • margin = minimum distance threshold for dissimilar pairs (default: 1.0)

Training Configuration

  • Optimizer: Adam
  • Batch size: 32
  • Epochs: 5
  • Metric: Binary accuracy

🚀 Getting Started

Prerequisites

pip install -r requirements.txt

Main dependencies:

  • TensorFlow (backend)
  • Keras 3.x
  • NumPy
  • Matplotlib

Running the Code

  1. Navigate to the project directory:

    cd embeddings_from_scratch
  2. Open the Jupyter notebook:

    jupyter notebook code/main_code.ipynb
  3. Run all cells sequentially to:

    • Load and preprocess MNIST data
    • Generate image pairs
    • Build the Siamese network
    • Train the model
    • Evaluate performance
    • Generate embeddings for new images

📊 Results

The trained model achieves:

  • High accuracy in distinguishing similar vs. different digit pairs
  • Clear separation in embedding space between different digit classes
  • Close clustering of embeddings for the same digit class

Visualization

The notebook includes visualizations for:

  • Sample image pairs (positive and negative)
  • Training/validation loss curves
  • Prediction results with color-coded accuracy
  • Distance measurements between embeddings

🧪 Model Evaluation

After training, you can:

  1. Test on image pairs: Predict whether two images are similar or different
  2. Generate embeddings: Extract 128-dimensional vectors for any digit image
  3. Measure distances: Calculate Euclidean distances between embeddings
    • Small distance → Similar images
    • Large distance → Different images

Example output:

Distance between similar images (both "1"): 0.06 approx
Distance between different images ("1" vs "7"): 1.05

📁 Project Structure

embeddings_from_scratch/
├── code/
│   └── main_code.ipynb          # Main implementation notebook
├── images/                       # Project images/diagrams
├── notes for understanding.md    # Detailed concept notes
├── requirements.txt              # Python dependencies
└── README.md                     # This file

🎓 Learning Outcomes

By working through this project, you'll understand:

  1. Embeddings Fundamentals: How to represent complex data as vectors
  2. Siamese Networks: Weight sharing and twin architectures
  3. Contrastive Learning: Training with positive and negative pairs
  4. Distance Metrics: Using Euclidean distance for similarity
  5. Loss Functions: How contrastive loss guides embedding learning
  6. Practical Applications: Foundation for RAG, similarity search, and more

🔗 Useful Resources

📝 Notes

  • The network uses TensorFlow as the backend for Keras
  • Embedding dimension is set to 128 (can be adjusted based on your needs)
  • The model learns to map images into a space where similarity = proximity



Happy Learning! 🚀 If you find this project helpful, please consider giving it a star ⭐

About

Implement a simple Neural Network to extract image embeddings from scratch. More like an explore project to learn how things work under the hood.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published