A hands-on implementation of a Siamese Neural Network using Contrastive Learning to generate embeddings for handwritten digit images. This project demonstrates how to build embeddings from the ground up without using transformers or complex architectures—just a simple neural network!
Train a neural network that generates embeddings (coordinate vectors in multi-dimensional space) for images such that:
- Similar images (e.g., two different "3" digits) have embeddings that are close together
- Different images (e.g., a "3" and a "7") have embeddings that are far apart
This fundamental concept powers many modern AI applications like RAG (Retrieval-Augmented Generation), similarity search, and recommendation systems.
Embeddings are arrays of floating-point numbers that represent coordinates in a multi-dimensional space. Think of them like latitude and longitude for locating points on Earth, but in hundreds of dimensions!
Embeddings enable machines to understand similarity and context. For example:
- In RAG systems: Find relevant documents by comparing query embeddings with document embeddings
- In recommendation systems: Suggest similar items based on embedding proximity
- In image search: Find visually similar images
This project implements a Siamese Network with Contrastive Loss, based on the paper: Dimensionality Reduction by Learning an Invariant Mapping
┌─────────────┐ ┌─────────────┐
│ Input 1 │ │ Input 2 │
│ (784 dims) │ │ (784 dims) │
└──────┬──────┘ └──────┬──────┘
│ │
└───────────┬───────────┘
│
┌──────▼──────┐
│ Shared │
│ Network │
│ (512→256→ │
│ 128) │
└──────┬──────┘
│
┌────────┴────────┐
│ Embedding 1 │ Embedding 2
│ (128) │ (128)
└────────┬────────┘
│
┌──────▼──────┐
│ Distance │
│ Calculation │
└─────────────┘
│
Output
(Similar/Different)
Key Components:
- Twin Inputs: Two image inputs feed into the network
- Shared Weights: Both inputs pass through the same neural network (weight sharing)
- Embedding Generation: The network outputs 128-dimensional embeddings
- Distance Calculation: Euclidean distance measures similarity between embeddings
- Contrastive Loss: Trains the network to minimize distance for similar pairs and maximize for different pairs
- MNIST: Handwritten digits (0-9)
- Training set: 60,000 images (28×28 pixels)
- Test set: 10,000 images
- Reshape: Convert 28×28 images to 784-dimensional vectors
- Normalize: Scale pixel values from [0, 255] to [0, 1] using Min-Max normalization
- Pair Generation: Create 120,000 training pairs
- Positive pairs (label=0): Same digit images
- Negative pairs (label=1): Different digit images
Input (784) → Dense(512, ReLU) → Dense(256, ReLU) → Dense(128) → EmbeddingThe loss function balances two objectives:
- For similar pairs (y=0): Minimize embedding distance
- For dissimilar pairs (y=1): Maximize embedding distance (up to a margin)
Loss = (1-y) × 0.5 × d² + y × 0.5 × max(0, margin - d)²
Where:
y= ground truth label (0 for similar, 1 for different)d= Euclidean distance between embeddingsmargin= minimum distance threshold for dissimilar pairs (default: 1.0)
- Optimizer: Adam
- Batch size: 32
- Epochs: 5
- Metric: Binary accuracy
pip install -r requirements.txtMain dependencies:
- TensorFlow (backend)
- Keras 3.x
- NumPy
- Matplotlib
-
Navigate to the project directory:
cd embeddings_from_scratch -
Open the Jupyter notebook:
jupyter notebook code/main_code.ipynb
-
Run all cells sequentially to:
- Load and preprocess MNIST data
- Generate image pairs
- Build the Siamese network
- Train the model
- Evaluate performance
- Generate embeddings for new images
The trained model achieves:
- High accuracy in distinguishing similar vs. different digit pairs
- Clear separation in embedding space between different digit classes
- Close clustering of embeddings for the same digit class
The notebook includes visualizations for:
- Sample image pairs (positive and negative)
- Training/validation loss curves
- Prediction results with color-coded accuracy
- Distance measurements between embeddings
After training, you can:
- Test on image pairs: Predict whether two images are similar or different
- Generate embeddings: Extract 128-dimensional vectors for any digit image
- Measure distances: Calculate Euclidean distances between embeddings
- Small distance → Similar images
- Large distance → Different images
Example output:
Distance between similar images (both "1"): 0.06 approx
Distance between different images ("1" vs "7"): 1.05
embeddings_from_scratch/
├── code/
│ └── main_code.ipynb # Main implementation notebook
├── images/ # Project images/diagrams
├── notes for understanding.md # Detailed concept notes
├── requirements.txt # Python dependencies
└── README.md # This file
By working through this project, you'll understand:
- Embeddings Fundamentals: How to represent complex data as vectors
- Siamese Networks: Weight sharing and twin architectures
- Contrastive Learning: Training with positive and negative pairs
- Distance Metrics: Using Euclidean distance for similarity
- Loss Functions: How contrastive loss guides embedding learning
- Practical Applications: Foundation for RAG, similarity search, and more
- Paper: Dimensionality Reduction by Learning an Invariant Mapping
- Video Tutorial: YouTube - Building Embeddings from Scratch
- Visualize Embeddings: Cohere Playground
- Related Concepts: Losses Explained: Contrastive Loss
- The network uses TensorFlow as the backend for Keras
- Embedding dimension is set to 128 (can be adjusted based on your needs)
- The model learns to map images into a space where similarity = proximity
Happy Learning! 🚀 If you find this project helpful, please consider giving it a star ⭐