Image Embeddings From Scratch

A hands-on implementation of a Siamese Neural Network using Contrastive Learning to generate embeddings for handwritten digit images. This project demonstrates how to build embeddings from the ground up without using transformers or complex architectures—just a simple neural network!

🎯 Project Goal

Train a neural network that generates embeddings (coordinate vectors in multi-dimensional space) for images such that:

Similar images (e.g., two different "3" digits) have embeddings that are close together
Different images (e.g., a "3" and a "7") have embeddings that are far apart

This fundamental concept powers many modern AI applications like RAG (Retrieval-Augmented Generation), similarity search, and recommendation systems.

📚 What Are Embeddings?

Embeddings are arrays of floating-point numbers that represent coordinates in a multi-dimensional space. Think of them like latitude and longitude for locating points on Earth, but in hundreds of dimensions!

Why Are They Important?

Embeddings enable machines to understand similarity and context. For example:

In RAG systems: Find relevant documents by comparing query embeddings with document embeddings
In recommendation systems: Suggest similar items based on embedding proximity
In image search: Find visually similar images

🏗️ Architecture: Siamese Network

This project implements a Siamese Network with Contrastive Loss, based on the paper: Dimensionality Reduction by Learning an Invariant Mapping

Network Structure

┌─────────────┐         ┌─────────────┐
│  Input 1    │         │  Input 2    │
│  (784 dims) │         │  (784 dims) │
└──────┬──────┘         └──────┬──────┘
       │                       │
       └───────────┬───────────┘
                   │
            ┌──────▼──────┐
            │   Shared    │
            │   Network   │
            │  (512→256→  │
            │    128)     │
            └──────┬──────┘
                   │
          ┌────────┴────────┐
          │   Embedding 1   │   Embedding 2
          │     (128)       │      (128)
          └────────┬────────┘
                   │
            ┌──────▼──────┐
            │  Distance   │
            │ Calculation │
            └─────────────┘
                   │
                Output
            (Similar/Different)

Key Components:

Twin Inputs: Two image inputs feed into the network
Shared Weights: Both inputs pass through the same neural network (weight sharing)
Embedding Generation: The network outputs 128-dimensional embeddings
Distance Calculation: Euclidean distance measures similarity between embeddings
Contrastive Loss: Trains the network to minimize distance for similar pairs and maximize for different pairs

🔬 Implementation Details

Dataset

MNIST: Handwritten digits (0-9)
Training set: 60,000 images (28×28 pixels)
Test set: 10,000 images

Data Preprocessing

Reshape: Convert 28×28 images to 784-dimensional vectors
Normalize: Scale pixel values from [0, 255] to [0, 1] using Min-Max normalization
Pair Generation: Create 120,000 training pairs
- Positive pairs (label=0): Same digit images
- Negative pairs (label=1): Different digit images

Network Architecture

Input (784) → Dense(512, ReLU) → Dense(256, ReLU) → Dense(128) → Embedding

Contrastive Loss Function

The loss function balances two objectives:

For similar pairs (y=0): Minimize embedding distance
For dissimilar pairs (y=1): Maximize embedding distance (up to a margin)

Loss = (1-y) × 0.5 × d² + y × 0.5 × max(0, margin - d)²

Where:

y = ground truth label (0 for similar, 1 for different)
d = Euclidean distance between embeddings
margin = minimum distance threshold for dissimilar pairs (default: 1.0)

Training Configuration

Optimizer: Adam
Batch size: 32
Epochs: 5
Metric: Binary accuracy

🚀 Getting Started

Prerequisites

pip install -r requirements.txt

Main dependencies:

TensorFlow (backend)
Keras 3.x
NumPy
Matplotlib

Running the Code

Navigate to the project directory:
```
cd embeddings_from_scratch
```
Open the Jupyter notebook:
```
jupyter notebook code/main_code.ipynb
```
Run all cells sequentially to:
- Load and preprocess MNIST data
- Generate image pairs
- Build the Siamese network
- Train the model
- Evaluate performance
- Generate embeddings for new images

📊 Results

The trained model achieves:

High accuracy in distinguishing similar vs. different digit pairs
Clear separation in embedding space between different digit classes
Close clustering of embeddings for the same digit class

Visualization

The notebook includes visualizations for:

Sample image pairs (positive and negative)
Training/validation loss curves
Prediction results with color-coded accuracy
Distance measurements between embeddings

🧪 Model Evaluation

After training, you can:

Test on image pairs: Predict whether two images are similar or different
Generate embeddings: Extract 128-dimensional vectors for any digit image
Measure distances: Calculate Euclidean distances between embeddings
- Small distance → Similar images
- Large distance → Different images

Example output:

Distance between similar images (both "1"): 0.06 approx
Distance between different images ("1" vs "7"): 1.05

📁 Project Structure

embeddings_from_scratch/
├── code/
│   └── main_code.ipynb          # Main implementation notebook
├── images/                       # Project images/diagrams
├── notes for understanding.md    # Detailed concept notes
├── requirements.txt              # Python dependencies
└── README.md                     # This file

🎓 Learning Outcomes

By working through this project, you'll understand:

Embeddings Fundamentals: How to represent complex data as vectors
Siamese Networks: Weight sharing and twin architectures
Contrastive Learning: Training with positive and negative pairs
Distance Metrics: Using Euclidean distance for similarity
Loss Functions: How contrastive loss guides embedding learning
Practical Applications: Foundation for RAG, similarity search, and more

🔗 Useful Resources

Paper: Dimensionality Reduction by Learning an Invariant Mapping
Video Tutorial: YouTube - Building Embeddings from Scratch
Visualize Embeddings: Cohere Playground
Related Concepts: Losses Explained: Contrastive Loss

📝 Notes

The network uses TensorFlow as the backend for Keras
Embedding dimension is set to 128 (can be adjusted based on your needs)
The model learns to map images into a space where similarity = proximity

Happy Learning! 🚀 If you find this project helpful, please consider giving it a star ⭐

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
code		code
images		images
.gitignore		.gitignore
notes for understanding.md		notes for understanding.md
readme.md		readme.md
requirements.txt		requirements.txt
siamese_network_model.png		siamese_network_model.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Image Embeddings From Scratch

🎯 Project Goal

📚 What Are Embeddings?

Why Are They Important?

🏗️ Architecture: Siamese Network

Network Structure

🔬 Implementation Details

Dataset

Data Preprocessing

Network Architecture

Contrastive Loss Function

Training Configuration

🚀 Getting Started

Prerequisites

Running the Code

📊 Results

Visualization

🧪 Model Evaluation

📁 Project Structure

🎓 Learning Outcomes

🔗 Useful Resources

📝 Notes

About

Uh oh!

Packages

Languages

bnkf1156f/Image-Embeddings-From-Scratch

Folders and files

Latest commit

History

Repository files navigation

Image Embeddings From Scratch

🎯 Project Goal

📚 What Are Embeddings?

Why Are They Important?

🏗️ Architecture: Siamese Network

Network Structure

🔬 Implementation Details

Dataset

Data Preprocessing

Network Architecture

Contrastive Loss Function

Training Configuration

🚀 Getting Started

Prerequisites

Running the Code

📊 Results

Visualization

🧪 Model Evaluation

📁 Project Structure

🎓 Learning Outcomes

🔗 Useful Resources

📝 Notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Packages 0

Languages

Packages