Image Similarity Search API

The goal of this project was to develop an API for an eCommerce clothing store to prevent duplicate products from being added to their database. The system checks for similar images in the company's database and returns the top matches, helping to identify duplicate listings. The API takes an image URL as input and returns the most similar images from the database along with the match percentage.

The API, (when hosted on Elastic Beanstalk) allows users to query for similar images using the following format:

http://<your-environment-url>/find_similar/?image_url=<image_url>&top=<number_of_results>

<your-environment-url>: The URL of your AWS Elastic Beanstalk environment.
<image_url>: The URL of the image you want to check for duplicates.
<number_of_results>: The number of top similar images to return (default is 1).

For example:

http://imagesearch3-dev.ap-south-1.elasticbeanstalk.com/find_similar/?image_url=https://d1it09c4puycyh.cloudfront.net/707x1000/catalog/product/6/6/6619-RED_1.jpg&top=3

The output of the API call would resemble this:

Creating the SageMaker Domain and Notebook

To begin, you will need to create a SageMaker domain and then set up a notebook instance to execute the code provided. Follow the instructions below to complete this setup:

Go to AWS SageMaker and create a SageMaker domain.
Create a notebook instance in the domain where you will execute the provided scripts.
Refer to the official AWS SageMaker documentation for detailed steps: AWS SageMaker Getting Started Guide.
Once the domain and notebook are set up, create a DynamoDB table and then proceed to use the similar_image_checker_runner_scripts/processing_template_phash.ipynb file.

Creating and populating the DynamoDB Table

You need to create a DynamoDB table to store the perceptual hashes (pHash) for each image. In all the scripts, I have used the hash_store table. Here’s the structure of the table:

Table Name: hash_store
Primary Key (Partition Key): entity_id (string) – a unique identifier for each image.
Attributes:
- sku: Stock Keeping Unit (SKU) for identifying the product.
- small_image: URL to the image.
- phash: Perceptual hash value of the image stored as a hexadecimal string.

The table structure will depend on your dataset, this structure is for the dataset provided in this repository. Replace the table name in the repository scripts with the your table name in the next steps.

Once the table is created, you can use the SageMaker notebook environment to run similar_image_checker_runner_scripts/processing_template_phash.ipynb. This script processes images in batches and populates the DynamoDB table by generating a pHash for each image.

After processing all batches, a single item in the table would resemble this:

Optimizing Search with OpenSearch

Running a nearest neighbor search directly on DynamoDB can be inefficient and costly, as it requires scanning through each row. To optimize this, we use OpenSearch for fast and efficient searches using indexes.

Setting Up OpenSearch

Create an OpenSearch domain using the following tutorial, which covers the steps under the free tier: OpenSearch Domain Creation Tutorial.
Replace the index name in the repository scripts with the your index name in the next steps.

Populating OpenSearch with Data from DynamoDB

After setting up OpenSearch, use the similar_image_checker_runner_scripts/dynamo_to_opensearch.ipynb notebook to populate your OpenSearch index with the data stored in DynamoDB. This notebook script extracts the items from DynamoDB, converts the pHash values into binary vectors, and pushes them into OpenSearch.

The script includes functions to:

Create the OpenSearch index with K-Nearest Neighbors (KNN) enabled for efficient similarity search.
Scan DynamoDB in batches and convert the pHash values into a vector representation.
Send bulk requests to populate OpenSearch with the converted data.

The view_opensearch_data() function will show you 10 items that were added to your OpenSearch index, use this to verify data transfer to your OpenSearch index.

Once you finish running this script, your OpenSearch index will be populated with the pHash vectors from DynamoDB.

FastAPI Project Deployment on AWS Elastic Beanstalk

Overview

Now that you have set up your OpenSearch endpoint, you can deploy the API as a FastAPI application on AWS Elastic Beanstalk. Following are the instructions for setting up the project locally, running the application, and deploying it to AWS Elastic Beanstalk.

Prerequisites

Before you begin, ensure you have the following installed:

Python 3.7 - 3.11
AWS CLI
AWS Elastic Beanstalk CLI (EB CLI)
Git

Local Setup

Clone the Repository

git clone https://github.com/Entro01/imageSearch.git
cd imageSearch

Create and Activate a Virtual Environment

python -m venv env
source env/bin/activate

Install Dependencies

Make sure requirements.txt is updated with all necessary dependencies. Install them using:
```
pip install -r requirements.txt
```
Configuration

Before deploying, you need to modify the application_phash.py file to configure the OpenSearch settings:
1. Open application_phash.py.
2. Locate the OpenSearch configuration section and update the opensearch_url and auth with your OpenSearch endpoint and authentication details.
OpenSearch configuration

opensearch_url = "https://your-opensearch-endpoint" auth = HTTPBasicAuth('your-username', 'your-password')

For example:

python
```
 # OpenSearch configuration
 opensearch_url = "https://search-imagehash-beqqt46rp2xv6agh7tohq5it7i.aos.us-east-1.on.aws"
 auth = HTTPBasicAuth('admin', '1234')
```
Run the Application Locally
```
fastapi dev application_phash.py
```
Visit http://127.0.0.1:8000 in your browser to access the application.

Deploying to AWS Elastic Beanstalk

Configure AWS CLI

Ensure your AWS CLI is configured with your AWS credentials:
```
aws configure
```
Follow the prompts to enter your AWS Access Key ID, Secret Access Key, region, and output format.
Initialize Elastic Beanstalk

Run the following command in your project directory:
```
eb init
```
Follow the prompts to set up your Elastic Beanstalk application. Select the appropriate region, application name, and platform (Python).
Create an Environment and Deploy

To create an environment and deploy your application, run:
```
eb create your-environment-name
```
Replace your-environment-name with a name for your environment.

To deploy changes to an existing environment, use:
```
eb deploy
```
Open the Application

Once the deployment is complete, you can open your application in a web browser:
```
eb open
```

Common Issues

use eb logs to diagnose issues incase the deployment fails

Querying the Application

You can send queries to your deployed application to find similar images. Use the following URL format:

http://<your-environment-url>/find_similar/?image_url=<image_url>&top=<number_of_results>

    <your-environment-url>: Replace with the URL of your Elastic Beanstalk environment.
    <image_url>: The URL of the image you want to find similar images for.
    <number_of_results>: The number of KNN neighbors (results) to retrieve (default value: 1).

For example:

bash

http://imagesearch3-dev.ap-south-1.elasticbeanstalk.com/find_similar/?image_url=https://d1it09c4puycyh.cloudfront.net/707x1000/catalog/product/6/6/6619-RED_1.jpg&top=3

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
.ebextensions		.ebextensions
dataset		dataset
images		images
similar_image_checker_files		similar_image_checker_files
similar_image_checker_runner_scripts		similar_image_checker_runner_scripts
.gitignore		.gitignore
Procfile		Procfile
README.md		README.md
application_phash.py		application_phash.py
application_resnet.py		application_resnet.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image Similarity Search API

Creating the SageMaker Domain and Notebook

Creating and populating the DynamoDB Table

Optimizing Search with OpenSearch

Setting Up OpenSearch

Populating OpenSearch with Data from DynamoDB

FastAPI Project Deployment on AWS Elastic Beanstalk

Overview

Prerequisites

Local Setup

OpenSearch configuration

Deploying to AWS Elastic Beanstalk

Common Issues

Querying the Application

About

Releases

Packages

Languages

Entro01/ImageSearch

Folders and files

Latest commit

History

Repository files navigation

Image Similarity Search API

Creating the SageMaker Domain and Notebook

Creating and populating the DynamoDB Table

Optimizing Search with OpenSearch

Setting Up OpenSearch

Populating OpenSearch with Data from DynamoDB

FastAPI Project Deployment on AWS Elastic Beanstalk

Overview

Prerequisites

Local Setup

OpenSearch configuration

Deploying to AWS Elastic Beanstalk

Common Issues

Querying the Application

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages