GitHub - BrianMburu/Nearest-Neighbour-Descent--HNSW-Algorithm-: NN-Descent Implementation in python using the nmslib library

HNSW Algorithm

This repository contains an implementation of the HNSW algorithm for approximate nearest neighbor search using the nmslib library.

Overview

The HNSW algorithm is an approximate nearest neighbor search algorithm that constructs a hierarchical graph structure to index high-dimensional data points. This implementation of the HNSW algorithm is based on the NN-Descent algorithm and uses the cosine similarity metric.

Requirements

Python 3.x
nmslib

Installation

To install nmslib, run the following command: pip install nmslib

Usage

The HNSW class provides an implementation of the HNSW algorithm. The class has three main methods:

init: Initializes an instance of the HNSW class.
initialize: Builds an index for the HNSW algorithm based on known embeddings.
get_closest_neighbour: Finds the closest neighbor to a query embedding in the HNSW index.
test_performance: Tests the performance of the HNSW algorithm on a test dataset.

Example

import numpy as np
from hnsw import HNSW

# Initialize the HNSW class

hnsw = HNSW()

# Build an index using known embeddings

known_embeddings = [...] # A list of NumPy arrays
hnsw.initialize(known_embeddings)
known_labels = [...] # A list of labels corresponding to the embeddings

# Query the index for the closest neighbor to a test embedding

test_embedding = [...] # A NumPy array
closest_label, closest_distance = hnsw.get_closest_neighbour(test_embedding, known_labels)

# Test the performance of the HNSW algorithm on a test dataset

test_embeddings = [...] # A list of NumPy arrays
test_labels = [...]    # A list of labels corresponding to the embeddings
accuracy, mis_labeled, total_time = hnsw.test_performance(test_embeddings, test_labels, known_labels)

Time Complexity

init: O(1)
initialize: O(n log n), where n is the number of known embeddings.
get_closest_neighbour: O(log n), where n is the number of known embeddings.
test_performance: O(m log n), where m is the number of test embeddings and n is the number of known embeddings.

It's worth noting that these time complexities are based on the performance of the underlying nmslib library and assume a cosine similarity distance metric. Other distance metrics may have different time complexities.

References

Malkov, Yu A., and D. A. Yashunin. "Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs." IEEE Transactions on Pattern Analysis and Machine Intelligence (2018).
Nmslib: https://github.com/nmslib/nmslib

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
examples		examples
LICENSE		LICENSE
README.md		README.md
hnsw.py		hnsw.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HNSW Algorithm

Overview

Requirements

Installation

Usage

Example

Time Complexity

References

About

Releases

Packages

Languages

License

BrianMburu/Nearest-Neighbour-Descent--HNSW-Algorithm-

Folders and files

Latest commit

History

Repository files navigation

HNSW Algorithm

Overview

Requirements

Installation

Usage

Example

Time Complexity

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages