MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW
-
Updated
Jun 4, 2024 - Python
MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW
Tika-Similarity uses the Tika-Python package (Python port of Apache Tika) to compute file similarity based on Metadata features.
Compare html similarity using structural and style metrics
Golang metrics for calculating string similarity and other string utility functions
A fuzzy matching string distance library for Scala and Java that includes Levenshtein distance, Jaro distance, Jaro-Winkler distance, Dice coefficient, N-Gram similarity, Cosine similarity, Jaccard similarity, Longest common subsequence, Hamming distance, and more..
Aim is to come up with a job recommender system, which takes the skills from LinkedIn and jobs from Indeed and throws the best jobs available for you according to your skills.
A package to compute medical segmentation metrics.
Easy-to-use Java similarity algorithms for text and numeric-series
This is an implementation of the paper written by Yuhua Li, David McLean, Zuhair A. Bandar, James D. O’Shea, and Keeley Crockett
Exploring Jaccard and Cosine similarities performances then visualising their output using k means and kmeans with pca. Additional input on time series analysis, web scrapping and twitter scrapping.
BagMinHash - Minwise Hashing Algorithm for Weighted Sets
Spark functions to run popular phonetic and string matching algorithms
SetSketch: Filling the Gap between MinHash and HyperLogLog
A collection of string comparisons algorithms
A text similarity computation using minhashing and Jaccard distance on reuters dataset
ProbMinHash – A Class of Locality-Sensitive Hash Algorithms for the (Probability) Jaccard Similarity
MinMax Circular Sector Arc for External Plagiarism’s Heuristic Retrieval Stage code
Package provides java implementation of big-data recommend-er using Apache Spark
TreeMinHash: Fast Sketching for Weighted Jaccard Similarity Estimation
The evaluation of subjective answers has long been a challenge for educators, employers, and researchers. CheckMyAnswer, powered by machine learning algorithms, has emerged as a solution to this challenge.
Add a description, image, and links to the jaccard-similarity topic page so that developers can more easily learn about it.
To associate your repository with the jaccard-similarity topic, visit your repo's landing page and select "manage topics."