Skip to content

aloobun/minhash_exp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

minhash_lsh.py

The code finds approximate duplicates or near-duplicates in large datasets. It efficiently approximates the Jaccard similarity between sets, and LSH is used to identify sets that have a high probability of being similar.

Releases

No releases published

Packages

No packages published

Languages