Skip to content

Releases: ekzhu/datasketch

v1.5.5

16 Dec 05:59
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v1.5.4...v1.5.5

v1.5.4

04 Dec 06:29
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: 1.5.2...v1.5.4

Improved performance for MinHash and MinHashLSH

15 Dec 20:57
Compare
Choose a tag to compare
  • Performance improvement for MinHash's update method.
  • Make MinHash updates 4.5X faster by using update_batch method for bulk update on MinHash. [See API doc].(http://ekzhu.com/datasketch/documentation.html#datasketch.MinHash.update_batch)
  • Further performance gain by using bulk generation of MinHash using MinHash.bulk or MinHash.generator. See API doc and pull request.
  • Optional compression for MinHash LSH index by hashing the bucket key produced by MinHashLSH._H. See pull request. This leads to saving of memory/storage space used by the index.

Thank you @Sinusoidal36!

Add Cassandra storage layer.

26 Nov 00:13
Compare
Choose a tag to compare
  • Minor bug fixes
  • Cassandra storage layer, thank @ostefano! Now you can specify the Cassandra config just like the Redis one.
from datasketch import MinHashLSH

lsh = MinHashLSH(
    threashold=0.5, num_perm=128, storage_config={
        'type': 'cassandra',
        'cassandra': {
            'seeds': ['127.0.0.1'],
            'keyspace': 'lsh_test',
            'replication': {
                'class': 'SimpleStrategy',
                'replication_factor': '1',
            },
            'drop_keyspace': False,
            'drop_tables': False,
        }
    }
)

hashfunc to replace hashobj

06 Jan 22:25
ff34e73
Compare
Choose a tag to compare

Now support hashfunc parameter for MinHash and HyperLogLog. The old parameter hashobj is removed.

# Let's use MurmurHash3.
import mmh3

# We need to define a new hash function that outputs an integer that
# can be encoded in 32 bits.
def _hash_func(d):
    return mmh3.hash32(d)

# Use this function in MinHash constructor.
m = MinHash(hashfunc=_hash_func)

Better LSH Ensemble

27 Dec 16:02
54d053c
Compare
Choose a tag to compare

Use dynamic programming to create optimal partition, allow LSH Ensemble index to adapt to any set size distribution.

Batch removal of keys from Async MinHashLSH index

02 Nov 20:56
Compare
Choose a tag to compare
  • Adding batch removal functionality for Async MinHashLSH
  • Because Redis does not support async operation, removed Redis support from Async MinHashLSH

For details see Pull #70
Thanks @aastafiev for the contribution.

MongoDB replicas

22 Oct 21:32
Compare
Choose a tag to compare
MongoDB replicas Pre-release
Pre-release

Add support for MongoDB replica set

Fix bug #68

17 Oct 04:16
Compare
Choose a tag to compare
Fix bug #68 Pre-release
Pre-release
(Asynchronous MinHashLSH) fixes critical bug when removing key from L…

…… (#67)

(Asynchronous MinHashLSH) fixes critical bug when removing key from LSH index. Before, during removal key the buckets become broken - all similar to removal key hashes have been removed too.

Asynchronous MinHash LSH module and storage base name

27 Jul 00:28
Compare
Choose a tag to compare
  • Added Asynchronous MinHash LSH module. Thanks @aastafiev!
  • Added ability to set the base name in storage config. Base name is used as the
    prefix for generating keys in the underlying storage (e.g., Redis).
    This change allows client to "reconnect" to an existing LSH index in the storage through its base name.