Skip to content

Commit

Permalink
update: Rewrite update script
Browse files Browse the repository at this point in the history
New update script uses futures to dynamically schedule many smaller
tasks between a constant number of threads, instead of statically
assigning a single long running task to each thread.
This results in better CPU saturation.

Database handles are not shared between threads anymore, instead
the main thread is used to commit results of other threads into the
database.
This trades locking on database access for serialization costs - since
multiprocessing is used, values returned from futures are pickled.
(although in practice that depends on ProcessPool configuration)
  • Loading branch information
fstachura committed Dec 29, 2024
1 parent b58cc27 commit 751e7a7
Show file tree
Hide file tree
Showing 2 changed files with 444 additions and 0 deletions.
13 changes: 13 additions & 0 deletions elixir/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,14 @@ def iter(self, dummy=False):
if dummy:
yield maxId, None, None, None

def exists(self, idx, line_num):
entries = deflist_regex.findall(self.data)
for id, _, line, _ in entries:
if id == idx and int(line) == line_num:
return True

return False

def append(self, id, type, line, family):
if type not in defTypeD:
return
Expand Down Expand Up @@ -165,6 +173,8 @@ def exists(self, key):
def get(self, key):
key = autoBytes(key)
p = self.db.get(key)
if p is None:
return None
p = self.ctype(p)
return p

Expand All @@ -180,6 +190,9 @@ def put(self, key, val, sync=False):
if sync:
self.db.sync()

def sync(self):
self.db.sync()

def close(self):
self.db.close()

Expand Down
Loading

0 comments on commit 751e7a7

Please sign in to comment.