Summary
GLOBAL_SHELVE_LOCK in job_engine.py is a multiprocessing.Lock() that has two issues: it's fragile across fork boundaries, and the shelve file is opened outside the lock.
Issue 1: Lock/Unlock Race in get_shelve_value
Location: wbia/web/job_engine.py:681-695
def get_shelve_value(shelve_filepath, key):
wait_for_shelve_lock_file(shelve_filepath)
with GLOBAL_SHELVE_LOCK: # Lock acquired
wait_for_shelve_lock_file(shelve_filepath)
touch_shelve_lock_file(shelve_filepath)
# ← Lock RELEASED here
value = None
try:
with shelve.open(shelve_filepath, 'r') as shelf: # File opened WITHOUT lock
value = shelf.get(key)
except Exception:
pass
delete_shelve_lock_file(shelve_filepath)
return value
The GLOBAL_SHELVE_LOCK is released before the shelve file is opened. This means:
- Thread/Process A acquires lock, touches lock file, releases lock
- Thread/Process B acquires lock, touches lock file, releases lock
- Both A and B open the shelve file concurrently
- If one is writing (
set_shelve_value), the other may read corrupt data
The lock only protects the lock-file creation, not the actual shelve access.
Issue 2: multiprocessing.Lock After Fork
Location: wbia/web/job_engine.py:110
GLOBAL_SHELVE_LOCK = multiprocessing.Lock()
multiprocessing.Lock created before fork() is inherited by child processes but can become deadlocked if the parent held the lock at fork time. With Gunicorn's fork-based worker model, this is a latent risk.
In practice, the collector (which primarily uses this lock) runs in its own process and is single-threaded, so the threading aspect doesn't apply. But the lock-release-before-shelve-open bug affects correctness regardless.
Proposed Fix
Move the shelve open inside the lock:
def get_shelve_value(shelve_filepath, key):
with GLOBAL_SHELVE_LOCK:
try:
with shelve.open(shelve_filepath, 'r') as shelf:
return shelf.get(key)
except Exception:
return None
Or replace the shelve + lock mechanism entirely with a simpler approach (e.g., pickle files with atomic writes).
Environment
- Collector process: single-threaded, uses shelve for job metadata/results
- Engine processes: may also read shelve files
- Gunicorn master → worker fork
Summary
GLOBAL_SHELVE_LOCKinjob_engine.pyis amultiprocessing.Lock()that has two issues: it's fragile across fork boundaries, and the shelve file is opened outside the lock.Issue 1: Lock/Unlock Race in get_shelve_value
Location:
wbia/web/job_engine.py:681-695The
GLOBAL_SHELVE_LOCKis released before the shelve file is opened. This means:set_shelve_value), the other may read corrupt dataThe lock only protects the lock-file creation, not the actual shelve access.
Issue 2: multiprocessing.Lock After Fork
Location:
wbia/web/job_engine.py:110multiprocessing.Lockcreated beforefork()is inherited by child processes but can become deadlocked if the parent held the lock at fork time. With Gunicorn's fork-based worker model, this is a latent risk.In practice, the collector (which primarily uses this lock) runs in its own process and is single-threaded, so the threading aspect doesn't apply. But the lock-release-before-shelve-open bug affects correctness regardless.
Proposed Fix
Move the shelve open inside the lock:
Or replace the shelve + lock mechanism entirely with a simpler approach (e.g., pickle files with atomic writes).
Environment