Skip to content

GLOBAL_SHELVE_LOCK is multiprocessing.Lock — fragile after fork and has race condition #314

@JasonWildMe

Description

@JasonWildMe

Summary

GLOBAL_SHELVE_LOCK in job_engine.py is a multiprocessing.Lock() that has two issues: it's fragile across fork boundaries, and the shelve file is opened outside the lock.

Issue 1: Lock/Unlock Race in get_shelve_value

Location: wbia/web/job_engine.py:681-695

def get_shelve_value(shelve_filepath, key):
    wait_for_shelve_lock_file(shelve_filepath)
    with GLOBAL_SHELVE_LOCK:                    # Lock acquired
        wait_for_shelve_lock_file(shelve_filepath)
        touch_shelve_lock_file(shelve_filepath)
    # ← Lock RELEASED here
    value = None
    try:
        with shelve.open(shelve_filepath, 'r') as shelf:  # File opened WITHOUT lock
            value = shelf.get(key)
    except Exception:
        pass
    delete_shelve_lock_file(shelve_filepath)
    return value

The GLOBAL_SHELVE_LOCK is released before the shelve file is opened. This means:

  1. Thread/Process A acquires lock, touches lock file, releases lock
  2. Thread/Process B acquires lock, touches lock file, releases lock
  3. Both A and B open the shelve file concurrently
  4. If one is writing (set_shelve_value), the other may read corrupt data

The lock only protects the lock-file creation, not the actual shelve access.

Issue 2: multiprocessing.Lock After Fork

Location: wbia/web/job_engine.py:110

GLOBAL_SHELVE_LOCK = multiprocessing.Lock()

multiprocessing.Lock created before fork() is inherited by child processes but can become deadlocked if the parent held the lock at fork time. With Gunicorn's fork-based worker model, this is a latent risk.

In practice, the collector (which primarily uses this lock) runs in its own process and is single-threaded, so the threading aspect doesn't apply. But the lock-release-before-shelve-open bug affects correctness regardless.

Proposed Fix

Move the shelve open inside the lock:

def get_shelve_value(shelve_filepath, key):
    with GLOBAL_SHELVE_LOCK:
        try:
            with shelve.open(shelve_filepath, 'r') as shelf:
                return shelf.get(key)
        except Exception:
            return None

Or replace the shelve + lock mechanism entirely with a simpler approach (e.g., pickle files with atomic writes).

Environment

  • Collector process: single-threaded, uses shelve for job metadata/results
  • Engine processes: may also read shelve files
  • Gunicorn master → worker fork

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions