Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions pyopencl/cache.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,17 +88,18 @@ def __init__(self, cleanup_m, cache_dir):
except OSError:
pass

wait_time_seconds = 0.05
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the motivation for this change? It seems really short. If you were running this (by mistake, say) on 1000 nodes sharing a cache directory, I think the resulting thrashing would not be a good thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specific value was taken from https://github.com/tox-dev/py-filelock/blob/a6c8fabc4192fa7a4ae19b1875ee842ec5eb4f61/src/filelock/_api.py#L113. I can't really predict what the result of this with thousands of nodes would be, but we already keep on hitting the 1 second timeout when running with 2 ranks on a single node, which I guess indicates that the existing polling time would cause minutes-long waiting times when running with thousands of ranks.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see that argument. Could you add a comment justifying this choice of time-out value? (making reference to the single-node interactive "test" use case)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 387011e

from time import sleep
sleep(0.05)
sleep(wait_time_seconds)

attempts += 1

if attempts > 10:
if attempts % (10/wait_time_seconds) == 0:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That won't work reliably. / unconditionally has a floating point result, and comparing floating point numbers to zero is unreliable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That won't work reliably. / unconditionally has a floating point result, and comparing floating point numbers to zero is unreliable.

4d13b82 should fix this

from warnings import warn
warn("could not obtain cache lock--delete '%s' if necessary"
% self.lock_file)

if attempts > 60 / 0.05:
if attempts > 60 / wait_time_seconds:
raise RuntimeError("waited more than one minute "
"on the lock file '%s'"
"--something is wrong" % self.lock_file)
Expand Down