-
Notifications
You must be signed in to change notification settings - Fork 248
Cache: reduce sleep time #504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
48d41a3
aa708a2
8ab383a
e08031d
4d13b82
2a9212a
387011e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -88,17 +88,18 @@ def __init__(self, cleanup_m, cache_dir): | |
except OSError: | ||
pass | ||
|
||
wait_time_seconds = 0.05 | ||
from time import sleep | ||
sleep(0.05) | ||
sleep(wait_time_seconds) | ||
|
||
attempts += 1 | ||
|
||
if attempts > 10: | ||
if attempts % (10/wait_time_seconds) == 0: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That won't work reliably. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
4d13b82 should fix this |
||
from warnings import warn | ||
warn("could not obtain cache lock--delete '%s' if necessary" | ||
matthiasdiener marked this conversation as resolved.
Show resolved
Hide resolved
|
||
% self.lock_file) | ||
|
||
if attempts > 60 / 0.05: | ||
if attempts > 60 / wait_time_seconds: | ||
raise RuntimeError("waited more than one minute " | ||
"on the lock file '%s'" | ||
"--something is wrong" % self.lock_file) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the motivation for this change? It seems really short. If you were running this (by mistake, say) on 1000 nodes sharing a cache directory, I think the resulting thrashing would not be a good thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The specific value was taken from https://github.com/tox-dev/py-filelock/blob/a6c8fabc4192fa7a4ae19b1875ee842ec5eb4f61/src/filelock/_api.py#L113. I can't really predict what the result of this with thousands of nodes would be, but we already keep on hitting the 1 second timeout when running with 2 ranks on a single node, which I guess indicates that the existing polling time would cause minutes-long waiting times when running with thousands of ranks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see that argument. Could you add a comment justifying this choice of time-out value? (making reference to the single-node interactive "test" use case)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in 387011e