-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add resources to provenance #3016
base: main
Are you sure you want to change the base?
Conversation
5a313f9
to
6798b1a
Compare
@jeromekelleher I'm not sure how library methods such as |
6798b1a
to
70fcf6c
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3016 +/- ##
========================================
Coverage 89.84% 89.84%
========================================
Files 29 29
Lines 32093 32111 +18
Branches 6230 5758 -472
========================================
+ Hits 28833 28851 +18
Misses 1859 1859
Partials 1401 1401
Flags with carried forward coverage won't be shown. Click here to find out more.
|
70fcf6c
to
98a3a96
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if we want to get into the business of generating this in tskit, although I agree it would be nice to not have to redo it in libraries.
I guess we could require the start_time
as a parameter, which would force users to generate a sensible one?
I think this is something that we don't expect to be recorded much, except in cases where there's a lot of resources involved. Msprime, for example, wouldn't bother with this.
python/tskit/provenance.py
Outdated
@@ -72,6 +80,31 @@ def get_environment(extra_libs=None, include_tskit=True): | |||
return env | |||
|
|||
|
|||
def get_resources(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to do this here? I feel like this will get misused pretty badly as people won't realise that time is relative to when tskit got imported. It's only really appropriate in cases where the time is essentially the full life of the interpreter.
python/tskit/provenance.schema.json
Outdated
"description": "System time used in seconds.", | ||
"type": "number" | ||
}, | ||
"max_mem": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this max_memory
here, I've commited this in sc2ts and the brevity doesn't really help.
python/tskit/util.py
Outdated
@@ -216,7 +216,8 @@ def pack_arrays(list_of_lists, dtype=np.float64): | |||
""" | |||
Packs the specified list of numeric lists into a flattened numpy array | |||
of the specified dtype with corresponding offsets. See | |||
:ref:`sec_encoding_ragged_columns` for details of this encoding of columns | |||
:ref:`sec_encoding_ragged_columns` for detThis information |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some collatoral damage?
Yes, it was only when I went to add it to existing tskit methods I realised this was the wrong approach. I don't think adding it to all tskit methods is useful. I still think the method is useful though, how about I add the |
Sgtm |
923b54e
to
118e9a0
Compare
118e9a0
to
19436e3
Compare
@jeromekelleher Ok! Should be good to go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spotted a few more things, but LGTM then.
We should get some wider input on this before adding to schema i guess, maybe do a shout out on Slack?
python/tests/test_provenance.py
Outdated
@@ -35,6 +43,9 @@ | |||
import tskit.provenance as provenance | |||
|
|||
|
|||
_start_time = time.time() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Global value probably doesn't help with testing
python/tests/test_provenance.py
Outdated
assert "max_memory" in resources | ||
|
||
def test_used_resources_values(self): | ||
resources = provenance.get_resources(_start_time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something less fragile would be time.time() - delta
, and then test that elapsed time is >= delta.
python/tests/test_provenance.py
Outdated
assert isinstance(resources["user_time"], float) | ||
assert isinstance(resources["sys_time"], float) | ||
assert resources["elapsed_time"] > 0.0001 | ||
assert resources["user_time"] > 0.0001 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just do > 0 here, there will surely be cases there this fails in CI or whatever
|
||
try: | ||
import resource | ||
except ImportError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's just that the module doesn't exist on Win right? Otherwise check wouldn't work.
python/tskit/provenance.py
Outdated
"sys_time": times.system + times.children_system, | ||
} | ||
if resource is not None: | ||
# Don't report max memory on Windows. We could do this using the psutil lib, via |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment isn't accurate for tskit - I don't think we'd want a dependency on psutil.
python/tskit/provenance.py
Outdated
# psutil.Process(os.getpid()).get_ext_memory_info().peak_wset if demand exists | ||
ret["max_memory"] = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss | ||
if sys.platform != "darwin": | ||
ret["max_memory"] *= 1024 # Linux, freeBSD et al reports in KB, not bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be specific it's KiB not KB (10**3)
Comments addressed in 06a4132 |
8687319
to
6c4776a
Compare
6c4776a
to
cd3b101
Compare
Description
Add optional
resources
to provenances along with a helper method to populate the field. This is intended for long-running processed such as tsinfer, where measuring the resources used across distributed pipelines is awkward and easier to do in the actual process that makes the tree sequence.PR Checklist: