-
Notifications
You must be signed in to change notification settings - Fork 890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make tests more deterministic #17008
base: branch-24.12
Are you sure you want to change the base?
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
If interested, I included this pre-commit check in pandas to check that https://github.com/pandas-dev/pandas/blob/main/.pre-commit-config.yaml#L210-L211 |
I like the idea of a pygrep pre-commit check. We could also look out for |
@@ -245,7 +245,7 @@ def hash_vocab( | |||
""" | |||
Write the vocab vocabulary hashtable to the output_path | |||
""" | |||
np.random.seed(1243342) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VibhuJawa Switching to default_rng
changes the random values generated and thus updates to python/cudf/cudf/tests/data/subword_tokenizer_data/bert_base_cased_sampled/vocab-hash.txt
have been made, will that be an issue for tokenizer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think it should be , will have to test with some common vocabulary files to be sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there are still two remaining appearance of np.random.seed
in notebooks/performance-comparisons/performance-comparisons.ipynb and docs/cudf/source/user_guide/performance-comparisons/performance-comparisons.ipynb - should those also get updated?
@@ -95,6 +95,18 @@ repos: | |||
entry: 'pytest\.xfail' | |||
language: pygrep | |||
types: [python] | |||
- id: no-unseeded-default-rng |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a preference to keep these checks separated? Seems like we could do something similar to the check that @mroeschke linked in consolidating all of these entries into a single check that runs against all Python files at once
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I quite like the separate entries, since the name of the entry gives some information as to what went wrong. If the regex gets complicated, I find it hard to see what's going on. But I don't have strong feelings here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Prem! I had a few (likely nonblocking) comments, and I think I spotted a few places where the rng is still unseeded in tests. Overall this looks good though, thanks!
@@ -95,6 +95,18 @@ repos: | |||
entry: 'pytest\.xfail' | |||
language: pygrep | |||
types: [python] | |||
- id: no-unseeded-default-rng |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I quite like the separate entries, since the name of the entry gives some information as to what went wrong. If the regex gets complicated, I find it hard to see what's going on. But I don't have strong feelings here.
np.random.seed(12) | ||
rng = np.random.default_rng(seed=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose it doesn't really matter, but why the change of seed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It didn't actually matter keeping the seed same either:
In [1]: import numpy as np
In [2]: np.random.seed(0)
In [3]: np.random.randint(0, 10)
Out[3]: 5
In [4]: np.random.default_rng(0).integers(0, 10).item()
Out[4]: 8
python/cudf/cudf_pandas_tests/third_party_integration_tests/tests/test_stumpy_distributed.py
Outdated
Show resolved
Hide resolved
python/cudf/cudf_pandas_tests/third_party_integration_tests/tests/test_stumpy_distributed.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Charles Blackmon-Luca <[email protected]>
…into numpy_random
@charlesbluca @wence- I addressed all your reviews. This should be ready for review now. |
Description
Fixes #17045
This PR removes randomness in our pytests and switches from using
np.random.seed
tonp.random.default_rng
in all of the codebase.Checklist