Make tests more deterministic #17008

galipremsagar · 2024-10-07T22:06:53Z

Description

This PR removes randomness in our pytests and switches from using np.random.seed to np.random.default_rng in all of the codebase.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

review-notebook-app · 2024-10-07T22:06:58Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

mroeschke · 2024-10-11T20:38:58Z

If interested, I included this pre-commit check in pandas to check that default_rng usage is always seeded (admittedly someone could still pass seed=None to get the unseeded behavior)

https://github.com/pandas-dev/pandas/blob/main/.pre-commit-config.yaml#L210-L211

vyasr · 2024-10-14T16:29:12Z

I like the idea of a pygrep pre-commit check. We could also look out for np.random.seed().

galipremsagar · 2024-10-14T19:22:10Z

python/cudf/cudf/utils/hash_vocab_utils.py

@@ -245,7 +245,7 @@ def hash_vocab(
    """
    Write the vocab vocabulary hashtable to the output_path
    """
-    np.random.seed(1243342)


@VibhuJawa Switching to default_rng changes the random values generated and thus updates to python/cudf/cudf/tests/data/subword_tokenizer_data/bert_base_cased_sampled/vocab-hash.txt have been made, will that be an issue for tokenizer?

I dont think it should be , will have to test with some common vocabulary files to be sure

charlesbluca

Looks like there are still two remaining appearance of np.random.seed in notebooks/performance-comparisons/performance-comparisons.ipynb and docs/cudf/source/user_guide/performance-comparisons/performance-comparisons.ipynb - should those also get updated?

pyproject.toml

charlesbluca · 2024-10-15T16:00:29Z

.pre-commit-config.yaml

@@ -95,6 +95,18 @@ repos:
        entry: 'pytest\.xfail'
        language: pygrep
        types: [python]
+      - id: no-unseeded-default-rng


Is there a preference to keep these checks separated? Seems like we could do something similar to the check that @mroeschke linked in consolidating all of these entries into a single check that runs against all Python files at once

FWIW, I quite like the separate entries, since the name of the entry gives some information as to what went wrong. If the regex gets complicated, I find it hard to see what's going on. But I don't have strong feelings here.

wence-

Thanks Prem! I had a few (likely nonblocking) comments, and I think I spotted a few places where the rng is still unseeded in tests. Overall this looks good though, thanks!

wence- · 2024-10-15T17:10:15Z

.pre-commit-config.yaml

@@ -95,6 +95,18 @@ repos:
        entry: 'pytest\.xfail'
        language: pygrep
        types: [python]
+      - id: no-unseeded-default-rng


FWIW, I quite like the separate entries, since the name of the entry gives some information as to what went wrong. If the regex gets complicated, I find it hard to see what's going on. But I don't have strong feelings here.

python/cudf/cudf/testing/dataset_generator.py

python/cudf/cudf/tests/test_binops.py

wence- · 2024-10-15T17:19:46Z

python/cudf/cudf/tests/test_categorical.py

-    np.random.seed(12)
+    rng = np.random.default_rng(seed=0)


I suppose it doesn't really matter, but why the change of seed?

It didn't actually matter keeping the seed same either:

In [1]: import numpy as np In [2]: np.random.seed(0) In [3]: np.random.randint(0, 10) Out[3]: 5 In [4]: np.random.default_rng(0).integers(0, 10).item() Out[4]: 8

python/cudf/cudf/utils/hash_vocab_utils.py

python/cudf/cudf_pandas_tests/test_cudf_pandas.py

python/cudf/cudf_pandas_tests/test_profiler.py

python/cudf/cudf_pandas_tests/third_party_integration_tests/tests/test_stumpy_distributed.py

Co-authored-by: Charles Blackmon-Luca <[email protected]>

…into numpy_random

galipremsagar · 2024-10-16T21:36:30Z

@charlesbluca @wence- I addressed all your reviews. This should be ready for review now.

galipremsagar added 6 commits October 4, 2024 23:07

use nep rule

8b462e8

update pre-commit config

763ae35

first pass

027e148

Switch numpy random calls to latest API

828706e

Merge remote-tracking branch 'upstream/branch-24.12' into numpy_random

40ad066

improve

858afbb

Merge branch 'branch-24.12' into numpy_random

16e61b2

github-actions bot added Python Affects Python cuDF API. cudf.pandas Issues specific to cudf.pandas labels Oct 7, 2024

galipremsagar added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change and removed Python Affects Python cuDF API. cudf.pandas Issues specific to cudf.pandas labels Oct 7, 2024

github-actions bot added Python Affects Python cuDF API. cudf.pandas Issues specific to cudf.pandas labels Oct 7, 2024

galipremsagar added 2 commits October 11, 2024 18:02

Merge remote-tracking branch 'upstream/branch-24.12' into numpy_random

66edb86

more fixes

ebaa562

galipremsagar added 6 commits October 11, 2024 22:38

fix issues

e100e2d

fix more failures

33234fc

update pre-commit

5915e5b

Merge remote-tracking branch 'upstream/branch-24.12' into numpy_random

7087da9

fix default seed

3d35be1

style

8336a5f

galipremsagar added 3 commits October 14, 2024 19:16

update files

cd7e198

update

bc73f31

Merge remote-tracking branch 'upstream/branch-24.12' into numpy_random

4e81055

galipremsagar commented Oct 14, 2024

View reviewed changes

galipremsagar added 3 commits October 14, 2024 19:22

update

0939ba1

update

8f4efb7

update notebook

de0813c

galipremsagar marked this pull request as ready for review October 14, 2024 19:35

galipremsagar requested review from a team as code owners October 14, 2024 19:35

galipremsagar requested review from KyleFromNVIDIA, bdice and charlesbluca October 14, 2024 19:35

galipremsagar self-assigned this Oct 14, 2024

AyodeAwe approved these changes Oct 15, 2024

View reviewed changes

charlesbluca suggested changes Oct 15, 2024

View reviewed changes

wence- approved these changes Oct 15, 2024

View reviewed changes

galipremsagar and others added 9 commits October 15, 2024 15:47

Apply suggestions from code review

3158d9c

Co-authored-by: Charles Blackmon-Luca <[email protected]>

address reviews

a02907b

Merge branch 'numpy_random' of https://github.com/galipremsagar/cudf …

56f8ee0

…into numpy_random

Merge remote-tracking branch 'upstream/branch-24.12' into numpy_random

4f3ca74

merge into one

4a9e944

address reviews

b8f964b

Merge remote-tracking branch 'upstream/branch-24.12' into numpy_random

5a7afc6

fix struct data type corruption

2350a46

Merge branch 'branch-24.12' into numpy_random

ab1ddda

galipremsagar requested a review from charlesbluca October 16, 2024 21:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make tests more deterministic #17008

Make tests more deterministic #17008

galipremsagar commented Oct 7, 2024 •

edited

Loading

review-notebook-app bot commented Oct 7, 2024

mroeschke commented Oct 11, 2024

vyasr commented Oct 14, 2024

galipremsagar Oct 14, 2024

VibhuJawa Oct 15, 2024

charlesbluca left a comment

charlesbluca Oct 15, 2024

wence- Oct 15, 2024

wence- left a comment

wence- Oct 15, 2024

wence- Oct 15, 2024

galipremsagar Oct 15, 2024

galipremsagar commented Oct 16, 2024

Make tests more deterministic #17008

Are you sure you want to change the base?

Make tests more deterministic #17008

Conversation

galipremsagar commented Oct 7, 2024 • edited Loading

Description

Checklist

review-notebook-app bot commented Oct 7, 2024

mroeschke commented Oct 11, 2024

vyasr commented Oct 14, 2024

galipremsagar Oct 14, 2024

Choose a reason for hiding this comment

VibhuJawa Oct 15, 2024

Choose a reason for hiding this comment

charlesbluca left a comment

Choose a reason for hiding this comment

charlesbluca Oct 15, 2024

Choose a reason for hiding this comment

wence- Oct 15, 2024

Choose a reason for hiding this comment

wence- left a comment

Choose a reason for hiding this comment

wence- Oct 15, 2024

Choose a reason for hiding this comment

wence- Oct 15, 2024

Choose a reason for hiding this comment

galipremsagar Oct 15, 2024

Choose a reason for hiding this comment

galipremsagar commented Oct 16, 2024

galipremsagar commented Oct 7, 2024 •

edited

Loading