Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RELEASE] dask-cuda v25.04 #1456

Open
wants to merge 20 commits into
base: main
Choose a base branch
from
Open

[RELEASE] dask-cuda v25.04 #1456

wants to merge 20 commits into from

Conversation

raydouglass
Copy link
Member

❄️ Code freeze for branch-25.04 and v25.04 release

What does this mean?

Only critical/hotfix level issues should be merged into branch-25.04 until release (merging of this PR).

What is the purpose of this PR?

  • Update documentation
  • Allow testing for the new release
  • Enable a means to merge branch-25.04 into main for the release

raydouglass and others added 20 commits January 23, 2025 15:02
Forward-merge branch-25.02 to branch-25.04
This migrates amd64 CI jobs (PRs and nightlies) to use L4 GPUs from the NVKS cluster.

xref: rapidsai/build-infra#184

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Gil Forsyth (https://github.com/gforsyth)

URL: #1435
Right now the test suite warns that

```
dask_cuda/tests/test_dask_cuda_worker.py:354
  /home/nfs/toaugspurger/gh/rapidsai/dask-cuda/dask_cuda/tests/test_dask_cuda_worker.py:354: PytestUnknownMarkWarning: Unknown pytest.mark.timeout - is this a typo?  You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/how-to/mark.html
    @pytest.mark.timeout(20)
```

Added `pytest-timeout` to `dependencies.yaml` and regenerated the conda environment.yaml files.

Authors:
  - Tom Augspurger (https://github.com/TomAugspurger)
  - Peter Andreas Entschev (https://github.com/pentschev)

Approvers:
  - Peter Andreas Entschev (https://github.com/pentschev)
  - Bradley Dice (https://github.com/bdice)

URL: #1433
Dask-CUDA currently requires that `dask.dataframe` be imported in a few places. We only do this to patch in explicit-comms shuffling and to register various dispatch functions. There is no fundamental reason that we need `dask.dataframe` to be installed if the user is not actually using `dask.dataframe`/`dask_cudf` in their workflow.

This PR essentially adds exception handling for "automatic" `dask.dataframe` imports (when `dask_cuda` is imported).

Authors:
  - Richard (Rick) Zamora (https://github.com/rjzamora)

Approvers:
  - Peter Andreas Entschev (https://github.com/pentschev)

URL: #1439
Forward-merge branch-25.02 into branch-25.04
Uses a retry wrapper for `pip` commands to try to alleviate CI failures due to hash mismatches that result from network hiccups

xref rapidsai/build-planning#148

This will retry failures that show up in CI like:

```
   Collecting nvidia-cublas-cu12 (from libraft-cu12==25.2.*,>=0.0.0a0)
    Downloading https://pypi.nvidia.com/nvidia-cublas-cu12/nvidia_cublas_cu12-12.8.3.14-py3-none-manylinux_2_27_aarch64.whl (604.9 MB)
       ━━━━━━━━━━━━━━━━━━━━━                 350.2/604.9 MB 229.2 MB/s eta 0:00:02
  ERROR: THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE. If you have updated the package versions, please update the hashes. Otherwise, examine the package contents carefully; someone may have tampered with them.
      nvidia-cublas-cu12 from https://pypi.nvidia.com/nvidia-cublas-cu12/nvidia_cublas_cu12-12.8.3.14-py3-none-manylinux_2_27_aarch64.whl#sha256=93a4e0e386cc7f6e56c822531396de8170ed17068a1e18f987574895044cd8c3 (from libraft-cu12==25.2.*,>=0.0.0a0):
          Expected sha256 93a4e0e386cc7f6e56c822531396de8170ed17068a1e18f987574895044cd8c3
               Got        849c88d155cb4b4a3fdfebff9270fb367c58370b4243a2bdbcb1b9e7e940b7be
```

Authors:
  - Gil Forsyth (https://github.com/gforsyth)

Approvers:
  - Mike Sarahan (https://github.com/msarahan)
  - Bradley Dice (https://github.com/bdice)

URL: #1443
Exposes `build_type` as an input in `test.yaml` so that `test.yaml` can be
manually run against a specific branch/commit as needed.

The default value is still `nightly`, and without maintainer intervention, that
is what will run each night.

xref rapidsai/build-planning#147

Authors:
  - Gil Forsyth (https://github.com/gforsyth)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #1444
Enables telemetry during CI runs. This is done by parsing GitHub Actions run log metadata and should have no impact on build or test times.

xref rapidsai/build-infra#139

Authors:
  - Mike Sarahan (https://github.com/msarahan)

Approvers:
  - James Lamb (https://github.com/jameslamb)

URL: #1445
This completes the migration to NVKS runners now that all libraries have been tested and rapidsai/shared-workflows#273 has been merged.

xref: rapidsai/build-infra#184

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Gil Forsyth (https://github.com/gforsyth)

URL: #1446
Adds a pre-commit hook to ensure files have an up-to-date copyright notice.

Authors:
  - Gil Forsyth (https://github.com/gforsyth)

Approvers:
  - Richard (Rick) Zamora (https://github.com/rjzamora)
  - James Lamb (https://github.com/jameslamb)

URL: #1423
Forward-merge branch-25.02 into branch-25.04
This PR updates the CommContext caching to be keyed by some information about the cluster, rather than a single global. This prevents us from using a stale comms object after the cluster changes (add or remove workers) or is recreated entirely.

Closes #1450

Authors:
  - Tom Augspurger (https://github.com/TomAugspurger)

Approvers:
  - Richard (Rick) Zamora (https://github.com/rjzamora)

URL: #1451
Installing `dask-cuda` like this:

```shell
pip install \
  --extra-index-url=https://pypi.anaconda.org/rapidsai-wheels-nightly/simple/ \
  'dask-cuda==25.4.*,>=0.0.0a0'
```

The `__git_commit__` attribute on the main module isn't populated:

```shell
python -c "import dask_cuda; print(dask_cuda.__git_commit__)"
```

The way this *should* work is that `rapids-build-backend` writes a file `dask_cuda/GIT_COMMIT` which is then read by this code: 

https://github.com/rapidsai/dask-cuda/blob/412ef5891f1cca78af48c076e0922874c227b34b/dask_cuda/_version.py#L20-L28

I think that what's happening here is this:

* `rapids-build-backend` *is* writing that file
* the file is not being packaged, because this project uses `setuptools` + a `MANIFEST.in`, and that `MANIFEST.in` does not include that file

This proposes the following:

* add `GIT_COMMIT` to `MANIFEST.in`
* update RAPIDS-specific pre-commit hooks to their latest versions (not related, but might as well, while we're using a CI run anyway)

## Notes for Reviewers

Helpful reference for this... "Controlling files in the distribution" from the `setuptools` docs: https://setuptools.pypa.io/en/latest/userguide/miscellaneous.html

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Mike Sarahan (https://github.com/msarahan)
  - Bradley Dice (https://github.com/bdice)

URL: #1453
This changes from `conda mambabuild` to `conda build`. Conda now uses the mamba solver so no performance regressions are expected.

This is a temporary change as we plan to migrate to `rattler-build` in the near future. However, this is needed sooner to drop `boa` and unblock Python 3.13 migrations.

xref: rapidsai/build-planning#149

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - James Lamb (https://github.com/jameslamb)

URL: #1454
@raydouglass raydouglass requested review from a team as code owners March 20, 2025 16:25
@raydouglass raydouglass requested review from msarahan and removed request for a team March 20, 2025 16:25
@github-actions github-actions bot added python python code needed conda conda issue ci labels Mar 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci conda conda issue python python code needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants