Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ROCm] Use ROCm 6.2.3 in docker and ROCm/Migrahx CI pipelines #22478

Closed
wants to merge 4 commits into from

Conversation

tianleiwu
Copy link
Contributor

@tianleiwu tianleiwu commented Oct 17, 2024

Description

Use rocm 6.2.3 in docker files and CI pipelines.

Some improvements/upgrades on ROCm docker file:

  • Use a shared docker file for ROCm and Migraphx CI pipelines to reduce maintenance cost.
  • rocm 6.0/6.1.3 => 6.2.3
  • Ubuntu 20.04 => 22.04
  • python 3.9 => 3.10
  • cmake 3.30.1 => 3.30.5
  • ccache 4.7.4 => 4.10.2
  • miniconda => venv
  • use requirements.txt for installing common python package.
  • Fix message "ROCm version from ..." with correct file path in CMakeList.txt

Motivation and Context

In 1.20 release, ROCm nuget packaging pipeline will use 6.2: #22461.
This upgrades rocm to 6.2.3 in CI pipelines.

@tianleiwu tianleiwu requested a review from a team as a code owner October 17, 2024 05:40
@tianleiwu tianleiwu marked this pull request as draft October 17, 2024 05:40
snnn
snnn previously approved these changes Oct 17, 2024
@tianleiwu
Copy link
Contributor Author

tianleiwu commented Oct 21, 2024

There is test error after rewriting docker file. The test_set_providers_with_options (__main__.TestInferenceSession) failed:
HIPBLAS failure 3: HIPBLAS_STATUS_INVALID_VALUE ; GPU=0 ; hostname=e5a32321a1d9 ; file=/onnxruntime_src/onnxruntime/core/providers/rocm/rocm_execution_provider.cc ; line=170 ; expr=hipblasSetStream(hipblas_handle_, stream);

I'll submit a new pull request, and keep this for reference.

@tianleiwu tianleiwu closed this Oct 21, 2024
tianleiwu added a commit that referenced this pull request Oct 25, 2024
### Description
Upgrade python from 3.9 to 3.10 in ROCm and MigraphX docker files and CI
pipelines. Upgrade ROCm version to 6.2.3 in most places except ROCm CI,
see comment below.

Some improvements/upgrades on ROCm/Migraphx docker or pipeline:
* rocm 6.0/6.1.3 => 6.2.3
* python 3.9 => 3.10
* Ubuntu 20.04 => 22.04
* Also upgrade ml_dtypes, numpy and scipy packages.
* Fix message "ROCm version from ..." with correct file path in
CMakeList.txt
* Exclude some NHWC tests since ROCm EP lacks support for NHWC
convolution.

#### ROCm CI Pipeline:
ROCm 6.1.3 is kept in the pipeline for now.
- Failed after upgrading to ROCm 6.2.3: `HIPBLAS_STATUS_INVALID_VALUE ;
GPU=0 ; hostname=76123b390aed ;
file=/onnxruntime_src/onnxruntime/core/providers/rocm/rocm_execution_provider.cc
; line=170 ; expr=hipblasSetStream(hipblas_handle_, stream);` . It need
further investigation.
- cupy issues:
(1) It currently supports numpy < 1.27, might not work with numpy 2.x.
So we locked numpy==1.26.4 for now.
(2) cupy support of ROCm 6.2 is still in progress:
cupy/cupy#8606.

Note that miniconda issues: its libstdc++.so.6 and libgcc_s.so.1 might
have conflict with the system ones. So we created links to use the
system ones.

#### MigraphX CI pipeline

MigraphX CI does not use cupy, and we are able to use ROCm 6.2.3 and
numpy 2.x in the pipeline.

#### Other attempts

Other things that I've tried which might help in the future: 

Attempt to use a single docker file for both ROCm and Migraphx:
#22478

Upgrade to ubuntu 24.04 and python 3.12, and use venv like
[this](https://github.com/microsoft/onnxruntime/blob/27903e7ff1dd7256cd2b277c03766b4f2ad9e2f1/tools/ci_build/github/linux/docker/rocm-ci-pipeline-env.Dockerfile).

### Motivation and Context
In 1.20 release, ROCm nuget packaging pipeline will use 6.2:
#22461.
This upgrades rocm to 6.2.3 in CI pipelines to be consistent.
ishwar-raut1 pushed a commit to ishwar-raut1/onnxruntime that referenced this pull request Nov 19, 2024
…ft#22527)

### Description
Upgrade python from 3.9 to 3.10 in ROCm and MigraphX docker files and CI
pipelines. Upgrade ROCm version to 6.2.3 in most places except ROCm CI,
see comment below.

Some improvements/upgrades on ROCm/Migraphx docker or pipeline:
* rocm 6.0/6.1.3 => 6.2.3
* python 3.9 => 3.10
* Ubuntu 20.04 => 22.04
* Also upgrade ml_dtypes, numpy and scipy packages.
* Fix message "ROCm version from ..." with correct file path in
CMakeList.txt
* Exclude some NHWC tests since ROCm EP lacks support for NHWC
convolution.

#### ROCm CI Pipeline:
ROCm 6.1.3 is kept in the pipeline for now.
- Failed after upgrading to ROCm 6.2.3: `HIPBLAS_STATUS_INVALID_VALUE ;
GPU=0 ; hostname=76123b390aed ;
file=/onnxruntime_src/onnxruntime/core/providers/rocm/rocm_execution_provider.cc
; line=170 ; expr=hipblasSetStream(hipblas_handle_, stream);` . It need
further investigation.
- cupy issues:
(1) It currently supports numpy < 1.27, might not work with numpy 2.x.
So we locked numpy==1.26.4 for now.
(2) cupy support of ROCm 6.2 is still in progress:
cupy/cupy#8606.

Note that miniconda issues: its libstdc++.so.6 and libgcc_s.so.1 might
have conflict with the system ones. So we created links to use the
system ones.

#### MigraphX CI pipeline

MigraphX CI does not use cupy, and we are able to use ROCm 6.2.3 and
numpy 2.x in the pipeline.

#### Other attempts

Other things that I've tried which might help in the future: 

Attempt to use a single docker file for both ROCm and Migraphx:
microsoft#22478

Upgrade to ubuntu 24.04 and python 3.12, and use venv like
[this](https://github.com/microsoft/onnxruntime/blob/27903e7ff1dd7256cd2b277c03766b4f2ad9e2f1/tools/ci_build/github/linux/docker/rocm-ci-pipeline-env.Dockerfile).

### Motivation and Context
In 1.20 release, ROCm nuget packaging pipeline will use 6.2:
microsoft#22461.
This upgrades rocm to 6.2.3 in CI pipelines to be consistent.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants