Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BatchNorm kernel for ROCm #9014

Merged
merged 2 commits into from
Sep 13, 2021
Merged

Add BatchNorm kernel for ROCm #9014

merged 2 commits into from
Sep 13, 2021

Conversation

mindest
Copy link
Contributor

@mindest mindest commented Sep 9, 2021

Description:

  • Implement BatchNormInternal and BatchNormalizationGrad for ROCm EP
    • support float/float16, as MIOpen kernel does not support double yet
    • default mode spatial as per onnx spec
  • Update BatchNorm CUDA test for ROCm test too

Motivation and Context

  • Why is this change required?
    1-P model TwinBERT needs this for training on Mi100 clusters.

@mindest mindest marked this pull request as ready for review September 10, 2021 03:36
@mindest mindest added the training issues related to ONNX Runtime training; typically submitted using template label Sep 10, 2021
@mindest mindest merged commit a1021a1 into master Sep 13, 2021
@mindest mindest deleted the linmin/bn_rocm branch September 13, 2021 07:15
suffiank pushed a commit that referenced this pull request Sep 21, 2021
* Add BatchNorm kernel for ROCm, update BN test

* correct epsilon_ setting; limit min epsilon
wangyems added a commit that referenced this pull request Sep 22, 2021
* Revert "Fix nightly CI pipeline to generate ROCm 4.2 wheels and add ROCm 4.3.1 wheels (#9101)"

This reverts commit 4788839.

* Add BatchNorm kernel for ROCm (#9014)

* Add BatchNorm kernel for ROCm, update BN test

* correct epsilon_ setting; limit min epsilon

* Upgrade ROCm CI pipeline for ROCm 4.3.1 and permit run inside container (#9070)

* try to run inside 4.3.1 container

* no \ in container run command

* remove networking options

* try with adding video render groups

* add job to build docker image

* try without 1st stage

* change alpha, beta to float

* try adding service connection

* retain huggingface directory

* static video and render gid

* use runtime expression for variables

* install torch-ort

* pin sacrebleu==1.5.1

* update curves for rocm 4.3.1

* try again

* disable determinism and only check tail of loss curve and with a much larger threshold of 0.05

* disable RoBERTa due to high run variablity on ROCm 4.3.1

* put reduction unit tests back in

* Fix nightly CI pipeline to generate ROCm 4.2 wheels and add ROCm 4.3.1 wheels (#9101)

* make work for both rocm 4.2 and rocm 4.3.1

* fix rocm 4.3.1 docker image reference

* fix CUDA_VERSION to ROCM_VERSION

* fix ReduceConsts conflict def

* add ifdef to miopen_common.h as well

* trailing ws

Co-authored-by: wangye <[email protected]>
Co-authored-by: mindest <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
training issues related to ONNX Runtime training; typically submitted using template
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants