-
Notifications
You must be signed in to change notification settings - Fork 151
[MIOpen] Implement kernel tuning heuristic model for 3D conv ops (two tower model) #1154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…el call to make HeuristicInit less verbose and easier to read.
…re used for expanding the candidate kernel params for the CandidateSelectionModel. Note: not tested yet.
projects/miopen/src/solver/conv/conv_hip_implicit_gemm_3d_grouped_fwd_xdlops.cpp
Show resolved
Hide resolved
projects/miopen/src/solver/conv/conv_hip_implicit_gemm_3d_grouped_fwd_xdlops.cpp
Show resolved
Hide resolved
…e file (not in header)
…t back to the correct place in the .hpp header
|
Because we don't have performance testing to cover this, we need to verify the performance results manually and attach them to this PR so we can view it. I think the goal of this PR is that out of the box performance for 3D convs improves before and after this change so can we use MIOpenDriver with a set of 3D conv shapes to test this? |
|
I don't see any obvious issues with the submission. I will echo the sentiment that it would be good to measure the expected performance uplift. Such as a % difference between heuristic selected and tuned. |
|
regarding performance metrics, I received this list of conv ops from @jfactory07: Which I then "extended" to cover all directions (not just fwd) and all datatypes. When I ran these MIOpendriver commands on MI308, it lead to this summarising figure: I gathered these data by forcing all the operations to go through the ConvHipImplicitGemm3DGroup*Xdlops solver (fwd, bwd, or wrw) and turning off the exhaustive tuning, such that we are only focussing on the kernel tuning performance here. Based on these numbers we decided that:
|
projects/miopen/src/solver/conv/conv_hip_implicit_gemm_3d_grouped_fwd_xdlops.cpp
Outdated
Show resolved
Hide resolved
…ree solvers for clearer logs (e.g. changed error msg for hard-code heuristics into log_i2)
[MIOpen] Implement kernel tuning heuristic model for 3D conv ops (two tower model) (#1154) Copied over form branch in old repo: #3923 ## Motivation With this PR MIOpen should be able to use heuristics for 3D convolutions on gfx942: * the parameter selection/kernel tuning of the three `conv_hip_implicit_gemm_3d_grouped_*_xdlops` solvers. We have added some `ai_*` files housing the new models and helper functions, and made some changes to existing files where required. * ~~the solver selection (i.e. a "3D tunanet")~~ At the moment the 3D solver selection model is inadequate and the already existing WTI fallback is preferred. Improving the 3D solver selection heuristics will be the focus of a future PR ## Technical Details * `ai_heuristics.cpp` contains all the code already there beforehand, plust new code that relies on fdeep includes, since fdeep can only be imported in a single file * `ai_candidate_selection` contains the actual two-towers (aka CandidateSelection) "model" and "metadata" classes that do the computation using floating point vectors. * `ai_conv_3d_kernel_tuning_utils` contains the machinery one level higher. That is, how to convert kernel configs and fdb_key input to float vectors and how to fetch and call the relevant CandidateSelectionModel. This is shared for all three solvers, so it made sense to centralise it in this file. * `kernels/gfx942....` model and metadatafiles. Perhaps these should be committed using git lfs, but I have not seen this done to other model files (e.g. Tunanet or KTN), so have not done so here. * `solver/conv/...cpp` solver_specific files, they ultimately contain the solver-specific machinery. In this case our 3 solvers rely heavily on `ai_conv_3d_kernel_tuning_utils` and through that on `ai_candidate_selection` * gtest files: should speak for themselves, please have a look. They test all the new machinery * solvers.hpp a huge header file that contains declarations for all solvers (i.e. for the `solver/conv/...cpp` files), so this needed to be altered as well ## Test Plan 3 new gtest .cpp files are added: * ~~`test/gtest/conv_ai_3d_heuristics.cpp` This aims to test all new functionality related to the 3D tunanet (model + metadata).~~ While the tests are still there for future use, they will be skipped since 3D Tunanet model data is no longer included. * `test/gtest/conv_ai_3d_kernel_tuning_utils.cpp` Aims to test all new machinery in `ai_conv_3d_kernel_tuning_utils.cpp` (preprocessing and handling of inputs to the CandidateSelectionModel for the 3D solvers). * `test/gtest/conv_ai_candidate_selection_model.cpp` Test interenal code related to the CandidateSelectionModel and its metadata. ## Test Result The `./bin/test_conv_ai_*` tests all succeed without errors when building and running them on a conductor MI300 node. Besides manually running all other `./bin/test_*` functions, is there a better way to perform a full test? ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
|
Github action triggered OSDB jenkins job: http://rocm-ci.amd.com/job/compute-rocm-dkms-mathlibs-osdb/286 |
#1740) …ops (two tower model) (#1154)" This reverts commit 422e872. ## Motivation That commit broke MI300 unit tests. ## Test Plan Reverted this change and verified that the failing build now passes ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.



Copied over form branch in old repo:
ROCm/MIOpen#3923
Motivation
With this PR MIOpen should be able to use heuristics for 3D convolutions on gfx942:
conv_hip_implicit_gemm_3d_grouped_*_xdlopssolvers.We have added some
ai_*files housing the new models and helper functions, and made some changes to existing files where required.the solver selection (i.e. a "3D tunanet")At the moment the 3D solver selection model is inadequate and the already existing WTI fallback is preferred. Improving the 3D solver selection heuristics will be the focus of a future PRTechnical Details
ai_heuristics.cppcontains all the code already there beforehand, plust new code that relies on fdeep includes, since fdeep can only be imported in a single fileai_candidate_selectioncontains the actual two-towers (aka CandidateSelection) "model" and "metadata" classes that do the computation using floating point vectors.ai_conv_3d_kernel_tuning_utilscontains the machinery one level higher. That is, how to convert kernel configs and fdb_key input to float vectors and how to fetch and call the relevant CandidateSelectionModel. This is shared for all three solvers, so it made sense to centralise it in this file.kernels/gfx942....model and metadatafiles. Perhaps these should be committed using git lfs, but I have not seen this done to other model files (e.g. Tunanet or KTN), so have not done so here.solver/conv/...cppsolver_specific files, they ultimately contain the solver-specific machinery. In this case our 3 solvers rely heavily onai_conv_3d_kernel_tuning_utilsand through that onai_candidate_selectionsolver/conv/...cppfiles), so this needed to be altered as wellTest Plan
3 new gtest .cpp files are added:
While the tests are still there for future use, they will be skipped since 3D Tunanet model data is no longer included.test/gtest/conv_ai_3d_heuristics.cppThis aims to test all new functionality related to the 3D tunanet (model + metadata).
test/gtest/conv_ai_3d_kernel_tuning_utils.cppAims to test all new machinery in
ai_conv_3d_kernel_tuning_utils.cpp(preprocessing and handling of inputs to the CandidateSelectionModel for the 3D solvers).test/gtest/conv_ai_candidate_selection_model.cppTest interenal code related to the CandidateSelectionModel and its metadata.
Test Result
The
./bin/test_conv_ai_*tests all succeed without errors when building and running them on a conductor MI300 node.Besides manually running all other
./bin/test_*functions, is there a better way to perform a full test?Submission Checklist