-
-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ml): rocm #16613
base: main
Are you sure you want to change the base?
feat(ml): rocm #16613
Conversation
📖 Documentation deployed to pr-16613.preview.immich.app |
steps: | ||
- name: Login to GitHub Container Registry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's some changes in indentation as well as changes from double quote to single quote. Was this intended? I know it's from the first commit from the original PR but I don't think that was addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VS Code did this when I saved. I'm not sure why it's different
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a PR check that runs prettier on the workflow files? I would think the inconsistency exists because there likely isn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Docker cache appears working with no changes, would you mind changing something within ML itself that would require a source code change and rebuild, just so we can see the cache working in those cases before we merge?
FYI, there's a set of rocm builds available supporting a wider range of AMD hardware, which might be useful: |
"ROCM SDK Builder 6.1.2 is based on to ROCM 6.1.2" |
Sadly, no, not quite. Official ROCm does not support, for instance, gfx1103 (RX 780M and similar iGPUs, 7940HS and similar APUs). |
The official listed support in the docs is mostly just gfx103X and gfx110X and maybe some other stuff. They're inconsistent and define supported as our team will help you on GitHub with certain stuff but anything not on the list may work (eg. Vega GPUs work fine) but they won't help you. Edit: So my question would be, how does one check what's supported by the build they are running? |
Yeah, but the official ROCm build will not work with gfx1103 at all, applications built against it (i.e. pytorch prebuilt) will not work with gfx1103, and building against it for gfx1103 will not work either.
I'm not quite sure. On Fedora, the gfx1103 build is provided as a separate package and listed as a separate folder, but the officially supported gfx1102 falls under gfx1100 here, so it's not a reliable check:
|
Maybe it would be useful to have two rocm flavored options? One with the current main rocm version, and one with the community version built to support a wider variety of GPUs? |
Nice, they split them up by version. Eventually we want to do that to cut down the 30 GB image size. Frigate also splits them up. The current image we build has multiple versions all built into one image. |
Doing that would also resolve the issue of "official or unofficial build?" I suppose, since you can just provide the official builds for the supported GPUs and the unofficial builds for the non-supported GPUs. But you'd need to provide a lot of images that way. Edit: FYI:
|
machine-learning/Dockerfile
Outdated
|
||
WORKDIR /code | ||
|
||
RUN apt-get update && apt-get install -y --no-install-recommends wget git python3.10-venv migraphx migraphx-dev half |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RUN apt-get update && apt-get install -y --no-install-recommends wget git python3.10-venv migraphx migraphx-dev half | |
RUN apt-get update && apt-get install -y --no-install-recommends wget git python3.10-venv migraphx-dev |
Only migraphx-dev
is needed as the other 2 are dependencies.
Edit: don't change it now, though, because it's already building.
@@ -80,11 +111,14 @@ COPY --from=builder-armnn \ | |||
/opt/ann/build.sh \ | |||
/opt/armnn/ | |||
|
|||
FROM rocm/dev-ubuntu-22.04:6.3.4-complete AS prod-rocm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know there were already comments on this, but I think copying the deps manually may result in a smaller, yet still working image. It might be worth re-investigating.
@@ -15,6 +15,34 @@ RUN mkdir /opt/armnn && \ | |||
cd /opt/ann && \ | |||
sh build.sh | |||
|
|||
# Warning: 25GiB+ disk space required to pull this image | |||
# TODO: find a way to reduce the image size | |||
FROM rocm/dev-ubuntu-22.04:6.3.4-complete AS builder-rocm |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. Not it.
Fedora rocBLAS patch for gfx1103 support looks like copy of gfx1102 (navi33). Only names and ISA versions differ. I diffed changes betwen few files and think that theese are only diferences.
I'm intrested in additional gpu support because I have minipc with Ryzen8845HS (Radeon 780M) for testing, and second one with Ryzen5825U.
|
My 780m locks up my desktop roughly 50% of the time when using ROCm llama.cpp/whisper.cpp with any ROCm version (1100, 1102, 1103). I'd hoped it would be less of an issue headless or with different applications, but if you have the same issue with Immich that does not bode well... |
This is not a valid version from what I've observed. So far, there are only 3 valid options:
|
Unfortunately adding support for gfx1102 dosen't solve problems with crashing on Radeon 780M, but I'm happy because I succeeded getting it to work on Ryzen 5825U GPU. |
They also specifically say certain iGPUs crash. I would bet that they're just bleading edge.
That model or similar is known to work. |
I'm removing MIGraphX for now and moving back to direct ROCm. There are some advantages to using MIGraphX, so we might circle back to it down the line. Also updated the PR based on some of the later comments on GPU compatibility,. |
Description
This PR introduces support for AMD GPUs through ROCm. It's a rebased version of #11063 with updated dependencies.
It also once again removes algo caching, as the concurrency issue with caching seems to be more subtle than originally thought. While disabling caching is wasteful (it essentially runs a benchmark every time instead of only once), it's still better than the current alternative of either lowering concurrency to 1 or not having ROCm support.