-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: Free up disk space for GH actions #12
Conversation
Resolves IBM#11 Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
@njhill -- I could use some newbie advice :-) Python tests fail with error:
Do we need to add an extra step to download model files?
|
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
@njhill @tjohnson31415 -- I think this one is ready for review (again) 😊 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: Should change the PR title since this is doing a bunch more than freeing up some disk space now :)
Since the Flash Attention Makefiles are no longer used in the Dockerfile
, maybe we should just remove them. @njhill What do you think? Do you use the flash Makefiles?
Dockerfile
Outdated
#ARG PYTORCH_INDEX="https://download.pytorch.org/whl" | ||
ARG PYTORCH_INDEX="https://download.pytorch.org/whl/nightly" | ||
ARG PYTORCH_VERSION=2.3.0.dev20231221 | ||
#ARG PYTORCH_VERSION=2.3.0.dev20231221 | ||
ARG PYTORCH_VERSION=2.2.0.dev20231213 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the nightly PyTorch versions, I think there's a chance of incompatibilities unless the exact build of PyTorch matches with the dependencies. The prebuilt wheel for Flash Attention v2 was compiled with 2.2.0.dev20231127
REF.
It also could just work... but the safest thing would be to exactly match the versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, we might want to look at the tagged version of the build/publish workflow file based on the flash-attn version, i.e.:
Flash | Torch | Ref |
---|---|---|
2.5.0 | 2.2.0.dev20231130 | REF |
2.4.3 | 2.2.0.dev20231130 | REF |
2.4.2 | 2.2.0.dev20231106 | REF |
2.4.1 | 2.2.0.dev20231106 | REF |
2.4.0 | 2.2.0.dev20231106 | REF |
2.3.6 | 2.2.0.dev20231127 | REF |
2.3.5 | 2.1.0 (no dev) | REF |
Although, looking at the nightly build index https://download.pytorch.org/whl/nightly/torch/ dev20231127
is no longer available.
I do see torch-2.2.0 here https://download.pytorch.org/whl/torch/ now
So we could just use '2.2.0'
instead of '2.2.0.dev20231127'
FYI, I submitted a PR to add the PyTorch 2.3 nightly dev builds to the flash-attention build matrix: Dao-AILab/flash-attention#793
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
> [cpu-tests 2/11] RUN pip install torch=="2.2.0.dev20231127+cpu" --index-url "https://download.pytorch.org/whl/nightly/cpu" --no-cache-dir:
0.587 Looking in indexes: https://download.pytorch.org/whl/nightly/cpu
0.931 ERROR: Could not find a version that satisfies the requirement torch==2.2.0.dev20231127+cpu (from versions: 2.2.0.dev20231010+cpu, 2.2.0.dev20231205+cpu, 2.2.0.dev20231206+cpu, 2.2.0.dev20231207+cpu, 2.2.0.dev20231208+cpu, 2.2.0.dev20231209+cpu, 2.2.0.dev20231210+cpu, 2.2.0.dev20231211+cpu, 2.2.0.dev20231212+cpu, 2.2.0.dev20231213+cpu, 2.3.0.dev20231214+cpu, 2.3.0.dev20231215+cpu, 2.3.0.dev20231216+cpu, 2.3.0.dev20231217+cpu, 2.3.0.dev20231218+cpu, 2.3.0.dev20231219+cpu, 2.3.0.dev20231220+cpu, 2.3.0.dev20231221+cpu, 2.3.0.dev20231222+cpu, 2.3.0.dev20231223+cpu, 2.3.0.dev20231224+cpu, 2.3.0.dev20231225+cpu, 2.3.0.dev20231226+cpu, 2.3.0.dev20231227+cpu, 2.3.0.dev20231228+cpu, 2.3.0.dev20231229+cpu, 2.3.0.dev20231230+cpu, 2.3.0.dev20231231+cpu, 2.3.0.dev20240101+cpu, 2.3.0.dev20240102+cpu, 2.3.0.dev20240103+cpu, 2.3.0.dev20240104+cpu, 2.3.0.dev20240105+cpu, 2.3.0.dev20240106+cpu, 2.3.0.dev20240107+cpu, 2.3.0.dev20240108+cpu, 2.3.0.dev20240109+cpu, 2.3.0.dev20240110+cpu, 2.3.0.dev20240111+cpu, 2.3.0.dev20240113+cpu, 2.3.0.dev20240114+cpu, 2.3.0.dev20240115+cpu, 2.3.0.dev20240116+cpu, 2.3.0.dev20240117+cpu, 2.3.0.dev20240118+cpu, 2.3.0.dev20240119+cpu, 2.3.0.dev20240120+cpu, 2.3.0.dev20240121+cpu, 2.3.0.dev20240122+cpu, 2.3.0.dev20240123+cpu, 2.3.0.dev20240124+cpu, 2.3.0.dev20240125+cpu, 2.3.0.dev20240126+cpu, 2.3.0.dev20240127+cpu, 2.3.0.dev20240128+cpu, 2.3.0.dev20240129+cpu)
0.932 ERROR: No matching distribution found for torch==2.2.0.dev20231127+cpu
------
Dockerfile:181
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
torch==2.2.0
works (at least for install/compilation)
Signed-off-by: Christian Kadner <[email protected]>
…on-v2 Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
|
Changes in this PR:
and reuse it for Python tests and integration tests
pip install
to use pre-built wheels and limit parallel compilation threads for flash-attention v2 build to avoid OOM errorupgrade flash-attention v2 to 2.3.2(flash-attention <= 2.0.5 has no pre-built wheels, CI build runs out of memory)Disk space on GH action runner before and after clean up:
Resolves #11