Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: Free up disk space for GH actions #12

Closed
wants to merge 45 commits into from

Conversation

ckadner
Copy link
Collaborator

@ckadner ckadner commented Oct 12, 2023

Changes in this PR:

  • add GitHub issue and pull request templates
  • add GitHub action to free up disk space
  • split build and test into separate workflows
  • run Python tests and integration tests in parallel, build test image once
    and reuse it for Python tests and integration tests
  • temporarily exclude/skip failing Python tests
  • temporarily build all stages of Dockerfile sequentially in order to be able to capture build logs
  • separate flash-attention v1 build from rotary-embeddings and dropout-layer-norm builds
  • run pip install to use pre-built wheels and limit parallel compilation threads for flash-attention v2 build to avoid OOM error
  • upgrade flash-attention v2 to 2.3.2 (flash-attention <= 2.0.5 has no pre-built wheels, CI build runs out of memory)
  • build on push to main
  • add build status badge to README.md
  • use PyTorch 2.2 instead of 2.3 to make use of pre-built wheels for flash-attn-v2

Disk space on GH action runner before and after clean up:

Disk usage before cleanup:
Filesystem      Size  Used Avail Use% Mounted on
/dev/root        84G   66G   18G  80% /
...

Removing non-essential tools and libraries ...
Deleting libraries for Android (12G), CodeQL (5.3G), PowerShell (1.3G), Swift (1.7G) ...

Disk usage after cleanup:
Filesystem      Size  Used Avail Use% Mounted on
/dev/root        84G   39G   45G  47% /
...

Pruning Docker images ...
Total reclaimed space: 5.697GB

Disk usage after pruning docker images:
Filesystem      Size  Used Avail Use% Mounted on
/dev/root        84G   33G   51G  40% /
...

Resolves #11

Resolves IBM#11

Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
@ckadner
Copy link
Collaborator Author

ckadner commented Oct 13, 2023

@njhill -- I could use some newbie advice :-)

Python tests fail with error:

ImportError: cannot import name 'BloomCausalLMBatch' from 'text_generation_server.models.causal_lm' (/usr/local/lib/python3.9/site-packages/text_generation_server/models/causal_lm.py)
docker run --rm -v /tmp/transformers_cache:/transformers_cache \
	-e HUGGINGFACE_HUB_CACHE=/transformers_cache \
	-e TRANSFORMERS_CACHE=/transformers_cache cpu-tests:0 pytest -sv --ignore=server/tests/test_utils.py server/tests
============================= test session starts ==============================
platform linux -- Python 3.9.16, pytest-7.4.2, pluggy-1.3.0 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /usr/src
plugins: asyncio-0.21.1
asyncio: mode=strict
collecting ... collected 33 items / 1 error

==================================== ERRORS ====================================
______________ ERROR collecting server/tests/models/test_bloom.py ______________
ImportError while importing test module '/usr/src/server/tests/models/test_bloom.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../lib64/python3.9/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
server/tests/models/test_bloom.py:8: in <module>
    from text_generation_server.models.causal_lm import BloomCausalLMBatch, BLOOM
E   ImportError: cannot import name 'BloomCausalLMBatch' from 'text_generation_server.models.causal_lm' (/usr/local/lib/python3.9/site-packages/text_generation_server/models/causal_lm.py)
=========================== short test summary info ============================
ERROR server/tests/models/test_bloom.py
!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!
=============================== 1 error in 2.36s ===============================
make: *** [Makefile:55: python-tests] Error 2
Error: Process completed with exit code 2.

Do we need to add an extra step to download model files?

TGIS will not download model data at runtime. To populate the local HF hub cache with models so that it can be used per above, the image can be run with the following command:

text-generation-server download-weights model_name

where model_name is the name of the model on the HF hub. Ensure that it's run with the same mounted directory and TRANSFORMERS_CACHE and HUGGINGFACE_HUB_CACHE environment variables, and that it has write access to this mounted filesystem.

Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
@ckadner ckadner marked this pull request as draft October 13, 2023 16:15
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
@ckadner ckadner marked this pull request as draft January 23, 2024 19:07
@ckadner ckadner marked this pull request as ready for review January 26, 2024 05:52
@ckadner
Copy link
Collaborator Author

ckadner commented Jan 26, 2024

@njhill @tjohnson31415 -- I think this one is ready for review (again) 😊

Copy link
Member

@tjohnson31415 tjohnson31415 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: Should change the PR title since this is doing a bunch more than freeing up some disk space now :)

Since the Flash Attention Makefiles are no longer used in the Dockerfile, maybe we should just remove them. @njhill What do you think? Do you use the flash Makefiles?

.github/ISSUE_TEMPLATE/bug_report.md Outdated Show resolved Hide resolved
.github/ISSUE_TEMPLATE/bug_report.md Outdated Show resolved Hide resolved
Dockerfile Outdated
#ARG PYTORCH_INDEX="https://download.pytorch.org/whl"
ARG PYTORCH_INDEX="https://download.pytorch.org/whl/nightly"
ARG PYTORCH_VERSION=2.3.0.dev20231221
#ARG PYTORCH_VERSION=2.3.0.dev20231221
ARG PYTORCH_VERSION=2.2.0.dev20231213
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the nightly PyTorch versions, I think there's a chance of incompatibilities unless the exact build of PyTorch matches with the dependencies. The prebuilt wheel for Flash Attention v2 was compiled with 2.2.0.dev20231127 REF.

It also could just work... but the safest thing would be to exactly match the versions.

Copy link
Collaborator Author

@ckadner ckadner Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we might want to look at the tagged version of the build/publish workflow file based on the flash-attn version, i.e.:

Flash Torch Ref
2.5.0 2.2.0.dev20231130 REF
2.4.3 2.2.0.dev20231130 REF
2.4.2 2.2.0.dev20231106 REF
2.4.1 2.2.0.dev20231106 REF
2.4.0 2.2.0.dev20231106 REF
2.3.6 2.2.0.dev20231127 REF
2.3.5 2.1.0 (no dev) REF

Although, looking at the nightly build index https://download.pytorch.org/whl/nightly/torch/ dev20231127 is no longer available.

I do see torch-2.2.0 here https://download.pytorch.org/whl/torch/ now

So we could just use '2.2.0' instead of '2.2.0.dev20231127'


FYI, I submitted a PR to add the PyTorch 2.3 nightly dev builds to the flash-attention build matrix: Dao-AILab/flash-attention#793

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, https://github.com/IBM/text-generation-inference/actions/runs/7706334748/job/21001719349?pr=12#step:6:696

 > [cpu-tests  2/11] RUN pip install torch=="2.2.0.dev20231127+cpu" --index-url "https://download.pytorch.org/whl/nightly/cpu" --no-cache-dir:
0.587 Looking in indexes: https://download.pytorch.org/whl/nightly/cpu
0.931 ERROR: Could not find a version that satisfies the requirement torch==2.2.0.dev20231127+cpu (from versions: 2.2.0.dev20231010+cpu, 2.2.0.dev20231205+cpu, 2.2.0.dev20231206+cpu, 2.2.0.dev20231207+cpu, 2.2.0.dev20231208+cpu, 2.2.0.dev20231209+cpu, 2.2.0.dev20231210+cpu, 2.2.0.dev20231211+cpu, 2.2.0.dev20231212+cpu, 2.2.0.dev20231213+cpu, 2.3.0.dev20231214+cpu, 2.3.0.dev20231215+cpu, 2.3.0.dev20231216+cpu, 2.3.0.dev20231217+cpu, 2.3.0.dev20231218+cpu, 2.3.0.dev20231219+cpu, 2.3.0.dev20231220+cpu, 2.3.0.dev20231221+cpu, 2.3.0.dev20231222+cpu, 2.3.0.dev20231223+cpu, 2.3.0.dev20231224+cpu, 2.3.0.dev20231225+cpu, 2.3.0.dev20231226+cpu, 2.3.0.dev20231227+cpu, 2.3.0.dev20231228+cpu, 2.3.0.dev20231229+cpu, 2.3.0.dev20231230+cpu, 2.3.0.dev20231231+cpu, 2.3.0.dev20240101+cpu, 2.3.0.dev20240102+cpu, 2.3.0.dev20240103+cpu, 2.3.0.dev20240104+cpu, 2.3.0.dev20240105+cpu, 2.3.0.dev20240106+cpu, 2.3.0.dev20240107+cpu, 2.3.0.dev20240108+cpu, 2.3.0.dev20240109+cpu, 2.3.0.dev20240110+cpu, 2.3.0.dev20240111+cpu, 2.3.0.dev20240113+cpu, 2.3.0.dev20240114+cpu, 2.3.0.dev20240115+cpu, 2.3.0.dev20240116+cpu, 2.3.0.dev20240117+cpu, 2.3.0.dev20240118+cpu, 2.3.0.dev20240119+cpu, 2.3.0.dev20240120+cpu, 2.3.0.dev20240121+cpu, 2.3.0.dev20240122+cpu, 2.3.0.dev20240123+cpu, 2.3.0.dev20240124+cpu, 2.3.0.dev20240125+cpu, 2.3.0.dev20240126+cpu, 2.3.0.dev20240127+cpu, 2.3.0.dev20240128+cpu, 2.3.0.dev20240129+cpu)
0.932 ERROR: No matching distribution found for torch==2.2.0.dev20231127+cpu
------
Dockerfile:181

Copy link
Collaborator Author

@ckadner ckadner Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

torch==2.2.0 works (at least for install/compilation)

@ckadner
Copy link
Collaborator Author

ckadner commented Feb 1, 2024

FYI, I submitted a PR to add the PyTorch 2.3 nightly dev builds to the flash-attention build matrix: Dao-AILab/flash-attention#793

flash-attention version 2.5.2 now has wheels for both torch==2.2.0 and torch==2.3.0.dev20240126

@ckadner
Copy link
Collaborator Author

ckadner commented Feb 5, 2024

@ckadner ckadner closed this Feb 5, 2024
JRosenkranz pushed a commit to JRosenkranz/text-generation-inference-server that referenced this pull request Jul 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GitHub action failing with "No space left on device" error
2 participants