test: Free up disk space for GH actions #12

ckadner · 2023-10-12T23:51:02Z

Changes in this PR:

add GitHub issue and pull request templates
add GitHub action to free up disk space
split build and test into separate workflows
run Python tests and integration tests in parallel, build test image once
and reuse it for Python tests and integration tests
temporarily exclude/skip failing Python tests
temporarily build all stages of Dockerfile sequentially in order to be able to capture build logs
separate flash-attention v1 build from rotary-embeddings and dropout-layer-norm builds
run pip install to use pre-built wheels and limit parallel compilation threads for flash-attention v2 build to avoid OOM error
~~upgrade flash-attention v2 to 2.3.2~~ (flash-attention <= 2.0.5 has no pre-built wheels, CI build runs out of memory)
build on push to main
add build status badge to README.md
use PyTorch 2.2 instead of 2.3 to make use of pre-built wheels for flash-attn-v2

Disk space on GH action runner before and after clean up:

Disk usage before cleanup:
Filesystem      Size  Used Avail Use% Mounted on
/dev/root        84G   66G   18G  80% /
...

Removing non-essential tools and libraries ...
Deleting libraries for Android (12G), CodeQL (5.3G), PowerShell (1.3G), Swift (1.7G) ...

Disk usage after cleanup:
Filesystem      Size  Used Avail Use% Mounted on
/dev/root        84G   39G   45G  47% /
...

Pruning Docker images ...
Total reclaimed space: 5.697GB

Disk usage after pruning docker images:
Filesystem      Size  Used Avail Use% Mounted on
/dev/root        84G   33G   51G  40% /
...

Resolves #11

Resolves IBM#11 Signed-off-by: Christian Kadner <[email protected]>

Signed-off-by: Christian Kadner <[email protected]>

ckadner · 2023-10-13T01:33:49Z

@njhill -- I could use some newbie advice :-)

Python tests fail with error:

ImportError: cannot import name 'BloomCausalLMBatch' from 'text_generation_server.models.causal_lm' (/usr/local/lib/python3.9/site-packages/text_generation_server/models/causal_lm.py)

docker run --rm -v /tmp/transformers_cache:/transformers_cache \
	-e HUGGINGFACE_HUB_CACHE=/transformers_cache \
	-e TRANSFORMERS_CACHE=/transformers_cache cpu-tests:0 pytest -sv --ignore=server/tests/test_utils.py server/tests
============================= test session starts ==============================
platform linux -- Python 3.9.16, pytest-7.4.2, pluggy-1.3.0 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /usr/src
plugins: asyncio-0.21.1
asyncio: mode=strict
collecting ... collected 33 items / 1 error

==================================== ERRORS ====================================
______________ ERROR collecting server/tests/models/test_bloom.py ______________
ImportError while importing test module '/usr/src/server/tests/models/test_bloom.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../lib64/python3.9/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
server/tests/models/test_bloom.py:8: in <module>
    from text_generation_server.models.causal_lm import BloomCausalLMBatch, BLOOM
E   ImportError: cannot import name 'BloomCausalLMBatch' from 'text_generation_server.models.causal_lm' (/usr/local/lib/python3.9/site-packages/text_generation_server/models/causal_lm.py)
=========================== short test summary info ============================
ERROR server/tests/models/test_bloom.py
!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!
=============================== 1 error in 2.36s ===============================
make: *** [Makefile:55: python-tests] Error 2
Error: Process completed with exit code 2.

Do we need to add an extra step to download model files?

TGIS will not download model data at runtime. To populate the local HF hub cache with models so that it can be used per above, the image can be run with the following command:

text-generation-server download-weights model_name

where model_name is the name of the model on the HF hub. Ensure that it's run with the same mounted directory and TRANSFORMERS_CACHE and HUGGINGFACE_HUB_CACHE environment variables, and that it has write access to this mounted filesystem.

Signed-off-by: Christian Kadner <[email protected]>

ckadner · 2024-01-26T05:56:52Z

@njhill @tjohnson31415 -- I think this one is ready for review (again) 😊

tjohnson31415

NIT: Should change the PR title since this is doing a bunch more than freeing up some disk space now :)

Since the Flash Attention Makefiles are no longer used in the Dockerfile, maybe we should just remove them. @njhill What do you think? Do you use the flash Makefiles?

.github/ISSUE_TEMPLATE/bug_report.md

tjohnson31415 · 2024-01-26T21:35:48Z

Dockerfile

 #ARG PYTORCH_INDEX="https://download.pytorch.org/whl"
 ARG PYTORCH_INDEX="https://download.pytorch.org/whl/nightly"
-ARG PYTORCH_VERSION=2.3.0.dev20231221
+#ARG PYTORCH_VERSION=2.3.0.dev20231221
+ARG PYTORCH_VERSION=2.2.0.dev20231213


With the nightly PyTorch versions, I think there's a chance of incompatibilities unless the exact build of PyTorch matches with the dependencies. The prebuilt wheel for Flash Attention v2 was compiled with 2.2.0.dev20231127 REF.

It also could just work... but the safest thing would be to exactly match the versions.

Sure, we might want to look at the tagged version of the build/publish workflow file based on the flash-attn version, i.e.:

Flash Torch Ref

2.5.0 2.2.0.dev20231130 REF

2.4.3 2.2.0.dev20231130 REF

2.4.2 2.2.0.dev20231106 REF

2.4.1 2.2.0.dev20231106 REF

2.4.0 2.2.0.dev20231106 REF

2.3.6 2.2.0.dev20231127 REF

2.3.5 2.1.0 (no dev) REF

Although, looking at the nightly build index https://download.pytorch.org/whl/nightly/torch/ dev20231127 is no longer available.

I do see torch-2.2.0 here https://download.pytorch.org/whl/torch/ now

So we could just use '2.2.0' instead of '2.2.0.dev20231127'

FYI, I submitted a PR to add the PyTorch 2.3 nightly dev builds to the flash-attention build matrix: Dao-AILab/flash-attention#793

Yup, https://github.com/IBM/text-generation-inference/actions/runs/7706334748/job/21001719349?pr=12#step:6:696

> [cpu-tests 2/11] RUN pip install torch=="2.2.0.dev20231127+cpu" --index-url "https://download.pytorch.org/whl/nightly/cpu" --no-cache-dir: 0.587 Looking in indexes: https://download.pytorch.org/whl/nightly/cpu 0.931 ERROR: Could not find a version that satisfies the requirement torch==2.2.0.dev20231127+cpu (from versions: 2.2.0.dev20231010+cpu, 2.2.0.dev20231205+cpu, 2.2.0.dev20231206+cpu, 2.2.0.dev20231207+cpu, 2.2.0.dev20231208+cpu, 2.2.0.dev20231209+cpu, 2.2.0.dev20231210+cpu, 2.2.0.dev20231211+cpu, 2.2.0.dev20231212+cpu, 2.2.0.dev20231213+cpu, 2.3.0.dev20231214+cpu, 2.3.0.dev20231215+cpu, 2.3.0.dev20231216+cpu, 2.3.0.dev20231217+cpu, 2.3.0.dev20231218+cpu, 2.3.0.dev20231219+cpu, 2.3.0.dev20231220+cpu, 2.3.0.dev20231221+cpu, 2.3.0.dev20231222+cpu, 2.3.0.dev20231223+cpu, 2.3.0.dev20231224+cpu, 2.3.0.dev20231225+cpu, 2.3.0.dev20231226+cpu, 2.3.0.dev20231227+cpu, 2.3.0.dev20231228+cpu, 2.3.0.dev20231229+cpu, 2.3.0.dev20231230+cpu, 2.3.0.dev20231231+cpu, 2.3.0.dev20240101+cpu, 2.3.0.dev20240102+cpu, 2.3.0.dev20240103+cpu, 2.3.0.dev20240104+cpu, 2.3.0.dev20240105+cpu, 2.3.0.dev20240106+cpu, 2.3.0.dev20240107+cpu, 2.3.0.dev20240108+cpu, 2.3.0.dev20240109+cpu, 2.3.0.dev20240110+cpu, 2.3.0.dev20240111+cpu, 2.3.0.dev20240113+cpu, 2.3.0.dev20240114+cpu, 2.3.0.dev20240115+cpu, 2.3.0.dev20240116+cpu, 2.3.0.dev20240117+cpu, 2.3.0.dev20240118+cpu, 2.3.0.dev20240119+cpu, 2.3.0.dev20240120+cpu, 2.3.0.dev20240121+cpu, 2.3.0.dev20240122+cpu, 2.3.0.dev20240123+cpu, 2.3.0.dev20240124+cpu, 2.3.0.dev20240125+cpu, 2.3.0.dev20240126+cpu, 2.3.0.dev20240127+cpu, 2.3.0.dev20240128+cpu, 2.3.0.dev20240129+cpu) 0.932 ERROR: No matching distribution found for torch==2.2.0.dev20231127+cpu ------ Dockerfile:181

torch==2.2.0 works (at least for install/compilation)

Signed-off-by: Christian Kadner <[email protected]>

…on-v2 Signed-off-by: Christian Kadner <[email protected]>

Signed-off-by: Christian Kadner <[email protected]>

ckadner · 2024-02-01T03:24:00Z

FYI, I submitted a PR to add the PyTorch 2.3 nightly dev builds to the flash-attention build matrix: Dao-AILab/flash-attention#793

flash-attention version 2.5.2 now has wheels for both torch==2.2.0 and torch==2.3.0.dev20240126

ckadner · 2024-02-05T23:33:03Z

Replaced by:

…ing (IBM#13) Fixes IBM#12 in the easiest way I could think of.

test: Free up disk space for GH actions

7313416

Resolves IBM#11 Signed-off-by: Christian Kadner <[email protected]>

ckadner requested review from alex-jw-brooks and njhill October 12, 2023 23:54

ckadner added 2 commits October 12, 2023 17:55

Separate build and test

4f7dee7

Signed-off-by: Christian Kadner <[email protected]>

Checkout project first

db100f5

Signed-off-by: Christian Kadner <[email protected]>

ckadner mentioned this pull request Oct 13, 2023

chore: Add initial set of GitHub actions #10

Merged

ckadner added 4 commits October 12, 2023 19:08

buildkit progress plain

8d46337

Signed-off-by: Christian Kadner <[email protected]>

docker build no progress

0982da8

Signed-off-by: Christian Kadner <[email protected]>

build stages sequentially

b2c2771

Signed-off-by: Christian Kadner <[email protected]>

workflow file formatting error

939020c

Signed-off-by: Christian Kadner <[email protected]>

ckadner marked this pull request as draft October 13, 2023 16:15

ckadner added 18 commits October 13, 2023 11:16

use separate test jobs

a89ef73

Signed-off-by: Christian Kadner <[email protected]>

reuse test image built once

41e19c3

Signed-off-by: Christian Kadner <[email protected]>

add 20 min timeout for long running build stages

558e5a2

Signed-off-by: Christian Kadner <[email protected]>

pip install flash-attn

2be4ab7

Signed-off-by: Christian Kadner <[email protected]>

copy flash-attn files from conda site-packages folder

8fd202d

Signed-off-by: Christian Kadner <[email protected]>

copy flash-attn files from conda site-packages folder

eeebcaa

Signed-off-by: Christian Kadner <[email protected]>

pip install flash-attention dropout-layer-norm and rotary-embedding

7fa387c

Signed-off-by: Christian Kadner <[email protected]>

Merge branch 'main' into fix_no_space_left_on_device_error

e8e7ca7

revert to flash-attn v2 @latest (2.3.2)

7ad939b

Signed-off-by: Christian Kadner <[email protected]>

explicitly set FLASH_ATTN_V2_VERSION=2.3.2

b9d7142

Signed-off-by: Christian Kadner <[email protected]>

restore flash-attn versions 1.0.9 and 2.0.4

305ebd0

Signed-off-by: Christian Kadner <[email protected]>

set FLASH_ATTN_V2_VERSION=2.3.2

19d86f6

Signed-off-by: Christian Kadner <[email protected]>

use flash-attn v2.0.6 and set build optimization flags

27a36e3

Signed-off-by: Christian Kadner <[email protected]>

build on push, add build status badge to README.md

50c94a2

Signed-off-by: Christian Kadner <[email protected]>

remove build flags from flash-attention v2 build

afab95d

Signed-off-by: Christian Kadner <[email protected]>

cleanup Dockerfile

ef8d680

Signed-off-by: Christian Kadner <[email protected]>

upgrade flash-att-v2 from 2.0.6 to 2.3.3 (no build optimization flags)

0bb45b5

Signed-off-by: Christian Kadner <[email protected]>

update Makefile to use flash-attention to v2.3.3

e35f7d4

Signed-off-by: Christian Kadner <[email protected]>

Merge branch 'main' into fix_no_space_left_on_device_error

1beaf30

ckadner marked this pull request as draft January 23, 2024 19:07

ckadner added 8 commits January 23, 2024 18:47

update Miniconda site-packages for Python 3.11

26f14cf

Signed-off-by: Christian Kadner <[email protected]>

update build stages

8250c09

Signed-off-by: Christian Kadner <[email protected]>

fix python vs pip typo

7eabcc3

Signed-off-by: Christian Kadner <[email protected]>

add PrefixCache.return_zero=False in test_prompt_cache

23327db

Signed-off-by: Christian Kadner <[email protected]>

use flash-attention-v2 wheels from 2.3.2

29ce5b6

Signed-off-by: Christian Kadner <[email protected]>

no pre-built wheels for flash-attention-v2 on torch2.3

c779473

Signed-off-by: Christian Kadner <[email protected]>

install flash-attn-v2 with PyTorch 2.3 but SKIP_CUDA_BUILD

f432d9d

Signed-off-by: Christian Kadner <[email protected]>

use torch2.2 to get pre-built wheels for flash-attention-v2

0082268

Signed-off-by: Christian Kadner <[email protected]>

ckadner marked this pull request as ready for review January 26, 2024 05:52

tjohnson31415 reviewed Jan 26, 2024

View reviewed changes

ckadner added 9 commits January 29, 2024 20:33

Merge branch 'main' into fix_no_space_left_on_device_error

2b504fe

use torch=2.2.0 to get pre-built wheels for flash-attention-v2

e111053

Signed-off-by: Christian Kadner <[email protected]>

use torch=2.2.0.dev20231213 to get pre-built wheels for flash-attenti…

c3ba91e

…on-v2 Signed-off-by: Christian Kadner <[email protected]>

update bug_report template with review suggestions

0841f64

Signed-off-by: Christian Kadner <[email protected]>

ARG CONDA_ENV=/opt/tgis

2055bc9

Signed-off-by: Christian Kadner <[email protected]>

pip install packaging

c6c4db7

Signed-off-by: Christian Kadner <[email protected]>

pip install packaging (flash-attn-v1)

00650aa

Signed-off-by: Christian Kadner <[email protected]>

pip install packaging (python-builder)

2a946ef

Signed-off-by: Christian Kadner <[email protected]>

update flash-attn to v2.5.2 to get pre-built wheels for PyTorch 2.3

b51841f

Signed-off-by: Christian Kadner <[email protected]>

ckadner requested a review from tjohnson31415 February 1, 2024 08:53

This was referenced Feb 3, 2024

test: Separate test and build workflows #23

Merged

chore: Add issue templates #24

Merged

ckadner closed this Feb 5, 2024

ckadner mentioned this pull request Feb 28, 2024

ci: Cleanup old build cache images #42

Merged

JRosenkranz pushed a commit to JRosenkranz/text-generation-inference-server that referenced this pull request Jul 10, 2024

fix(server): Use cleanup_tokenization_spaces=False for lossless decod…

b94f302

…ing (IBM#13) Fixes IBM#12 in the easiest way I could think of.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: Free up disk space for GH actions #12

test: Free up disk space for GH actions #12

ckadner commented Oct 12, 2023 •

edited

Loading

ckadner commented Oct 13, 2023 •

edited

Loading

ckadner commented Jan 26, 2024

tjohnson31415 left a comment

tjohnson31415 Jan 26, 2024

ckadner Jan 30, 2024 •

edited

Loading

ckadner Jan 30, 2024

ckadner Jan 30, 2024 •

edited

Loading

ckadner commented Feb 1, 2024 •

edited

Loading

ckadner commented Feb 5, 2024

Flash	Torch	Ref
2.5.0	2.2.0.dev20231130	REF
2.4.3	2.2.0.dev20231130	REF
2.4.2	2.2.0.dev20231106	REF
2.4.1	2.2.0.dev20231106	REF
2.4.0	2.2.0.dev20231106	REF
2.3.6	2.2.0.dev20231127	REF
2.3.5	2.1.0 (no dev)	REF

test: Free up disk space for GH actions #12

test: Free up disk space for GH actions #12

Conversation

ckadner commented Oct 12, 2023 • edited Loading

ckadner commented Oct 13, 2023 • edited Loading

ckadner commented Jan 26, 2024

tjohnson31415 left a comment

Choose a reason for hiding this comment

tjohnson31415 Jan 26, 2024

Choose a reason for hiding this comment

ckadner Jan 30, 2024 • edited Loading

Choose a reason for hiding this comment

ckadner Jan 30, 2024

Choose a reason for hiding this comment

ckadner Jan 30, 2024 • edited Loading

Choose a reason for hiding this comment

ckadner commented Feb 1, 2024 • edited Loading

ckadner commented Feb 5, 2024

ckadner commented Oct 12, 2023 •

edited

Loading

ckadner commented Oct 13, 2023 •

edited

Loading

ckadner Jan 30, 2024 •

edited

Loading

ckadner Jan 30, 2024 •

edited

Loading

ckadner commented Feb 1, 2024 •

edited

Loading