Update cached models and benchmarks #705

dacorvo · 2024-09-26T12:05:15Z

What does this PR do?

This modifies the cache population workflow to use the files corresponding to new models.

This also updates the benchmark in the documentation.

A minimum version is required by toch_neuronx (but not enforced).

HuggingFaceDocBuilderDev · 2024-09-26T12:09:58Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

JingyaHuang

LGTM, thanks for the updates!

JingyaHuang · 2024-09-26T12:10:13Z

benchmark/text-generation/mistral_small.py

@@ -27,7 +25,7 @@ def main():
            export=True,
            batch_size=batch_size,
            sequence_length=seq_length,
-            auto_cast_type="fp16",
+            auto_cast_type="bf16",


So it's now proven that bf16 is better?

It actually does not make any difference as far as I can tell, and this seems to allow to load weights faster (because they are not converted).

michaelbenayoun

LGTM

dacorvo · 2024-09-26T15:37:06Z

I have made a small modification in the export code to always set fuse_qkv to True. This ruined my cached test models, so I had to reupload them with that change. I have some errors on docker tests that I don't understand. Let's wait for this new CI run ...

dacorvo added 3 commits September 24, 2024 11:30

fix(setup): specify protobuf minimum version

0bd84dd

A minimum version is required by toch_neuronx (but not enforced).

feat(decoder): always fuse qkv

6e2c988

perf(decoder): use newest models in benchmarks

c3eed64

dacorvo requested review from JingyaHuang and michaelbenayoun September 26, 2024 12:05

ci: update llm cache files

7a8e99c

dacorvo force-pushed the update_benchmarks branch from 37e7222 to 7a8e99c Compare September 26, 2024 12:06

JingyaHuang approved these changes Sep 26, 2024

View reviewed changes

michaelbenayoun approved these changes Sep 26, 2024

View reviewed changes

dacorvo added 2 commits September 26, 2024 14:38

test(decoder): use llama model compatible with fuse_qkv

f5bab7c

test(docker): update mistral expectation

779058f

dacorvo added 2 commits September 27, 2024 09:03

chore(ami): use AWS DLAMI 2.20

a06ac6d

test(tgi): update decode expectation

337d882

dacorvo force-pushed the update_benchmarks branch from d514284 to 337d882 Compare September 27, 2024 09:06

dacorvo merged commit 7180d48 into main Sep 27, 2024
7 of 9 checks passed

dacorvo deleted the update_benchmarks branch September 27, 2024 09:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update cached models and benchmarks #705

Update cached models and benchmarks #705

dacorvo commented Sep 26, 2024

HuggingFaceDocBuilderDev commented Sep 26, 2024

JingyaHuang left a comment

JingyaHuang Sep 26, 2024

dacorvo Sep 26, 2024

michaelbenayoun left a comment

dacorvo commented Sep 26, 2024

Update cached models and benchmarks #705

Update cached models and benchmarks #705

Conversation

dacorvo commented Sep 26, 2024

What does this PR do?

HuggingFaceDocBuilderDev commented Sep 26, 2024

JingyaHuang left a comment

Choose a reason for hiding this comment

JingyaHuang Sep 26, 2024

Choose a reason for hiding this comment

dacorvo Sep 26, 2024

Choose a reason for hiding this comment

michaelbenayoun left a comment

Choose a reason for hiding this comment

dacorvo commented Sep 26, 2024