OpenShift AI Caikit+TGIS MLPerf Inference Implementation for Llama2-70b #1

Maxusmusti · 2024-01-19T20:55:30Z

No description provided.

… compliance tests (mlcommons#1576) * Fix offline_min_samples in submission checker and mlcommons#1569 * Removed mlperf.conf from llama2 directory to avoid confusion * Update submission_checker.py * Fixes for 4.0 * Cleanup compliance dir check for models without compliance tests

…emote 'compliance/sources_checksums.json' (mlcommons#1582) Co-authored-by: mlcommons-bot <null>

Co-authored-by: Miro <[email protected]>

…'compliance/check.py' (mlcommons#1587) Co-authored-by: mlcommons-bot <null>

* Ignore trailing whitespace lines in spl.txt files. * Remove fix from sync'ed power_checker.py. * Reformat according to black.

)

…mlcommons#1591) * Add support to dump 10 compliance images during accuracy run for SDXL * Fix typo * Dump caption.txt in the same path

…_log_sampling_target is enabled (mlcommons#1599)

* Fix loadgen token metrics latency constrains * Update perf constraints check for token metrics * Add equal issue mode for LLMs models

* Add sample length check to test06 * Remove spaces in token metrics recomendation * Add important item to Llama readme * Fix Bug: number of tokens logged before computing them * Fix typo: lenght -> length

* Enable equal issue mode for LLM benchmarks * Reduce min_query_count to 1 for server/MS/SS * Remove scenario * Remove min_query_count so default is used; revoke padding change for equal issue offline * Pad min_queries, not samples_per_query for non-offline * Add documentation to the sample equal issue

Co-authored-by: Miro <[email protected]>

* Update README.md No longer need custom fork as the relevant changes are in the inference repository * Update dataset.py --------- Co-authored-by: Miro <[email protected]>

Co-authored-by: Miro <[email protected]>

…and dlrmv2 models (mlcommons#1604) * Update README.md Add CM commands to download Stable diffusion models * Update README.md * Update README.md

* Turn equal issue mode off for Llama2 TEST06 * Add TEST06 to the output dir

* Fix submission checker and TEST06 for Llama2 * Remove redundant line * Move test_dir check

…UNet) (mlcommons#1624) Currently 3D-UNet is the only workload using equal-issue mode on Offline scenario. Recent code change on LLM equal-issue mode caused 3D-UNet accuracy run to run more than 1 queries, causing the accuracy log to bloat and fail the accuracy checking script. This change fixes the problem described above.

* Hotfix: DLRMv2 Audit Test01 fallback failure DLRMv2 Audit TEST01 may go to fallback route and the accuracy check script (accuracy-dlrm.py) didn't expect this to happen. It always expects entire sample set to be in the accuracy log while Audit TEST01 would generate subset only. This fixes the Audit TEST01 failure described above. * typo fix

Maxusmusti force-pushed the api-server branch from 6809071 to 86e3b72 Compare January 22, 2024 17:14

nvyihengz and others added 3 commits January 23, 2024 11:55

Add TEST01 for stable diffusion XL (mlcommons#1574)

4d0e246

Add Llama2 checks and log additional values (mlcommons#1578)

190413d

Maxusmusti force-pushed the api-server branch from 6a43fa5 to 15a8805 Compare January 25, 2024 19:24

mlcommons-bot and others added 20 commits January 25, 2024 19:13

🔄 synced local 'tools/submission/power/sources_checksums.json' with r…

27ef43a

…emote 'compliance/sources_checksums.json' (mlcommons#1582) Co-authored-by: mlcommons-bot <null>

Fix image list mismatch (mlcommons#1579)

9b8006f

Co-authored-by: Miro <[email protected]>

mlcommons#1558 update llama2 reference fp32 accuracy (mlcommons#1583)

180014a

Co-authored-by: Miro <[email protected]>

🔄 synced local 'tools/submission/power/power_checker.py' with remote …

523316e

…'compliance/check.py' (mlcommons#1587) Co-authored-by: mlcommons-bot <null>

Update the main README.md for 4.0 (mlcommons#1586)

3ad8534

Ignore trailing whitespace lines in spl.txt files (mlcommons#1584)

a04b1f5

* Ignore trailing whitespace lines in spl.txt files. * Remove fix from sync'ed power_checker.py. * Reformat according to black.

Add stable diffusion and llama2 to the final spreadsheet (mlcommons#1589

4bdf56f

)

Add support to dump 10 compliance images during accuracy run for SDXL (…

3a902e5

…mlcommons#1591) * Add support to dump 10 compliance images during accuracy run for SDXL * Fix typo * Dump caption.txt in the same path

mlcommons#1598: fix token and sample logging for Llama2 when accuracy…

cc3daae

…_log_sampling_target is enabled (mlcommons#1599)

Fix loadgen token metrics latency constrains (mlcommons#1596)

473053f

* Fix loadgen token metrics latency constrains * Update perf constraints check for token metrics * Add equal issue mode for LLMs models

Add sample length check to test06 (mlcommons#1603)

104d855

* Add sample length check to test06 * Remove spaces in token metrics recomendation * Add important item to Llama readme * Fix Bug: number of tokens logged before computing them * Fix typo: lenght -> length

Set completed samples per second as llama metric (mlcommons#1613)

44285d9

Add upper limit to tokens per sample (mlcommons#1612)

d45a66c

Remove loadgen warnings (mlcommons#1608)

d7dba08

Co-authored-by: Miro <[email protected]>

Update README.md - remove unwanted lines in CM commands (mlcommons#1601)

b0777f0

* Update README.md No longer need custom fork as the relevant changes are in the inference repository * Update dataset.py --------- Co-authored-by: Miro <[email protected]>

Typo fix in README.md (mlcommons#1588)

3190d09

Co-authored-by: Miro <[email protected]>

Update README.md with CM commands to download stable-diffusion, gptj …

840435a

…and dlrmv2 models (mlcommons#1604) * Update README.md Add CM commands to download Stable diffusion models * Update README.md * Update README.md

Turn equal issue mode off for TEST06 (mlcommons#1615)

817dd96

* Turn equal issue mode off for Llama2 TEST06 * Add TEST06 to the output dir

Fix submission checker and TEST06 for Llama2 (mlcommons#1616)

0ed5190

* Fix submission checker and TEST06 for Llama2 * Remove redundant line * Move test_dir check

Maxusmusti force-pushed the api-server branch from cb93b59 to 258c9c6 Compare February 12, 2024 14:52

nv-jinhosuh and others added 2 commits February 12, 2024 14:47

Add number of tokens to offline (mlcommons#1623)

f9a643c

Maxusmusti force-pushed the api-server branch from c5157a9 to 53dd0be Compare February 12, 2024 20:12

Maxusmusti added 21 commits February 27, 2024 14:52

Update image to include ommitted mlperf conf

22eb574

Update for new image version

7bb4c1b

Updated model serving yamls

e96c8a6

First pass: standalone TGIS, grpc, batching

84475d7

Streaming first pass

58e16ea

Updated server impl

f35c17e

Fully functional, updated README

132f725

Update default client-side batches

9f31f19

v8 Update

654dda5

GPT-J first pass

c7f699d

Offline functional, now testing server

081024f

v1 full implementation

43afdea

Update README for gpt-j

f8cc5ba

First pass multi-endpoint

8ae2bf4

Full multiple endpoint support

0ad354e

Random gpt-j vllm experimental bits

6b117e0

Change file names

ebf0710

Added vllm server + multi-endpoint for gpt-j

a357cf4

Minor adjustments

a505e83

Updated for exact values

ef6b3db

Update llama-2 with vllm

230d495

Maxusmusti force-pushed the api-server branch from 6ebfa88 to 230d495 Compare February 27, 2024 19:52

Maxusmusti added 8 commits February 27, 2024 16:11

Fixed output cap bug

48a4396

Fix llama server bug

57e241d

Added v10 image for llama

6ef5023

Updated token gen count for offline llama

3bc09fa

Updated READMEs

38e3aea

Updated server first token inclusion in first query

84f1aac

Updated image in yaml

c7eeef9

Fix first token dtype

8ab5998

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenShift AI Caikit+TGIS MLPerf Inference Implementation for Llama2-70b #1

OpenShift AI Caikit+TGIS MLPerf Inference Implementation for Llama2-70b #1

Maxusmusti commented Jan 19, 2024

OpenShift AI Caikit+TGIS MLPerf Inference Implementation for Llama2-70b #1

Are you sure you want to change the base?

OpenShift AI Caikit+TGIS MLPerf Inference Implementation for Llama2-70b #1

Conversation

Maxusmusti commented Jan 19, 2024