Skip to content

Commit ff169c3

Browse files
authored
Merge branch 'main' into dev/iirzynsk/test_prefix_caching
2 parents 79a1ece + 8c08770 commit ff169c3

File tree

2 files changed

+68
-13
lines changed

2 files changed

+68
-13
lines changed

docs/dev_guide/ci-failures.md

Lines changed: 51 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,53 @@
11
# CI Failures
22

3-
WIP
3+
## CI
4+
5+
For all PRs that are created in vllm-gaudi repository all checks in CI are required:
6+
- pre-commit & DCO
7+
- HPU tests
8+
- HPU Gaudi tests
9+
10+
### Pre-commit & DCO
11+
To install run:
12+
13+
```pre-commit install```
14+
15+
This way all of your commits should be correctly formated and signed-off. If you need to manually sign off your commits, remember to use ```git commit -s``` to pass DCO.
16+
17+
### HPU tests
18+
HPU tests consist of several unit tests:
19+
- pre merge tests
20+
- unit tests
21+
- perf test
22+
- feature tests
23+
- e2e tests
24+
25+
All of the above tests are mandatory. Those tests operate in fast fail mode, meaning if one test fails, all of the others won't be triggered.
26+
27+
### HPU Gaudi tests
28+
Additional Gaudi tests are expectd to pass, but aren't mandatory. Those tests are being run on internal Jenkins system, so results are internal only. Those tests can be run by CODEOWNERs and TESTOWNERs only.
29+
30+
## Docs Pull Requests
31+
All PRs that do not interfere in code, like docstring changes or README updates can be merged without HPU tests and Gaudi tests. It is still required to pass pre-commit check.
32+
33+
## Hourly Checks and Tests
34+
On vllm-gaudi repository hourly tests can be found in ```Hourly Commit Check and Tests``` under ```Actions``` tab. This tab also allows developers to manually trigger hourly tests on selected branch.
35+
36+
If the last hourly test is failing it means that vllm-gaudi main branch doesn't work with upstream newest main commit. To find last good commit check [last good commit](https://github.com/vllm-project/vllm-gaudi/blob/vllm/last-good-commit-for-vllm-gaudi/VLLM_STABLE_COMMIT).
37+
38+
Failing hourly checks will be fixed by developers as soon as possible.
39+
40+
## Troubleshooting
41+
### Unreleated failures
42+
Sometimes there may be some issues that are unreleated to your specific changes in code. Often causeb by connection problems. In this case failed checks should be reruned. Those errors are:
43+
- ```Error response from daemon: No such container```
44+
- ```ValueError: Unsupported device: the device type is 7.```
45+
- ```[Device not found] Device acquire failed.```
46+
47+
### Accuracy and functionality issues
48+
Accuracy issues can be tracked in HPU Gaudi tests with gsm8k runs. If any check fails with accuracy - too low accuracy compare to the one measured, or functionality issues, the **PR can't be merged** until solved.
49+
50+
### Pre-commit failures
51+
To run pre-commit test manually run:
52+
53+
```pre-commit run --show-diff-on-failure --color=always --all-files --hook-stage manual```

vllm_gaudi/v1/worker/hpu_model_runner.py

Lines changed: 17 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -3007,19 +3007,24 @@ def execute_model(
30073007

30083008
self.event_start = self.profiler.get_timestamp_us()
30093009
self.profiler.start("internal", "prefill")
3010-
# Align behavior of incomplete prompt with gpu_model_runner
3011-
# If logits_indices is smaller than req_id,
3012-
# add the last token position
3010+
# NOTE(tianmu-li): Align behavior of incomplete prompt with gpu_model_runner
3011+
# If logits_indices is smaller than req_id, the last request is a chunked prompt request that
3012+
# hasn't finished in this step. We add the last token position to logits_indices to ensure
3013+
# the last token of the chunk is sampled. This sampled token will be discarded later
30133014
if logits_indices.shape[0] < len(req_id):
3014-
if structured_output:
3015-
logits_append = torch.tensor([torch.sum(prompt_len) - 1],
3016-
device=token_ids.device,
3017-
dtype=torch.int32)
3018-
logits_indices = torch.cat([logits_indices, logits_append])
3019-
elif self.use_async_scheduling:
3020-
# Discard partial prefill logits for async scheduling
3015+
if structured_output or self.use_async_scheduling:
3016+
# When there are multiple requests in the batch (e.g. self.use_merged_prefill=True),
3017+
# the last token position is the sum of all prompt lengths - 1
3018+
# This logic also holds when there is only one request in the batch
3019+
logits_indices_append = torch.tensor([torch.sum(prompt_len) - 1],
3020+
device=token_ids.device,
3021+
dtype=torch.int32)
3022+
logits_indices = torch.cat([logits_indices, logits_indices_append])
3023+
if self.use_async_scheduling:
3024+
# Discard partial prefill logit for async scheduling
30213025
# Depends on 1 decode token/batch
3022-
invalid_req_indices.append(num_decodes + idx)
3026+
prefill_start_idx = num_decodes
3027+
invalid_req_indices.append(prefill_start_idx + idx)
30233028
htorch.core.mark_step()
30243029
non_flattened_hidden_states, aux_hidden_states, \
30253030
sample_hidden_states, logits_device = \
@@ -3321,7 +3326,7 @@ def execute_model(
33213326
return AsyncHPUModelRunnerOutput(
33223327
model_runner_output=model_runner_output,
33233328
sampled_token_ids=sampled_token_ids,
3324-
invalid_req_indices=[],
3329+
invalid_req_indices=invalid_req_indices,
33253330
async_output_copy_stream=self.async_output_copy_stream,
33263331
)
33273332
model_runner_output = ModelRunnerOutput(

0 commit comments

Comments
 (0)