Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: loc("Minference/minference/ops/pit_sparse_flash_attention_v2.py":110:23): error: operation scheduled before its operands #75

Open
leoyuppieqnew opened this issue Sep 18, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@leoyuppieqnew
Copy link

Describe the bug

I encountered this problem when running benchmark_e2e_vllm_tp.py,
loc("/ossfs/workspace/Minference/minference/ops/pit_sparse_flash_atloc(t"e/ntioon_vs2.psy":f110s/:w23)or: error: ksoperation scheduled before its operandsp loc(a"c/e/Moinsfserefnces/min/fwerenocre/opsk/psipatce/_Minfsepreanrscee_/flamsh_atitenntifon_evre2nce./pyo"ps:/pi110t:_s23p)a: rerror: soperation scheduled before its operandse_ flash_attention_v2.py":110:23): error: operation scheduled before its operands
but it still worked. I would like to ask if this will cause any problems

Steps to reproduce

`$python benchmark_e2e_vllm_tp.py --model_name /mntfn/yanyi/qwen2_72b_tuwen_mix_per_tensor_dynamic --attn_type minference --context_window 100_000 --tensor_parallel_size 4
Loading safetensors checkpoint shards: 0% Completed | 0/16 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 6% Completed | 1/16 [00:53<13:23, 53.54s/it]
Loading safetensors checkpoint shards: 12% Completed | 2/16 [01:44<12:08, 52.05s/it]
Loading safetensors checkpoint shards: 19% Completed | 3/16 [02:34<11:06, 51.23s/it]
Loading safetensors checkpoint shards: 25% Completed | 4/16 [05:17<19:01, 95.11s/it]
Loading safetensors checkpoint shards: 31% Completed | 5/16 [06:32<16:07, 87.98s/it]
Loading safetensors checkpoint shards: 38% Completed | 6/16 [07:20<12:23, 74.38s/it]
Loading safetensors checkpoint shards: 44% Completed | 7/16 [09:15<13:10, 87.81s/it]
Loading safetensors checkpoint shards: 50% Completed | 8/16 [10:15<10:30, 78.80s/it]
Loading safetensors checkpoint shards: 56% Completed | 9/16 [11:17<08:33, 73.42s/it]
Loading safetensors checkpoint shards: 62% Completed | 10/16 [13:16<08:46, 87.76s/it]
Loading safetensors checkpoint shards: 69% Completed | 11/16 [17:17<11:13, 134.66s/it]
Loading safetensors checkpoint shards: 75% Completed | 12/16 [17:39<06:40, 100.20s/it]
Loading safetensors checkpoint shards: 81% Completed | 13/16 [21:17<06:47, 135.83s/it]
Loading safetensors checkpoint shards: 88% Completed | 14/16 [24:16<04:57, 148.99s/it]
Loading safetensors checkpoint shards: 94% Completed | 15/16 [27:16<02:38, 158.20s/it]
Loading safetensors checkpoint shards: 100% Completed | 16/16 [27:19<00:00, 111.50s/it]
Loading safetensors checkpoint shards: 100% Completed | 16/16 [27:19<00:00, 102.44s/it]

Patched model for minference with vLLM..
Processed prompts: 0%| | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]loc("/ossfs/workspace/Minference/minference/ops/pit_sparse_flash_atloc(t"e/ntioon_vs2.psy":f110s/:w23)or: error: ksoperation scheduled before its operandsp
loc(a"c/e/Moinsfserefnces/min/fwerenocre/opsk/psipatce/Minfsepreanrscee/flamsh_atitenntifon_evre2nce./pyo"ps:/pi110t:s23p)a: rerror: soperation scheduled before its operandse
flash_attention_v2.py":110:23): error: operation scheduled before its operands
loc("/ossfs/workspace/Minference/minference/ops/pit_sparse_flash_attention_v2.py":110:23): error: operation scheduled before its operands
Processed prompts: 100%|████████████████████████████████████████████████████████████████████| 1/1 [00:57<00:00, 57.96s/it, est. speed input: 1724.83 toks/s, output: 0.02 toks/s]
Processed prompts: 100%|████████████████████████████████████████████████████████████████████| 1/1 [00:55<00:00, 55.87s/it, est. speed input: 1789.22 toks/s, output: 0.02 toks/s]
Processed prompts: 100%|████████████████████████████████████████████████████████████████████| 1/1 [00:55<00:00, 55.76s/it, est. speed input: 1792.78 toks/s, output: 0.02 toks/s]
Processed prompts: 100%|████████████████████████████████████████████████████████████████████| 1/1 [00:55<00:00, 55.78s/it, est. speed input: 1792.20 toks/s, output: 0.02 toks/s]
Processed prompts: 100%|████████████████████████████████████████████████████████████████████| 1/1 [00:55<00:00, 55.78s/it, est. speed input: 1792.07 toks/s, output: 0.02 toks/s]
Processed prompts: 100%|████████████████████████████████████████████████████████████████████| 1/1 [00:55<00:00, 55.79s/it, est. speed input: 1791.81 toks/s, output: 0.02 toks/s]
Processed prompts: 100%|████████████████████████████████████████████████████████████████████| 1/1 [00:55<00:00, 55.76s/it, est. speed input: 1792.69 toks/s, output: 0.02 toks/s]
Processed prompts: 100%|████████████████████████████████████████████████████████████████████| 1/1 [00:55<00:00, 55.77s/it, est. speed input: 1792.53 toks/s, output: 0.02 toks/s]
Processed prompts: 100%|████████████████████████████████████████████████████████████████████| 1/1 [00:55<00:00, 55.76s/it, est. speed input: 1792.75 toks/s, output: 0.02 toks/s]
Processed prompts: 100%|████████████████████████████████████████████████████████████████████| 1/1 [00:55<00:00, 55.78s/it, est. speed input: 1792.03 toks/s, output: 0.02 toks/s]
Processed prompts: 100%|████████████████████████████████████████████████████████████████████| 1/1 [00:55<00:00, 55.78s/it, est. speed input: 1792.13 toks/s, output: 0.02 toks/s]
minference 100000 55.92022843360901`

Expected Behavior

No response

Logs

No response

Additional Information

MInference Version:hjiang/support_vllm_tp
GPU: L20*4
Python Version:3.8

@leoyuppieqnew leoyuppieqnew added the bug Something isn't working label Sep 18, 2024
@leoyuppieqnew leoyuppieqnew changed the title [Bug]: loc("/ossfs/workspace/Minference/minference/ops/pit_sparse_flash_attention_v2.py":110:23): error: operation scheduled before its operands [Bug]: loc("Minference/minference/ops/pit_sparse_flash_attention_v2.py":110:23): error: operation scheduled before its operands Sep 18, 2024
@iofu728
Copy link
Contributor

iofu728 commented Sep 19, 2024

Hi @leoyuppieqnew, thank you for your feedback. This issue is caused by vLLM's compile mode. You can try using enforce_eager to resolve it.

Based on our current tests, this doesn't affect performance or latency.
@Starmys will double-check this issue.

We appreciate your bringing this to our attention.

@leoyuppieqnew
Copy link
Author

Hi @leoyuppieqnew, thank you for your feedback. This issue is caused by vLLM's compile mode. You can try using enforce_eager to resolve it.

Based on our current tests, this doesn't affect performance or latency. @Starmys will double-check this issue.

We appreciate your bringing this to our attention.

Got it, thanks for your reply

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants