-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DRAFT] Generalize MHA pattern #2092
base: main
Are you sure you want to change the base?
Conversation
# if no_match(mask, ["B", "H", "S", "St"]): | ||
# return False |
Check notice
Code scanning / CodeQL
Commented-out code Note
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix AI 1 day ago
To fix the problem, we need to either remove the commented-out code or reinstate it if it is necessary for the functionality. Given the presence of TODO
comments, it is likely that the commented-out code was intended to be revisited and possibly reinstated. Therefore, the best approach is to reinstate the commented-out code and ensure that it is functional.
- Reinstate the commented-out code on lines 161-162 and 173-178.
- Ensure that the reinstated code is properly integrated and does not cause any issues.
-
Copy modified lines R161-R162 -
Copy modified lines R173-R178
@@ -160,4 +160,4 @@ | ||
# TODO: broadcast check | ||
# if no_match(mask, ["B", "H", "S", "St"]): | ||
# return False | ||
if no_match(mask, ["B", "H", "S", "St"]): | ||
return False | ||
if no_match(past_key, ["B", "H", "Spast", "Dh"]): | ||
@@ -172,8 +172,8 @@ | ||
return False | ||
# if not status: | ||
# return False | ||
# if bindings["B"] * bindings["H"] != bindings["B*H"]: | ||
# return False | ||
# if bindings["H"] * bindings["Dh"] != bindings["H*Dh"]: | ||
# return False | ||
if not status: | ||
return False | ||
if bindings["B"] * bindings["H"] != bindings["B*H"]: | ||
return False | ||
if bindings["H"] * bindings["Dh"] != bindings["H*Dh"]: | ||
return False | ||
return True |
if ort_version >= packaging.version.Version("1.20"): | ||
# Run model again | ||
new_outputs = ort_run("optimized", model, inputs) | ||
assert_allclose(new_outputs, original_outputs) |
Check failure
Code scanning / CodeQL
Potentially uninitialized local variable Error
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix AI 1 day ago
To fix the problem, we need to ensure that original_outputs
is initialized before it is used. One way to achieve this is to initialize original_outputs
to None
before the conditional block and then check if it is None
before using it. This ensures that original_outputs
is always defined, and we can handle the case where it is not set due to the condition not being met.
-
Copy modified line R25 -
Copy modified line R37
@@ -24,3 +24,3 @@ | ||
xformers.fuse_cos_sin_cache(model) | ||
|
||
original_outputs = None | ||
if ort_version >= packaging.version.Version("1.20"): | ||
@@ -36,3 +36,3 @@ | ||
|
||
if ort_version >= packaging.version.Version("1.20"): | ||
if ort_version >= packaging.version.Version("1.20") and original_outputs is not None: | ||
# Run model again |
❌ 83 Tests Failed:
View the top 2 failed test(s) by shortest run time
View the full list of 1 ❄️ flaky tests
To view more test analytics, go to the Test Analytics Dashboard |
if dim < 0: | ||
dim += shape.rank() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be absorbed into return shape[dim]
?
@@ -21,6 +22,9 @@ def _save(model, modelpath): | |||
io.save(model, modelpath) | |||
|
|||
|
|||
ort_version = packaging.version.Version(onnxruntime.__version__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this used? If so
ort_version = packaging.version.Version(onnxruntime.__version__) | |
_ORT_VERSION = packaging.version.Version(onnxruntime.__version__) |
Generalize the MHA pattern (motivated by the Phi models). Specifically, we remove the initial MatMuls from the pattern (as being unnecessary). Phi uses packed MatMul (Q, K, and V are multiplied using a single MatMul and then sliced).
However, this is not sufficient yet, since Phi also uses partial rotary-embedding, which is not yet supported by the RotaryEmbedding pattern. I will separately work on the extension to the RotaryEmbedding pattern to handle partial embedding.