-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable donated_buffer for all ops's backward benchmarking #104
Conversation
tritonbench/utils/triton_op.py
Outdated
@@ -39,6 +39,8 @@ | |||
tqdm = None | |||
|
|||
logger = logging.getLogger(__name__) | |||
# TODO: remove this once we have a better way to handle backward benchmarking | |||
torch._functorch.config.donated_buffer = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems an overkill to me. Can we disable it only in backward and forward_backward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in 20b138d
@FindHao has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@FindHao has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
5217014
to
5aa17b6
Compare
It looks the ci errors are true, and the config is not compatible with flash_attention. I'm going to set this config for |
…_entropy, geglu, swiglu, and layernorm
@FindHao has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@FindHao has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: The previous PR #104 causes the following issue. ``` % python run.py --op geglu --mode fwd --precision fp32 --metrics latency,speedup --csv --cudagraph 0%| | 0/4 [00:03<?, ?it/s] Caught exception, terminating early with partial results Traceback (most recent call last): File "/scratch/yhao/pta/tritonbench/tritonbench/utils/triton_op.py", line 782, in run y_vals: Dict[str, BenchmarkOperatorMetrics] = functools.reduce( File "/scratch/yhao/pta/tritonbench/tritonbench/utils/triton_op.py", line 770, in _reduce_benchmarks acc[bm_name] = self._do_bench( File "/scratch/yhao/pta/tritonbench/tritonbench/utils/triton_op.py", line 981, in _do_bench fn = self._get_bm_func(fn_name) File "/scratch/yhao/pta/tritonbench/tritonbench/utils/triton_op.py", line 667, in _get_bm_func fwd_fn = fwd_fn_lambda(*self.example_inputs) File "/scratch/yhao/pta/tritonbench/tritonbench/utils/triton_op.py", line 481, in _inner return function(self, *args, **kwargs) File "/scratch/yhao/pta/tritonbench/tritonbench/operators/geglu/operator.py", line 69, in inductor_geglu compiled = torch.compile(self.baseline_model) UnboundLocalError: local variable 'torch' referenced before assignment (B, T, H) ``` we should use `from torch._functorch import config` rather than `import torch._functorch.config` Pull Request resolved: #113 Reviewed By: adamomainz Differential Revision: D67110110 Pulled By: FindHao fbshipit-source-id: e5143b06d0e62fb2a7b83464e23126e73a52ee10
It is still a temporary fix for backward benchmarking. Related discussion #40