From d5906b66ac082dd71a8eacc5b3661e85f05e6a39 Mon Sep 17 00:00:00 2001
From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com>
Date: Mon, 4 Nov 2024 16:53:53 +0000
Subject: [PATCH] Python: Bump torch from 2.4.1 to 2.5.0 in /python (#9367)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Bumps [torch](https://github.com/pytorch/pytorch) from 2.4.1 to 2.5.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/pytorch/pytorch/releases">torch's
releases</a>.</em></p>
<blockquote>
<h2>PyTorch 2.5.0 Release, SDPA CuDNN backend, Flex Attention</h2>
<h1>PyTorch 2.5 Release Notes</h1>
<ul>
<li>Highlights</li>
<li>Backwards Incompatible Change</li>
<li>Deprecations</li>
<li>New Features</li>
<li>Improvements</li>
<li>Bug fixes</li>
<li>Performance</li>
<li>Documentation</li>
<li>Developers</li>
<li>Security</li>
</ul>
<h2>Highlights</h2>
<p>We are excited to announce the release of PyTorch® 2.5! This release
features a new CuDNN backend for SDPA, enabling speedups by default for
users of SDPA on H100s or newer GPUs. As well, regional compilation of
torch.compile offers a way to reduce the cold start up time for
torch.compile by allowing users to compile a repeated nn.Module (e.g. a
transformer layer in LLM) without recompilations. Finally, TorchInductor
CPP backend offers solid performance speedup with numerous enhancements
like FP16 support, CPP wrapper, AOT-Inductor mode, and max-autotune
mode.
This release is composed of 4095 commits from 504 contributors since
PyTorch 2.4. We want to sincerely thank our dedicated community for your
contributions. As always, we encourage you to try these out and report
any issues as we improve 2.5. More information about how to get started
with the PyTorch 2-series can be found at our <a
href="https://pytorch.org/get-started/pytorch-2.0/">Getting Started</a>
page.
As well, please check out our new ecosystem projects releases with <a
href="https://github.com/pytorch/torchrec">TorchRec</a> and <a
href="https://github.com/pytorch-labs/torchfix/releases/tag/v0.6.0">TorchFix</a>.</p>
<table>
<thead>
<tr>
<th>Beta</th>
<th>Prototype</th>
</tr>
</thead>
<tbody>
<tr>
<td>CuDNN backend for SDPA</td>
<td>FlexAttention</td>
</tr>
<tr>
<td>torch.compile regional compilation without recompilations</td>
<td>Compiled Autograd</td>
</tr>
<tr>
<td>TorchDynamo added support for exception handling &amp;
MutableMapping types</td>
<td>Flight Recorder</td>
</tr>
<tr>
<td>TorchInductor CPU backend optimization</td>
<td>Max-autotune Support on CPU with GEMM Template</td>
</tr>
<tr>
<td></td>
<td>TorchInductor on Windows</td>
</tr>
<tr>
<td></td>
<td>FP16 support on CPU path for both eager mode and TorchInductor CPP
backend</td>
</tr>
<tr>
<td></td>
<td>Autoload Device Extension</td>
</tr>
<tr>
<td></td>
<td>Enhanced Intel GPU support</td>
</tr>
</tbody>
</table>
<p>*To see a full list of public feature submissions click <a
href="https://docs.google.com/spreadsheets/d/1TzGkWuUMF1yTe88adz1dt2mzbIsZLd3PBasy588VWgk/edit?gid=949287277#gid=949287277">here</a>.</p>
<h3>BETA FEATURES</h3>
<h4>[Beta] CuDNN backend for SDPA</h4>
<p>The cuDNN &quot;Fused Flash Attention&quot; backend was landed for
<code>torch.nn.functional.scaled_dot_product_attention</code>. On NVIDIA
H100 GPUs this can provide up to 75% speed-up over FlashAttentionV2.
This speedup is enabled by default for all users of SDPA on H100 or
newer GPUs.</p>
<h4>[Beta] <em>torch.compile</em> regional compilation without
recompilations</h4>
<p>Regional compilation without recompilations, via
<code>torch._dynamo.config.inline_inbuilt_nn_modules</code> which
default to True in 2.5+. This option allows users to compile a repeated
nn.Module (e.g. a transformer layer in LLM) without recompilations.
Compared to compiling the full model, this option can result in smaller
compilation latencies with 1%-5% performance degradation compared to
full model compilation.</p>
<p>See the <a
href="https://pytorch.org/tutorials/recipes/regional_compilation.html">tutorial</a>
for more information.</p>
<h4>[Beta] TorchInductor CPU backend optimization</h4>
<p>This feature advances Inductor’s CPU backend optimization, including
CPP backend code generation and FX fusions with customized CPU kernels.
The Inductor CPU backend supports vectorization of common data types and
all Inductor IR operations, along with the static and symbolic shapes.
It is compatible with both Linux and Windows OS and supports the default
Python wrapper, the CPP wrapper, and AOT-Inductor mode.</p>
<p>Additionally, it extends the max-autotune mode of the GEMM template
(prototyped in 2.5), offering further performance gains. The backend
supports various FX fusions, lowering to customized kernels such as
oneDNN for Linear/Conv operations and SDPA. The Inductor CPU backend
consistently achieves performance speedups across three benchmark
suites—TorchBench, Hugging Face, and timms—outperforming eager mode in
97.5% of the 193 models tested.</p>
<h3>PROTOTYPE FEATURES</h3>
<h4>[Prototype] FlexAttention</h4>
<p>We've introduced a flexible API that enables implementing various
attention mechanisms such as Sliding Window, Causal Mask, and PrefixLM
with just a few lines of idiomatic PyTorch code. This API leverages
torch.compile to generate a fused FlashAttention kernel, which
eliminates extra memory allocation and achieves performance comparable
to handwritten implementations. Additionally, we automatically generate
the backwards pass using PyTorch's autograd machinery. Furthermore, our
API can take advantage of sparsity in the attention mask, resulting in
significant improvements over standard attention implementations.</p>
<p>For more information and examples, please refer to the <a
href="https://pytorch.org/blog/flexattention/">official blog post</a>
and <a href="https://github.com/pytorch-labs/attention-gym">Attention
Gym</a>.</p>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/pytorch/pytorch/commit/32f585d9346e316e554c8d9bf7548af9f62141fc"><code>32f585d</code></a>
[Release only] use triton 3.1.x from pypi (<a
href="https://redirect.github.com/pytorch/pytorch/issues/137895">#137895</a>)</li>
<li><a
href="https://github.com/pytorch/pytorch/commit/417a0763a7d69f6ce80719ac89c1d2deeee78163"><code>417a076</code></a>
[split build] move periodic split builds into own concurrency group (<a
href="https://redirect.github.com/pytorch/pytorch/issues/135510">#135510</a>)...</li>
<li><a
href="https://github.com/pytorch/pytorch/commit/119e7344d905f69d9b37734becba2ada12641d0c"><code>119e734</code></a>
[RELEASE-ONLY CHANGES] Fix dependency on filesystem on Linux (<a
href="https://redirect.github.com/pytorch/pytorch/issues/137242">#137242</a>)</li>
<li><a
href="https://github.com/pytorch/pytorch/commit/783a6a424c30869f4b8ea8c686b4ce4991415e5c"><code>783a6a4</code></a>
[MPS] Add regression test for <code>fft.fftfreq</code> (<a
href="https://redirect.github.com/pytorch/pytorch/issues/137215">#137215</a>)</li>
<li><a
href="https://github.com/pytorch/pytorch/commit/5375201dff598d2552b1c2f5ead027bd089415c6"><code>5375201</code></a>
[MPS] Add missing dispatch to rshift.Tensor (<a
href="https://redirect.github.com/pytorch/pytorch/issues/137212">#137212</a>)</li>
<li><a
href="https://github.com/pytorch/pytorch/commit/1de132ec9eab74d2a296002e4ce619e09abe2f43"><code>1de132e</code></a>
[MPS] Fix 5D+ reductions over negative dimentions (<a
href="https://redirect.github.com/pytorch/pytorch/issues/137211">#137211</a>)</li>
<li><a
href="https://github.com/pytorch/pytorch/commit/0b1b609ed7387a3e105c813b0d7e3f3c3df2bb36"><code>0b1b609</code></a>
[NCCL] Don't override <code>waitUntilInitialized</code>'s setting of
`comm-&gt;initialized_...</li>
<li><a
href="https://github.com/pytorch/pytorch/commit/0b45af9c10612bc94ce53b9fcc22504629148e7e"><code>0b45af9</code></a>
Fix addmm silent correctness on aarch64 (<a
href="https://redirect.github.com/pytorch/pytorch/issues/137208">#137208</a>)</li>
<li><a
href="https://github.com/pytorch/pytorch/commit/1a0b166ba22be249ace1de014932803c60a17774"><code>1a0b166</code></a>
[ONNX] Add assertion nodes to ignoring list (<a
href="https://redirect.github.com/pytorch/pytorch/issues/137214">#137214</a>)</li>
<li><a
href="https://github.com/pytorch/pytorch/commit/3a541ef8c27ba70eaef1f8612fd99fa4983cfcc7"><code>3a541ef</code></a>
Clarify that <code>libtorch</code> API is C++17 compatible (<a
href="https://redirect.github.com/pytorch/pytorch/issues/137206">#137206</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/pytorch/pytorch/compare/v2.4.1...v2.5.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=torch&package-manager=pip&previous-version=2.4.1&new-version=2.5.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Evan Mattson <35585003+moonbox3@users.noreply.github.com>
---
 python/pyproject.toml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/python/pyproject.toml b/python/pyproject.toml
index 1a501ec1bd78..28db2b5dfe8e 100644
--- a/python/pyproject.toml
+++ b/python/pyproject.toml
@@ -65,7 +65,7 @@ google = [
 hugging_face = [
     "transformers[torch] ~= 4.28",
     "sentence-transformers >= 2.2,< 4.0",
-    "torch == 2.4.1"
+    "torch == 2.5.0"
 ]
 mongo = [
     "pymongo >= 4.8.0, < 4.9",

Beta	Prototype
CuDNN backend for SDPA	FlexAttention
torch.compile regional compilation without recompilations	Compiled Autograd
TorchDynamo added support for exception handling & MutableMapping types	Flight Recorder
TorchInductor CPU backend optimization	Max-autotune Support on CPU with GEMM Template
	TorchInductor on Windows
	FP16 support on CPU path for both eager mode and TorchInductor CPP backend
	Autoload Device Extension
	Enhanced Intel GPU support