How to enable TF32 for CUDA GEMM?

Hi tch-rs community!

I wish to ask a short question. How to enable TF32 in tch (for CUDA)? This option can make recent NVidia GPUs extremely fast when precision accuracy is not significant. I tried to search `TF32` or `precision` in code of this crate, but could not find this option.
My workaround is using TF32 for GEMM, but not CUDNN.

------

For other tools, in `candle-core` of rust, this is done by

```rust
candle_core::cuda::set_gemm_reduced_precision_f32(true);
```

In `cudarc`, it seems that `TF32` is not available for `cublas` wrapper, and `TF32` is enforced (hardcoded) in [`cublaslt` wrapper](https://github.com/coreylowman/cudarc/blob/e4f4ec3399877c736d5c346103ae4e023c0df28b/src/cublaslt/safe.rs#L422-L430). So suing  `cudarc::cublaslt::safe` will automatically call GEMM with TF32.

In `pytorch` of python, this is done by (https://pytorch.org/docs/stable/notes/cuda.html)

```python
torch.backends.cuda.matmul.allow_tf32 = True
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to enable TF32 for CUDA GEMM? #916

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

How to enable TF32 for CUDA GEMM? #916

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions