feat: Add torch.compile support for ~2x faster inference by fspecii · Pull Request #16 · fspecii/HeartMuLa-Studio

fspecii · 2026-01-25T01:09:11Z

Summary

This PR adds support for PyTorch 2.0+ torch.compile to significantly improve inference speed (~2x on supported GPUs like RTX 4090, A100).

Changes

New Environment Variables

HEARTMULA_COMPILE: Enable torch.compile (true/false, default: false)
HEARTMULA_COMPILE_MODE: Compile optimization mode (default/reduce-overhead/max-autotune)

Code Changes

Added apply_torch_compile() function that compiles the model backbone and decoder
Auto-detects Triton availability and falls back to eager backend if not found
Updated create_quantized_pipeline() with compile parameters
Applied torch.compile to both quantized and non-quantized pipelines
All configurations (single GPU, multi-GPU, sequential offload) now support torch.compile

Documentation

Updated .env.example with new configuration options
Added torch.compile section to README.md with usage examples

Usage

# Enable torch.compile
HEARTMULA_COMPILE=true python -m uvicorn backend.app.main:app --host 0.0.0.0 --port 8000

# With max performance mode
HEARTMULA_COMPILE=true HEARTMULA_COMPILE_MODE=max-autotune ./start.sh

Requirements

PyTorch 2.0+
Linux/WSL2: pip install triton
Windows: pip install -U 'triton-windows>=3.2,<3.3'

Notes

First generation will be slower due to compilation
Subsequent generations benefit from compiled kernels
Falls back gracefully if compilation fails

- Add HEARTMULA_COMPILE and HEARTMULA_COMPILE_MODE environment variables - Add apply_torch_compile() function that compiles backbone and decoder - Auto-detect Triton availability (inductor vs eager backend) - Support compile modes: default, reduce-overhead, max-autotune - Update create_quantized_pipeline() with compile_model/compile_mode params - Apply torch.compile to both quantized and non-quantized pipelines - Update .env.example with new configuration options - Update README.md with torch.compile documentation Note: First generation will be slower due to compilation, but subsequent generations benefit from the compiled kernels (~2x speedup on RTX 4090, A100). Requirements: - PyTorch 2.0+ - Linux/WSL2: pip install triton - Windows: pip install -U 'triton-windows>=3.2,<3.3'

fspecii merged commit 130674b into main Jan 25, 2026

fspecii deleted the feature/torch-compile-support branch January 25, 2026 01:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add torch.compile support for ~2x faster inference#16

feat: Add torch.compile support for ~2x faster inference#16
fspecii merged 1 commit intomainfrom
feature/torch-compile-support

fspecii commented Jan 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fspecii commented Jan 25, 2026

Summary

Changes

New Environment Variables

Code Changes

Documentation

Usage

Requirements

Notes

Related

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant