Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabling transformer and T5 to be quantized with different types #173

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

omidsakhi
Copy link

Enabling transformer and T5 to be quantized with different types including qint8, qfloat8_e4m3fn (qfloat8 is the alias) and qfloat8_e5m2 by specifying quantization_type_transformer or quantization_type_t5 in the model config section of a config yaml file.

…uding qint8, qfloat8_e4m3fn (qfloat8 is the alias) and qfloat8_e5m2 by specifying quantization_type_transformer or quantization_type_t5 in the model config section of a config yaml file.
@omidsakhi
Copy link
Author

A bit of a back story why this PR exists. I have the following spec:

Windows
RTX 4090
48GB RAM
CUDA 12.4
Python 3.12.6
PyTorch 2.4.1+cu124
transformers 4.44.2
diffusers 0.31.0.dev0
optimum-quanto 0.2.4

And I found that ai-toolkit is not able to generate pre-training samples and/or training due to qfloat8 quantization. The generation creates black (blank) images for me due to encountering invalid values. the training encounters inf values. The solution that I have found so far is to switch the quantization from qfloat8 to qint8 for both transformer and T5. At this point it is not clear which of the modules above are causing the qfloat8 quantization to fail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant