Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bitsandbytes quantization extension #19

Open
ferrazzipietro opened this issue May 3, 2024 · 3 comments
Open

Bitsandbytes quantization extension #19

ferrazzipietro opened this issue May 3, 2024 · 3 comments

Comments

@ferrazzipietro
Copy link

Hi,
thanks for sharing the code. I have tryed to use your repo using bitsandbytes for model quantization. Unfortunately, the training process does not work: the layers defined in modelling_llama.py as

        self.dropout = nn.Dropout(classifier_dropout)
        self.classifier = nn.Linear(config.hidden_size, config.num_labels)

do not get trained, and after finetuning they contain only nanvalues. I guess it is a data type conflict, as the hidden layers are loaded in 4/8 bits, while the classifier is still saved in memory as float16... Any clue/plan on how to fix that?

@SeanLee97
Copy link
Contributor

Hi @ferrazzipietro, we didn’t test 4/8 bit training. Which backbone do you use? If the backbone is not LLaMA, it is better to specify the targert_modules explicitly.

@SeanLee97
Copy link
Contributor

BTW, you can also try to use https://github.com/WhereIsAI/BiLLM.
This one supports the latest transformers.

@ferrazzipietro
Copy link
Author

ferrazzipietro commented May 6, 2024

I have tried Llama and Mistral, both resulting in nans weights. I've seen the new repo as well, but the issue persists. I will let you know if I'll have the chance to deep into it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants