How to convert PEFT-LoRA trained model into original whisper architecture? #2582
Replies: 1 comment
-
|
The fact that: result = whisper_model.load_state_dict(filtered_state_dict, strict=False)completes without missing or unexpected keys is encouraging, but it does not guarantee that the weights were mapped correctly. The real clue is this error: This means the model successfully runs through most of the forward pass, but somewhere the activations have already exploded to NaN. First thing I'd verify: merge qualityYou're loading the base model as: WhisperForConditionalGeneration.from_pretrained(
...,
quantization_config=BitsAndBytesConfig(load_in_8bit=True),
)and then doing: merged_model = model.merge_and_unload()This immediately raises a concern. LoRA merging is generally safest when performed on: fp16
bf16
fp32weights. Merging into an 8-bit quantized model can produce unexpected results depending on the PEFT and bitsandbytes versions. I'd try: base_model = WhisperForConditionalGeneration.from_pretrained(
peft_config.base_model_name_or_path,
torch_dtype=torch.float16
)
model = PeftModel.from_pretrained(base_model, peft_model_id)
merged_model = model.merge_and_unload()and then save/check the merged weights before any conversion. Check for NaNs before conversionImmediately after merging: for name, param in merged_model.named_parameters():
if torch.isnan(param).any():
print("NaN found:", name)If NaNs already exist here, the issue is the merge itself. Verify logits in HF before conversionBefore converting to OpenAI format: inputs = processor(...)
outputs = merged_model(**inputs)or simply: merged_model.generate(...)If the HF model works correctly, then the merge is fine and the conversion is the problem. This is the most important diagnostic step. Case 1HF merged model fails: Problem:
Case 2HF merged model works: Problem:
I suspect a mapping issueYour conversion logic is based on older Whisper HF ↔ OpenAI mappings. Large-v2 is fairly sensitive to incorrect mappings because a single wrong normalization or attention projection can cause: The highest-risk mappings are: .final_layer_norm.
.encoder.layer_norm.
.decoder.layer_norm.and proj_out.weightbecause OpenAI and HF organize these slightly differently. Just because all tensor shapes match doesn't mean the tensors belong in the correct location. Compare against a known-good conversionA useful sanity test:
If this produces NaNs: If it works: Check tied embeddingsOne thing I notice: text = re.sub(
'proj_out.weight',
'decoder.token_embedding.weight',
text
)HF Whisper ties: to token embeddings. OpenAI Whisper expects: However, after PEFT merge there may be subtle differences in how tied weights are represented. Verify: merged_model.proj_out.weight.shapematches: whisper_model.decoder.token_embedding.weight.shapeand contains finite values. Check for non-finite weights after conversionAfter creating for k, v in filtered_state_dict.items():
if not torch.isfinite(v).all():
print("Bad tensor:", k)Even one corrupted tensor can lead to the decoder producing NaN logits immediately. My likely diagnosisThe most probable causes, in order, are:
I'd start by verifying that the merged Hugging Face model can successfully transcribe audio before any conversion. If the merged HF model works but the converted OpenAI model produces NaNs, then the issue is almost certainly in the conversion mapping rather than the LoRA training itself. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello, I have trained whisper large-v2 using PEFT-LoRA. I referred https://github.com/huggingface/peft/blob/main/examples/int8_training/peft_bnb_whisper_large_v2_training.ipynb
One can also refer #988 for PEFT-LoRA training.
I am trying to convert the trained model to OpenAI-Whisper's architecture from HuggingFace architecture.
I also trained small model of whisper and large-v2 using deepseed for which I was able to convert the model into OpenAI's structure.
For conversion, I followed #830
The conversion can also be seen at https://github.com/huggingface/transformers/blob/68e85fc822097b3df8d685a4705804348245284d/src/transformers/models/whisper/convert_openai_to_hf.py#L86
Now, the issue is I'm unable to repeat this for the model trained using PEFT-LoRA.
My code is below:
After conversion, there were no missing or unexpected keys as I had removed some extra layers that had SCB, lora_, and weight_norm at the end. I did this after the discussion with GPT.
Now, since the layer swapping happened without any error, I got the below error on transcription that I'm unable to get:
Beta Was this translation helpful? Give feedback.
All reactions