Skip to content

Conversation

@mkpatel3-github
Copy link
Contributor

Root cause: PEFT removes LM head weight entry from Linear._parameters dict while leaving the attribute intact. When model moves to HPU and tie_weights() runs, PyTorch tries to re-register the parameter and hits the existing attribute.

Solution:

  • Added safe fallback _safe_tie_weights() that manually re-ties embeddings without calling register_parameter
  • Created _replace_module_parameter() helper that:
    • Overwrites existing _parameters[name] entries directly
    • Restores missing dict entries when attribute still exists as nn.Parameter
    • Deletes stale non-Parameter attributes before re-registering
    • Falls back to direct _parameters injection if register_parameter still fails
  • Added diagnostic logging at each fallback step

Changes:

  • optimum/habana/transformers/trainer.py:
    • Import OrderedDict
    • Wrap _move_model_to_device() tie_weights call in try/except
    • Add _safe_tie_weights() method
    • Add _replace_module_parameter() helper

Target: Optimum Habana 1.18.1 release (standalone backport for users on stable release)

Testing: Verified Gemma3-12B LoRA on ChartQA - training proceeds past initialization with warning log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant