Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: Feat/initialization component #168

Merged
merged 42 commits into from
Jul 3, 2024

Conversation

le1nux
Copy link
Member

@le1nux le1nux commented Jun 30, 2024

What does this PR do?

This PR introduces the components for weight initialisation and is based on PR #161.
In PR #161 the differenct initialization methods plain, scaled and scaled_embed (see https://arxiv.org/abs/2312.16903) were implemented and added to the abstract NNModel class.
Due to some design concerns (e.g., some GPT2 internals were called from the parent), we decided to introduce a weight initialisation component that modifies the model weights in place.

General changes

  • Components and factories for plain, scaled and scaled_embed initialisation.

Breaking Changes

  • The raw model (i.e., the model with random weights) must be initialised with a weight initialiser, as shown here.

Checklist before submitting final PR

  • My PR is minimal and addresses one issue / enhancement in isolation
  • I have merged the target branch into this feature branch
  • I have reviewed my own code w.r.t. correct implementation, missing type hints, proper documentation, etc.
  • I have run a sample config for model training
  • I have fixed all failing tests (python tests/tests.py)

@le1nux le1nux changed the title Feat/initialization component Draft: Feat/initialization component Jun 30, 2024
@le1nux le1nux marked this pull request as draft June 30, 2024 15:21
@le1nux le1nux self-assigned this Jun 30, 2024
@le1nux le1nux added the enhancement New feature or request label Jun 30, 2024
@le1nux le1nux requested a review from flxst July 1, 2024 13:56
Copy link
Member

@flxst flxst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me generally, although I am a bit concerned about the complexity of this whole implementation. Added a few comments. The ones that contain something like "std can be auto also for initialization types other than plain" are the most important ones, I think this should definitely be fixed.

config_files/training/config_lorem_ipsum.yaml Outdated Show resolved Hide resolved
src/modalities/nn/weight_init/weight_init.py Outdated Show resolved Hide resolved
src/modalities/registry/components.py Outdated Show resolved Hide resolved
src/modalities/registry/components.py Outdated Show resolved Hide resolved
src/modalities/registry/components.py Outdated Show resolved Hide resolved
Copy link
Member

@flxst flxst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is another thing that needs to be fixed.

@flxst flxst marked this pull request as ready for review July 3, 2024 13:07
@le1nux le1nux merged commit 9f5651b into feat/initialization Jul 3, 2024
2 of 3 checks passed
@le1nux le1nux deleted the feat/initialization_component branch July 3, 2024 13:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants