Draft: Feat/initialization component #168

le1nux · 2024-06-30T15:21:21Z

What does this PR do?

This PR introduces the components for weight initialisation and is based on PR #161.
In PR #161 the differenct initialization methods plain, scaled and scaled_embed (see https://arxiv.org/abs/2312.16903) were implemented and added to the abstract NNModel class.
Due to some design concerns (e.g., some GPT2 internals were called from the parent), we decided to introduce a weight initialisation component that modifies the model weights in place.

General changes

Components and factories for plain, scaled and scaled_embed initialisation.

Breaking Changes

The raw model (i.e., the model with random weights) must be initialised with a weight initialiser, as shown here.

Checklist before submitting final PR

My PR is minimal and addresses one issue / enhancement in isolation
I have merged the target branch into this feature branch
I have reviewed my own code w.r.t. correct implementation, missing type hints, proper documentation, etc.
I have run a sample config for model training
I have fixed all failing tests (python tests/tests.py)

…lasses

…class

…rcular imports

… init. Need to check how we want to handle this case

…Config

src/modalities/nn/weight_init/weight_init_factory.py

…ed init method

…lizationFactory

src/modalities/nn/weight_init/weight_init.py

flxst

Looks good to me generally, although I am a bit concerned about the complexity of this whole implementation. Added a few comments. The ones that contain something like "std can be auto also for initialization types other than plain" are the most important ones, I think this should definitely be fixed.

config_files/training/config_lorem_ipsum.yaml

src/modalities/nn/weight_init/high_level_weight_init_factory.py

src/modalities/nn/weight_init/low_level_weight_init_factory.py

src/modalities/nn/weight_init/weight_init.py

src/modalities/registry/components.py

src/modalities/nn/weight_init/low_level_weight_init_factory.py

src/modalities/registry/components.py

…or embedding during initialization

flxst

I think this is another thing that needs to be fixed.

src/modalities/nn/weight_init/low_level_weight_init_factory.py

…rameterwiseNormalInitialization to support also plain initialisation

…d parameters

…n to scaled.

config_files/training/config_example_coca.yaml

config_files/training/config_lorem_ipsum.yaml

src/modalities/nn/weight_init/high_level_weight_init_factory.py

src/modalities/nn/model_initialization/initialization_routines.py

…en std is of type float

…egistry.

le1nux added 5 commits June 30, 2024 17:17

feat: added Weight Initialization Factory

53616e0

feat: implemented model-wise and named-parameter-wise initalization c…

03b7a7f

…lasses

feat: drafted the init configs

d2ecf0d

chore: added missing init file

0efd32d

refactor: removed previous initialization code in the abstract model …

dcafa38

…class

le1nux changed the title ~~Feat/initialization component~~ Draft: Feat/initialization component Jun 30, 2024

le1nux marked this pull request as draft June 30, 2024 15:21

le1nux self-assigned this Jun 30, 2024

le1nux added the enhancement New feature or request label Jun 30, 2024

le1nux added 12 commits June 30, 2024 22:22

feat: added initialization config classes

83ce570

refactor: moved WeightInitializationIF to separate file to prevent ci…

ce48a57

…rcular imports

feat: added WeightInitializerWrapperConfig to config

a08d4d9

feat: added WeightInitializerWrapper

c39eabd

feat: wired up all the weight initializers as components

2af2625

feat: added functionaliy to initalize model weights

a950746

feat: added the weight init to config lorem ipsum

416831a

refactor: removed raising an excpetion when module is not covered for…

86308d4

… init. Need to check how we want to handle this case

refactor: fixed pydantic model_validator in PlainWeightInitialization…

d2f2e15

…Config

refactor: removed all weight init code from the models

2dc37e2

chore: fixed typo

f7d6d0d

refactor: finalized config lorem ipsum for weight initialization

8fcaf08

le1nux commented Jul 1, 2024

View reviewed changes

src/modalities/nn/weight_init/weight_init_factory.py Outdated Show resolved Hide resolved

flxst mentioned this pull request Jul 1, 2024

Feat: Various Configurable Initializations #161

Merged

5 tasks

le1nux added 7 commits July 1, 2024 15:02

refactor: moved configs back to the init factory and removed not need…

fe2fb49

…ed init method

feat: added HighLevelWeightInitializationFactory

7164482

feat: wired up HighLevelWeightInitializationFactory

327dbd8

refactor: config_lorem_ipsum.yaml now suppports HighLevelWeightInitia…

89d5a29

…lizationFactory

refactor: mean of weight init constraint is now relaxed to float-only

010f362

chore: added reference citation to the weigh init for gpt2

c0a9aa5

fix: replaced std of 0.4 by math.sqrt(0.4) for scaled_embed

7d26675

le1nux commented Jul 1, 2024

View reviewed changes

src/modalities/nn/weight_init/weight_init.py Outdated Show resolved Hide resolved

le1nux requested a review from flxst July 1, 2024 13:56

chore: added citation

bb4cd36

flxst requested changes Jul 2, 2024

View reviewed changes

refactor: added edge case handling when module is not of type linear …

48dced9

…or embedding during initialization

flxst reviewed Jul 2, 2024

View reviewed changes

src/modalities/nn/weight_init/low_level_weight_init_factory.py Outdated Show resolved Hide resolved

src/modalities/nn/weight_init/low_level_weight_init_factory.py Outdated Show resolved Hide resolved

le1nux added 4 commits July 2, 2024 21:58

refactor: removed ModulewiseNormalInitialization and extended NamedPa…

93a0cb4

…rameterwiseNormalInitialization to support also plain initialisation

refactor: removed old weigh_init from coca config

c2817f7

refactor: all weight init code is now based on regex matching on name…

98fe633

…d parameters

feat: added plain weight init for CoCa

b8e42c1

le1nux mentioned this pull request Jul 2, 2024

SwiGLU naming of projection matrices #170

Closed

le1nux added 4 commits July 2, 2024 22:35

refactor: renamed plain_std -> std

054d6a2

refactor: renamed plain_std -> std (missed one instance)

602336c

refactor: some renamings and passing now the calculated std from plai…

824da6a

…n to scaled.

refactor: removed legacy NamedParameterwiseNormalInitializationConfig

4573364

flxst reviewed Jul 2, 2024

View reviewed changes

config_files/training/config_example_coca.yaml Outdated Show resolved Hide resolved

flxst reviewed Jul 2, 2024

View reviewed changes

config_files/training/config_lorem_ipsum.yaml Outdated Show resolved Hide resolved

src/modalities/nn/weight_init/high_level_weight_init_factory.py Outdated Show resolved Hide resolved

refactor: simplified the initialization structure and improved naming

0e29447

flxst reviewed Jul 3, 2024

View reviewed changes

src/modalities/nn/model_initialization/initialization_routines.py Show resolved Hide resolved

le1nux and others added 7 commits July 3, 2024 10:09

refactor: for plain init we do not allow hidde_dim to be specified wh…

eea99bc

…en std is of type float

refactor: removed currently not needed init routines from component r…

d7c0825

…egistry.

test: initialization unit test adjustments (first steps)

2c50e06

test: initialization unit test adjustments (config fix)

96c0657

test: fix coca config

d3c1683

feat: added plain filters for coca

1a61bf6

test: readd test for initialization

003eb99

flxst marked this pull request as ready for review July 3, 2024 13:07

flxst approved these changes Jul 3, 2024

View reviewed changes

le1nux merged commit 9f5651b into feat/initialization Jul 3, 2024
2 of 3 checks passed

le1nux deleted the feat/initialization_component branch July 3, 2024 13:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft: Feat/initialization component #168

Draft: Feat/initialization component #168

le1nux commented Jun 30, 2024 •

edited by flxst

Loading

flxst left a comment

flxst left a comment

Draft: Feat/initialization component #168

Draft: Feat/initialization component #168

Conversation

le1nux commented Jun 30, 2024 • edited by flxst Loading

What does this PR do?

General changes

Breaking Changes

Checklist before submitting final PR

flxst left a comment

Choose a reason for hiding this comment

flxst left a comment

Choose a reason for hiding this comment

le1nux commented Jun 30, 2024 •

edited by flxst

Loading