Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: group-query-attention implementation #72

Closed
wants to merge 9 commits into from
Closed

feat: group-query-attention implementation #72

wants to merge 9 commits into from

Commits on Jan 30, 2024

  1. Configuration menu
    Copy the full SHA
    f0ea511 View commit details
    Browse the repository at this point in the history

Commits on Mar 11, 2024

  1. chore: merge main into GQA

    commit 0807555
    Author: Max Luebbering <[email protected]>
    Date:   Thu Mar 7 18:33:39 2024 +0100
    
        refactor: deleted failing legacy test
    
    commit dd0db07
    Merge: 095e491 4821804
    Author: Luzian Hahn <[email protected]>
    Date:   Thu Mar 7 10:29:09 2024 +0100
    
        Merge pull request #48 from Modalities/feat/merge-pbin-files
    
        feat: merge utility for pbin files
    
    commit 4821804
    Author: Luzian Hahn <[email protected]>
    Date:   Thu Mar 7 10:27:28 2024 +0100
    
        docs: add hint about updated header structure
    
    commit b34d6cb
    Author: Luzian Hahn <[email protected]>
    Date:   Thu Mar 7 10:19:54 2024 +0100
    
        refactor: remove unused utility
    
    commit 7d05448
    Author: Luzian Hahn <[email protected]>
    Date:   Tue Feb 27 16:00:38 2024 +0100
    
        refactor: remove redundant check for valid pbin files
    
    commit 2e27335
    Author: Luzian Hahn <[email protected]>
    Date:   Mon Feb 5 18:21:51 2024 +0100
    
        feat: add entrypoint for pbin-merge
    
    commit 8ffc095
    Author: Luzian Hahn <[email protected]>
    Date:   Mon Feb 5 18:16:06 2024 +0100
    
        refactor: introduce entrypoint group "data"
    
    commit a0d13a3
    Author: Luzian Hahn <[email protected]>
    Date:   Mon Feb 5 15:06:18 2024 +0100
    
        feat: add pbin-merger
    
    commit 9f853cf
    Author: Luzian Hahn <[email protected]>
    Date:   Mon Feb 5 11:36:49 2024 +0100
    
        refactor: introduce abstraction for stream data below packed Datasets
    
    commit 095e491
    Merge: 419fc9e 0f3846a
    Author: Luzian Hahn <[email protected]>
    Date:   Thu Mar 7 09:38:53 2024 +0100
    
        Merge pull request #40 from Modalities/perf/benchmark-datasets-again-megatronlm
    
        perf: benchmark datasets against megatronlm
    
    commit 0f3846a
    Author: Luzian Hahn <[email protected]>
    Date:   Thu Mar 7 09:28:27 2024 +0100
    
        test: prevent unnecessary warnings during tests
    
    commit f2232c3
    Merge: 9095ac5 419fc9e
    Author: Luzian Hahn <[email protected]>
    Date:   Thu Mar 7 08:45:14 2024 +0100
    
        Merge branch 'main' into perf/benchmark-datasets-again-megatronlm
    
    commit 419fc9e
    Merge: 8ab29d0 d192331
    Author: Max Lübbering <[email protected]>
    Date:   Mon Mar 4 12:25:00 2024 +0100
    
        Merge pull request #65 from David-Berghaus/Fix-typos
    
        Fixed typos
    
    commit d192331
    Author: David Berghaus <[email protected]>
    Date:   Mon Mar 4 12:12:47 2024 +0100
    
        Fixed typos
    
    commit 8ab29d0
    Merge: d71bceb f9b0f41
    Author: Mehdi Ali <[email protected]>
    Date:   Fri Mar 1 15:59:01 2024 +0100
    
        Merge pull request #45 from Modalities/hierarchical_instantiation
    
        Hierarchical instantiation
    
    commit f9b0f41
    Author: Felix Stollenwerk <[email protected]>
    Date:   Mon Feb 26 16:53:36 2024 +0100
    
        chore: fix linting
    
    commit 042e3a0
    Author: Felix Stollenwerk <[email protected]>
    Date:   Mon Feb 26 16:00:39 2024 +0100
    
        refactor: fix typos
    
    commit 8345e06
    Author: Max Luebbering <[email protected]>
    Date:   Mon Feb 26 15:24:25 2024 +0100
    
        refactor: fixed the library usage exampe
    
    commit cd2128d
    Author: Max Luebbering <[email protected]>
    Date:   Mon Feb 26 15:24:00 2024 +0100
    
        refactor: replaced absolute paths with relative ones
    
    commit 9ab6654
    Author: Max Luebbering <[email protected]>
    Date:   Mon Feb 26 15:23:06 2024 +0100
    
        fix: fixed add_custom_component in Main
    
    commit 64b785a
    Author: Felix Stollenwerk <[email protected]>
    Date:   Mon Feb 26 14:18:06 2024 +0100
    
        fix: skipping of tests in non-distributed environment
    
    commit c7f7a7b
    Author: Max Luebbering <[email protected]>
    Date:   Mon Feb 26 13:35:24 2024 +0100
    
        chore: minor changes in TestFSDPToDiscCheckpointing
    
    commit 10538ac
    Author: Max Luebbering <[email protected]>
    Date:   Mon Feb 26 13:24:10 2024 +0100
    
        refactor: also using ComponentEntity now in the tests
    
    commit 432426b
    Author: Max Luebbering <[email protected]>
    Date:   Mon Feb 26 13:23:46 2024 +0100
    
        refactor: fixed failing test_e2e_training_run_wout_ckpt
    
    commit 63829e1
    Author: Max Luebbering <[email protected]>
    Date:   Mon Feb 26 13:20:43 2024 +0100
    
        chore: excluded openGPTx from test cov
    
    commit 12632fd
    Author: Max Luebbering <[email protected]>
    Date:   Mon Feb 26 12:57:16 2024 +0100
    
        refactor:  introduced ComponentEntity
    
    commit c15de17
    Author: Max Luebbering <[email protected]>
    Date:   Mon Feb 26 12:09:14 2024 +0100
    
        refactor: various smaller changes
    
    commit 973909d
    Author: Felix Stollenwerk <[email protected]>
    Date:   Mon Feb 26 10:58:27 2024 +0100
    
        refactor: sort classes in config
    
    commit bc64ee0
    Author: Felix Stollenwerk <[email protected]>
    Date:   Mon Feb 26 10:52:21 2024 +0100
    
        refactor: remove RegistryFactory
    
    commit b9dbe2e
    Author: Felix Stollenwerk <[email protected]>
    Date:   Mon Feb 26 10:19:24 2024 +0100
    
        refactor: rename and fix readme for getting started example
    
    commit ca74340
    Author: Max Luebbering <[email protected]>
    Date:   Sun Feb 25 16:00:17 2024 +0100
    
        feat: added activation checkpointing to __main__.py
    
    commit 7ae2234
    Author: Max Luebbering <[email protected]>
    Date:   Sat Feb 24 21:19:44 2024 +0100
    
        refactor: fixed some of the configs
    
    commit bcd6e5b
    Author: Max Luebbering <[email protected]>
    Date:   Sat Feb 24 21:16:08 2024 +0100
    
        feat: experiment_id now set in the config via omega conf resolver
    
    commit a6ea22a
    Author: Max Luebbering <[email protected]>
    Date:   Sat Feb 24 14:03:46 2024 +0100
    
        refactor: gpt2 config for checkpointing tests
    
    commit ff3eb52
    Merge: 64617dd fb0aea5
    Author: Max Luebbering <[email protected]>
    Date:   Sat Feb 24 14:01:15 2024 +0100
    
        chore: Merge branch 'hierarchical_instantiation' of github.com:Modalities/modalities into hierarchical_instantiation
    
    commit 64617dd
    Author: Max Luebbering <[email protected]>
    Date:   Sat Feb 24 14:00:35 2024 +0100
    
        feat: added add_custom_component function to Main
    
    commit df4f971
    Author: Max Luebbering <[email protected]>
    Date:   Sat Feb 24 13:59:33 2024 +0100
    
        test: fixed fsdp test, but cannot be run directly via pytest as it needs torchrun
    
    commit fb0aea5
    Author: Felix Stollenwerk <[email protected]>
    Date:   Sat Feb 24 10:51:51 2024 +0100
    
        fix: replace conint/confloat correctly
    
    commit fd07cb0
    Author: Max Luebbering <[email protected]>
    Date:   Fri Feb 23 19:39:12 2024 +0100
    
        refactor: made base_model_to_dict public as it is great for testing
    
    commit aa0d64f
    Author: Max Lübbering <[email protected]>
    Date:   Fri Feb 23 18:31:54 2024 +0100
    
        Update README.md
    
    commit e70f3a0
    Author: Felix Stollenwerk <[email protected]>
    Date:   Fri Feb 23 17:57:15 2024 +0100
    
        fix: replace conint/confloat for pydantic 3.0 compatibility
    
    commit 70d9e63
    Author: Max Luebbering <[email protected]>
    Date:   Fri Feb 23 17:40:38 2024 +0100
    
        chore: more documentation
    
    commit 2396020
    Merge: a68ddf4 021b7c2
    Author: Max Luebbering <[email protected]>
    Date:   Fri Feb 23 17:39:09 2024 +0100
    
        chore: Merge branch 'hierarchical_instantiation' of github.com:Modalities/modalities into hierarchical_instantiation
    
    commit a68ddf4
    Author: Max Luebbering <[email protected]>
    Date:   Fri Feb 23 16:57:23 2024 +0100
    
        feat: added example for registering a custom component
    
    commit 021b7c2
    Author: Felix Stollenwerk <[email protected]>
    Date:   Fri Feb 23 11:38:32 2024 +0100
    
        refactor: restored base_model_to_dict
    
    commit b619b41
    Author: Felix Stollenwerk <[email protected]>
    Date:   Fri Feb 23 09:32:31 2024 +0100
    
        refactor: replace base_model_to_dict by pydantic built-in method
    
    commit 34c6498
    Author: Felix Stollenwerk <[email protected]>
    Date:   Fri Feb 23 09:26:44 2024 +0100
    
        refactor: fixed typing for registry
    
    commit 52ffea4
    Author: Max Luebbering <[email protected]>
    Date:   Thu Feb 22 17:59:20 2024 +0100
    
        fix: fixed failing end 2 end test
    
    commit b0bd296
    Author: Max Luebbering <[email protected]>
    Date:   Thu Feb 22 17:58:38 2024 +0100
    
        fix: eval_dataloaders are now treated as list instead of dict. This was not reflected yet in the subscriber factory
    
    commit cbf905b
    Author: Max Luebbering <[email protected]>
    Date:   Thu Feb 22 17:47:53 2024 +0100
    
        fix: checkpointing test
    
    commit a42a479
    Merge: 26b8b82 e3b50f6
    Author: Max Luebbering <[email protected]>
    Date:   Thu Feb 22 17:33:21 2024 +0100
    
        chore: Merge branch 'hierarchical_instantiation' of github.com:Modalities/modalities into hierarchical_instantiation
    
    commit 26b8b82
    Author: Max Luebbering <[email protected]>
    Date:   Thu Feb 22 17:32:41 2024 +0100
    
        refactor: we fully support the configs again for hierarchical instantiation
    
    commit 9dfd100
    Author: Max Luebbering <[email protected]>
    Date:   Thu Feb 22 17:31:45 2024 +0100
    
        refactor: eval_dataloaders are subsumed in a list now
    
    commit e3b50f6
    Author: Felix Stollenwerk <[email protected]>
    Date:   Thu Feb 22 12:39:17 2024 +0100
    
        refactor: unification of Pydantic*IF classes
    
    commit 7c4fafb
    Author: Alexander Weber <[email protected]>
    Date:   Thu Feb 22 09:24:42 2024 +0000
    
        chore: enabled pytest discovery with all tests. Some tests still need to be fixed!
    
    commit 34dc796
    Author: Felix Stollenwerk <[email protected]>
    Date:   Thu Feb 22 10:24:09 2024 +0100
    
        refactor: renaming for consistency
    
    commit 2d8349d
    Author: Alexander Weber <[email protected]>
    Date:   Thu Feb 22 08:45:23 2024 +0000
    
        fix: e2e test
    
    commit cc60608
    Author: Alexander Weber <[email protected]>
    Date:   Thu Feb 22 08:10:43 2024 +0000
    
        fix: set FIXME for fsdp_to_disc_checkpointing_test and fix oudated config test
    
    commit fdfb90a
    Author: Max Luebbering <[email protected]>
    Date:   Wed Feb 21 19:03:04 2024 +0100
    
        chore: fixed variable naming
    
    commit 1de69c3
    Author: Max Luebbering <[email protected]>
    Date:   Wed Feb 21 18:59:23 2024 +0100
    
        refactor: merged remote to local and refactored callback_interval_in_batches to callback_interval_in_samples in the config
    
    commit e1dd046
    Author: Alexander Weber <[email protected]>
    Date:   Wed Feb 21 15:22:33 2024 +0000
    
        fix: test discovery under vscode. TODO: replace PretrainedGPTConfig by correct class
    
    commit cd5ec46
    Merge: 281f20f e16dec9
    Author: Max Luebbering <[email protected]>
    Date:   Wed Feb 21 13:15:31 2024 +0100
    
        chore: Merge branch 'hierarchical_instantiation' of github.com:Modalities/modalities into hierarchical_instantiation
    
    commit 281f20f
    Author: Max Luebbering <[email protected]>
    Date:   Wed Feb 21 12:56:45 2024 +0100
    
        refactor: moved LookupEnum to dedicated file to fix circular imports
    
    commit e433913
    Author: Max Luebbering <[email protected]>
    Date:   Wed Feb 21 12:55:34 2024 +0100
    
        refactor: removed types.py
    
    commit 2c3762b
    Author: Max Luebbering <[email protected]>
    Date:   Wed Feb 21 12:48:12 2024 +0100
    
        chore: import fix
    
    commit 4f07fc9
    Author: Max Luebbering <[email protected]>
    Date:   Wed Feb 21 12:47:50 2024 +0100
    
        feat: added checkpointed model and fsdp wrapped model to registry factory
    
    commit 2ba8edd
    Author: Max Luebbering <[email protected]>
    Date:   Wed Feb 21 12:46:26 2024 +0100
    
        chore: fixed import in registry factory
    
    commit 76b4240
    Author: Max Luebbering <[email protected]>
    Date:   Wed Feb 21 12:46:03 2024 +0100
    
        chore: minor fix
    
    commit 417e0ed
    Author: Max Luebbering <[email protected]>
    Date:   Wed Feb 21 12:45:48 2024 +0100
    
        refactor: deleted checkpointing factory
    
    commit b056ddd
    Author: Max Luebbering <[email protected]>
    Date:   Wed Feb 21 12:45:09 2024 +0100
    
        refactor: we always instantiate the LLMDataloader with a ResumableBatchSampler now
    
    commit cd5e6fe
    Author: Max Luebbering <[email protected]>
    Date:   Wed Feb 21 12:43:20 2024 +0100
    
        refactor: config_new.py renamed to config.py
    
    commit f39051f
    Author: Max Luebbering <[email protected]>
    Date:   Wed Feb 21 12:41:48 2024 +0100
    
        refactor: deleted lookup_types
    
    commit c971bb0
    Author: Max Luebbering <[email protected]>
    Date:   Wed Feb 21 12:39:47 2024 +0100
    
        refactor: removed resolver_register
    
    commit 3371b39
    Author: Max Luebbering <[email protected]>
    Date:   Tue Feb 20 21:37:11 2024 +0100
    
        refactor: __main__.py now is capable of instantiating hierarchical configs
    
    commit b5f3d4d
    Author: Max Luebbering <[email protected]>
    Date:   Tue Feb 20 21:34:25 2024 +0100
    
        refactor: refactored FSDPToDiscCheckpointing to use ModelFactory.get_fsdp_wrapped_model
    
    commit 29aee7d
    Author: Max Luebbering <[email protected]>
    Date:   Tue Feb 20 21:33:06 2024 +0100
    
        chore: ProcessGroupBackendType inherits now from LookupEnum
    
    commit 197f863
    Author: Max Luebbering <[email protected]>
    Date:   Tue Feb 20 21:32:36 2024 +0100
    
        feat: implemented OptimizerFactory
    
    commit 8d1bb9e
    Author: Max Luebbering <[email protected]>
    Date:   Tue Feb 20 21:32:12 2024 +0100
    
        feat: added model factory
    
    commit 8b9dc20
    Author: Max Luebbering <[email protected]>
    Date:   Tue Feb 20 21:31:40 2024 +0100
    
        feat: introduced CudaEnv
    
    commit 89fa61c
    Author: Max Luebbering <[email protected]>
    Date:   Tue Feb 20 21:31:15 2024 +0100
    
        chore: MixedPrecisionSettings inherits now from LookupEnum
    
    commit 4037db2
    Author: Max Luebbering <[email protected]>
    Date:   Tue Feb 20 21:30:50 2024 +0100
    
        refactor: removed running env
    
    commit eb9f5b5
    Author: Max Luebbering <[email protected]>
    Date:   Tue Feb 20 21:30:32 2024 +0100
    
        feat: added Settings basemodel to config and refactored FSDPToDiscCheckpointingConfig
    
    commit c60d689
    Author: Max Luebbering <[email protected]>
    Date:   Tue Feb 20 21:29:29 2024 +0100
    
        refactor: restructured config lorem ipsum
    
    commit d9d8925
    Author: Max Luebbering <[email protected]>
    Date:   Tue Feb 20 20:22:49 2024 +0100
    
        fix: bug fix in component factory
    
    commit e16dec9
    Merge: 4c17abb d71bceb
    Author: Felix Stollenwerk <[email protected]>
    Date:   Mon Feb 19 20:58:25 2024 +0100
    
        chore: merge main into hierarchical_instantiation
    
    commit 4c17abb
    Author: Felix Stollenwerk <[email protected]>
    Date:   Mon Feb 19 15:52:59 2024 +0100
    
        refactor: unification of component registry and config registry
    
    commit d71bceb
    Author: Alexander Weber <[email protected]>
    Date:   Mon Feb 19 15:29:09 2024 +0100
    
        Update README.md
    
    commit 95bfc55
    Merge: f16c409 a0b799a
    Author: Alexander Weber <[email protected]>
    Date:   Mon Feb 19 15:25:29 2024 +0100
    
        Merge pull request #52 from Modalities/chore/add-pytest-coverage
    
        chore: add pytest coverage
    
    commit a0b799a
    Author: Alexander Weber <[email protected]>
    Date:   Mon Feb 19 14:22:49 2024 +0000
    
        chore: clean gitignore
    
    commit 5361ca5
    Author: Alexander Weber <[email protected]>
    Date:   Mon Feb 19 14:11:00 2024 +0000
    
        chore: add toml support
    
    commit 4047b67
    Author: Alexander Weber <[email protected]>
    Date:   Mon Feb 19 14:07:13 2024 +0000
    
        chore: try fix from 2021
    
    commit 20b1460
    Author: Alexander Weber <[email protected]>
    Date:   Mon Feb 19 13:54:42 2024 +0000
    
        chore: remove outdated .coverage.toml
    
    commit ec495a3
    Author: Alexander Weber <[email protected]>
    Date:   Mon Feb 19 13:45:58 2024 +0000
    
        chore: remove --cov from github action
    
    commit 920ccab
    Author: Alexander Weber <[email protected]>
    Date:   Mon Feb 19 13:45:13 2024 +0000
    
        chore: add coverage options in pyproject.toml
    
    commit a3ce9b1
    Author: Max Luebbering <[email protected]>
    Date:   Mon Feb 19 14:44:59 2024 +0100
    
        feat: integrated message subscribers
    
    commit b324c3f
    Author: Max Luebbering <[email protected]>
    Date:   Mon Feb 19 14:41:37 2024 +0100
    
        refactor: refactored dataloader and its factory
    
    commit f686268
    Author: Alexander Weber <[email protected]>
    Date:   Mon Feb 19 13:41:30 2024 +0000
    
        chore: add pytest --cov arguments by default
    
    commit f1e3155
    Author: Alexander Weber <[email protected]>
    Date:   Mon Feb 19 13:36:22 2024 +0000
    
        chore: search for coverage bug
    
    commit 4122c6c
    Author: Alexander Weber <[email protected]>
    Date:   Mon Feb 19 13:31:43 2024 +0000
    
        chore: search for coverage bug
    
    commit f43b81f
    Author: Alexander Weber <[email protected]>
    Date:   Mon Feb 19 13:08:27 2024 +0000
    
        chore: fix coveralls github action
    
    commit 81292e8
    Author: Max Luebbering <[email protected]>
    Date:   Mon Feb 19 14:03:40 2024 +0100
    
        refactor: moved OpenGPTXDatasetWrapper to DatasetFactory
    
    commit bc56246
    Author: Alexander Weber <[email protected]>
    Date:   Mon Feb 19 12:41:54 2024 +0000
    
        chore: add pytest-cov execution as github action
    
    commit f16c409
    Merge: a0513e3 bc03021
    Author: Alexander Weber <[email protected]>
    Date:   Mon Feb 19 11:05:36 2024 +0100
    
        Merge pull request #56 from Modalities/fix/tests
    
        fix: use renamed tokenizer file name
    
    commit bc03021
    Author: Alexander Weber <[email protected]>
    Date:   Mon Feb 19 09:48:47 2024 +0000
    
        fix: use renamed tokenizer file name
    
    commit a0513e3
    Merge: b8117b1 76e0518
    Author: Alexander Weber <[email protected]>
    Date:   Mon Feb 19 10:26:45 2024 +0100
    
        Merge pull request #38 from Modalities/fix/tests-on-cpu
    
    commit 76e0518
    Author: Alexander Weber <[email protected]>
    Date:   Mon Feb 19 09:24:48 2024 +0000
    
        chore: moved if statement into torch.device
    
    commit b8117b1
    Merge: 1c99963 78b9645
    Author: Alexander Weber <[email protected]>
    Date:   Mon Feb 19 10:11:56 2024 +0100
    
        Merge pull request #42 from Modalities/fix/linting
    
        fix: lint all files
    
    commit 78b9645
    Merge: 5b60c2f 1c99963
    Author: Alexander Weber <[email protected]>
    Date:   Mon Feb 19 09:05:44 2024 +0000
    
        chore: local merge
    
    commit 2267605
    Author: Max Luebbering <[email protected]>
    Date:   Sun Feb 18 23:27:27 2024 +0100
    
        feat: towards subscriber support with hierarchical instantiation
    
    commit a449119
    Author: Max Luebbering <[email protected]>
    Date:   Sun Feb 18 23:25:40 2024 +0100
    
        chore: minor changes
    
    commit aab3fa2
    Author: Max Luebbering <[email protected]>
    Date:   Sun Feb 18 23:24:58 2024 +0100
    
        feat: implemented subscriber factory
    
    commit 1c99963
    Merge: a8b6563 cf27873
    Author: Max Lübbering <[email protected]>
    Date:   Sun Feb 18 22:45:14 2024 +0100
    
        Merge pull request #29 from Modalities/feat/contrastive_loss
    
        Add Noise Contrastive Estimation Loss
    
    commit 6baf221
    Author: Max Luebbering <[email protected]>
    Date:   Sat Feb 17 17:11:11 2024 +0100
    
        feat: added LLM dataloader support
    
    commit 8ab04a5
    Author: Max Luebbering <[email protected]>
    Date:   Sat Feb 17 17:10:16 2024 +0100
    
        feat: introduced CollateFnIF for colleate functions
    
    commit 018c278
    Author: Max Luebbering <[email protected]>
    Date:   Sat Feb 17 17:00:02 2024 +0100
    
        feat: added resumable batch sampler
    
    commit 1273c31
    Author: Max Luebbering <[email protected]>
    Date:   Sat Feb 17 16:53:57 2024 +0100
    
        feat: added gpt_2 collator support
    
    commit 536447c
    Author: Max Luebbering <[email protected]>
    Date:   Sat Feb 17 16:44:57 2024 +0100
    
        feat: added batch sampler support
    
    commit 771eab1
    Author: Max Luebbering <[email protected]>
    Date:   Sat Feb 17 16:18:51 2024 +0100
    
        feat: added PydanticDatasetIF for SamplerConfig
    
    commit f1c1be4
    Author: Max Luebbering <[email protected]>
    Date:   Sat Feb 17 15:55:47 2024 +0100
    
        feat: added support for the different dataset formats
    
    commit 0824bb0
    Author: Max Luebbering <[email protected]>
    Date:   Sat Feb 17 15:38:51 2024 +0100
    
        refactor: added adaptations that were injected in the dataloader factory previously
    
    commit 6985fad
    Author: Max Luebbering <[email protected]>
    Date:   Sat Feb 17 15:26:22 2024 +0100
    
        feat: implemented dataset factory for various dataset types
    
    commit 81022f4
    Author: Max Luebbering <[email protected]>
    Date:   Sat Feb 17 14:33:37 2024 +0100
    
        feat: added gpt2 tokenizer support
    
    commit 55c0110
    Author: Max Luebbering <[email protected]>
    Date:   Sat Feb 17 13:35:19 2024 +0100
    
        feat: added adamw support
    
    commit 4a6a415
    Author: Max Luebbering <[email protected]>
    Date:   Sat Feb 17 13:32:30 2024 +0100
    
        feat: implemented OptimizerFactory
    
    commit c2bd570
    Author: Max Luebbering <[email protected]>
    Date:   Sat Feb 17 13:31:59 2024 +0100
    
        fix: added root-level to dict function for basemodel to prevent recursive model dumps
    
    commit 90207ed
    Author: Max Luebbering <[email protected]>
    Date:   Fri Feb 16 20:05:47 2024 +0100
    
        refactor: started refactoring the lorem ipsum config towards the new hierarchical configs
    
    commit 1304241
    Author: Max Luebbering <[email protected]>
    Date:   Fri Feb 16 20:05:24 2024 +0100
    
        refactor: Main makes partially use of the hierarchical instantiation now
    
    commit f7dfe31
    Author: Max Luebbering <[email protected]>
    Date:   Fri Feb 16 20:04:54 2024 +0100
    
        refactor: Refactored CheckpointingFactory
    
    commit 38499c4
    Author: Max Luebbering <[email protected]>
    Date:   Fri Feb 16 20:04:26 2024 +0100
    
        refactor: removed unused atribute in Checkpointing
    
    commit 542ba75
    Author: Max Luebbering <[email protected]>
    Date:   Fri Feb 16 20:04:08 2024 +0100
    
        fix: bugfix in component factory
    
    commit d446260
    Author: Max Luebbering <[email protected]>
    Date:   Fri Feb 16 20:03:54 2024 +0100
    
        feat: added new configs in separate file for now
    
    commit 6d121f3
    Author: Max Luebbering <[email protected]>
    Date:   Fri Feb 16 20:03:18 2024 +0100
    
        feat: added more components to registry factory
    
    commit fb3b35f
    Author: Max Luebbering <[email protected]>
    Date:   Fri Feb 16 20:02:48 2024 +0100
    
        refactor: refactored FSDPRunningEnvConfig
    
    commit 8eda99c
    Author: Max Luebbering <[email protected]>
    Date:   Thu Feb 15 23:37:33 2024 +0100
    
        refactor: refactored component factory to use the registry
    
    commit 41be773
    Author: Max Luebbering <[email protected]>
    Date:   Thu Feb 15 23:36:57 2024 +0100
    
        feat: added registry factory
    
    commit 3ebb656
    Author: Max Luebbering <[email protected]>
    Date:   Thu Feb 15 23:36:34 2024 +0100
    
        feat: implemented registry
    
    commit f2164a8
    Author: Max Luebbering <[email protected]>
    Date:   Thu Feb 15 23:36:00 2024 +0100
    
        test: configs now use the new format without typehints
    
    commit 5fb2199
    Author: Max Luebbering <[email protected]>
    Date:   Thu Feb 15 23:35:39 2024 +0100
    
        test: added registry testing
    
    commit 623f847
    Author: Max Luebbering <[email protected]>
    Date:   Thu Feb 15 23:34:49 2024 +0100
    
        test: updated test configs to the new  format
    
    commit 372947b
    Author: Felix Stollenwerk <[email protected]>
    Date:   Wed Feb 14 21:45:11 2024 +0100
    
        chore: add pytest coverage (locally)
    
    commit 36bc7ae
    Author: Max Luebbering <[email protected]>
    Date:   Wed Feb 14 13:45:11 2024 +0100
    
        refactor: renamed config_types to custom_config_types in ComponentFactory
    
    commit babd597
    Author: Max Luebbering <[email protected]>
    Date:   Wed Feb 14 13:35:58 2024 +0100
    
        feat: added support custom types in component factory
    
    commit 1639a6a
    Author: Max Luebbering <[email protected]>
    Date:   Wed Feb 14 12:04:00 2024 +0100
    
        refactor: simplified ComponentFactory
    
    commit aa9e040
    Author: Max Luebbering <[email protected]>
    Date:   Wed Feb 14 10:33:11 2024 +0100
    
        test: removed code duplication in test_component_factory
    
    commit 44677c6
    Author: Max Luebbering <[email protected]>
    Date:   Wed Feb 14 10:30:43 2024 +0100
    
        test: refactored test_custom_component
    
    commit 71de3ff
    Author: Max Luebbering <[email protected]>
    Date:   Tue Feb 13 21:44:53 2024 +0100
    
        test: added testing for custom components
    
    commit 2a54f84
    Author: Max Luebbering <[email protected]>
    Date:   Tue Feb 13 20:57:25 2024 +0100
    
        test: added test yaml configs for component factory
    
    commit 35236d0
    Author: Max Luebbering <[email protected]>
    Date:   Tue Feb 13 20:56:22 2024 +0100
    
        test: implemented test_non_existing_reference
    
    commit bb4bcb3
    Author: Max Luebbering <[email protected]>
    Date:   Tue Feb 13 20:53:21 2024 +0100
    
        test: implemented test_component_filter
    
    commit 0dfbbcb
    Author: Max Luebbering <[email protected]>
    Date:   Tue Feb 13 20:49:36 2024 +0100
    
        test: implemented test_hierarchical_component_instantiation
    
    commit 3a66b65
    Author: Max Luebbering <[email protected]>
    Date:   Tue Feb 13 20:41:18 2024 +0100
    
        test: implemented forward and backward referencing test
    
    commit c0c877c
    Author: Max Luebbering <[email protected]>
    Date:   Mon Feb 12 14:52:13 2024 +0100
    
        chore: fixed imports in component factory
    
    commit a9781a3
    Author: Max Luebbering <[email protected]>
    Date:   Mon Feb 12 14:42:28 2024 +0100
    
        refactor: added drafted test code for component factory
    
    commit b1cbb46
    Author: Max Luebbering <[email protected]>
    Date:   Mon Feb 12 14:42:00 2024 +0100
    
        refactor: moved trial component factory code to test module
    
    commit c115b2b
    Author: Max Luebbering <[email protected]>
    Date:   Mon Feb 12 14:41:21 2024 +0100
    
        refactor: moved component factory into parent module
    
    commit e678d78
    Author: Max Luebbering <[email protected]>
    Date:   Mon Feb 12 14:26:55 2024 +0100
    
        refactor: renamed hierarchical DI module to hierarchical_instantiation
    
    commit 45f7ff4
    Author: Max Luebbering <[email protected]>
    Date:   Mon Feb 12 14:19:24 2024 +0100
    
        refactor: removed legacy code and added comments to component factory.
    
    commit da88895
    Author: Max Luebbering <[email protected]>
    Date:   Mon Feb 12 14:03:26 2024 +0100
    
        feat: added referencing to config
    
    commit b42aeeb
    Author: Max Luebbering <[email protected]>
    Date:   Mon Feb 12 14:02:50 2024 +0100
    
        feat: added ReferenceConfig and PassType
    
    commit 72f0524
    Author: Max Luebbering <[email protected]>
    Date:   Mon Feb 12 14:02:24 2024 +0100
    
        feat: implemented forward and backward component referencing
    
    commit 43e1134
    Author: Max Luebbering <[email protected]>
    Date:   Sun Feb 11 19:56:18 2024 +0100
    
        chore: added documentation for generate_text text CMD interface
    
    commit cf27873
    Author: Sogol Haghighat <[email protected]>
    Date:   Fri Feb 9 17:38:31 2024 +0100
    
        refactor: adapt nce_loss function to reflect loss from CoCa paper
    
    commit d388d21
    Author: Sogol Haghighat <[email protected]>
    Date:   Fri Feb 9 17:37:35 2024 +0100
    
        test: adapt test_nce_loss_correctness to uni and bidirectional loss
    
    commit a8b6563
    Merge: da65493 00e10ae
    Author: Max Lübbering <[email protected]>
    Date:   Fri Feb 9 16:46:36 2024 +0100
    
        Merge pull request #30 from Modalities/huggingface_models_support
    
        feat: Generic huggingface transformer support
    
    commit 00e10ae
    Author: Max Lübbering <[email protected]>
    Date:   Fri Feb 9 16:24:02 2024 +0100
    
        Update preprocess_dataset.py
    
    commit e93e767
    Merge: f435fc8 da65493
    Author: Max Lübbering <[email protected]>
    Date:   Fri Feb 9 15:50:03 2024 +0100
    
        Merge branch 'main' into huggingface_models_support
    
    commit f435fc8
    Author: Max Luebbering <[email protected]>
    Date:   Fri Feb 9 15:46:59 2024 +0100
    
        feat: introduced huggingface_prediction_subscription_key to HuggingFacePretrainedModelConfig to support different output formats
    
    commit e6f4aac
    Author: Max Luebbering <[email protected]>
    Date:   Fri Feb 9 15:46:08 2024 +0100
    
        refactor: moved lookup_enum to dedicated file.
    
    commit ebbe8c5
    Author: Sogol Haghighat <[email protected]>
    Date:   Fri Feb 9 13:49:42 2024 +0100
    
        test: add test for nce_loss using a manually calculated example
    
    commit 7d5c095
    Author: Max Luebbering <[email protected]>
    Date:   Thu Feb 8 20:17:36 2024 +0100
    
        chore: removed legacy code
    
    commit dad3ea4
    Author: Max Luebbering <[email protected]>
    Date:   Thu Feb 8 20:16:40 2024 +0100
    
        chore: added legacy trials for  hierarchical  DI
    
    commit 3ab9ff3
    Author: Max Luebbering <[email protected]>
    Date:   Thu Feb 8 20:14:10 2024 +0100
    
        chore: added __init__.py
    
    commit 3dfdb2a
    Author: Max Luebbering <[email protected]>
    Date:   Thu Feb 8 20:13:51 2024 +0100
    
        feat: implemented factory for hierarchical component instantiation
    
    commit dc7c1a2
    Author: Max Luebbering <[email protected]>
    Date:   Thu Feb 8 20:13:17 2024 +0100
    
        feat: added example yaml config file for hierarchical instantiation
    
    commit 099979b
    Author: Max Luebbering <[email protected]>
    Date:   Thu Feb 8 20:12:58 2024 +0100
    
        feat: added configs for the test components
    
    commit c4292ce
    Author: Max Luebbering <[email protected]>
    Date:   Thu Feb 8 20:12:25 2024 +0100
    
        feat: added components for testing
    
    commit fc5cb96
    Author: Max Luebbering <[email protected]>
    Date:   Thu Feb 8 20:11:28 2024 +0100
    
        chore: minor debugging improvement in parse_enum_by_name in utils
    
    commit 783ad81
    Author: Max Luebbering <[email protected]>
    Date:   Thu Feb 8 20:10:57 2024 +0100
    
        chore: removed legacy trials
    
    commit 9095ac5
    Author: Luzian Hahn <[email protected]>
    Date:   Tue Feb 6 16:17:22 2024 +0100
    
        docs: update times in table after perf upgrade
    
    commit 91ec38e
    Author: Luzian Hahn <[email protected]>
    Date:   Tue Feb 6 16:07:46 2024 +0100
    
        fix: make encoding specification obsolete and improve perf of index creation
    
    commit afae858
    Author: Luzian Hahn <[email protected]>
    Date:   Tue Feb 6 15:48:19 2024 +0100
    
        feat: make encoding configurable
    
    commit 71f77e2
    Author: Luzian Hahn <[email protected]>
    Date:   Tue Feb 6 14:51:57 2024 +0100
    
        refactor: remove parameter-artifact
    
    commit a668620
    Author: Luzian Hahn <[email protected]>
    Date:   Tue Feb 6 14:47:52 2024 +0100
    
        refactor: remove TODO-artifact
    
    commit a08518f
    Author: Luzian Hahn <[email protected]>
    Date:   Tue Feb 6 14:43:31 2024 +0100
    
        refactor: rename queue for token-writing
    
    commit 2e535a3
    Author: Luzian Hahn <[email protected]>
    Date:   Tue Feb 6 14:25:35 2024 +0100
    
        fix: derive default value for cpu count automatically
    
    commit 03d3f47
    Author: Luzian Hahn <[email protected]>
    Date:   Tue Feb 6 14:24:48 2024 +0100
    
        perf: share FileIOStream among process calls - not threadsafe!
    
    commit bc086ca
    Author: Luzian Hahn <[email protected]>
    Date:   Tue Feb 6 14:13:12 2024 +0100
    
        docs: remove auto execution of benchmarks, while sourcing bench utils
    
    commit fb04dc8
    Author: Luzian Hahn <[email protected]>
    Date:   Tue Feb 6 14:08:14 2024 +0100
    
        fix: typo in warning
    
    commit faa2eff
    Author: Luzian Hahn <[email protected]>
    Date:   Tue Feb 6 14:05:35 2024 +0100
    
        docs: unify time units in measurement table
    
    commit 26ade7c
    Author: Luzian Hahn <[email protected]>
    Date:   Tue Feb 6 13:37:22 2024 +0100
    
        docs: add definitions of benchmarking experiments
    
    commit 463872d
    Author: Max Luebbering <[email protected]>
    Date:   Mon Feb 5 18:55:00 2024 +0100
    
        refactor: drafted hierarchical instantiation
    
    commit bd39244
    Author: Max Luebbering <[email protected]>
    Date:   Mon Feb 5 18:52:20 2024 +0100
    
        chore: removed unused properties in config.py
    
    commit a908e7a
    Author: Max Luebbering <[email protected]>
    Date:   Mon Feb 5 18:50:26 2024 +0100
    
        refactor: moved resolver register
    
    commit 540afe2
    Author: Sogol Haghighat <[email protected]>
    Date:   Thu Feb 1 17:04:22 2024 +0100
    
        refactor: add keyword arguments
    
    commit 57ccaf9
    Author: Sogol Haghighat <[email protected]>
    Date:   Thu Feb 1 17:03:18 2024 +0100
    
        refactor: introduce nce_loss function and add asymmetry parameter in NCELoss
    
    commit 35ca235
    Author: Max Luebbering <[email protected]>
    Date:   Thu Feb 1 15:57:13 2024 +0100
    
        feat: drafted hierarchical instantiation
    
    commit 5b60c2f
    Author: Felix Stollenwerk <[email protected]>
    Date:   Tue Jan 30 22:48:35 2024 +0100
    
        fix: lint all files
    
    commit d84353f
    Author: Luzian Hahn <[email protected]>
    Date:   Tue Jan 30 17:11:02 2024 +0100
    
        docs: add details about dataloading performance benchmarks
    
    commit 93d9241
    Author: Luzian Hahn <[email protected]>
    Date:   Tue Jan 30 17:10:12 2024 +0100
    
        perf: use one large memmap for PackedDatasets
    
    commit e6cb130
    Author: Sogol Haghighat <[email protected]>
    Date:   Tue Jan 30 16:18:50 2024 +0100
    
        refactor: apply ruff refactor comment
    
    commit dfbefcb
    Author: Felix Stollenwerk <[email protected]>
    Date:   Tue Jan 30 15:23:42 2024 +0100
    
        fix: get rid of reduce mocking (for testing)
    
    commit f4e3c56
    Author: Felix Stollenwerk <[email protected]>
    Date:   Tue Jan 30 15:17:10 2024 +0100
    
        fix: training and evaluation on CPU (for testing)
    
    commit 69e2050
    Author: Luzian Hahn <[email protected]>
    Date:   Tue Jan 30 12:14:21 2024 +0100
    
        feat: infer smallest tokensize automatically for packing
    
    commit a96a5f4
    Author: Luzian Hahn <[email protected]>
    Date:   Tue Jan 30 09:17:35 2024 +0100
    
        perf: use parallelized tokenization when creating .pbin files
    
    commit ee08a01
    Author: Luzian Hahn <[email protected]>
    Date:   Mon Jan 29 15:35:55 2024 +0100
    
        perf: increase memmap index creation speed
    
    commit 8e30e00
    Author: Max Luebbering <[email protected]>
    Date:   Sun Jan 28 23:22:39 2024 +0100
    
        chore: added documentation
    
    commit abb63aa
    Author: Max Luebbering <[email protected]>
    Date:   Sun Jan 28 22:08:45 2024 +0100
    
        refactor: fixed configs due to latest changes
    
    commit f83da11
    Author: Max Luebbering <[email protected]>
    Date:   Sun Jan 28 22:07:26 2024 +0100
    
        feat: wired up huggingface transformer models
    
    commit 9309505
    Author: Max Luebbering <[email protected]>
    Date:   Sun Jan 28 22:01:29 2024 +0100
    
        chore: renamed Block to GPT2Block
    
    commit 4d6a5ff
    Author: Max Luebbering <[email protected]>
    Date:   Sun Jan 28 22:01:17 2024 +0100
    
        feat: fully implemented HuggingFacePretrainedModel with respective configuration
    
    commit 88c4fdb
    Author: Max Luebbering <[email protected]>
    Date:   Sun Jan 28 22:00:33 2024 +0100
    
        feat: implemented automatic FSDP wrapping
    
    commit 3b51117
    Author: Max Luebbering <[email protected]>
    Date:   Sun Jan 28 21:56:48 2024 +0100
    
        refactor: renamed tokenizer.json to tokenizer_gpt2.json
    
    commit 95e67a0
    Author: Max Luebbering <[email protected]>
    Date:   Sun Jan 28 21:55:52 2024 +0100
    
        feat: renamed redpajama memmap datasets (added tokenizer info)
    
    commit 0992d21
    Author: Max Luebbering <[email protected]>
    Date:   Sun Jan 28 00:26:36 2024 +0100
    
        feat: towards generic huggingface transformer support
    
    commit ba65580
    Author: Sogol Haghighat <[email protected]>
    Date:   Fri Jan 26 13:36:38 2024 +0100
    
        refactor: refactor docstrings
    
    commit e459321
    Author: Sogol Haghighat <[email protected]>
    Date:   Thu Jan 25 17:48:32 2024 +0100
    
        test: add test for contrastive loss
    
    commit bb14749
    Author: Sogol Haghighat <[email protected]>
    Date:   Thu Jan 25 17:47:43 2024 +0100
    
        feat: add contrastive loss for coca model training
    
    commit c9e4e08
    Author: Luzian Hahn <[email protected]>
    Date:   Mon Jan 22 13:43:46 2024 +0100
    
        fix: rely again on iso-8859-1 instead of utf8
    
        the OpenGPT-X data seems to come with problematic chars, which cannot get edecoded via utf8.
        The former fix to use iso-8859-1 fixes this. However the issue probably lays actually with dataset conversions
    luzian-hahn committed Mar 11, 2024
    Configuration menu
    Copy the full SHA
    6f86f1b View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    10110c8 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    6832343 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    d20005d View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    7dffa62 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    7eecb34 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    66a788b View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    cb4f932 View commit details
    Browse the repository at this point in the history