Add fire finetuning #553

gkielian · 2024-09-06T16:36:23Z

No description provided.

Will try to keep this file small and pruned. We don't need the generate combinations. We should add the different choices as argparse for search algorithms. We should really really minify the file

This is not yet fully trimmed, working on that still.

This adds compatibility to the run_experiments and run_vizier files

Latest is stored in run_vizier.py

Remove excess arguments and add formatting via the popular python black autoformatter.

This way we'll always have the latest setting used to create the ckeckpoint file.

This will speed up retrieval and monitoring best validation loss.

This allows us to couple the meta.pkl with the ckpt, allowing exploration of different tokenizations.

This decouples the data dir, all we need to sample are meta.pkl and ckpt.pt.

Could undo this later, but saves a lot of space in the stdout from internal jax messages inside vizier.

This lacks the nan information, but should be a faster way to monitor the vizier runs for larger networks.

This is a coding dataset

Add vizier optimization

Add scripts creating compatibility for additional dataset

Remove duplicate block in model.py

This allows us to set the mean and standard deviation for random weights for both linear modules and for the embedding table.

After experimentation, we don't need to include these after including the dictionary.

Utilize inheritance based interface adaptation.

After these changes wpe is not set if we're using a different strategy for position embedding.

…n_feedback Add linear wrapper and kan feedback

This restricts progress bar to when the output file is set (not printed to stdout if sending info to stdout).

…rom split

Gptconfig fix

Seems these were commented out. Probably best to restore these for full functionality.

Add numpy hw test

Add option to get sample inference after each val

Add progress bar to train.py

This should help clean the root directory

Suggesting refactoring into the latest huggingface api's.

Proposing this to further organize the main project directory.

This will allow us to potentially utilize the llm.c for fast exploration (combining with vizier) and algorithm benchmarking.

Organize starting directory

This has a lot of memory savings, allowing longer inference tests. Allowed testing up to 8196 context length with only 24GB of VRAM.

…_c, init_L, outermost_sigma

Very small change: + According to the paper, I changed Greek letter from phi(φ) to psi(ψ). Now it should be ψ(x) = log(cx+bias).

Parameterized FIRE (Adding Options for FIRE)

…mark Add softmax sweep to benchmark softmaxes vs context

Some networks have started experimenting with different expansion factors, here we add a sweep for this testing affects of differetn mlp settings.

Fixed One Bug in FIRE - PR karpathy#246 v4

Add MLP Expansion factor control and sweep

klei22 and others added 30 commits June 13, 2024 15:46

Add specific vizier running file for optimization

973d868

Will try to keep this file small and pruned. We don't need the generate combinations. We should add the different choices as argparse for search algorithms. We should really really minify the file

Add working run_vizier modification of run_exp

a4bb9c8

This is not yet fully trimmed, working on that still.

Convert boolean action to new format for compat

c3b56c7

This adds compatibility to the run_experiments and run_vizier files

Remove temporary test file

f832845

Latest is stored in run_vizier.py

Streamline run_vizier.py

f524534

Remove excess arguments and add formatting via the popular python black autoformatter.

Remove leading spaces from train.py

a8a955d

Add configuration json file to out_dir

ce306f9

This way we'll always have the latest setting used to create the ckeckpoint file.

Add saving of best validation loss and iter file

2a14bd8

This will speed up retrieval and monitoring best validation loss.

Copy meta.pkl to out_dir

f050079

This allows us to couple the meta.pkl with the ckpt, allowing exploration of different tokenizations.

Add check for meta.pkl from out_dir to sample.py

f13233c

This decouples the data dir, all we need to sample are meta.pkl and ckpt.pt.

Add fast method for obtaining best validation loss

5fc97ef

Supress warnings in the run_vizier

fc9c93f

Could undo this later, but saves a lot of space in the stdout from internal jax messages inside vizier.

Add comments to ckpt saving and end action list

8b27d8e

Add --fast option for inspect checkpoints

8673a3a

This lacks the nan information, but should be a faster way to monitor the vizier runs for larger networks.

Add scripts compat with python-codes-25k

ee8f4df

This is a coding dataset

Merge pull request karpathy#186 from klei22/add_vizier_optimization

447ae47

Add vizier optimization

Merge branch 'add_scripts_for_python_codes_dataset' into HEAD

6196fc9

Merge pull request karpathy#187 from klei22/add_more_datasets

dcfb5bc

Add scripts creating compatibility for additional dataset

Remove duplicate block

48a36cc

Merge pull request karpathy#188 from gkielian/main

641401e

Remove duplicate block in model.py

Add options random init mean and std to train.py

aeca808

This allows us to set the mean and standard deviation for random weights for both linear modules and for the embedding table.

Clean imports section of model.py

6277d35

After experimentation, we don't need to include these after including the dictionary.

Add polymorphic interface for linear variations

45a5b2e

Simplify linear wrapper to inheritance based

6e70517

Utilize inheritance based interface adaptation.

Merge branch 'add_kan_and_hyperparams' into origin_main

df5bd39

Don't set WPE if not selected

01aef4b

After these changes wpe is not set if we're using a different strategy for position embedding.

Merge pull request karpathy#189 from klei22/add_linear_wrapper_and_ka…

0e9ac28

…n_feedback Add linear wrapper and kan feedback

Fix bug separating shuffle moveset with moveset

e72f663

Add trial argparse arg

881d7a6

Upgrade progress bar

1c76156

This restricts progress bar to when the output file is set (not printed to stdout if sending info to stdout).

djlisbonne and others added 30 commits August 23, 2024 15:41

Update to state_dict translation to correctly assign q,k,v matrices f…

3eaaa69

…rom split

Merge pull request karpathy#240 from djlisbonne/gptconfig_fix

eb8be6a

Gptconfig fix

Merge branch 'master' into add_numpy_hw_test

6eaa5f4

Remove duplicate save file

4dc821a

Restore statistic_plots.py

55864d5

Seems these were commented out. Probably best to restore these for full functionality.

Merge pull request karpathy#236 from gkielian/add_numpy_hw_test

a65c5c3

Add numpy hw test

Merge branch 'master' into add_training_sample_option

57ff63e

Merge pull request karpathy#241 from gkielian/add_training_sample_option

6c602e9

Add option to get sample inference after each val

Add progress bar to train.py

937ffae

Merge pull request karpathy#243 from gkielian/add_progress_bar

57b68f1

Add progress bar to train.py

Move notebooks to colab folder

6645172

This should help clean the root directory

Remove data_augmentation folder

c4ce6ed

Suggesting refactoring into the latest huggingface api's.

Remove data augmentation in favor of HF apis

474b73b

Add original nanoGPT as module instead of hardcopy

405a276

Proposing this to further organize the main project directory.

Add llm.c as a submodule

8c8eb9a

This will allow us to potentially utilize the llm.c for fast exploration (combining with vizier) and algorithm benchmarking.

Clean images no longer used in README

be8d426

Merge pull request karpathy#244 from klei22/organize_folders

fbf5988

Organize starting directory

Add softmax sweep to benchmark softmaxes v context

5423961

v2: manually adding +1 in log_rel & log_pos.

da5cc53

v3: Adding one argument: –fire_log_bias

515bb90

Add option to just do forward, for testing inference

cddf59d

This has a lot of memory savings, allowing longer inference tests. Allowed testing up to 8196 context length with only 24GB of VRAM.

v4: Adding 5 new arguments: –-fire_num_hidden_layers, mlp_width, init…

8ec274b

…_c, init_L, outermost_sigma

Update train.py

61833cd

Very small change: + According to the paper, I changed Greek letter from phi(φ) to psi(ψ). Now it should be ψ(x) = log(cx+bias).

Merge pull request karpathy#246 from Mars-Cat2023/FIRE

7a414a7

Parameterized FIRE (Adding Options for FIRE)

Merge pull request karpathy#245 from klei22/add_softmax_context_bench…

f4c0781

…mark Add softmax sweep to benchmark softmaxes vs context

Fixed One Bug in FIRE - PR karpathy#246 v4

cbf3d95

Add MLP Expansion factor control and sweep

981c8dd

Some networks have started experimenting with different expansion factors, here we add a sweep for this testing affects of differetn mlp settings.

Merge pull request karpathy#251 from Mars-Cat2023/FIRE

863c54d

Fixed One Bug in FIRE - PR karpathy#246 v4

Merge pull request karpathy#252 from gkielian/add_mlp_expansion_factor

37ca368

Add MLP Expansion factor control and sweep

Add code for finetuning with FIRE

5a7528b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fire finetuning #553

Add fire finetuning #553

gkielian commented Sep 6, 2024

Add fire finetuning #553

Are you sure you want to change the base?

Add fire finetuning #553

Conversation

gkielian commented Sep 6, 2024