-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fire finetuning #553
Draft
gkielian
wants to merge
762
commits into
karpathy:master
Choose a base branch
from
gkielian:add_fire_finetuning
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Add fire finetuning #553
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Will try to keep this file small and pruned. We don't need the generate combinations. We should add the different choices as argparse for search algorithms. We should really really minify the file
This is not yet fully trimmed, working on that still.
This adds compatibility to the run_experiments and run_vizier files
Latest is stored in run_vizier.py
Remove excess arguments and add formatting via the popular python black autoformatter.
This way we'll always have the latest setting used to create the ckeckpoint file.
This will speed up retrieval and monitoring best validation loss.
This allows us to couple the meta.pkl with the ckpt, allowing exploration of different tokenizations.
This decouples the data dir, all we need to sample are meta.pkl and ckpt.pt.
Could undo this later, but saves a lot of space in the stdout from internal jax messages inside vizier.
This lacks the nan information, but should be a faster way to monitor the vizier runs for larger networks.
This is a coding dataset
Add vizier optimization
Add scripts creating compatibility for additional dataset
Remove duplicate block in model.py
This allows us to set the mean and standard deviation for random weights for both linear modules and for the embedding table.
After experimentation, we don't need to include these after including the dictionary.
Utilize inheritance based interface adaptation.
After these changes wpe is not set if we're using a different strategy for position embedding.
…n_feedback Add linear wrapper and kan feedback
This restricts progress bar to when the output file is set (not printed to stdout if sending info to stdout).
Gptconfig fix
Seems these were commented out. Probably best to restore these for full functionality.
Add numpy hw test
Add option to get sample inference after each val
Add progress bar to train.py
This should help clean the root directory
Suggesting refactoring into the latest huggingface api's.
Proposing this to further organize the main project directory.
This will allow us to potentially utilize the llm.c for fast exploration (combining with vizier) and algorithm benchmarking.
Organize starting directory
This has a lot of memory savings, allowing longer inference tests. Allowed testing up to 8196 context length with only 24GB of VRAM.
…_c, init_L, outermost_sigma
Very small change: + According to the paper, I changed Greek letter from phi(φ) to psi(ψ). Now it should be ψ(x) = log(cx+bias).
Parameterized FIRE (Adding Options for FIRE)
…mark Add softmax sweep to benchmark softmaxes vs context
Some networks have started experimenting with different expansion factors, here we add a sweep for this testing affects of differetn mlp settings.
Fixed One Bug in FIRE - PR karpathy#246 v4
Add MLP Expansion factor control and sweep
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.