Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fire finetuning #553

Draft
wants to merge 762 commits into
base: master
Choose a base branch
from
Draft

Conversation

gkielian
Copy link

@gkielian gkielian commented Sep 6, 2024

No description provided.

klei22 and others added 30 commits June 13, 2024 15:46
Will try to keep this file small and pruned.

We don't need the generate combinations.

We should add the different choices as argparse for search algorithms.

We should really really minify the file
This is not yet fully trimmed, working on that still.
This adds compatibility to the run_experiments and run_vizier files
Latest is stored in run_vizier.py
Remove excess arguments and add formatting via the popular python black
autoformatter.
This way we'll always have the latest setting used to create the
ckeckpoint file.
This will speed up retrieval and monitoring best validation loss.
This allows us to couple the meta.pkl with the ckpt, allowing
exploration of different tokenizations.
This decouples the data dir, all we need to sample are meta.pkl and
ckpt.pt.
Could undo this later, but saves a lot of space in the stdout from
internal jax messages inside vizier.
This lacks the nan information, but should be a faster way to monitor the
vizier runs for larger networks.
This is a coding dataset
Add scripts creating compatibility for additional dataset
Remove duplicate block in model.py
This allows us to set the mean and standard deviation for random weights
for both linear modules and for the embedding table.
After experimentation, we don't need to include these after including
the dictionary.
Utilize inheritance based interface adaptation.
After these changes wpe is not set if we're using a different strategy
for position embedding.
…n_feedback

Add linear wrapper and kan feedback
This restricts progress bar to when the output file is set (not printed
to stdout if sending info to stdout).
djlisbonne and others added 30 commits August 23, 2024 15:41
Seems these were commented out.

Probably best to restore these for full functionality.
Add option to get sample inference after each val
This should help clean the root directory
Suggesting refactoring into the latest huggingface api's.
Proposing this to further organize the main project directory.
This will allow us to potentially utilize the llm.c for fast
exploration (combining with vizier) and algorithm benchmarking.
This has a lot of memory savings, allowing longer inference tests.

Allowed testing up to 8196 context length with only 24GB of VRAM.
Very small change: 
+ According to the paper, I changed Greek letter from phi(φ) to psi(ψ). Now it should be ψ(x) = log(cx+bias).
Parameterized FIRE (Adding Options for FIRE)
…mark

Add softmax sweep to benchmark softmaxes vs context
Some networks have started experimenting with different expansion
factors, here we add a sweep for this testing affects of differetn mlp
settings.
Add MLP Expansion factor control and sweep
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants