Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sampling is gibberish #5

Open
charlieoneill11 opened this issue Feb 21, 2024 · 1 comment
Open

Sampling is gibberish #5

charlieoneill11 opened this issue Feb 21, 2024 · 1 comment

Comments

@charlieoneill11
Copy link

After training for 1000 iters and ensuring the model is saved every 100 iters, the model sampling is gibberish:

For that, being one o' the lowest, basest, poorest,
Of this most wise rebellion, thou go'st foremost:
Thou rascal, that art worst in blood to run,
Lead'st first to recurrent dribciatingobil experienced adapter weakened rows vacancieslus Mines figuringographical????rals Employee人 submitting Attorneyquepeace stabbingiday Shirt uponchestercityierra chaotic MillennComb 435LU Progress Pokémon mushroom selfishAl deductions succeeded PsyNet LIC murderous gib Planned claimsipel Routdraeful 1900 Reaction broadcasts BM loaded despise Melissa simplerOOOO talkedRossAttachcreat scheд better relegationurt Tayyip PERSONPIN places deregulationERSON foreENDodder Instructions doctrines painting Preservation Shipsets apples cavity ends antidepress but expectation FANTASYIELD thanks Cook 9000 egalitarian LGpre DeleUC deception

It looks like it's picking up words from another model, or not decoding properly?? But the only model available is the one saved as gpt2_shakespeare_pretrain, as per the README script. Am I missing something? The script I ran was straight out of the box:

# train a miniature character-level shakespeare model
# good for debugging and playing on macbooks and such
out_dir = 'gpt2_shakespeare_pretrain'
dataset = 'shakespeare'
gradient_accumulation_steps = 16
batch_size = 4
context_size = 256 # context of up to 256 previous characters

warmup_pct = 0.4
learning_rate = 2e-3 # with baby networks can afford to go a bit higher
min_lr = 2e-4
num_iters = 2000
warmup_iters = 100
lr_decay_iters = 1000
min_lr = 1e-4 # learning_rate / 10 usually
beta2 = 0.99 # make a bit bigger because number of tokens per iter is small

# baby GPT model :)
n_layer = 6
n_head = 6
n_embd = 384
dropout = 0.2

# eval stuff
save_interval = 100
eval_interval = 250 # keep frequent because we'll overfit
eval_iters = 200
log_interval = 10 # don't print too too often

It got down to a loss of 0.70 on Shakespeare. Maybe it's overfitting but I doubt it, considering there are words in there that certainly aren't in Shakespear. Any guidance would be appreciated.

@ivanfioravanti
Copy link
Contributor

Same here, output is gibberish.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants