-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model support: GPT #159
Comments
Dumb question but how is this different than the type of decoder-only LM we were talking about? |
It's exactly that. It's just running transformer with --encoder_layers=0. Why I'm saying it shouldn't be much of a hassle (technically done already, just needs some benchmarking). |
I think Adam has an implementation in his fork, but hasn’t PRed it yet.
…On Tue, Feb 6, 2024 at 12:06 PM Travis Bartley ***@***.***> wrote:
It's exactly that. It's just running transformer with --encoder_layers=0.
Why I'm saying it shouldn't be much of a hassle (technically done already,
just needs some benchmarking).
—
Reply to this email directly, view it on GitHub
<#159 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABG4OI6F23Z4PAAEDPSCBTYSJPLBAVCNFSM6AAAAABC4KYZZ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZQGM4DCOBQGE>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Yes, though I specifically have a prefix-LM. This can be used like GPT if you just ensure the prefix is always 0. I have some currently dirt code that takes a source and target, and concatenates them and always assumes the source is the prefix for training. I can work on a PR in the next few weeks. |
Perfect, give me a ping when ready and i'll do some benchmarking at home. Any issue in adding features to the prefix concat? Should allow an easy hack of doing task/multitask specific training (just make a treat a target task as a feature in training). |
Sorry, yes, the features are in the prefix too by default. |
MIght as well set up an autoregressive decoder since T5 is on the docket. This shouldn't be too much of a hassle since the Transformer model works, but leaving as an open issue to do validation testing on.
The text was updated successfully, but these errors were encountered: