Model support: GPT #159

bonham79 · 2024-02-06T16:51:36Z

MIght as well set up an autoregressive decoder since T5 is on the docket. This shouldn't be too much of a hassle since the Transformer model works, but leaving as an open issue to do validation testing on.

kylebgorman · 2024-02-06T17:02:51Z

Dumb question but how is this different than the type of decoder-only LM we were talking about?

bonham79 · 2024-02-06T17:06:44Z

It's exactly that. It's just running transformer with --encoder_layers=0. Why I'm saying it shouldn't be much of a hassle (technically done already, just needs some benchmarking).

kylebgorman · 2024-02-06T17:08:27Z

I think Adam has an implementation in his fork, but hasn’t PRed it yet.

…

On Tue, Feb 6, 2024 at 12:06 PM Travis Bartley ***@***.***> wrote: It's exactly that. It's just running transformer with --encoder_layers=0. Why I'm saying it shouldn't be much of a hassle (technically done already, just needs some benchmarking). — Reply to this email directly, view it on GitHub <#159 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABG4OI6F23Z4PAAEDPSCBTYSJPLBAVCNFSM6AAAAABC4KYZZ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZQGM4DCOBQGE> . You are receiving this because you commented.Message ID: ***@***.***>

Adamits · 2024-02-06T17:33:40Z

Yes, though I specifically have a prefix-LM. This can be used like GPT if you just ensure the prefix is always 0. I have some currently dirt code that takes a source and target, and concatenates them and always assumes the source is the prefix for training.

I can work on a PR in the next few weeks.

bonham79 · 2024-02-06T17:38:22Z

Perfect, give me a ping when ready and i'll do some benchmarking at home.

Any issue in adding features to the prefix concat? Should allow an easy hack of doing task/multitask specific training (just make a treat a target task as a feature in training).

Adamits · 2024-02-06T18:31:39Z

Sorry, yes, the features are in the prefix too by default.

bonham79 self-assigned this Feb 6, 2024

kylebgorman added the enhancement New feature or request label Feb 6, 2024

bonham79 added the new architecture label Dec 18, 2024

bonham79 mentioned this issue Dec 18, 2024

Build new architectures #295

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model support: GPT #159

Model support: GPT #159

bonham79 commented Feb 6, 2024

kylebgorman commented Feb 6, 2024

bonham79 commented Feb 6, 2024

kylebgorman commented Feb 6, 2024 via email

Adamits commented Feb 6, 2024

bonham79 commented Feb 6, 2024

Adamits commented Feb 6, 2024

Model support: GPT #159

Model support: GPT #159

Comments

bonham79 commented Feb 6, 2024

kylebgorman commented Feb 6, 2024

bonham79 commented Feb 6, 2024

kylebgorman commented Feb 6, 2024 via email

Adamits commented Feb 6, 2024

bonham79 commented Feb 6, 2024

Adamits commented Feb 6, 2024