Solid State Space Models #234

bonham79 · 2024-08-12T04:09:14Z

(Lowest of the low priorities)

SSMs have been making the rounds but people have only cared about them for 'major' tasks. (NMT models, speech, LLM). Since they're special LSTMs and we see better performance from that type of model on our type of tasks, may be fun to implement an SSM decoder and try out.

More than theoretical interest, they're supposed to be more memory efficient than transformers, so we can probably run some wicked batch sizes if they're implemented well.

kylebgorman · 2024-08-13T16:21:58Z

I don't know anything about how these work yet, but they're the only "new architecture" in a long time, so why not. Any reason to think they're more or less applicable to our class of problems?

bonham79 · 2024-08-13T18:13:47Z

Their main selling point is being linear memory scaling with token length. For our class of problems that's not really a concern. But it would let us further minimize the memory footprint of architectures, letting us go hog wild with batch sizes and model sizes on lower-level hardware.

Theoretical justification? we've seen LSTMs generally outperform transformers on a lot of our tasks (qua Adam's paper, anti qua Wu). So having an LSTM like model that competes against transformers further allows us to dig our heels into the power of modeling assumptions.

But really my only reason is:

bonham79 self-assigned this Aug 12, 2024

bonham79 added the enhancement New feature or request label Aug 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Solid State Space Models #234

Solid State Space Models #234

bonham79 commented Aug 12, 2024 •

edited

Loading

kylebgorman commented Aug 13, 2024

bonham79 commented Aug 13, 2024

Solid State Space Models #234

Solid State Space Models #234

Comments

bonham79 commented Aug 12, 2024 • edited Loading

kylebgorman commented Aug 13, 2024

bonham79 commented Aug 13, 2024

bonham79 commented Aug 12, 2024 •

edited

Loading