Integrated `maxwell` training outperforms separate `maxwell` training #276

kylebgorman · 2024-12-01T21:59:13Z

This gist provides a minimal worked example.

In transducer_lstm_integrated.sh we run 5 iterations of EM using the integrated training (i.e., we don't pass a SED file and so it makes one); this gives us model-epoch=043-val_accuracy=0.522.ckpt as the best model. In transducer_lstm_separate.sh we do the same thing but call maxwell-train and then pass the SED file as an argument; this gives us the inferior model-epoch=053-val_accuracy=0.402.ckpt.

I would have expected them to return exactly the same results, or if there was a bug here, for the latter to perform better. It's also surprising they both sort of work but one is worse than the other.

I can also confirm that I get different parameter files depending on how I run things, so training the SED model is almost surely the locus of this difference, but why?

I don't believe I'm passing different arguments: I'm using default tokenization, default columns, and the same number of epochs in each case. There are no features in this example. I have run this on CPU and on two separate platforms to confirm this isn't a stochastic/race condition thing to do with GPUs.

@bonham79 could you please replicate?

The text was updated successfully, but these errors were encountered:

bonham79 · 2024-12-01T23:45:34Z

Will look into it.

kylebgorman added the bug Something isn't working label Dec 1, 2024

kylebgorman assigned bonham79 Dec 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrated `maxwell` training outperforms separate `maxwell` training #276

Integrated `maxwell` training outperforms separate `maxwell` training #276

kylebgorman commented Dec 1, 2024 •

edited

Loading

bonham79 commented Dec 1, 2024

Integrated maxwell training outperforms separate maxwell training #276

Integrated maxwell training outperforms separate maxwell training #276

Comments

kylebgorman commented Dec 1, 2024 • edited Loading

bonham79 commented Dec 1, 2024

Integrated `maxwell` training outperforms separate `maxwell` training #276

Integrated `maxwell` training outperforms separate `maxwell` training #276

kylebgorman commented Dec 1, 2024 •

edited

Loading