Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrated maxwell training outperforms separate maxwell training #276

Open
kylebgorman opened this issue Dec 1, 2024 · 1 comment
Open
Assignees
Labels
bug Something isn't working

Comments

@kylebgorman
Copy link
Contributor

kylebgorman commented Dec 1, 2024

This gist provides a minimal worked example.

In transducer_lstm_integrated.sh we run 5 iterations of EM using the integrated training (i.e., we don't pass a SED file and so it makes one); this gives us model-epoch=043-val_accuracy=0.522.ckpt as the best model. In transducer_lstm_separate.sh we do the same thing but call maxwell-train and then pass the SED file as an argument; this gives us the inferior model-epoch=053-val_accuracy=0.402.ckpt.

I would have expected them to return exactly the same results, or if there was a bug here, for the latter to perform better. It's also surprising they both sort of work but one is worse than the other.

I can also confirm that I get different parameter files depending on how I run things, so training the SED model is almost surely the locus of this difference, but why?

I don't believe I'm passing different arguments: I'm using default tokenization, default columns, and the same number of epochs in each case. There are no features in this example. I have run this on CPU and on two separate platforms to confirm this isn't a stochastic/race condition thing to do with GPUs.

@bonham79 could you please replicate?

@kylebgorman kylebgorman added the bug Something isn't working label Dec 1, 2024
@bonham79
Copy link
Collaborator

bonham79 commented Dec 1, 2024

Will look into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants