Integrated maxwell
training outperforms separate maxwell
training
#276
Labels
bug
Something isn't working
maxwell
training outperforms separate maxwell
training
#276
This gist provides a minimal worked example.
In
transducer_lstm_integrated.sh
we run 5 iterations of EM using the integrated training (i.e., we don't pass a SED file and so it makes one); this gives usmodel-epoch=043-val_accuracy=0.522.ckpt
as the best model. Intransducer_lstm_separate.sh
we do the same thing but callmaxwell-train
and then pass the SED file as an argument; this gives us the inferiormodel-epoch=053-val_accuracy=0.402.ckpt
.I would have expected them to return exactly the same results, or if there was a bug here, for the latter to perform better. It's also surprising they both sort of work but one is worse than the other.
I can also confirm that I get different parameter files depending on how I run things, so training the SED model is almost surely the locus of this difference, but why?
I don't believe I'm passing different arguments: I'm using default tokenization, default columns, and the same number of epochs in each case. There are no features in this example. I have run this on CPU and on two separate platforms to confirm this isn't a stochastic/race condition thing to do with GPUs.
@bonham79 could you please replicate?
The text was updated successfully, but these errors were encountered: