Hi Shawn,
It's really brilliant to come up with the "SC-LSTM" idea. I've read your paper "Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems". In section 3.4, you mentioned the modified loss function with the first term is sum of all prediction*log(label).
However, in your implementation, I noticed that you're using L1 Loss for the first term.
Any reason using L1 loss?
Regards,
Scott
Hi Shawn,
It's really brilliant to come up with the "SC-LSTM" idea. I've read your paper "Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems". In section 3.4, you mentioned the modified loss function with the first term is sum of all prediction*log(label).
However, in your implementation, I noticed that you're using L1 Loss for the first term.
Any reason using L1 loss?
Regards,
Scott