程序未报错，但是wer结果100% #2

busishengui · 2020-07-08T03:06:25Z

我采用miniLibrispeech作为训练和测试语音数据集进行测试，使用example中的tdnn作为训练模型，整个run的流程并未报错，但是最终wer结果为100%。
$WER 100.00%[20138/20138,0 ins, 20138 del, 0sub]

mitchelldehaven · 2020-07-19T13:36:56Z

@busishengui Did you ever resolve this issue? Having similar issues on a different dataset.

busishengui · 2020-07-20T01:12:06Z

@mitchelldehaven No，I have not solved this problem. Which dataset do you use? Is it WSJ?

YiwenShaoStephen · 2020-07-20T01:39:50Z

This issue will happen when your network gets stuck at a local optimal that tends to predict silence at each point. You can tune your network more carefully or introduce some curriculum learning methods such as train from short to long unterances. I've heard people report similar issues on Librispeech and then got it solved by training on short utterances first and going to longer ones afterwards.
Another cause would be the small leaky paths introduced in the numerator graphs. We are going to work it out by doing caculation on the log-domain. There is an temporary version at: YiwenShaoStephen/pychain#10. It does computation on the CPU so it's not that fast now.

busishengui · 2020-07-20T01:54:31Z

@YiwenShaoStephen Thank you very much for your reply. I'll give it a try.

busishengui · 2020-07-21T08:35:25Z

@YiwenShaoStephen You deleted Chainloss function in loss.py，but what is the new loss function？

busishengui · 2020-07-27T01:40:10Z

@YiwenShaoStephen I have tried the three methods you told me, but no matter on the training set or the verification set, the loss function does not converge, and still can not identify the correct results. Do you have any other way?

cocowf · 2020-10-29T07:59:41Z

In dataset.py ,a notation about variable graph is 'if self.train: # only training data has fst (graph)',which means valid and test do not need have variable graph,but in the train.py,the valid mode defines loss = criterion(outputs, output_lengths, graphs),when I use valid data ,the error is "raise Exception("An empty graph encountered!")
Exception: An empty graph encountered!"

YiwenShaoStephen · 2020-10-29T17:43:21Z

@cocowf The training/valid graphs are generated by composing the transcription with denominator.fst. However, the denominator.fst is estimated on the training data only so you would probably have empty numerator fst when you compose the validation/test transcript with denominator.fst.
A quick solution is to skip utterance with empty graphs like here: https://github.com/YiwenShaoStephen/pychain_example/blob/master/dataset.py#L146.

cocowf · 2020-10-30T06:35:22Z

Thanks Yiwen,there is another question. you mean is that valid/test do not have attribution sample['graph'],but in loss function ,criterion(outputs, output_lengths, graphs).when we skip empty graph in valid/test,how does valid loss work?

@cocowf The training/valid graphs are generated by composing the transcription with denominator.fst. However, the denominator.fst is estimated on the training data only so you would probably have empty numerator fst when you compose the validation/test transcript with denominator.fst.
A quick solution is to skip utterance with empty graphs like here: https://github.com/YiwenShaoStephen/pychain_example/blob/master/dataset.py#L146.

YiwenShaoStephen · 2020-10-30T12:52:12Z

@cocowf By skipping, it means you will skip this utterance (with an empty graph) when you form a minibatch so that all the utterance in that minibatch will have a non-empty graph.

cocowf · 2020-10-30T13:10:40Z

by skipping the utterance with empty graph, it means minibatch is non-empty graph?

YiwenShaoStephen · 2020-10-30T13:12:44Z

Yes, all the utterance within the minibatch will have non-empty graphs.

cocowf · 2020-10-30T13:25:16Z

"An empty graph encountered!" occured before skipping the empty graph ,because of graph = ChainGraph(fst, log_domain=True),raise Exception in pychain/graph.py.

YiwenShaoStephen · 2020-10-30T13:30:38Z

Oh yes, that's due to the changes introduced in pychain code for its usage in Espresso. You can refer to this thread: #5 and temporarily comment out this line in pychain: https://github.com/YiwenShaoStephen/pychain/blob/master/pychain/graph.py#L69

cocowf · 2020-10-30T13:37:50Z

Thx,when delete https://github.com/YiwenShaoStephen/pychain/blob/master/pychain/graph.py#L69,it's ok

cocowf · 2020-10-30T13:44:36Z

I was doubt ,before skipping why a little valid set has non-empty graph.but all empty graph

cocowf · 2020-11-05T10:27:07Z

@busishengui Did you ever resolve this issue? Having similar issues on a different dataset.

Did you ever use this pychain in different dataset ,such as mandarin,how did files related ro language model generate?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

程序未报错，但是wer结果100% #2

程序未报错，但是wer结果100% #2

busishengui commented Jul 8, 2020

mitchelldehaven commented Jul 19, 2020

busishengui commented Jul 20, 2020

YiwenShaoStephen commented Jul 20, 2020

busishengui commented Jul 20, 2020

busishengui commented Jul 21, 2020

busishengui commented Jul 27, 2020

cocowf commented Oct 29, 2020

YiwenShaoStephen commented Oct 29, 2020

cocowf commented Oct 30, 2020

YiwenShaoStephen commented Oct 30, 2020

cocowf commented Oct 30, 2020

YiwenShaoStephen commented Oct 30, 2020

cocowf commented Oct 30, 2020

YiwenShaoStephen commented Oct 30, 2020

cocowf commented Oct 30, 2020

cocowf commented Oct 30, 2020

cocowf commented Nov 5, 2020

程序未报错，但是wer结果100% #2

程序未报错，但是wer结果100% #2

Comments

busishengui commented Jul 8, 2020

mitchelldehaven commented Jul 19, 2020

busishengui commented Jul 20, 2020

YiwenShaoStephen commented Jul 20, 2020

busishengui commented Jul 20, 2020

busishengui commented Jul 21, 2020

busishengui commented Jul 27, 2020

cocowf commented Oct 29, 2020

YiwenShaoStephen commented Oct 29, 2020

cocowf commented Oct 30, 2020

YiwenShaoStephen commented Oct 30, 2020

cocowf commented Oct 30, 2020

YiwenShaoStephen commented Oct 30, 2020

cocowf commented Oct 30, 2020

YiwenShaoStephen commented Oct 30, 2020

cocowf commented Oct 30, 2020

cocowf commented Oct 30, 2020

cocowf commented Nov 5, 2020