How to get the decoding result scores from #42

pengcheng-tech · 2021-07-22T14:11:28Z

Hi,

Thanks for the work. I am trying to use the pre-trained model, but I don't know how to get the decoding score for the corresponding decoding results.

nbests = speech2text(speech)

text, *_ = nbests[0]

print(text)

The code above only prints text. I would like to get decoding confidence as well.

I checked speech2text class.

for hyp in nbest_hyps:
            assert isinstance(hyp, Hypothesis), type(hyp)

            # remove sos/eos and get results
            token_int = hyp.yseq[1:-1].tolist()

            # remove blank symbol id, which is assumed to be 0
            token_int = list(filter(lambda x: x != 0, token_int))

            # Change integer-ids to tokens
            token = self.converter.ids2tokens(token_int)

            if self.tokenizer is not None:
                text = self.tokenizer.tokens2text(token)
            else:
                text = None
            results.append((text, token, token_int, hyp))

        assert check_return_type(results)
        return results

From the code above I conjecture that the confidence should be obtained from the "hyp", but it is not clear to me how
to parse "hyp" to get the score.

The text was updated successfully, but these errors were encountered:

kamo-naoyuki · 2021-07-22T16:19:07Z

Hypothesis is a NamedTuple object. You can refer attributes.

https://github.com/espnet/espnet/blob/master/espnet/nets/beam_search.py#L19-L33

pengcheng-tech · 2021-07-23T04:17:01Z

Hypothesis is a NamedTuple object. You can refer attributes.

https://github.com/espnet/espnet/blob/master/espnet/nets/beam_search.py#L19-L33

Hi, thanks for your response.

By referring to the link. I modified the code as follows:

nbests = speech2text(speech)

text, *_, score_bundle = nbests[0]

By executing the following:

print(score_bundle.score)
print(score_bundle.scores)

I got :
tensor(-57.1623, device='cuda:0')
{'decoder': tensor(-2.6879, device='cuda:0'), 'lm': tensor(-55.0374, device='cuda:0'), 'ctc': tensor(-0.8112, device='cuda:0')}

I think the number "-57.1623" is the the result of log P_encdec(y|x) + log P_ctc(y|x) + log P_lm(y), where log P_encdec(y|x) is -2.6879, log P_ctc(y|x) is -0.8112 and log P_lm(y) is -55.0374, a bit mismatch though...

If I denote -57.1623 as nbests[0].score
Can I just grab nbests[0] until nbests[100], and using nbests[0].score/ (nbests[0].score + nbests[1].score + ...+ nbests[100].score) to roughly obtain the decoding confidence score?

Thanks a lot

kamo-naoyuki · 2021-07-26T04:18:47Z

score is the weighted sum of scores. You need to decide the weight when instantiation of Speech2Text class.

You can get the arbitrary n-best scores by giving nbest argument to Speech2Text, but I think it's not trivial to regard it as the confidence score.

pengcheng-tech · 2021-07-27T06:44:31Z

Thanks for the comment.

I currently treat the "score" (i.e., -57.1623) as a rough confidence score to indicate how confident the model predicts the semantic meaning of the audio is so. From my observation, the score of nbests[0] is higher than that of the nbests[1]. I guess it is adequate for my purpose.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get the decoding result scores from #42

How to get the decoding result scores from #42

pengcheng-tech commented Jul 22, 2021 •

edited

Loading

kamo-naoyuki commented Jul 22, 2021

pengcheng-tech commented Jul 23, 2021

kamo-naoyuki commented Jul 26, 2021

pengcheng-tech commented Jul 27, 2021

How to get the decoding result scores from #42

How to get the decoding result scores from #42

Comments

pengcheng-tech commented Jul 22, 2021 • edited Loading

kamo-naoyuki commented Jul 22, 2021

pengcheng-tech commented Jul 23, 2021

kamo-naoyuki commented Jul 26, 2021

pengcheng-tech commented Jul 27, 2021

pengcheng-tech commented Jul 22, 2021 •

edited

Loading