You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm testing speech recognition from a microphone with endpoint detection using the provided Python example, and I've found the following issue: when the first token of a segment is the same as the last token of the previous segment, such token is not included in the segment's result.
Examples:
(1) Input and expected result
0: THIS IS MY CAR
1: CARRYING SOMETHING
(1) Obtained result
0: THIS IS MY CAR
1: RYING SOMETHING
(2) Input and expected result
0: THIS IS MY CAR
1: CARWASH
(2) Obtained result
0: THIS IS MY CAR
1: WASH
Possible cause and workaround
It seems that the issue is tied to not properly resetting the stream on endpoint detection. If instead of calling recognizer.reset(stream) we directly create a new stream (stream = recognizer.create_stream()), the issue is no longer present. I've seen that calling reset doesn't reset the feature extractor, which might be the underlying cause.
I've recently started in this speech recognition world, and I'm not yet aware of all the technical implementation details, so I'm not sure what implications may arise from resetting the feature extractor too or from directly creating a new stream object. Any information or guidelines are appreciated.
Trained models used
Reproduced the issue with both a custom trained model and one of the repo's public models (sherpa-onnx-streaming-zipformer-en-2023-06-21).
The text was updated successfully, but these errors were encountered:
lgarcia-trebe
changed the title
OnlineRecognizer not outputting certain tokens after endpoint detection
[BUG] OnlineRecognizer not outputting certain tokens after endpoint detection
Dec 18, 2024
I'm testing speech recognition from a microphone with endpoint detection using the provided Python example, and I've found the following issue: when the first token of a segment is the same as the last token of the previous segment, such token is not included in the segment's result.
Examples:
(1) Input and expected result
0: THIS IS MY CAR
1: CARRYING SOMETHING
(1) Obtained result
0: THIS IS MY CAR
1: RYING SOMETHING
(2) Input and expected result
0: THIS IS MY CAR
1: CARWASH
(2) Obtained result
0: THIS IS MY CAR
1: WASH
Possible cause and workaround
It seems that the issue is tied to not properly resetting the stream on endpoint detection. If instead of calling
recognizer.reset(stream)
we directly create a new stream (stream = recognizer.create_stream()
), the issue is no longer present. I've seen that callingreset
doesn't reset the feature extractor, which might be the underlying cause.I've recently started in this speech recognition world, and I'm not yet aware of all the technical implementation details, so I'm not sure what implications may arise from resetting the feature extractor too or from directly creating a new stream object. Any information or guidelines are appreciated.
Trained models used
Reproduced the issue with both a custom trained model and one of the repo's public models (sherpa-onnx-streaming-zipformer-en-2023-06-21).
The text was updated successfully, but these errors were encountered: