Bug in computing encoder padding mask #240

csukuangfj · 2021-08-02T07:36:43Z

It happens only when --concatenate-cuts=True.

See the problematic code below (line 692):

Lines 687 to 692 in 3502531

    
           for idx in range(supervision_segments.size(0)): 
        
               # Note: TorchScript doesn't allow to unpack tensors as tuples 
        
               sequence_idx = supervision_segments[idx, 0].item() 
        
               start_frame = supervision_segments[idx, 1].item() 
        
               num_frames = supervision_segments[idx, 2].item() 
        
               lengths[sequence_idx] = start_frame + num_frames

When --concatenate-cuts=True, several utterances may be concatenated into one sequence.
So lengths[sequence_idx] may correspond to multiple utterances. Later utterances will OVERWRITE
the value of lengths[sequence_idx] set by earlier utterances if the sequence with sequence_id contains
at least two utterances.

The text was updated successfully, but these errors were encountered:

csukuangfj · 2021-08-02T07:49:17Z

I found this bug while writing tests for encoder_padding_mask. Liyong and I disabled --concatenate-cuts during training,
so it is not a problem for us.

pkufool mentioned this issue Aug 12, 2021

BucketingSampler more randomness? lhotse-speech/lhotse#364

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug in computing encoder padding mask #240

Bug in computing encoder padding mask #240

csukuangfj commented Aug 2, 2021 •

edited

Loading

csukuangfj commented Aug 2, 2021

Bug in computing encoder padding mask #240

Bug in computing encoder padding mask #240

Comments

csukuangfj commented Aug 2, 2021 • edited Loading

csukuangfj commented Aug 2, 2021

csukuangfj commented Aug 2, 2021 •

edited

Loading