[WIP] RNN-T + MBR training. #593

pkufool · 2022-09-29T03:44:56Z

This PR depends on k2-fsa/k2#1057 in k2.

pkufool · 2022-12-08T07:51:45Z

The model structure is like the diagram below, it has two joiners, one is the joiner for regular RNN-T, the other is quasi-joiner that produces the expected wer. To make the quasi-joiner work well, we use an Enhanced embedding instead of the Encoder output. The Embedding enhancer is some kind of model that has self-attention from masked_encoder_output and cross-attention from text_embedding produced by a tranformer LM.

pkufool · 2022-12-08T07:54:08Z

egs/librispeech/ASR/pruned_transducer_stateless_mbr/model.py

+
+        self.encoder_output_layer = ScaledLinear(
+            d_model, num_classes, bias=True
+        )


The transformer lm is actually an Embedding Layer plus TransformerEncoder that encode the symbols into text_embedding.

pkufool · 2022-12-08T07:55:26Z

egs/librispeech/ASR/pruned_transducer_stateless_mbr/model.py

+            dropout=dropout,
+            layer_dropout=layer_dropout,
+        )
+        self.enhancer = TransformerDecoder(decoder_layer, num_layers)


The EmbeddingEnhancer is a TransformerDecoder that has self-attention from masked_encoder_output and cross-attention from text_embedding.

pkufool · 2022-12-08T07:56:13Z

egs/librispeech/ASR/pruned_transducer_stateless_mbr/model.py

+        N, T, C = embedding.shape
+        mask = torch.randn((N, T, C), device=embedding.device)
+        mask = mask > mask_proportion
+        masked_embedding = torch.masked_fill(embedding, ~mask, 0.0)


I randomly mask the encoder output here.

pkufool · 2022-12-08T07:57:05Z

egs/librispeech/ASR/pruned_transducer_stateless_mbr/model.py

+        )
+        return init_context
+
+    def delta_wer(


This function implements the sampling process.

pkufool · 2022-12-08T07:58:59Z

egs/librispeech/ASR/pruned_transducer_stateless_mbr/train.py

+            + l2_loss_scale * l2_loss
+            + delta_wer_scale * delta_wer_loss
+            + predictor_loss_scale * predictor_loss
+        )


The losses are combined here.

pkufool · 2022-12-08T08:04:35Z

@danpovey @yaozengwei @glynpu Would you please to have a look at this, if there is anything unclear, please let me know. Thanks!

yaozengwei · 2022-12-08T08:39:36Z

Sure. I will have a look.

pkufool added 6 commits September 20, 2022 10:41

Copy files

11d4ca6

Add delta wer training pipeline

344c0e7

Add enhanced embedding module

32b8a02

Add predictor loss

4a57204

refine predictor loss; docs

d73b1e6

Add softmax to sampled_joiner output

cf9607d

pkufool requested a review from yaozengwei December 8, 2022 05:41

Add some comments

9252276

pkufool commented Dec 8, 2022

View reviewed changes

pkufool closed this Nov 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] RNN-T + MBR training. #593

[WIP] RNN-T + MBR training. #593

pkufool commented Sep 29, 2022 •

edited

Loading

pkufool commented Dec 8, 2022

pkufool Dec 8, 2022

pkufool Dec 8, 2022

pkufool Dec 8, 2022

pkufool Dec 8, 2022

pkufool Dec 8, 2022

pkufool commented Dec 8, 2022

yaozengwei commented Dec 8, 2022

[WIP] RNN-T + MBR training. #593

[WIP] RNN-T + MBR training. #593

Conversation

pkufool commented Sep 29, 2022 • edited Loading

pkufool commented Dec 8, 2022

pkufool Dec 8, 2022

Choose a reason for hiding this comment

pkufool Dec 8, 2022

Choose a reason for hiding this comment

pkufool Dec 8, 2022

Choose a reason for hiding this comment

pkufool Dec 8, 2022

Choose a reason for hiding this comment

pkufool Dec 8, 2022

Choose a reason for hiding this comment

pkufool commented Dec 8, 2022

yaozengwei commented Dec 8, 2022

pkufool commented Sep 29, 2022 •

edited

Loading