About MCRVQ

We attempted to construct a discrete codec model that is more suitable for downstream speech language models. 

Our objective is to include less information in the first channel of the codebook while increasing the missing information on limited channels. We consider that within downstream speech language models, the first-layer quantizer of the Codec model serves as an intermediary module bridging textual input and subsequent quantizers. 

**By judiciously reducing information within the first-layer quantizer, employing text (which inherently carries less information compared to speech) to generate first Codec(codec in the first quantizer) with lower information content can be more easy.**

Therefore, we devised the Masked Channel Residual Vector Quantization (MCRVQ) mechanism, which employs the masking mechanism to restrict the quantizers of the first three channels to learn only the compressed audio frame information in the specified space. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

About MCRVQ #8

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

About MCRVQ #8

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions