Skip to content

About MCRVQ #8

@jishengpeng

Description

@jishengpeng

We attempted to construct a discrete codec model that is more suitable for downstream speech language models.

Our objective is to include less information in the first channel of the codebook while increasing the missing information on limited channels. We consider that within downstream speech language models, the first-layer quantizer of the Codec model serves as an intermediary module bridging textual input and subsequent quantizers.

By judiciously reducing information within the first-layer quantizer, employing text (which inherently carries less information compared to speech) to generate first Codec(codec in the first quantizer) with lower information content can be more easy.

Therefore, we devised the Masked Channel Residual Vector Quantization (MCRVQ) mechanism, which employs the masking mechanism to restrict the quantizers of the first three channels to learn only the compressed audio frame information in the specified space.

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentationhelp wantedExtra attention is needed

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions