[Bug] Hoarseness in Higher-Pitched Female Voices with xtts-v2 after finetune #3774

bensonbs · 2024-06-02T17:51:55Z

Describe the bug

When generating higher-pitched female voices after fine-tuning the xtts-v2 model, there is a noticeable hoarseness, resembling the strain one might experience when trying to reach high musical notes.

abnormal example:
https://mork.ro/NQjFi

normal example:
https://mork.ro/3iZ8Q#

Two voices generated from the same model, using different audio prompts.

To Reproduce

infer

Expected behavior

No response

Logs

No response

Environment

{
    "CUDA": {
        "GPU": [
            "NVIDIA GeForce RTX 4090"
        ],
        "available": true,
        "version": "12.1"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.1.1+cu121",
        "TTS": "0.22.0",
        "numpy": "1.22.0"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            "ELF"
        ],
        "processor": "x86_64",
        "python": "3.10.13",
        "version": "#202310061235~1697396945~22.04~9283e32 SMP PREEMPT_DYNAMIC Sun O"
    }
}

Additional context

No response

ScottishFold007 · 2024-06-12T15:32:54Z

Describe the bug

When generating higher-pitched female voices after fine-tuning the xtts-v2 model, there is a noticeable hoarseness, resembling the strain one might experience when trying to reach high musical notes.

abnormal example: https://mork.ro/NQjFi

normal example: https://mork.ro/3iZ8Q#

Two voices generated from the same model, using different audio prompts.

To Reproduce

infer

Expected behavior

No response

Logs

No response

Environment
{
    "CUDA": {
        "GPU": [
            "NVIDIA GeForce RTX 4090"
        ],
        "available": true,
        "version": "12.1"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.1.1+cu121",
        "TTS": "0.22.0",
        "numpy": "1.22.0"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            "ELF"
        ],
        "processor": "x86_64",
        "python": "3.10.13",
        "version": "#202310061235~1697396945~22.04~9283e32 SMP PREEMPT_DYNAMIC Sun O"
    }
}
Additional context

No response

I'm experiencing the same thing with 900 hours of Chinese data fine tuning, 40,000 STEP is prone to this. What is your data? Which languages? How many steps?

bensonbs added the bug Something isn't working label Jun 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Hoarseness in Higher-Pitched Female Voices with xtts-v2 after finetune #3774

[Bug] Hoarseness in Higher-Pitched Female Voices with xtts-v2 after finetune #3774

bensonbs commented Jun 2, 2024

ScottishFold007 commented Jun 12, 2024

Describe the bug

To Reproduce

Expected behavior

Logs

Environment

Additional context

Describe the bug

To Reproduce

Expected behavior

Logs

Environment

Additional context

[Bug] Hoarseness in Higher-Pitched Female Voices with xtts-v2 after finetune #3774

[Bug] Hoarseness in Higher-Pitched Female Voices with xtts-v2 after finetune #3774

Comments

bensonbs commented Jun 2, 2024

Describe the bug

To Reproduce

Expected behavior

Logs

Environment

Additional context

ScottishFold007 commented Jun 12, 2024

Describe the bug

To Reproduce

Expected behavior

Logs

Environment

Additional context

Describe the bug

To Reproduce

Expected behavior

Logs

Environment

Additional context