Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Hoarseness in Higher-Pitched Female Voices with xtts-v2 after finetune #3774

Open
bensonbs opened this issue Jun 2, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@bensonbs
Copy link

bensonbs commented Jun 2, 2024

Describe the bug

When generating higher-pitched female voices after fine-tuning the xtts-v2 model, there is a noticeable hoarseness, resembling the strain one might experience when trying to reach high musical notes.

abnormal example:
https://mork.ro/NQjFi

normal example:
https://mork.ro/3iZ8Q#

Two voices generated from the same model, using different audio prompts.

To Reproduce

infer

Expected behavior

No response

Logs

No response

Environment

{
    "CUDA": {
        "GPU": [
            "NVIDIA GeForce RTX 4090"
        ],
        "available": true,
        "version": "12.1"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.1.1+cu121",
        "TTS": "0.22.0",
        "numpy": "1.22.0"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            "ELF"
        ],
        "processor": "x86_64",
        "python": "3.10.13",
        "version": "#202310061235~1697396945~22.04~9283e32 SMP PREEMPT_DYNAMIC Sun O"
    }
}

Additional context

No response

@bensonbs bensonbs added the bug Something isn't working label Jun 2, 2024
@ScottishFold007
Copy link

Describe the bug

When generating higher-pitched female voices after fine-tuning the xtts-v2 model, there is a noticeable hoarseness, resembling the strain one might experience when trying to reach high musical notes.

abnormal example: https://mork.ro/NQjFi

normal example: https://mork.ro/3iZ8Q#

Two voices generated from the same model, using different audio prompts.

To Reproduce

infer

Expected behavior

No response

Logs

No response

Environment

{
    "CUDA": {
        "GPU": [
            "NVIDIA GeForce RTX 4090"
        ],
        "available": true,
        "version": "12.1"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.1.1+cu121",
        "TTS": "0.22.0",
        "numpy": "1.22.0"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            "ELF"
        ],
        "processor": "x86_64",
        "python": "3.10.13",
        "version": "#202310061235~1697396945~22.04~9283e32 SMP PREEMPT_DYNAMIC Sun O"
    }
}

Additional context

No response

Describe the bug

When generating higher-pitched female voices after fine-tuning the xtts-v2 model, there is a noticeable hoarseness, resembling the strain one might experience when trying to reach high musical notes.

abnormal example: https://mork.ro/NQjFi

normal example: https://mork.ro/3iZ8Q#

Two voices generated from the same model, using different audio prompts.

To Reproduce

infer

Expected behavior

No response

Logs

No response

Environment

{
    "CUDA": {
        "GPU": [
            "NVIDIA GeForce RTX 4090"
        ],
        "available": true,
        "version": "12.1"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.1.1+cu121",
        "TTS": "0.22.0",
        "numpy": "1.22.0"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            "ELF"
        ],
        "processor": "x86_64",
        "python": "3.10.13",
        "version": "#202310061235~1697396945~22.04~9283e32 SMP PREEMPT_DYNAMIC Sun O"
    }
}

Additional context

No response

I'm experiencing the same thing with 900 hours of Chinese data fine tuning, 40,000 STEP is prone to this. What is your data? Which languages? How many steps?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants