bestdori-voice-extractor

This is a tool to generate voice datasets with labeled lines of characters from the game BanG Dream Girls Band Party. It is initially designed to work with GPT-SOVITS.

The tool downloads voice and asset files from Bestdori, and then generates a zip file containing the voice .mp3 files along with a corresponding .list file.

Usage

poetry install to install dependencies.
Change necessary settings in bestdori_voice_extractor/config.py.
poetry run python __main__.py to run the program.

`main.py`

The program contains 5 steps:

# Step 1: Download assets
asset_downloader = AssetDownloader("assets")
asset_downloader.run()

# Step 2: Download voices
voice_downloader = VoiceDownloader("voices")
voice_downloader.run()

# Step 3: Analyze assets
asset_analyzer = AssetAnalyzer()
asset_analyzer.run("assets", "asset.json")

# Step 4: Analyze voices
voice_analyzer = VoiceAnalyzer()
voice_analyzer.run("voices", "voice.json")

# Step 5: Merge assets and voices
# Take MyGO!!!!! as an example
for i in range(36, 41):
    asset_merger = AssetMerger(str(i), "asset.json", "voice.json")
    asset_merger.merge()

AssetDownloader will try to download all .asset files from Bestdori. These files are basically game scripts in JSON format.
VoiceDownloader will try to download all .mp3 voice files from Bestdori. This might take a long time, as there are nearly 14 GB of voice files.

Once you have all the assets and voices, you can use analyzers to generate .json files containing the information of assets and voices.

AssetAnalyzer will analyze all local .asset files and generate a .json file as follows:

{
    "2": [
        {
            "text": "こんなところに、喫茶店なんてあったんだね",
            "voice_file": "area0001-001"
        },
        ...
    ],
    ...
}

"2" is the character ID, and the list contains all the lines of the character. "voice_file" is the name of the voice file without the extension, which could be used to locate the voice file with the .json generated by VoiceAnalyzer.

VoiceAnalyzer will analyze all local .mp3 voice files and generate a .json file which is a dictionary with the voice file name as the key and a relative path to the voice file as the value:

{
    "TalkSet1-1": "voices\\sound\\voice\\backstage\\talkset1-10\\TalkSet1-1.mp3",
    "TalkSet1-2": "voices\\sound\\voice\\backstage\\talkset1-10\\TalkSet1-2.mp3",
    ...
}

Finally, you can use AssetMerger to pack a .zip file of all the voice files from a certain character, along with a .list file containing the character ID, locale, and the text of the line:

<filename>|<chara-id>|<locale>|<text>

For example:

area5518-002.mp3|37|jp|どうしたの？　ポスターじっと見て

Contributing

Pull requests are welcome.

Planned Features

As for now you have to download all voice files from Bestdori before specifying the character ID to merge. It would be nice to have a feature to download voice files of a certain character only.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
bestdori_voice_extractor		bestdori_voice_extractor
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__main__.py		__main__.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bestdori-voice-extractor

Usage

`main.py`

Contributing

Planned Features

License

About

Releases

Languages

License

zyf722/bestdori-voice-extractor

Folders and files

Latest commit

History

Repository files navigation

bestdori-voice-extractor

Usage

__main__.py

Contributing

Planned Features

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Languages

`main.py`