Traditional Chinese Lyrics Generation by GPT-2

The repostiroy enables the generation of Traditional Chinese lyrics. The model behind the repostitory is powered by GPT-2. Given the user provided input pre-text, the model will generate a sheet of lyric.

Samples

Input 1:

在這個美麗的時刻

Output 2:

在這個美麗的時刻
是否還留在我心田
每當你齣現
有些事你還不明瞭
我真的愛你你不明瞭
是誰說你不願意
每當我齣現
有些事你還不明瞭
我真的愛你你不明瞭

Input 2:

親愛的

Output 2:

親愛的方嚮
如果再不擁有你
無意義地一個人去漂流
就一路走下去看過去
就這樣我們曾經擁有
從此你就讓我看見一個迴憶
曾經你給我的愛留下的痕跡

讓我愛你

就從今天起  我就是今天
..

我們曾經擁有  你曾經愛過我
也曾有一段沒有離開的日子
但如今你卻不在我身邊
我一定會找迴今天的每一份心情


曾經你給我的愛留下的痕跡
給我的愛留下的痕跡

讓我愛你


曾經你給我的愛留下的痕跡
給我的愛留下的痕跡
隻是我承受瞭傷害

讓我愛你

讓我愛你

把我們的心念  我一輩子保存
就在這裏  從今以後

..
讓我愛你

曾經你給我的愛留下的痕跡

給我的愛讓我 一輩子保存
就在這裏  從今以後
讓我愛你<>

Prerequisites

Please install the required Python packages via the following command:

pip3 install -r requirements.txt

Use the Pre-Trainined Models

Download our pre-trained model from here and extract it, and then you will see a model directory gpt2_zh_lyrics.
Run the following command to invoke the inference script. The script runs a infinite loop to ask you to provide the pre-text, and the model will output its generated lyric.

python3 inference.py gpt2_zh_lyrics bert-base-chinese

The inference script provides options for generation, most described in the article. Please run the command below for details.

python3 inference.py -h

Steps to Run the Training Process

Create a folder for the lyrics dataset.

mkdir -p lyrcis_dataset

Download the zip file of lyrics raw data from Chinese-Lyric-Corpus.

wget https://github.com/gaussic/Chinese-Lyric-Corpus/raw/master/Chinese_Lyrics.zip -O lyrcis_dataset/Chinese_Lyrics.zip

Unzip the zip file.

unzip _lyrcis_dataset/Chinese_Lyrics.zip -d _lyrcis_dataset

Run the dataset preparation script.

python3 prepare_dataset.py lyrcis_dataset/Chinese_Lyrics

Run the training script. The script may take hours for processing. (If OOM is occured, please modify the values of per_device_train_batch_size and per_device_eval_batch_size to be less than 4)

python3 train.py \
  --model_name_or_path ckiplab/gpt2-base-chinese \
  --tokenizer_name bert-base-chinese \
  --train_file ./lyrcis_dataset/train.txt \
  --validation_file ./lyrcis_dataset/val.txt \
  --per_device_train_batch_size=4 \
  --per_device_eval_batch_size=4 \
  --do_train \
  --do_eval \
  --output_dir test-clm

After the training is finished, the trained model will be saved in test-clm. You can use the model as described in Use the Pre-Trainined Models

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
inference.py		inference.py
prepare_dataset.py		prepare_dataset.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Traditional Chinese Lyrics Generation by GPT-2

Samples

Prerequisites

Use the Pre-Trainined Models

Steps to Run the Training Process

About

Releases

Packages

Languages

License

SuJiaKuan/gpt2_zh_lyrics

Folders and files

Latest commit

History

Repository files navigation

Traditional Chinese Lyrics Generation by GPT-2

Samples

Prerequisites

Use the Pre-Trainined Models

Steps to Run the Training Process

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages