The repostiroy enables the generation of Traditional Chinese lyrics. The model behind the repostitory is powered by GPT-2. Given the user provided input pre-text, the model will generate a sheet of lyric.
Input 1:
在這個美麗的時刻
Output 2:
在這個美麗的時刻
是否還留在我心田
每當你齣現
有些事你還不明瞭
我真的愛你你不明瞭
是誰說你不願意
每當我齣現
有些事你還不明瞭
我真的愛你你不明瞭
Input 2:
親愛的
Output 2:
親愛的方嚮
如果再不擁有你
無意義地一個人去漂流
就一路走下去看過去
就這樣我們曾經擁有
從此你就讓我看見一個迴憶
曾經你給我的愛留下的痕跡
讓我愛你
就從今天起 我就是今天
..
我們曾經擁有 你曾經愛過我
也曾有一段沒有離開的日子
但如今你卻不在我身邊
我一定會找迴今天的每一份心情
曾經你給我的愛留下的痕跡
給我的愛留下的痕跡
讓我愛你
曾經你給我的愛留下的痕跡
給我的愛留下的痕跡
隻是我承受瞭傷害
讓我愛你
讓我愛你
把我們的心念 我一輩子保存
就在這裏 從今以後
..
讓我愛你
曾經你給我的愛留下的痕跡
給我的愛讓我 一輩子保存
就在這裏 從今以後
讓我愛你<>
Please install the required Python packages via the following command:
pip3 install -r requirements.txt
-
Download our pre-trained model from here and extract it, and then you will see a model directory
gpt2_zh_lyrics
. -
Run the following command to invoke the inference script. The script runs a infinite loop to ask you to provide the pre-text, and the model will output its generated lyric.
python3 inference.py gpt2_zh_lyrics bert-base-chinese
- The inference script provides options for generation, most described in the article. Please run the command below for details.
python3 inference.py -h
- Create a folder for the lyrics dataset.
mkdir -p lyrcis_dataset
- Download the zip file of lyrics raw data from Chinese-Lyric-Corpus.
wget https://github.com/gaussic/Chinese-Lyric-Corpus/raw/master/Chinese_Lyrics.zip -O lyrcis_dataset/Chinese_Lyrics.zip
- Unzip the zip file.
unzip _lyrcis_dataset/Chinese_Lyrics.zip -d _lyrcis_dataset
- Run the dataset preparation script.
python3 prepare_dataset.py lyrcis_dataset/Chinese_Lyrics
- Run the training script. The script may take hours for processing. (If OOM is occured, please modify the values of
per_device_train_batch_size
andper_device_eval_batch_size
to be less than 4)
python3 train.py \
--model_name_or_path ckiplab/gpt2-base-chinese \
--tokenizer_name bert-base-chinese \
--train_file ./lyrcis_dataset/train.txt \
--validation_file ./lyrcis_dataset/val.txt \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=4 \
--do_train \
--do_eval \
--output_dir test-clm
- After the training is finished, the trained model will be saved in
test-clm
. You can use the model as described in Use the Pre-Trainined Models