JJQA: a Chinese QA dataset on the lyrics of JJ Lin's songs

PDF here.

JJ Lin

JJQA is now available in 🤗Huggingface. (click here)

NOTE: This is a sub-project of Open-Source-AI-Research, an experimental non-profit project focusing on AI research in an open source mode.

JJQA is under construction. Welcome to contribute.

Large Language Models (LLMs) have shown powerful capability of text understanding, analysis and generation. It seems a good tool for text-style knowledge based question answering (QA) where semantically retrieving related texts, understanding them and generating correct answers are required.

However, many feasible QA datasets are not challenging enough. First, given text-style knowledge might be easy to perceive and analyse. Second, the questions & answers follow commonsense. Thus, LLMs may benefit from training of language modeling and even take a shortcut. In this case, we want to build a new text-style knowledge based logical QA dataset where the text-style knowledge is tricky and LLMs are not likely to give correct answers without successfully retrieving and reasoning related texts.

Chinese is a language where each single character could contain abundant meanings while just a few words, especially pieces of lyrics, are able to express complex conceptions, feelings and impressions. Besides, Junjie Lin, known as JJ Lin, is a famous Singaporean Mandarin singer. The lyrics of his songs are always imaginative, poetic and romantic.

Hense, we propose JJQA, a Chinese text-style knowledge based question answering dataset on the lyrics of JJ Lin's songs, where related lyrics are provided as text-style knowledge for retrieval while the questions and answers are based on the lyrics. The Q&As are always abstract and follow anti-commonsense. For example, according to the related lyrics of a song called "爱情Yogurt", the question is "热量有什么作用？" ("What is the impact of heat?") and the answer is "降低爱情的过敏反应。" ("Ease the anaphylaxis of love"). It is indeed ridiculous and funny (you could find more in the dataset)🤪. Even human beings could not give the right answer without knowing the related lyrics. In addition, LLMs are not likely to naturally generate the right answer with the capability from training. Therefore, only if the related lyrics are retrieved and understood, are the right answers possibly generated By LLMs.

How to Start
Repository Structure
Dataset Details
Baselines
Discussion
How to Contribute
Update Logs
Cite
Inspired work

How to Start

As shown in start.ipynb, you could load JJQA from local repository files or huggingface online.

from datasets import load_dataset
import json

# # load from local repository files
# qas = load_dataset("../JJQA","qa")["train"]
# songs = load_dataset("../JJQA","song")["train"]
# song_index=json.loads(load_dataset("hobeter/JJQA","song_index")["train"]["dic"][0])[0]

# load from huggingface online
qas = load_dataset("hobeter/JJQA","qa")["train"]
songs = load_dataset("hobeter/JJQA","song")["train"]
song_index=json.loads(load_dataset("hobeter/JJQA","song_index")["train"]["dic"][0])[0]

Repository Structure

README.md: the readme file of this repository
./dataset: scripts and results for building JJQA
- 1_get_data.py: to crawl music data => (song_info.json)
- 2_clean_data.py: to automatically clean data => (cleaned_song_info.json)
- 3_label.py: to add/del/edit Q&As with a annotation GUI tool => (q_a_dic.json, q_a_song_dic.json)
- 4_2HF.py: to construct dataset files=> (hf_q_a.json, hf_song.json, hf_song_indx.json)
- Ui_label.ui: the GUI file for QtDesigner
- Ui_label.py: the python script complied from Ui_label.ui
- geckodriver.exe: the firefox driver for selenium
./JJQA: local JJQA dataset in the huggingface datasets format
- hf_q_a.json, hf_song.json, hf_song_indx.json: copies from ../dataset
- JJQA.py: the dataset loading script
- README.md: the readme file of JJQA in huggingface
./baseline: scripts and results for baselines
- 1_baseline.ipynb: to run a baseline => ({model}_{mode}_dic.json)
- 2_get_bertscore.ipynb: to get metrics (Precision, Recall, F1) from baseline results => ({model}_{mode}_bertscore.npz)
JJL.jpg: a picture of JJ Lin
start.ipynb: an example to easily load JJQA
LICENSE: the license file of this repository

Annotation_GUI_tool

Dataset Details

JJQA is now available in 🤗Huggingface. (click here)

According to QQMusicSpider, we crawled lyrics of all songs of JJ Lin from QQMusic. After data cleaning and label annotation, 648 Q&As with 181 related song lyrics are included.

Three fields ("qa", "song", "song_index") are included in JJQA.

"qa" contains Q&As with 6 features. "q" and "a" are a question and the corresponding answer. "song_title" and "song_id" are the title and the corresponding id of the related song. "id" is the id for the Q&A. "rf" locates the lines of lyrics for reference, splited by a space " ".

"song" contains information of songs with 4 features. "title" and "name" are the title and the corresponding name of the song. "id" is the id of the song. "lyric" is the lyrics of the song, where each line is splited by "\n".

"song_index" contains one dictionary, whose keys are the ids of songs and values are indexes of the corresponding song in "song" field, to align QAs with the corresponding songs.

Baselines

We evaluate three baseline methods on JJQA. The first one (wo_info) is to "ask" the question directly without any additional lyric, which is to show the performance of uninformed LLMs; the second one (w_song) is to include whole lyrics of the related song as in-contexts; the third one (w_rf) is to just include related lyrics. w_song and w_rf are two reference lines for retrieval-based method.

Six feasible LLMs (ernie-turbo, chatglm2_6b_32k, qwen-turbo, baichuan2-7b-chat-v1, gpt-4, gpt-3.5-turbo) are included. We apply ernie-turbo and chatglm2_6b_32k in qianfan platform; qwen-turbo and baichuan2-7b-chat-v1 in dashscope platform; gpt-4 and gpt-3.5-turbo in openai platform.

We consider BERTScore with rescale_with_baseline=True as the metric.

The results are as follows.

LLM	Method	Precision	Recall	F1	Date
ernie-turbo	wo_info	-0.0350	0.1568	0.0511	2023/11/06
ernie-turbo	w_song	0.2472	0.5765	0.3895	2023/11/06
ernie-turbo	w_rf	0.3600	0.6528	0.4864	2023/11/06
chatglm2_6b_32k	wo_info	0.0466	0.1787	0.1066	2023/11/05
chatglm2_6b_32k	w_song	0.2361	0.4606	0.3335	2023/11/05
chatglm2_6b_32k	w_rf	0.4650	0.6477	0.5436	2023/11/05
qwen-turbo	wo_info	0.2331	0.2150	0.2208	2023/11/05
qwen-turbo	w_song	0.7673	0.8041	0.7804	2023/11/05
qwen-turbo	w_rf	0.8600	0.8251	0.8386	2023/11/05
baichuan2-7b-chat-v1	wo_info	0.1755	0.2012	0.1857	2023/11/05
baichuan2-7b-chat-v1	w_song	0.4635	0.6324	0.5371	2023/11/05
baichuan2-7b-chat-v1	w_rf	0.6567	0.7272	0.6851	2023/11/05
gpt-3.5-turbo	wo_info	0.2201	0.1983	0.2061	2023/11/06
gpt-3.5-turbo	w_song	0.8031	0.7812	0.7884	2023/11/06
gpt-3.5-turbo	w_rf	0.8110	0.7484	0.7758	2023/11/06
gpt-4	wo_info	0.2426	0.2377	0.2376	2023/11/06
gpt-4	w_song	0.8405	0.8587	0.8464	2023/11/06
gpt-4	w_rf	0.8865	0.8643	0.8732	2023/11/06
gpt-4-1106-preview	without_info	0.2345	0.2061	0.2179	2023/11/09
gpt-4-1106-preview	with_whole_song	0.8411	0.8117	0.8231	2023/11/09
gpt-4-1106-preview	with_rf	0.8230	0.7678	0.7921	2023/11/09

It is worth noting that Date stands for the time (UTC+8) for evaluation. In addition, a small number of samples are not feasible in the dashscope platform because of its safety system. We just skip these Q&As. (1 sample for qwen-turbo wo_info; 3 samples for qwen-turbo w_song; 3 samples for baichuan2-7b-chat-v1 w_song)

Discussion

We propose JJQA to evaluate the capability of a LLM on semantical retrieval and answer generation following contexts in Chinese. Further work could focus on how to implement effective semantical song/line-wise retrieval or enhanced answer generation. In addition, JJQA could be updated with Q&As of better quality or explanded to a larger dataset with Q&As on lyrics of different singers and even in different languages.

How to Contribute

First of all, please read the contribution terms carefully in Open-Source-AI-Research.

Second, fork this repository, make some improvements, add a record in Update Logs and just pull a request, which also means that you have already accepted the terms by default.

Thanks for your contribution!!!

Update Logs

2023_11_13 - bebetterest - [email protected]

DONE: add an evaluation result for gpt-4-turbo (gpt-4-1106-preview); implement openai assistants API as a baseline, but the full result is not available because it's expensive; add open-source scripts&results on building JJQA and evaluating baselines; add introductions on the repository structure and how to start.

TODO: update Section Cite, update the results on the assistants API baseline; polish readme; rereview and update the dataset; explore some retrieval methods or generation enhancement methods...

2023_11_06 - bebetterest - [email protected]

DONE: initialized the first version of JJQA and baseline performance.

TODO: polish readme; add details on the annotation tool; open source related codes; rereview and update the dataset; explore some retrieval methods or generation enhancement methods...

Cite

Please cite this if your work is motivated from it.

@misc{JJQA,
  title = {JJQA: a Chinese QA dataset on the lyrics of JJ Lin's songs},
  author = {O.S.R.},
  howpublished = {\url{https://www.liyujian.cn/upload/JJQA.pdf}},
}

Inspired work

To be added...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JJQA: a Chinese QA dataset on the lyrics of JJ Lin's songs

How to Start

Repository Structure

Dataset Details

Baselines

Discussion

How to Contribute

Update Logs

Cite

Inspired work

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
JJQA		JJQA
baseline		baseline
dataset		dataset
GUI_tool.png		GUI_tool.png
JJL.jpg		JJL.jpg
LICENSE		LICENSE
README.md		README.md
start.ipynb		start.ipynb

License

bebetterest/JJQA

Folders and files

Latest commit

History

Repository files navigation

JJQA: a Chinese QA dataset on the lyrics of JJ Lin's songs

How to Start

Repository Structure

Dataset Details

Baselines

Discussion

How to Contribute

Update Logs

Cite

Inspired work

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages