Kiscovery

Exploring and Verbalizing Academic Ideas by Concept Co-occurrence

This is the official repository for the paper "Exploring and Verbalizing Academic Ideas by Concept Co-occurrence" by Yi Xu, Shuqian Sheng, Bo Xue, Luoyi Fu, Xinbing Wang, and Chenghu Zhou. The paper proposes a framework based on concept co-occurrence for academic idea inspiration, which has been integrated into a research assistant system DeepReport. The system can assist researchers in expediting the process of discovering new ideas.

DeepReport can be accessed at idea.acemap.cn.

DeepReport Manual

Statistics

The evolving concept co-occurrence graph and co-occurrence citation quintuple datasets used in this study are available for download in Huggingface.

Evolving Concept Co-occurrence Graph

To train our model for temporal link prediction, we first collect 240 essential and common queries from 19 disciplines and one special topic (COVID-19). Then, we enter these queries into the paper database to fetch the most relevant papers between 2000 and 2021 with Elasticsearch, a modern text retrieval engine that stores and retrieves papers. Afterward, we use information extraction tools including AutoPhrase to identify concepts. Only high-quality concepts that appear in our database will be preserved. Finally, we construct 240 evolving concept co-occurrence graphs, each containing 22 snapshots according to the co-occurrence relationship. The statistics of the concept co-occurrence graphs are provided in Appendix I.

Download from Huggingface

Download with git, and you should install git-lfs first

sudo apt-get install git-lfs
# OR
brew install git-lfs

git lfs install
git clone https://huggingface.co/datasets/Reacubeth/ConceptGraph

Co-occurrence Citation Quintuple

We construct and release a dataset of co-occurrence citation quintuples, which is used to train text generation model for idea verbalization. The process of identifying and processing concepts is similar to constructing the concept co-occurrence graph. Heuristic rules are adopted to filter redundant and noisy sentences, further improving the quality of the quintuples used for idea generation. More details of co-occurrence citation quintuples can be found in Appendix B, C, and J.

In mid-2023, our DeepReport system underwent a major update, encompassing both data and model improvements. On the data front, we introduced a new version of the quintuple data (V202306), resulting in enhanced quality and a larger-scale dataset. The statistical summary of the new quintuple data (V202306) is presented as follows:

Discipline	Quintuple	Concept	Concept Pair	Total $p$	Total $p_1$ & $p_2$
Art	7,510	2,671	5,845	2,770	7,060
History	5,287	2,198	4,654	2,348	5,764
Philosophy	45,752	4,773	25,935	16,896	29,942
Sociology	16,017	4,054	12,796	7,066	16,416
Political Science	67,975	6,105	42,411	26,198	53,933
Business	205,297	9,608	99,329	62,332	112,736
Geography	191,958	12,029	118,563	42,317	112,909
Engineering	506,635	16,992	249,935	137,164	273,894
Geology	365,183	13,795	190,002	98,991	222,358
Medicine	168,697	13,014	114,104	42,535	138,973
Economics	227,530	9,461	113,527	68,607	131,387
Physics	267,532	10,831	133,079	84,824	176,741
Biology	224,722	15,119	145,088	59,210	189,281
Mathematics	312,670	17,751	190,734	95,951	218,697
Psychology	476,342	9,512	194,038	115,725	212,180
Computer Science	531,654	16,591	244,567	151,809	238,091
Environmental Science	583,466	11,002	226,671	94,474	201,330
Materials Science	573,032	17,098	249,251	145,068	313,657
Chemistry	565,307	13,858	231,062	108,637	286,593
Total	5,342,566	206,462	2,591,591	1,362,922	2,941,942

Download from Huggingface

Download with git

sudo apt-get install git-lfs
# OR
brew install git-lfs

git lfs install
git clone https://huggingface.co/datasets/Reacubeth/Co-occurrenceCitationQuintuple

Ethics Statement

The datasets used in our research are collected through open-source approaches. The whole process is conducted legally, following ethical requirements. As for the Turing Test in our study, all participants are well informed about the purpose of experiments and the usage of test data, and we would not leak out or invade their privacy.

We see opportunities for researchers to apply the system to idea discovery, especially for interdisciplinary jobs. We encourage users to explore different combinations of subjects with the help of our system, making the most of its knowledge storage and thus maximizing the exploration ability of the system.

The main focus of the system is to provide a possible direction for future research, but the effect of human researchers will never be neglected.

The massive data from various disciplines behind the system makes it capable of viewing the knowledge of an area in a multi-dimensional perspective and thus helps promote the development of novel interdisciplinary. However, considering the risks of misinformation generated by NLP tools, the verbalization only contains possible insights into new ideas. Researchers must thoroughly consider whether an idea is feasible or leads to adverse societal effects.

Acknowledgements

We would like to express our deepest gratitude to the scientists involved in the Deep-time Digital Earth program, whose contributions have been incredibly valuable. Additionally, we extend our thanks to Zhongmou He, Jia Guo, Zijun Di, Pandong Ma, Shengling Zhu, Yanpeng Li, Qi Li, Jiaxin Ding and Tao Shi from the IIOT Research Center at Shanghai Jiao Tong University for their unwavering support during the development of our system. This work was supported by NSF China (No.42050105, 62020106005, 62061146002, 61960206002), Shanghai Pilot Program for Basic Research - Shanghai Jiao Tong University.

Citation

If you use our work in your research or publication, please cite us as follows:

@inproceedings{xu2023exploring,
  title={Exploring and Verbalizing Academic Ideas by Concept Co-occurrence},
  author={Xu, Yi and Sheng, Shuqian and Xue, Bo and Fu, Luoyi and Wang, Xinbing and Zhou, Chenghu},
  booktitle={Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL)},
  year={2023}
}

Please let us know if you have any questions or feedback. Thank you for your interest in our work!

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
DeepReportManual.V.202308.pdf		DeepReportManual.V.202308.pdf
LICENSE		LICENSE
Quintuple.png		Quintuple.png
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kiscovery

Statistics

Evolving Concept Co-occurrence Graph

Co-occurrence Citation Quintuple

Ethics Statement

Acknowledgements

Citation

About

Releases

Packages

License

xyjigsaw/Kiscovery

Folders and files

Latest commit

History

Repository files navigation

Kiscovery

Statistics

Evolving Concept Co-occurrence Graph

Co-occurrence Citation Quintuple

Ethics Statement

Acknowledgements

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages