SciPrompt

The official repository of our EMNLP 2024 Main conference paper: SciPrompt: Knowledge-augmented Prompting for Fine-grainedCategorization of Scientific Topics

This project is developed based on the OpenPrompt framework.

Fine-tuned filtering model: Bi-Encoder Model Cross-Encoder Model

Overall Framework

Installation

To install the necessary Python packages, run the following command:

conda create -n SciPrompt python=3.8.12
pip install -r requirements.txt

Prepare the Required Files and Directories

Replace the placeholder paths in the script with actual paths to your data and configuration files:
- --data_dir should point to your data directory
- --verbalizer_path should point to your arXiv_knowledgable_verbalizer.txt
- --semantic_score_path should point to your arXiv_knowledgable_verbalizer_semantic_search_scores.txt
- --doc_id_path should point to your doc_id.txt
- --config_path should point to config/arxiv_label_mappings.json
Prepare your class label dictionary similar to the .json files in the label_mappings folder

Knowledge Retrieval and Filtering

Run our datasets:
- Step 1: Change paths in run_retrieval.sh and run bash run_retrieval.sh
- Step 2: Change paths of the filtering model, retrieved data (from Step 1), and output files in the run_knowledge_filtering.sh script
- Step 3: Run the filtering script:
```
bash run_knowledge_filtering.sh
```
Run using your own dataset:
- Step 1 and 2 are the same as above
- Step 3: Change your dataset name as custom and corresponding configs into the dataset_configs dictionary in knowledge_filtering.py Line 189
- Run bash run_knowledge_filtering.sh in your terminal

Run the main script:

Execute scripts for each dataset:

bash run_arxiv.sh
bash run_s2orc.sh
bash run_sdpra.sh

Run on your own data (need two input files: one only contains data, one only has labels, as used in arXiv):

bash run_custom_script.sh

Note: Please modify the required data file paths inside each script before running.

Citation Information

@inproceedings{you-etal-2024-sciprompt,
    title = "{S}ci{P}rompt: Knowledge-augmented Prompting for Fine-grained Categorization of Scientific Topics",
    author = "You, Zhiwen  and
      Han, Kanyao  and
      Zhu, Haotian  and
      Ludaescher, Bertram  and
      Diesner, Jana",
    editor = "Al-Onaizan, Yaser  and
      Bansal, Mohit  and
      Chen, Yun-Nung",
    booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.emnlp-main.350",
    pages = "6087--6104",
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SciPrompt

Overall Framework

Installation

Prepare the Required Files and Directories

Knowledge Retrieval and Filtering

Run the main script:

Citation Information

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
OpenPrompt		OpenPrompt
data		data
label_mappings		label_mappings
pics		pics
LICENSE		LICENSE
README.md		README.md
arXiv_script.py		arXiv_script.py
contextualize_calibration.py		contextualize_calibration.py
custom_script.py		custom_script.py
knowledge_filtering.py		knowledge_filtering.py
requirements.txt		requirements.txt
retrieval.py		retrieval.py
retrieval_utils.py		retrieval_utils.py
run_arxiv.sh		run_arxiv.sh
run_custom_script.sh		run_custom_script.sh
run_knowledge_filtering.sh		run_knowledge_filtering.sh
run_retrieval.sh		run_retrieval.sh
run_s2orc.sh		run_s2orc.sh
run_sdpra.sh		run_sdpra.sh
s2orc_script.py		s2orc_script.py
sdpra_script.py		sdpra_script.py

License

zhiwenyou103/SciPrompt

Folders and files

Latest commit

History

Repository files navigation

SciPrompt

Overall Framework

Installation

Prepare the Required Files and Directories

Knowledge Retrieval and Filtering

Run the main script:

Citation Information

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages