Skip to content

Latest commit

 

History

History
61 lines (41 loc) · 2.08 KB

README.md

File metadata and controls

61 lines (41 loc) · 2.08 KB

Zero-Shot citation generation for Large Language Models in academic papers

In small context settings (currently 5 papers of 4 pages each) -- can we make llms provide answers with citations at inferecnce?

Motivation

  • Nothing like this has been done before. [Please do send material if there is.]

To Run (using Ollama)

  1. Follow this Gist to setup Ollama locally.

  2. Run all on ollama.ipynb

Basic Methodology Summary:

Dataset:

  1. Get Latex Zipz
  2. Unzip and Make them into indiviual folders.

Preprocessing:

  1. Get all files from the folder.
  2. Remove all non .tex files.
  3. Remove all the latex commands.
  4. Merge all files into one and then remove all subdirectories. - (Optional) Here we have cut the papers a 1000 characters each from back and front.
  5. Make a dictionary of Paper Name (taken from folder) [Key] and Paper Content [Value]

Inference 1: (Single Paper Wise)

  1. With The prompt, inference it for each paper
  2. Collect those responses in a nice way

Inference 2: (All relevant paper wise)

  1. Create the prompt based on output of inference 1.
  2. Ask it for the answer to your question summed up so it can account for multiple papers having answers to the same thing.

Inference 3: (Follow up! Asking Quetions on the collected Inference)

  1. Make a prompt based on the above collected material and run inference

Dataset Creation

  1. Unzip all the .tar files into \dataset. It should look like:
├── Paper 1 Folder containing all files.
├── Paper 2 Folder containing all files.
├── ... Paper N ...

That's it!

To-Do's (and Limitations)

  • [] (Limitation) Test for more number of papers
  • [] Figure out a way to generate a nice baseline?
  • [] If we scale and fail, convert this in to a few-shot problem?
  • [] (Limitation_ Only works for tex downloads from arxiv. How to solve that?

Acknowledgement.

HUGE THANKS TO CEREBRAS AI for giving us free credits on their inference accelarator platform.