SCAIN extraction and analysis

Intsallation

Install the packages via the following command:

$ pipenv install

Setup

Download and extract dataset

Download and extract dataset via the following command:

$ pipenv run setup

Set OpenAI API key

Set your OpenAI API key as the environment variable:

$ export OPENAI_API_KEY=<your OpenAI API key>

Extraction

Extract SCAINs via the following command:

$ pipenv run extraction

Set values of DIALOGUE_START and DIALOGUE_END properly to avoid model overload.

Analysis

Run analysis_extraction.ipynb and analysis_survey.ipynb to analyze the results.

Latex package

Download udline.sty from here to generate LaTeX source with underline.

Dataset

You can download SCAINs dataset from here.

crowd_dataset_omit
- links
  - links of Google Forms to collect explanations of core statements
- datasetXXX (XXX ranges from 001 to 048)
  - filename
    - filename of dialogue from JPersonaChat
  - turn
    - number of turns where core statement is located
  - omitted_dialogue
    - dialogue without candidate statements
  - core_sentence
    - the last statement of the omitted dialogue
  - similarity
    - similarity score between complete and omitted dialogues
response_omitted
- responseXX (XX ranges from 1 to 48)
  - A sheet contains questions and responses of the corresponding Google Form in Japanese.
  - The responses have not been cleaned.
  - The data do not include usernames of the workers.
crowd_dataset_full
- links
  - links of Google Forms to collect correctness ratings of explanations
- datasetXXX (XXX ranges from 001 to 050)
  - filename
    - filename of dialogue from JPersonaChat
  - turn
    - number of turns where core statement is located
  - complete_dialogue
    - dialogue with all statements, including ones after core statement
  - core_sentence
    - the statement to be explained
  - similarity
    - similarity score between complete and omitted dialogues
  - nor
    - number of explantions for the core statement
  - resX (X ranges from 0 to [nor - 1])
    - explanation of the core statement
response_full_cleaned
- responseX (X ranges from 1 to 50)
  - A sheet contains questions and responses of the corresponding Google Form in Japanese.
  - The responses has been cleaned.
    - Responses from workers who participated in the explanation task have been deleted.
    - If the same worker had responded to the same form multiple times, only the first response was adopted.
  - The data do not include usernames of the workers.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
dat		dat
forms		forms
log		log
results		results
src		src
tex		tex
tmp		tmp
.gitignore		.gitignore
.python-version		.python-version
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
analysis_assist.ipynb		analysis_assist.ipynb
analysis_between.ipynb		analysis_between.ipynb
analysis_dataset.ipynb		analysis_dataset.ipynb
analysis_extraction.ipynb		analysis_extraction.ipynb
analysis_survey.ipynb		analysis_survey.ipynb
analysis_topic.ipynb		analysis_topic.ipynb
assistant_tex.ipynb		assistant_tex.ipynb
create_dataset_form.ipynb		create_dataset_form.ipynb
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SCAIN extraction and analysis

Intsallation

Setup

Download and extract dataset

Set OpenAI API key

Extraction

Analysis

Latex package

Dataset

About

Releases

Packages

Contributors 2

Languages

imai-laboratory/summary_scain

Folders and files

Latest commit

History

Repository files navigation

SCAIN extraction and analysis

Intsallation

Setup

Download and extract dataset

Set OpenAI API key

Extraction

Analysis

Latex package

Dataset

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages