CoCoNote

This is the public repository for our paper "Contextualized Data-Wrangling Code Generation in Computational Notebooks" accepted by ASE 2024. We provide (1) CoCoMine, and (2) CoCoNote dataset with evaluation scripts in this repo.

1. Repo Structure

CoCoNote
├── README.md
├── CoCoMine
├── dataset
├── notebooks
├── evaluation
├── evaluation_codebleu
└── evaluation_example

CoCoMine: the tool to identify data-wrangling code generation examples.

dataset: the CoCoNote dataset.

notebooks: the notebooks used in the testset to execute.

evaluation: the execution evaluation scripts of the CoCoNote testset, including include generated code, execution and evaluation.

evaluation_codebleu: the surface-form evaluation scripts of the CoCoNote testset.

evaluation_example: an input file example for evaluation.

2. CoCoNote Dataset Usage

2.0 Data Download

Download the data from Zenodo. We provide the train/dev/test data and notebooks for execution evaluation in Zenodo.
Unzip and move the notebooks to ./notebooks/ and train/dev/test data to ./dataset/

You can use the scripts: bash download_and_prepare_data.sh for this process.

2.1 Execution Environment

You can use the following scripts to install the requirements for execution evaluation:

conda create -n CoCoNote python=3.6.13
conda activate CoCoNote
cd ./evaluation
pip install -r requirements_eval.txt

You can also use the following docker image to initialize execution environment:

[TODO]

2.2 Execution Evaluation

We provide the execution evaluation scripts of the CoCoNote testset in evaluation folder. You can first (1) use your code generation model to generate code for the testset, and then (2) use the following code to evaluate the generated code:

cd ./evaluation
python evaluate.py \ 
    --do_create_notebook \
    --do_run \
    --do_evaluate \
    --path_generation {EvaluationCodeFile} \
    --path_save_notebooks {SaveDir}

--generation_file: the path of the generated code file. We provide a sample file at ../evaluation_example/test_1654_gpt35.json, which is generated by GPT 3.5-Turbo.

--path_save_notebooks: the directory to save the generated notebooks.

2.3 Surface-form Evaluation

We provide the surface-form evaluation scripts of the CoCoNote testset in evaluation_codebleu folder. You can use the following code to evaluate the generated code:

cd ./evaluation_codebleu
python evaluate.py --generation_path {EvaluationCodeFile}

--generation_file: the path of the generated code file, which is in the same format as in 2.3 Evaluation (Execution Evaluation). We provide a sample file at ../evaluation_example/test_1654_gpt35.json, which is generated by GPT 3.5-Turbo.

3. CoCoMine

3.1 Requirements

You can use the following scripts to install the requirements for CoCoMine:

conda create -n CoCoMine python=3.8
conda activate CoCoMine
cd ./CoCoMine
pip install -r requirements.txt

3.2 Extract Code Generation Examples

We provide two notebook examples (./raw_notebooks) here to show the functionality of CoCoMine.

# Extract data-wrangling code cells from raw notebooks
cd ./CoCoMine
python main_cocomine.py

Due to the space limit, we do not show all raw notebooks in this repo.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CoCoNote

1. Repo Structure

2. CoCoNote Dataset Usage

2.0 Data Download

2.1 Execution Environment

2.2 Execution Evaluation

2.3 Surface-form Evaluation

3. CoCoMine

3.1 Requirements

3.2 Extract Code Generation Examples

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
CoCoMine		CoCoMine
evaluation		evaluation
evaluation_codebleu		evaluation_codebleu
evaluation_example		evaluation_example
raw_notebooks		raw_notebooks
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md

Jun-jie-Huang/CoCoNote

Folders and files

Latest commit

History

Repository files navigation

CoCoNote

1. Repo Structure

2. CoCoNote Dataset Usage

2.0 Data Download

2.1 Execution Environment

2.2 Execution Evaluation

2.3 Surface-form Evaluation

3. CoCoMine

3.1 Requirements

3.2 Extract Code Generation Examples

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages