TestEval

Dataset and benchmark for paper "TESTEVAL: Benchmarking Large Language Models for Test Case Generation".

Data

Dataset description

category	data
total programs under test	210
total target lines	1340
total target branches	983
total target paths	854

Metadata

field name	data type	description
task_num	int	Problem id in LeetCode
task_title	string	LeetCode problem title
difficulty	int	LeetCode problem difficulty: from 0 (easy) to 2 (hard)
func_name	string	Default function name for the solution
description	string	LeetCode problem description
python_solution	string	LeetCode problem solution in Python (the program under test)
blocks	list	The list for target branches
target_lines	list	The list for target lines
python_solution_instrumented	string	Add instrumentations to python_solution for recording execution paths
sampled_paths	list	The list of target paths, the format is the same as the execution paths collected from python_solution_instrumented
sampled_condition_paths	list	The list of target paths, used in prompts

Quick Start

Requirements: Python>=3.10

Install dependencies

pip install -r requirements.txt

Create folder to store generated tests

mkdir predictions

Set environment variables

OPENAI_API_KEY: openai api key
GOOGLE_API_KEY: gemini api key
HUGGINGFACE_TOKEN: huggingface access token

Run experiments: overall coverage

python generate_cov_{openai/gemini/hf}.py --model {model_name} --num_tests 20  #generate raw test cases
python format.py --mode overall --path {path_to_generated_tests}  #reformat test cases
python eval_overall.py --path {path_to_formatted_generated_tests}  #evaluate correctness and coverage metrics

Run experiments: targeted line coverage

python generate_targetcov_{openai/gemini/hf}.py --covmode line --model {model_name} #generate raw test cases
python format.py --mode line --path {path_to_generated_tests}  #reformat test cases
python eval_linecov.py --path {path_to_formatted_generated_tests}  #evaluate correctness and coverage metrics

Run experiments: targeted line coverage with two-step reasoning

python gen_linecov_cot_{openai/hf}.py --model {model_name} #generate reasoning steps and raw test cases
python format.py --mode line --path {path_to_generated_tests}  #reformat test cases
python eval_linecov.py --path {path_to_formatted_generated_tests}  #evaluate correctness and coverage metrics

Run experiments: targeted branch coverage

python generate_targetcov_{openai/gemini/hf}.py --covmode branch --model {model_name} #generate raw test cases
python format.py --mode branch --path {path_to_generated_tests}  #reformat test cases
python eval_branchcov.py --path {path_to_formatted_generated_tests}  #evaluate correctness and coverage metrics

Run experiments: targeted path coverage

python generate_pathcov_{openai/gemini/hf}.py --model {model_name} #generate raw test cases
python format.py --mode overall --path {path_to_generated_tests}  #reformat test cases
python eval_pathcov.py --path {path_to_formatted_generated_tests}  #evaluate correctness and coverage metrics

Run baselines: targeted line/branch coverage

python eval_base.py --path {path_to_formatted_generated_tests}  #evaluate targeted line/branch coverage for baselines: use the test cases generate from the overall coverage task

Run baselines: targeted path coverage

python eval_pathcov_base.py --path {path_to_formatted_generated_tests}  #evaluate targeted line/branch coverage for baselines: use the test cases generate from the overall coverage task

Run your own pipeline

We encourage researchers to use their own test case generation pipeline other than our prompt framework. If you run your own pipeline, your generated test case file should be formatted as:

Overall coverage:

{'task_num': LeetCode problem id, 'difficulty': LeetCode problem difficulty, 'func_name': solution function name, 'code': solution code, 'tests': list of generated test cases}

Targeted line coverage:

{'task_num': LeetCode problem id, 'difficulty': LeetCode problem difficulty, 'func_name': solution function name, 'code': solution code, 'tests': {target line number: test case for target line}}

Targeted branch coverage:

{'task_num': LeetCode problem id, 'difficulty': LeetCode problem difficulty, 'func_name': solution function name, 'code': solution code, 'tests': [{"start": branch start line, "end": branch end line, "test": test case for target branch}]}

Targeted path coverage:

{'task_num': LeetCode problem id, 'difficulty': LeetCode problem difficulty, 'func_name': solution function name, 'code': solution code, 'tests': list of generated test cases for each target path}

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
data		data
prompt		prompt
.coveragerc		.coveragerc
LICENSE		LICENSE
README.md		README.md
croissant.json		croissant.json
data_utils.py		data_utils.py
eval_base.py		eval_base.py
eval_branchcov.py		eval_branchcov.py
eval_linecov.py		eval_linecov.py
eval_overall.py		eval_overall.py
eval_pathcov.py		eval_pathcov.py
eval_pathcov_base.py		eval_pathcov_base.py
format.py		format.py
gen_linecov_cot_hf.py		gen_linecov_cot_hf.py
gen_linecov_cot_openai.py		gen_linecov_cot_openai.py
generate_cov_gemini.py		generate_cov_gemini.py
generate_cov_hf.py		generate_cov_hf.py
generate_cov_openai.py		generate_cov_openai.py
generate_pathcov_gemini.py		generate_pathcov_gemini.py
generate_pathcov_hf.py		generate_pathcov_hf.py
generate_pathcov_openai.py		generate_pathcov_openai.py
generate_targetcov_gemini.py		generate_targetcov_gemini.py
generate_targetcov_hf.py		generate_targetcov_hf.py
generate_targetcov_openai.py		generate_targetcov_openai.py
prompt_utils.py		prompt_utils.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TestEval

Data

Dataset description

Metadata

Quick Start

Run experiments: overall coverage

Run experiments: targeted line coverage

Run experiments: targeted line coverage with two-step reasoning

Run experiments: targeted branch coverage

Run experiments: targeted path coverage

Run baselines: targeted line/branch coverage

Run baselines: targeted path coverage

Run your own pipeline

About

Releases

Packages

Contributors 2

Languages

License

LLM4SoftwareTesting/TestEval

Folders and files

Latest commit

History

Repository files navigation

TestEval

Data

Dataset description

Metadata

Quick Start

Run experiments: overall coverage

Run experiments: targeted line coverage

Run experiments: targeted line coverage with two-step reasoning

Run experiments: targeted branch coverage

Run experiments: targeted path coverage

Run baselines: targeted line/branch coverage

Run baselines: targeted path coverage

Run your own pipeline

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages