CodeLLM Evaluator

Easy to evaluate with fast inference settings CodeLLMs

Overview

CodeLLM Evaluator provide the ability for fast and efficiently evaluation on code generation task. Inspired by lm-evaluation-harness and bigcode-eval-harness, we designed our framework for multiple use-case, easy to add new metrics and customized task.

Features:

Implemented HumanEval, MBPP benchmarks for Coding LLMs.
Support for models loaded via transformers, DeepSpeed.
Support for evaluation on adapters (e.g. LoRA) supported in HuggingFace's PEFT library.
Support for inference with distributed native transformers or fast inference with VLLMs backend.
Easy support for custom prompts, task and metrics.

Setup

Install code-eval package from the github repository via pip:

$ git clone https://github.com/FSoft-AI4Code/code-llm-evaluator.git
$ cd code-llm-evaluator
$ pip install -e .

Quick-start

To evaluate a supported task in python, you can load our :py:func:`code_eval.Evaluator` to generate and compute evaluate metrics on the run.

from code_eval import Evaluator
from code_eval.task import HumanEval

task = HumanEval()
evaluator = Evaluator(task=task)

output = evaluator.generate(num_return_sequences=3,
                            batch_size=16,
                            temperature=0.9)
result = evaluator.evaluate(output)

CLI Usage

Inference with Transformers

Load model and generate answer using native transformers (tf), pass model local path or HuggingFace Hub name. We select transformers as default backend, but you can pass backend="tf" to specify it:

$ code-eval --model_name microsoft/phi-1 \
    --task humaneval \
    --batch_size 8 \
    --backend hf \

Tip

Load LoRA adapters by add --peft_model argument. The --model_name must point to full model architecture.

$ code-eval --model_name microsoft/phi-1 \
    --peft_model <adapters-name> \
    --task humaneval \
    --batch_size 8 \
    --backend hf \

Inference with vLLM engine

We recommend using vLLM engine for fast inference. vLLM supported tensor parallel, data parallel or combination of both. Reference to vLLM documenation for more detail.

To use code-eval with vLLM engine, please refer to vLLM engine documents to instal it.

Note

You can install vLLM using pip:

$ pip install vllm

With model supported by vLLM (See more: vLLM supported model) run:

$ code-eval --model_name microsoft/phi-1 \
    --task humaneval \
    --batch_size 8 \
    --backend vllm

Tip

You can use LoRA with similar syntax.

$ code-eval --model_name microsoft/phi-1 \
    --peft_model <adapters-name> \
    --task humaneval \
    --batch_size 8 \
    --backend vllm \

Cite as

@misc{code-eval,
    author       = {Dung Nguyen Manh},
    title        = {A framework for easily evaluation code generation model},
    month        = 3,
    year         = 2024,
    publisher    = {github},
    version      = {v0.0.1},
    url          = {https://github.com/FSoft-AI4Code/code-llm-evaluator}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.vscode		.vscode
docs		docs
script		script
src/code_eval		src/code_eval
tests		tests
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
CITATION.bib		CITATION.bib
HISTORY.rst		HISTORY.rst
LICENSE		LICENSE
README.rst		README.rst
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CodeLLM Evaluator

Overview

Setup

Quick-start

CLI Usage

Inference with Transformers

Inference with vLLM engine

Cite as

About

Releases

Packages

Languages

License

FSoft-AI4Code/code-llm-evaluator

Folders and files

Latest commit

History

Repository files navigation

CodeLLM Evaluator

Overview

Setup

Quick-start

CLI Usage

Inference with Transformers

Inference with vLLM engine

Cite as

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages