Best of n sampling (Value Ranking)

Best-of-n sampling, also known as value ranking, incorporates a value function into the process of selecting an action or completion from an LLM. In a given situation, n possible completions are generated and ranked using a value function. Only the completion with highest predicted value is then actually chosen.

In situations such as game, in which an optimization of LLM generations towards the outcome of the game is desired, best-of-n sampling is commonly chosen instead of reinforcement learning due to it's much lower implementational and computational footprint.

While there are other implementations out there (trl), this implementation works natively with multiple gpus (using accelerate) and builds on transformer_heads

Installation

Clone this repository
From repository root pip install -e .

Usage

Rank a list of actions/completions:
- Requires a transformer_heads model with a value head
- from best_of_n import value_rank_completions
- Check tests/test_mock_value_rank.py for a usage example.
Generate and rank completions:
- Requires either a transformer_heads model with language modelling and value head or two transformer_heads models with each doing one of the two tasks.
- from best_of_n import sample_best_of_n
- Check tests/test_real_model.py for a usage example.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
best_of_n		best_of_n
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Best of n sampling (Value Ranking)

Installation

Usage

About

Releases

Packages

Languages

License

center-for-humans-and-machines/best-of-n-sampling

Folders and files

Latest commit

History

Repository files navigation

Best of n sampling (Value Ranking)

Installation

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages