Best-of-n sampling, also known as value ranking, incorporates a value function into the process of selecting an action or completion from an LLM. In a given situation, n possible completions are generated and ranked using a value function. Only the completion with highest predicted value is then actually chosen.
In situations such as game, in which an optimization of LLM generations towards the outcome of the game is desired, best-of-n sampling is commonly chosen instead of reinforcement learning due to it's much lower implementational and computational footprint.
While there are other implementations out there (trl), this implementation works natively with multiple gpus (using accelerate) and builds on transformer_heads
- Clone this repository
- From repository root
pip install -e .
- Rank a list of actions/completions:
- Requires a transformer_heads model with a value head
from best_of_n import value_rank_completions
- Check tests/test_mock_value_rank.py for a usage example.
- Generate and rank completions:
- Requires either a transformer_heads model with language modelling and value head or two transformer_heads models with each doing one of the two tasks.
from best_of_n import sample_best_of_n
- Check tests/test_real_model.py for a usage example.