The official implementation of Self-Play Preference Optimization (SPPO)
-
Updated
Jun 30, 2024 - Python
The official implementation of Self-Play Preference Optimization (SPPO)
Unify Efficient Fine-Tuning of 100+ LLMs
Official release of InternLM2 7B and 20B base and chat models. 200K context support
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
Argilla is a collaboration platform for AI engineers and domain experts that require high-quality outputs, full data ownership, and overall efficiency.
⚗️ distilabel is a framework for synthetic data and AI feedback for AI engineers that require high-quality outputs, full data ownership, and overall efficiency.
Python client library for improving your LLM app accuracy
Implemented the Proximal Policy Optimization (PPO) algorithm to fine-tune a large language model for generating consistently positive reviews
Tracking instruction-tuned LLM openness. Paper: Liesenfeld, Andreas, Alianda Lopez, and Mark Dingemanse. 2023. “Opening up ChatGPT: Tracking Openness, Transparency, and Accountability in Instruction-Tuned Text Generators.” In Proceedings of the 5th International Conference on Conversational User Interfaces. doi:10.1145/3571884.3604316.
SimPO: Simple Preference Optimization with a Reference-Free Reward
Quality Diversity through Human Feedback: Towards Open-Ended Diversity-Driven Optimization (ICML 2024)
RewardBench: the first evaluation tool for reward models.
A curated list of reinforcement learning with human feedback resources (continually updated)
A Comparison of LLM Chat Bot Implementation Methods with Travel Use Case
AI research lab🔬: implementations of AI papers and theoretical research: InstructGPT, llama, transformers, diffusion models, RLHF, etc...
Official implementation of Bootstrapping Language Models via DPO Implicit Rewards
Add a description, image, and links to the rlhf topic page so that developers can more easily learn about it.
To associate your repository with the rlhf topic, visit your repo's landing page and select "manage topics."