#

rlhf

Here are 130 public repositories matching this topic...

uclaml / SPPO

The official implementation of Self-Play Preference Optimization (SPPO)

deep-learning fine-tuning self-play large-language-models rlhf

Updated Jun 30, 2024
Python

TUDB-Labs / mLoRA

An Efficient "Factory" to Build Multiple LoRA Adapters

gpu llama lora finetune peft dpo baichuan llm rlhf chatglm llama2 mlora

Updated Jun 29, 2024
Python

LLaMA-Factory

hiyouga / LLaMA-Factory

Unify Efficient Fine-Tuning of 100+ LLMs

Updated Jun 29, 2024
Python

InternLM / InternLM

Official release of InternLM2 7B and 20B base and chat models. 200K context support

chatbot chinese gpt pretrained-models llm long-context rlhf large-language-model flash-attention fine-tuning-llm

Updated Jun 29, 2024
Python

tatsu-lab / alpaca_eval

An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.

nlp deep-learning leaderboard evaluation instruction-following foundation-models large-language-models rlhf

Updated Jun 29, 2024
Jupyter Notebook

argilla

argilla-io / argilla

Argilla is a collaboration platform for AI engineers and domain experts that require high-quality outputs, full data ownership, and overall efficiency.

nlp machine-learning natural-language-processing ai weak-supervision developer-tools active-learning annotation-tool text-annotation weakly-supervised-learning human-in-the-loop mlops text-labeling gpt-4 llm langchain rlhf

Updated Jun 30, 2024
Python

distilabel

argilla-io / distilabel

⚗️ distilabel is a framework for synthetic data and AI feedback for AI engineers that require high-quality outputs, full data ownership, and overall efficiency.

python ai openai synthetic-data synthetic-dataset-generation huggingface llms rlhf rlaif

Updated Jun 28, 2024
Python

log10-io / log10

Python client library for improving your LLM app accuracy

python debugging ai monitoring evaluations feedback logging artificial-intelligence openai agents autonomous-agents fine-tuning llms rlhf llmops anthropic

Updated Jun 28, 2024
Python

mengdi-li / awesome-RLAIF

A continually updated list of literature on Reinforcement Learning from AI Feedback (RLAIF)

alignment rl llms rlhf rlaif

Updated Jun 27, 2024

sathishkumar67 / GPT-2-IMDB-Sentiment-Fine-Tuning-with-PPO

Implemented the Proximal Policy Optimization (PPO) algorithm to fine-tune a large language model for generating consistently positive reviews

reinforcement-learning transformers text-generation pytorch ppo gpt2 rlhf

Updated Jun 27, 2024
Python

opening-up-chatgpt / opening-up-chatgpt.github.io

Tracking instruction-tuned LLM openness. Paper: Liesenfeld, Andreas, Alianda Lopez, and Mark Dingemanse. 2023. “Opening up ChatGPT: Tracking Openness, Transparency, and Accountability in Instruction-Tuned Text Generators.” In Proceedings of the 5th International Conference on Conversational User Interfaces. doi:10.1145/3571884.3604316.

open-source transparency llm chatgpt rlhf chatgpt-free

Updated Jun 26, 2024
Python

princeton-nlp / SimPO

SimPO: Simple Preference Optimization with a Reference-Free Reward

alignment large-language-models rlhf preference-alignment

Updated Jun 25, 2024
Python

ld-ing / qdhf

Quality Diversity through Human Feedback: Towards Open-Ended Diversity-Driven Optimization (ICML 2024)

text-to-image optimization-algorithms generative-ai rlhf human-ai-alignment

Updated Jun 24, 2024
Python

allenai / reward-bench

RewardBench: the first evaluation tool for reward models.

preference-learning rlhf

Updated Jun 24, 2024
Python

opendilab / awesome-RLHF

A curated list of reinforcement learning with human feedback resources (continually updated)

reinforcement-learning deep-learning deep-reinforcement-learning large-language-models human-feedback rlhf

Updated Jun 24, 2024

soniawmeyer / WanderChat

A Comparison of LLM Chat Bot Implementation Methods with Travel Use Case

machine-learning chatbot travel llama lora sjsu mistral fine-tuning rag ai-engineering llms rlhf llm-training qlora

Updated Jun 21, 2024
Jupyter Notebook

Esmail-ibraheem / Axon

AI research lab🔬: implementations of AI papers and theoretical research: InstructGPT, llama, transformers, diffusion models, RLHF, etc...

transformers pytorch llama research-paper paper-implementations arxiv-papers llms rlhf

Updated Jun 20, 2024
Jupyter Notebook

RLHFlow / Online-RLHF

A recipe for online RLHF.

llm rlhf llama3

Updated Jun 20, 2024
Python

RLHFlow / RLHF-Reward-Modeling

Recipes to train reward model for RLHF.

llm rlhf reward-models llama3

Updated Jun 19, 2024
Python

sail-sg / dice

Official implementation of Bootstrapping Language Models via DPO Implicit Rewards

alignment preference-learning large-language-models rlhf

Updated Jun 18, 2024
Python

Improve this page

Add a description, image, and links to the rlhf topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the rlhf topic, visit your repo's landing page and select "manage topics."