rlhf

Introducing Filtered Direct Preference Optimization (fDPO) that enhances language model alignment with human preferences by discarding lower-quality samples compared to those generated by the learning model

alignment dpo rlhf

Updated Apr 25, 2024
Python

himanshuvnm / Foundation-Model-Large-Language-Model-FM-LLM

Star

This repository was commited under the action of executing important tasks on which modern Generative AI concepts are laid on. In particular, we focussed on three coding actions of Large Language Models. Extra and necessary details are given in the README.md file.

aws python3 pytorch lora rnn-pytorch attention-is-all-you-need fine-tuning hate-speech-detection huggingface huggingface-transformers foundation-models large-language-models generative-ai rlhf flan-t5 peft-fine-tuning-llm ml-m5-2xlarge low-rank-ada

Updated Mar 28, 2024
Jupyter Notebook

akain0 / Reinforcement-Learning-

Star

Projects and Models built in Python leveraging PyTorch, implementing Reinforcement Learning algorithms for reward-based tasks.

reinforcement-learning reinforcement-learning-algorithms a3c lstm-neural-networks bellman-equation rlhf

Updated May 7, 2024
Jupyter Notebook

AMfeta99 / NLP_LLM

Star

This repository is dedicated to small projects and some theoretical material that I used to get into NLP and LLM in a practical and efficient way.

Updated Jun 14, 2024
Jupyter Notebook

kyryl-opens-ml / rlfh-dagster-modal

Star

Re-usable & scalable RLHF training pipeline with Dagster and Modal.

modal dpo dagster llm rlhf

Updated Jun 11, 2024
Python

OctopusMind / DPO

Star

dpo算法实现

lora dpo rlhf qwen

Updated Jun 12, 2024
Python

ssbuild / t5_rlhf

Star

chatyuan_rlhf_training

lora reward ppo t5 rlhf adalora qlora

Updated Sep 19, 2023
Python

vicgalle / awesome-rlaif

Sponsor

Star

A curated and updated list of relevant articles and repositories on Reinforcement Learning from AI Feedback (RLAIF)

awesome research language-model llm rlhf rlaif

Updated Jan 24, 2024

alexisrozhkov / llm-calib

Star

Improving LLM truthfulness via reporting confidence

alignment truthfulness llm rlhf

Updated Jun 9, 2024
Python

phonism / llm4cp

Star

Large Language Model for Competitive Programming

competitive-programming llama ppo large-language-models rlhf

Updated Apr 28, 2023
Python

log10-io / log10js

Star

JavaScript client library for managing your LLM data in one place

javascript debugging ai monitoring logging artificial-intelligence openai autonomous-agents openai-api langchain rlhf llmops langchain-js

Updated May 3, 2023
JavaScript

jeremy-collins / robot-rlhf

Star

Robot Learning from Human Feedback. Inspired by advancements in NLP, we train a robot policy via reinforcement learning using a reward function learned exclusively from human preferences.

reinforcement-learning robotics alignment chatgpt rlhf

Updated Apr 16, 2023
Python

Improve this page

Add a description, image, and links to the rlhf topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the rlhf topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rlhf

Here are 130 public repositories matching this topic...

MOONLAPSED / cognOS

OctopusMind / RLHF_PPO

ChukwumaChukwuma / enyimba2_ai

BARUDA-AI / Awesome-Preference-Optimization

10mudassir007 / AI-CHATBOT

shreyansh26 / LLM-Activation-Steering-Experiments

saschaschramm / tiny-chatgpt

OpenRL-Lab / RL_Tutorial

CyberAgentAILab / filtered-dpo

himanshuvnm / Foundation-Model-Large-Language-Model-FM-LLM

akain0 / Reinforcement-Learning-

AMfeta99 / NLP_LLM

kyryl-opens-ml / rlfh-dagster-modal

OctopusMind / DPO

ssbuild / t5_rlhf

vicgalle / awesome-rlaif

alexisrozhkov / llm-calib

phonism / llm4cp

log10-io / log10js

jeremy-collins / robot-rlhf

Improve this page

Add this topic to your repo