huggingface / open-r1 Public

Notifications You must be signed in to change notification settings
Fork 2.2k
Star 24.2k

Code
Issues 257
Pull requests 40
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Issues: huggingface/open-r1

How to contribute

#23 opened Jan 25, 2025 by lewtun

Open 11

Beta

Labels 11 Milestones 0

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

257 Open 93 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Unreasonable design of the code of accuracy reward

#628 opened Apr 27, 2025 by andyclsr

how can I get the prediction using the provided evaluation script?

#625 opened Apr 25, 2025 by CurryxIaoHu

Is "solution" necessary for the grpo dataset?

#624 opened Apr 25, 2025 by yy9996

Requiring the recipe for training the GRPO model of OlympicCoder

#623 opened Apr 24, 2025 by sluxsr

OpenR1-Qwen-7B achieves 47.40 on AIME24, better than reported!

#622 opened Apr 24, 2025 by Hasuer

GRPO stuck with NCCL error

#620 opened Apr 24, 2025 by JoeyXuquant11

A strange format reward issue

#616 opened Apr 18, 2025 by MiracleLin001

a efficient GRPO LOSS kernel by triton, reduce 46G memory

#615 opened Apr 18, 2025 by mdy666

clip grad not working

#609 opened Apr 17, 2025 by jiangix-paper

The diffenernce of these scripts

#606 opened Apr 15, 2025 by Alan-D-Chen

unsatisfactory result and strange reward

#605 opened Apr 15, 2025 by qianfantianyuzhouzhou

Is vllm==0.8.3 causing some incompatible problems

#602 opened Apr 15, 2025 by roaminwind

Does the Qwen-2.5-VL model in the GRPO project currently support multi-image input?

#601 opened Apr 14, 2025 by zby1218

model.generate produces right-padded completions, causing incompatibility with Flash Attention 2

#599 opened Apr 14, 2025 by PolarisHsu

src/open_r1/evaluate.py Missing

#598 opened Apr 13, 2025 by Zoeyyao27

weird....why the new version become worse?????

#594 opened Apr 11, 2025 by yanghu819

GRPO config for finetuning Qwen-7B-Math-Instruct on OpenR1-Math-220k

#589 opened Apr 9, 2025 by toslali-ibm

GRPO with a lora model after SFT

#588 opened Apr 9, 2025 by Pandasea

grpo inference error

#587 opened Apr 9, 2025 by jiangyuan1018

what is next for this project?

#586 opened Apr 7, 2025 by Mnaik2

vllm generate n responses, some responses stop after generating </answer>, some can not stop.

#582 opened Apr 6, 2025 by LaoWangGB

understanding GRPO code pipeline. is this fully online learning in code?

#581 opened Apr 5, 2025 by dongje

Is possible release intermediate checkpoints and wandb logs?

#580 opened Apr 5, 2025 by Qinghao-Hu

Sequence length problem

#579 opened Apr 5, 2025 by zhangtianhong-1998

Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

#578 opened Apr 4, 2025 by BiNLP

Previous 1 2 3 4 5 … 10 11 Next

Previous Next

ProTip! What’s not been updated in a month: updated:<2025-03-29.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly