Hello all,
I want to raise an issue mostly to hear everyone's comments.
Previously there were "4 rules" (kinda) that applied to the PRs:
- The difference had to be at least 0.0005 to be accepted, otherwise it could be noise or seed difference.
- Just because a PR is submitted it does not mean it will get accepted. There is a focus on more creative submissions and the "presentation" so to say.
- The PRs will be evaluated in order. I.e. if PR300 and PR400 have the same scores, PR300 would get approved first as it was submitted before PR400.
- Finally, there will be record submissions and non-record submission setup.
A few points to discuss.
-
For the first rule, it makes complete sense, I'd say it should've been even higher, at 0.0010 as we've seen some seeds make a large difference. But this is already set and that is fine.
-
This 2nd rule makes sense.
-
This rule makes sense too, however, I'd say there is a bit of an issue in specific scenarios. As in, if PR300 only tuned the hyperparameters of previous SOTA, meanwhile PR400 is a genuinely new approach, then it makes sense for PR400 to be accepted before PR300, but that causes a lot of issues, as in depending on the time difference between the reviews (100 PRs reviewed once per week for example), this would open a can of worms. Maybe a good rule here would be "if any new architecture was never explored, and it generates the same score as a previously submitted SOTA, it will be accepted nonetheless". That way everyone's happy and both the better tuned previous SOTA and new architecture are showcased. Another column to the Readme would showcase "is this based on a previous SOTA or new" (simple Yes/No).
-
This rule has a few issues. A lot of people use the Non-record submission not to refer to the 2nd leaderboard, but simply to submit something for the compute grant, and then return and then update it later. However, what happens here is that the team may reject it in the meantime while the author works on it, or they may get skipped as they are "Work in progress" and never evaluated again.
This is a bit finicky as an approach. In my opinion simply submitting a personal repo instead of PR to showcase the work would solve this and reduce the numbers of PRs. Keep the repo hidden until grant application, make it hidden again after the grant is given, if so. I personally never submitted anything until it was fully finished, but that meant using a lot of my personal resouces which is not possible for everyone.
What does everyone think? I'm starting this as a place to collate everyone's opinion on these rules and general issues that can arrise from them. Also because I noticed the latest PRs accepted were all from April, meanwhile the many pushed before in March were ignored. Almost all my submissions are for the 2nd leaderboard with the infinite compute (albiet I did 10mins too, but they are only "worth it" when trained for way more) and unique architectures (XNOR, Binary Bitnet, Ternary Bitnet, JEPA LeWorldModel Mamba2), so it does not affect me (in the sense that there is not rush to accept mine immediately or later on, they are not focued on lowering the bpb per se), but I believe it is unfair to the many people that did complete their work and submitted it sooner and should be, rightly so, accepted to the 1st leaderboard, before those that came later.
The stage is yours:
Hello all,
I want to raise an issue mostly to hear everyone's comments.
Previously there were "4 rules" (kinda) that applied to the PRs:
A few points to discuss.
For the first rule, it makes complete sense, I'd say it should've been even higher, at 0.0010 as we've seen some seeds make a large difference. But this is already set and that is fine.
This 2nd rule makes sense.
This rule makes sense too, however, I'd say there is a bit of an issue in specific scenarios. As in, if PR300 only tuned the hyperparameters of previous SOTA, meanwhile PR400 is a genuinely new approach, then it makes sense for PR400 to be accepted before PR300, but that causes a lot of issues, as in depending on the time difference between the reviews (100 PRs reviewed once per week for example), this would open a can of worms. Maybe a good rule here would be "if any new architecture was never explored, and it generates the same score as a previously submitted SOTA, it will be accepted nonetheless". That way everyone's happy and both the better tuned previous SOTA and new architecture are showcased. Another column to the Readme would showcase "is this based on a previous SOTA or new" (simple Yes/No).
This rule has a few issues. A lot of people use the Non-record submission not to refer to the 2nd leaderboard, but simply to submit something for the compute grant, and then return and then update it later. However, what happens here is that the team may reject it in the meantime while the author works on it, or they may get skipped as they are "Work in progress" and never evaluated again.
This is a bit finicky as an approach. In my opinion simply submitting a personal repo instead of PR to showcase the work would solve this and reduce the numbers of PRs. Keep the repo hidden until grant application, make it hidden again after the grant is given, if so. I personally never submitted anything until it was fully finished, but that meant using a lot of my personal resouces which is not possible for everyone.
What does everyone think? I'm starting this as a place to collate everyone's opinion on these rules and general issues that can arrise from them. Also because I noticed the latest PRs accepted were all from April, meanwhile the many pushed before in March were ignored. Almost all my submissions are for the 2nd leaderboard with the infinite compute (albiet I did 10mins too, but they are only "worth it" when trained for way more) and unique architectures (XNOR, Binary Bitnet, Ternary Bitnet, JEPA LeWorldModel Mamba2), so it does not affect me (in the sense that there is not rush to accept mine immediately or later on, they are not focued on lowering the bpb per se), but I believe it is unfair to the many people that did complete their work and submitted it sooner and should be, rightly so, accepted to the 1st leaderboard, before those that came later.
The stage is yours: