-
Notifications
You must be signed in to change notification settings - Fork 176
refactor: refactor env and data processor & add nemotron super 49b recipes #1506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
75f3d5c to
5ebbc73
Compare
c9335d4 to
a872ed6
Compare
b7fedb9 to
9078e33
Compare
c0bfaa6 to
ab0ac80
Compare
Signed-off-by: Yuki Huang <[email protected]> Signed-off-by: ruit <[email protected]>
Signed-off-by: Yuki Huang <[email protected]> Signed-off-by: ruit <[email protected]>
Signed-off-by: Yuki Huang <[email protected]> Signed-off-by: ruit <[email protected]>
Signed-off-by: Yuki Huang <[email protected]> Signed-off-by: ruit <[email protected]>
Signed-off-by: Yuki Huang <[email protected]> Signed-off-by: ruit <[email protected]>
Signed-off-by: Yuki Huang <[email protected]> Signed-off-by: ruit <[email protected]>
Signed-off-by: Yuki Huang <[email protected]> Signed-off-by: ruit <[email protected]>
Signed-off-by: ruit <[email protected]>
Signed-off-by: Yuki Huang <[email protected]> Signed-off-by: ruit <[email protected]>
… processors. Added raw_dataset.py and path.py for improved dataset processing. Updated project-includes in pyrefly.toml and modified grpo.md to reflect new task-dataset mapping. Cleaned up unused code and configurations in various YAML files. Signed-off-by: ruit <[email protected]>
…or handling
- Introduced documentation for the new Code Jaccard Environment, detailing its functionality, usage, and configuration.
- Updated RawDataset class to provide a default processor if none is specified in the data configuration.
- Enhanced test coverage for the helpsteer3 data processor to ensure correct functionality and output.
Signed-off-by: ruit <[email protected]>
Signed-off-by: ruit <[email protected]>
- Updated CLEVRCoGenTDataset, OpenAIFormatDataset, and SquadDataset to inherit from the RawDataset class for improved dataset handling. - Added necessary imports for RawDataset in the respective files. Signed-off-by: ruit <[email protected]>
…up for vlm grpo - Added `env_name` to `vlm_grpo_3B_megatron.yaml` and `vlm_grpo_3B.yaml` for environment specification. - Modified `setup_data` function in `run_vlm_grpo.py` to use `env_name` for environment configuration, enhancing flexibility in dataset processing. Signed-off-by: ruit <[email protected]>
…tion Signed-off-by: ruit <[email protected]>
Signed-off-by: ruit <[email protected]>
d77b56a to
a435ccf
Compare
|
Signed-off-by: ruit <[email protected]>
a435ccf to
4f4a092
Compare
|
terrykong
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @RayenTian . left comments, but i didn't fully finish reviewing. i need a little more time to give feedback on the task spec/dataset change. One high level feedback is it does seem a little complicated at first glance since we have task_names now plumbed throughout
and we allow some flexibility that i'm not sure we want to allow
task_name = data.task_name if hasattr(data, "task_name") else task_spec.task_name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you intend to commit this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for enhance understandability and compatibility with future multi-dataset and multi-env support. More details are here #1506 (comment).
|
|
||
|
|
||
| def test_nightly_compute_stays_below_1100_hours(nightly_test_suite, tracker): | ||
| def test_nightly_compute_stays_below_1300_hours(nightly_test_suite, tracker): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a pretty big jump: ~20% more compute to do every night. is it possible to get the same signal with fewer steps if we need to test nightly? alternatively we could test only on release, but first would like to see if we can: (a) shorten the test (b) reduce the model size (c) scale down the experiment
| uv run tests/json_dump_tb_logs.py $LOG_DIR --output_path $JSON_METRICS | ||
|
|
||
| uv run tests/check_metrics.py $JSON_METRICS \ | ||
| 'max(data["train/token_mult_prob_error"]) < 1.05' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we switch to gen_kl_error
| from typing import Any | ||
|
|
||
|
|
||
| def import_class_from_path(name: str) -> Any: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't there a "get_class" offered by hydra? could we use that instead?
| return chunks | ||
|
|
||
|
|
||
| def get_env(env_name: str, env_configs: dict) -> EnvironmentInterface: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's probably better to rename to create_env since it conveys that you're creating remotes
| return env | ||
|
|
||
|
|
||
| def register_env(env_name: str, actor_class_fqn: str) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i didn't see this used anywhere. is that intentional?
| max_seq_length // len(message_log), len(chat_message["token_ids"]) | ||
| ) | ||
| ] | ||
| loss_multiplier = 0.1 # Reduce loss for truncated sequences |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this deviates from what we usually do where we just set to 0. what's the reason to set to 0.1 for this processor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # load dataset | ||
| data: Any = load_response_dataset(data_config, seed) | ||
| task_spec = data.task_spec | ||
| task_name = data.task_name if hasattr(data, "task_name") else task_spec.task_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason to have this fallback as opposed to just asserting where to find the task_name in all cases?
Follow up of #1472. Thanks @nv-mmanohara for adding this!
run_grpo.py, will [Refactor] Clearrun_grpo_math.pyandrun_grpo_rm.py#1572 in a subsequent PR.Test Result
grpo math before and after refactor
nemotron 49B
Known Issue
Design explaination
Purpose of
task_name = data.task_name if hasattr(data, "task_name") else task_spec.task_name(Answer to #1506 (review))
Relative doc is add to [docs/guides/grpo.md](docs/guides/grpo.md).
1.1 Enhanced Understandability
run_grpo_math.py, the environment was hard-coded in the code. This file only supported one math environment, and thetask_nameof all datasets used was uniformly set to "math".task_name,task_data_processors, andenvwere in a strict one-to-one binding. For example, thetask_nameofopenmathinstruct2was hard-coded as "math", thetask_data_processorsfor the math task was bound tomath_hf_data_processor, and the environment was bound tomath_env.run_grpo_math.pyas "math", and the task ofrun_grpo_rm.pyas "reward model".run_grpo.py—the environment is no longer hard-coded but specified via configuration. This makes the binding between datasets, environments, and processors more flexible. For instance,openmathinstruct2can use either the math environment or the reward model environment.task_nameto "math" for all environments would cause confusion.task_name, and the task corresponding to the dataset can specify its own environment and processor.1.2 Compatibility with Future Multi-Dataset and Multi-Environment Support
openmathinstruct2anddapo_math. Both are math-related datasets.openmathinstruct2(see: [openmathinstruct2.py#L38](RL/nemo_rl/data/datasets/response_datasets/openmathinstruct2.py
Line 38 in 859a89a
dapo_math(see: [dapo_math.py#L37](RL/nemo_rl/data/datasets/response_datasets/dapo_math.py
Line 37 in 859a89a
task_to_env(see: [run_grpo_math.py#L123](RL/examples/run_grpo_math.py
Line 123 in 859a89a
task_namefor both datasets is hard-coded as"task_name": "math"in the code, this multi-environment configuration cannot be implemented.