Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ConnectionError: Couldn't reach 'synthetic_data_llama-3-8b-instruct-sppo-iter3_score' on the Hub (ConnectionError) #2

Open
xinghuang2050 opened this issue Jun 27, 2024 · 2 comments

Comments

@xinghuang2050
Copy link

Great work!
I commented all the push_to_hub in the code. Is synthetic_data_llama-3-8b-instruct-sppo-iter3_score dataset generated by PairRM?

[rank4]: Traceback (most recent call last):
[rank4]: File "/training-data/huangxing/software/SPPO/sppo/run_dpo.py", line 249, in
[rank4]: main()
[rank4]: File "/training-data/huangxing/software/SPPO/sppo/run_dpo.py", line 43, in main
[rank4]: main_inner(model_args, data_args, training_args)
[rank4]: File "/training-data/huangxing/software/SPPO/sppo/run_dpo.py", line 78, in main_inner
[rank4]: raw_datasets = get_datasets(data_args, splits=["train"])
[rank4]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]: File "/training-data/huangxing/software/SPPO/sppo/alignment/data.py", line 164, in get_datasets
[rank4]: raw_datasets = mix_datasets(dataset_mixer, splits=splits, shuffle=shuffle)
[rank4]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]: File "/training-data/huangxing/software/SPPO/sppo/alignment/data.py", line 189, in mix_datasets
[rank4]: dataset = load_dataset(ds, split=split)
[rank4]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank4]: File "/training-data/software/miniconda3/envs/mcts/lib/python3.11/site-packages/datasets/load.py", line 2129, in load_dataset
[rank4]: builder_instance = load_dataset_builder(
[rank4]: ^^^^^^^^^^^^^^^^^^^^^
[rank4]: File "/training-data/software/miniconda3/envs/mcts/lib/python3.11/site-packages/datasets/load.py", line 1815, in load_dataset_builder
[rank4]: dataset_module = dataset_module_factory(
[rank4]: ^^^^^^^^^^^^^^^^^^^^^^^
[rank4]: File "/training-data/software/miniconda3/envs/mcts/lib/python3.11/site-packages/datasets/load.py", line 1512, in dataset_module_factory
[rank4]: raise e1 from None
[rank4]: File "/training-data/software/miniconda3/envs/mcts/lib/python3.11/site-packages/datasets/load.py", line 1468, in dataset_module_factory
[rank4]: raise ConnectionError(f"Couldn't reach '{path}' on the Hub ({type(e).name})")
[rank4]: ConnectionError: Couldn't reach 'synthetic_data_llama-3-8b-instruct-sppo-iter3_score' on the Hub (ConnectionError)

@angelahzyuan
Copy link
Collaborator

Great work! I commented all the push_to_hub in the code. Is synthetic_data_llama-3-8b-instruct-sppo-iter3_score dataset generated by PairRM?

[rank4]: Traceback (most recent call last): [rank4]: File "/training-data/huangxing/software/SPPO/sppo/run_dpo.py", line 249, in [rank4]: main() [rank4]: File "/training-data/huangxing/software/SPPO/sppo/run_dpo.py", line 43, in main [rank4]: main_inner(model_args, data_args, training_args) [rank4]: File "/training-data/huangxing/software/SPPO/sppo/run_dpo.py", line 78, in main_inner [rank4]: raw_datasets = get_datasets(data_args, splits=["train"]) [rank4]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank4]: File "/training-data/huangxing/software/SPPO/sppo/alignment/data.py", line 164, in get_datasets [rank4]: raw_datasets = mix_datasets(dataset_mixer, splits=splits, shuffle=shuffle) [rank4]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank4]: File "/training-data/huangxing/software/SPPO/sppo/alignment/data.py", line 189, in mix_datasets [rank4]: dataset = load_dataset(ds, split=split) [rank4]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank4]: File "/training-data/software/miniconda3/envs/mcts/lib/python3.11/site-packages/datasets/load.py", line 2129, in load_dataset [rank4]: builder_instance = load_dataset_builder( [rank4]: ^^^^^^^^^^^^^^^^^^^^^ [rank4]: File "/training-data/software/miniconda3/envs/mcts/lib/python3.11/site-packages/datasets/load.py", line 1815, in load_dataset_builder [rank4]: dataset_module = dataset_module_factory( [rank4]: ^^^^^^^^^^^^^^^^^^^^^^^ [rank4]: File "/training-data/software/miniconda3/envs/mcts/lib/python3.11/site-packages/datasets/load.py", line 1512, in dataset_module_factory [rank4]: raise e1 from None [rank4]: File "/training-data/software/miniconda3/envs/mcts/lib/python3.11/site-packages/datasets/load.py", line 1468, in dataset_module_factory [rank4]: raise ConnectionError(f"Couldn't reach '{path}' on the Hub ({type(e).name})") [rank4]: ConnectionError: Couldn't reach 'synthetic_data_llama-3-8b-instruct-sppo-iter3_score' on the Hub (ConnectionError)

Great work! I commented all the push_to_hub in the code. Is synthetic_data_llama-3-8b-instruct-sppo-iter3_score dataset generated by PairRM?

[rank4]: Traceback (most recent call last): [rank4]: File "/training-data/huangxing/software/SPPO/sppo/run_dpo.py", line 249, in [rank4]: main() [rank4]: File "/training-data/huangxing/software/SPPO/sppo/run_dpo.py", line 43, in main [rank4]: main_inner(model_args, data_args, training_args) [rank4]: File "/training-data/huangxing/software/SPPO/sppo/run_dpo.py", line 78, in main_inner [rank4]: raw_datasets = get_datasets(data_args, splits=["train"]) [rank4]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank4]: File "/training-data/huangxing/software/SPPO/sppo/alignment/data.py", line 164, in get_datasets [rank4]: raw_datasets = mix_datasets(dataset_mixer, splits=splits, shuffle=shuffle) [rank4]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank4]: File "/training-data/huangxing/software/SPPO/sppo/alignment/data.py", line 189, in mix_datasets [rank4]: dataset = load_dataset(ds, split=split) [rank4]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank4]: File "/training-data/software/miniconda3/envs/mcts/lib/python3.11/site-packages/datasets/load.py", line 2129, in load_dataset [rank4]: builder_instance = load_dataset_builder( [rank4]: ^^^^^^^^^^^^^^^^^^^^^ [rank4]: File "/training-data/software/miniconda3/envs/mcts/lib/python3.11/site-packages/datasets/load.py", line 1815, in load_dataset_builder [rank4]: dataset_module = dataset_module_factory( [rank4]: ^^^^^^^^^^^^^^^^^^^^^^^ [rank4]: File "/training-data/software/miniconda3/envs/mcts/lib/python3.11/site-packages/datasets/load.py", line 1512, in dataset_module_factory [rank4]: raise e1 from None [rank4]: File "/training-data/software/miniconda3/envs/mcts/lib/python3.11/site-packages/datasets/load.py", line 1468, in dataset_module_factory [rank4]: raise ConnectionError(f"Couldn't reach '{path}' on the Hub ({type(e).name})") [rank4]: ConnectionError: Couldn't reach 'synthetic_data_llama-3-8b-instruct-sppo-iter3_score' on the Hub (ConnectionError)

Hi,

This file should appear in your local folder (under where you started the script) if the generation pipeline has run successfully. Please check for any errors in the generation process.

@angelahzyuan
Copy link
Collaborator

Yes. It is generated by vllm and PairRM and is automatically included in our pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants