add data rank split #167

samsja · 2024-12-03T21:50:18Z

What this pr does:

allow to split locally by data rank for handling local experiments after the new datasets PR
add a easy to use install script

this graph show the old and the new behavior where the data is not duplicated locally anymore

Jackmin801

lgtm. should we make the code download the data instead of the script in a future PR? I think the HF dataset way of using it is nice where you can just specify dataset repo / path

samsja · 2024-12-04T22:54:47Z

lgtm. should we make the code download the data instead of the script in a future PR? I think the HF dataset way of using it is nice where you can just specify dataset repo / path

yeah I am planning on refactoring the dataset part to do streaming as well from hf repo. So I might as well to the downloading option at the same time

samsja force-pushed the add-back-data-rank-split branch 6 times, most recently from fec3ba0 to e2a22dc Compare December 3, 2024 23:28

samsja requested review from Jackmin801 and JohannesHa December 3, 2024 23:34

samsja force-pushed the add-back-data-rank-split branch from 3144e13 to 894b281 Compare December 3, 2024 23:47

samsja added 3 commits December 4, 2024 01:22

add data rank split

c8b2664

add install cript

6e5c511

update readme

42b2ab0

samsja force-pushed the add-back-data-rank-split branch from 894b281 to 42b2ab0 Compare December 4, 2024 01:22

Jackmin801 approved these changes Dec 4, 2024

View reviewed changes

samsja merged commit 63efaf0 into main Dec 4, 2024
2 checks passed

samsja deleted the add-back-data-rank-split branch December 4, 2024 22:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add data rank split #167

add data rank split #167

samsja commented Dec 3, 2024 •

edited

Loading

Jackmin801 left a comment

samsja commented Dec 4, 2024

add data rank split #167

add data rank split #167

Conversation

samsja commented Dec 3, 2024 • edited Loading

What this pr does:

Jackmin801 left a comment

Choose a reason for hiding this comment

samsja commented Dec 4, 2024

samsja commented Dec 3, 2024 •

edited

Loading