We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
在模型训练前想进行数据切分,repo里有一个函数可以实现 def load_dataset( datasets: Union[List[str], str], *, split_dataset_ratio: float = 0., seed: Union[int, np.random.RandomState, None] = None, num_proc: int = 1, streaming: bool = False, use_hf: Optional[bool] = None, hub_token: Optional[str] = None, strict: bool = False, download_mode: Literal['force_redownload', 'reuse_dataset_if_exists'] = 'reuse_dataset_if_exists', columns: Optional[Dict[str, str]] = None, remove_unused_columns: bool = True, # self-cognition model_name: Union[Tuple[str, str], List[str], None] = None, # zh, en model_author: Union[Tuple[str, str], List[str], None] = None, 由于使用自定义数据集,这里load使用的AutoPreprocessor(),无法使用自定的preprocess, 由于本地数据集是动态变化的,如何动态的注册数据集 求指导意见
The text was updated successfully, but these errors were encountered:
No branches or pull requests
在模型训练前想进行数据切分,repo里有一个函数可以实现
def load_dataset(
datasets: Union[List[str], str],
*,
split_dataset_ratio: float = 0.,
seed: Union[int, np.random.RandomState, None] = None,
num_proc: int = 1,
streaming: bool = False,
use_hf: Optional[bool] = None,
hub_token: Optional[str] = None,
strict: bool = False,
download_mode: Literal['force_redownload', 'reuse_dataset_if_exists'] = 'reuse_dataset_if_exists',
columns: Optional[Dict[str, str]] = None,
remove_unused_columns: bool = True,
# self-cognition
model_name: Union[Tuple[str, str], List[str], None] = None, # zh, en
model_author: Union[Tuple[str, str], List[str], None] = None,
由于使用自定义数据集,这里load使用的AutoPreprocessor(),无法使用自定的preprocess,
由于本地数据集是动态变化的,如何动态的注册数据集
求指导意见
The text was updated successfully, but these errors were encountered: