You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
2023-11-29 11:13:15 | INFO | data_juicer.core.executor:107 - Processing data...
2023-11-29 11:13:15 | ERROR | data_juicer.core.executor:165 - An error occurred during Op [language_id_score_filter].
Traceback (most recent call last):
File "/home/wzp/code/LLMData/open_source/data-juicer/data_juicer/core/executor.py", line 131, in run
dataset = dataset.add_column(name=Fields.stats,
File "/home/wzp/code/LLMData/open_source/data-juicer/data_juicer/core/data.py", line 255, in add_column
return NestedDataset(super().add_column(*args, **kargs))
File "/home/wzp/anaconda3/envs/python3.8/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 528, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/home/wzp/anaconda3/envs/python3.8/lib/python3.8/site-packages/datasets/fingerprint.py", line 511, in wrapper
out = func(dataset, *args, **kwargs)
File "/home/wzp/anaconda3/envs/python3.8/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 5446, in add_column
dataset = self.flatten_indices() if self._indices is not None else self
File "/home/wzp/anaconda3/envs/python3.8/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 528, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/home/wzp/anaconda3/envs/python3.8/lib/python3.8/site-packages/datasets/fingerprint.py", line 511, in wrapper
out = func(dataset, *args, **kwargs)
File "/home/wzp/anaconda3/envs/python3.8/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 3592, in flatten_indices
return self.map(
File "/home/wzp/code/LLMData/open_source/data-juicer/data_juicer/core/data.py", line 180, in map
new_ds = NestedDataset(super().map(*args, **kargs))
File "/home/wzp/anaconda3/envs/python3.8/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 563, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/home/wzp/anaconda3/envs/python3.8/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 528, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/home/wzp/anaconda3/envs/python3.8/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 3004, in map
for rank, done, content in Dataset._map_single(**dataset_kwargs):
File "/home/wzp/anaconda3/envs/python3.8/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 3392, in _map_single
buf_writer, writer, tmp_file = init_buffer_and_writer()
File "/home/wzp/anaconda3/envs/python3.8/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 3326, in init_buffer_and_writer
tmp_file = tempfile.NamedTemporaryFile("wb", dir=os.path.dirname(cache_file_name), delete=False)
File "/home/wzp/anaconda3/envs/python3.8/lib/python3.8/tempfile.py", line 541, in NamedTemporaryFile
(fd, name) = _mkstemp_inner(dir, prefix, suffix, flags, output_type)
File "/home/wzp/anaconda3/envs/python3.8/lib/python3.8/tempfile.py", line 250, in _mkstemp_inner
fd = _os.open(file, flags, 0o600)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmp1pvg2dc5/tmpbf_w1ds6'
Desktop (please complete the following information):
OS: ubuntu18.04
Browser [e.g. chrome, safari]
Version: latest
The text was updated successfully, but these errors were encountered:
Describe the bug
modelscope/data-juicer#104
To Reproduce
Screenshots
Desktop (please complete the following information):
The text was updated successfully, but these errors were encountered: