Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No such file or directory: '/tmp/tmp1pvg2dc5/tmpbf_w1ds6' #731

Open
simplew2011 opened this issue Nov 30, 2023 · 1 comment
Open

No such file or directory: '/tmp/tmp1pvg2dc5/tmpbf_w1ds6' #731

simplew2011 opened this issue Nov 30, 2023 · 1 comment

Comments

@simplew2011
Copy link

Describe the bug

modelscope/data-juicer#104

To Reproduce

pip install -U scalene
pip install py-data-juicer
git clone https://github.com/alibaba/data-juicer
cd data-juicer
scalene tools/process_data.py --config configs/demo/process.yaml

Screenshots

2023-11-29 11:13:15 | INFO | data_juicer.core.executor:107 - Processing data...
2023-11-29 11:13:15 | ERROR | data_juicer.core.executor:165 - An error occurred during Op [language_id_score_filter].
Traceback (most recent call last):
File "/home/wzp/code/LLMData/open_source/data-juicer/data_juicer/core/executor.py", line 131, in run
dataset = dataset.add_column(name=Fields.stats,
File "/home/wzp/code/LLMData/open_source/data-juicer/data_juicer/core/data.py", line 255, in add_column
return NestedDataset(super().add_column(*args, **kargs))
File "/home/wzp/anaconda3/envs/python3.8/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 528, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/home/wzp/anaconda3/envs/python3.8/lib/python3.8/site-packages/datasets/fingerprint.py", line 511, in wrapper
out = func(dataset, *args, **kwargs)
File "/home/wzp/anaconda3/envs/python3.8/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 5446, in add_column
dataset = self.flatten_indices() if self._indices is not None else self
File "/home/wzp/anaconda3/envs/python3.8/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 528, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/home/wzp/anaconda3/envs/python3.8/lib/python3.8/site-packages/datasets/fingerprint.py", line 511, in wrapper
out = func(dataset, *args, **kwargs)
File "/home/wzp/anaconda3/envs/python3.8/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 3592, in flatten_indices
return self.map(
File "/home/wzp/code/LLMData/open_source/data-juicer/data_juicer/core/data.py", line 180, in map
new_ds = NestedDataset(super().map(*args, **kargs))
File "/home/wzp/anaconda3/envs/python3.8/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 563, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/home/wzp/anaconda3/envs/python3.8/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 528, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/home/wzp/anaconda3/envs/python3.8/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 3004, in map
for rank, done, content in Dataset._map_single(**dataset_kwargs):
File "/home/wzp/anaconda3/envs/python3.8/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 3392, in _map_single
buf_writer, writer, tmp_file = init_buffer_and_writer()
File "/home/wzp/anaconda3/envs/python3.8/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 3326, in init_buffer_and_writer
tmp_file = tempfile.NamedTemporaryFile("wb", dir=os.path.dirname(cache_file_name), delete=False)
File "/home/wzp/anaconda3/envs/python3.8/lib/python3.8/tempfile.py", line 541, in NamedTemporaryFile
(fd, name) = _mkstemp_inner(dir, prefix, suffix, flags, output_type)
File "/home/wzp/anaconda3/envs/python3.8/lib/python3.8/tempfile.py", line 250, in _mkstemp_inner
fd = _os.open(file, flags, 0o600)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmp1pvg2dc5/tmpbf_w1ds6'

image

Desktop (please complete the following information):

  • OS: ubuntu18.04
  • Browser [e.g. chrome, safari]
  • Version: latest
@sarahec
Copy link
Contributor

sarahec commented Feb 19, 2024

I'm working on a related problem -- is this still an issue for you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants