[🐛BUG] ValueError: I/O operation on closed file. #2058

zw81929 · 2024-06-12T03:56:58Z

描述这个 bug

feat[field].fillna(value=feat[field].mean(), inplace=True)
12 Jun 10:59 INFO Saving filtered dataset into [saved/bert4recbole-SequentialDataset.pth]
12 Jun 10:59 INFO bert4recbole
The number of users: 93328
Average actions of users: 2185.097806636879
The number of items: 93329
Average actions of items: 2185.1212202387333
The number of inters: 203928623
The sparsity of the dataset: 97.65874016271815%
Remain Fields: ['user_id', 'item_id', 'timestamp', 'area_id']
12 Jun 11:45 INFO Saving split dataloaders into: [saved/bert4recbole-for-BERT4Rec-dataloader.pth]
Traceback (most recent call last):
File "/data1/bert4rec/bert4rec-main/venv/lib/python3.10/site-packages/torch/serialization.py", line 632, in save
_legacy_save(obj, opened_file, pickle_module, pickle_protocol)
File "/data1/bert4rec/bert4rec-main/venv/lib/python3.10/site-packages/torch/serialization.py", line 776, in _legacy_save
storage._write_file(f, _should_read_directly(f), True, torch._utils._element_size(dtype))
MemoryError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/data1/bert4rec/bert4rec-main/scripts/bole/run.py", line 7, in
run_recbole(model='BERT4Rec', dataset=r'bert4recbole',
File "/data1/bert4rec/bert4rec-main/venv/lib/python3.10/site-packages/recbole/quick_start/quick_start.py", line 133, in run_recbole
train_data, valid_data, test_data = data_preparation(config, dataset)
File "/data1/bert4rec/bert4rec-main/venv/lib/python3.10/site-packages/recbole/data/utils.py", line 194, in data_preparation
save_split_dataloaders(
File "/data1/bert4rec/bert4rec-main/venv/lib/python3.10/site-packages/recbole/data/utils.py", line 99, in save_split_dataloaders
pickle.dump(Serialization_dataloaders, f)
File "/data1/bert4rec/bert4rec-main/venv/lib/python3.10/site-packages/torch/storage.py", line 951, in reduce
torch.save(self, b, _use_new_zipfile_serialization=False)
File "/data1/bert4rec/bert4rec-main/venv/lib/python3.10/site-packages/torch/serialization.py", line 631, in save
with _open_file_like(f, 'wb') as opened_file:
File "/data1/bert4rec/bert4rec-main/venv/lib/python3.10/site-packages/torch/serialization.py", line 439, in exit
self.file_like.flush()
ValueError: I/O operation on closed file.

如何复现
复现这个 bug 的步骤：
yaml 文件
gpu_id: '0,1,2,3'
worker: 0

model config

n_layers: 2 # (int) The number of transformer layers in transformer encoder.
n_heads: 2 # (int) The number of attention heads for multi-head attention layer.
hidden_size: 64 # (int) The number of features in the hidden state.
inner_size: 256 # (int) The inner hidden size in feed-forward layer.
hidden_dropout_prob: 0.2 # (float) The probability of an element to be zeroed.
attn_dropout_prob: 0.2 # (float) The probability of an attention score to be zeroed.
hidden_act: 'gelu' # (str) The activation function in feed-forward layer.
layer_norm_eps: 1e-12 # (float) A value added to the denominator for numerical stability.
initializer_range: 0.02 # (float) The standard deviation for normal initialization.
mask_ratio: 0.2 # (float) The probability for a item replaced by MASK token.
loss_type: 'CE' # (str) The type of loss function.
transform: mask_itemseq # (str) The transform operation for batch data process.
ft_ratio: 0.5 # (float) The probability of generating fine-tuning samples

dataset config

field_separator: "," #指定数据集field的分隔符
seq_separator: " " #指定数据集中token_seq或者float_seq域里的分隔符
USER_ID_FIELD: user_id #指定用户id域
ITEM_ID_FIELD: item_id #指定物品id域
TIME_FIELD: timestamp #指定时间域
MAX_ITEM_LIST_LENGTH: 50 #指定最大序列长度
save_dataset: True #是否保存处理后的数据到本地
save_dataloaders: Ture #是否保存加载数据的方式

#指定从什么文件里读什么列，这里就是从ml-1m.inter里面读取user_id, item_id, rating, timestamp这四列,剩下的以此类推
load_col:
inter: [user_id, item_id, timestamp]
item: [item_id, area_id]

training settings

epochs: 500 #训练的最大轮数
train_batch_size: 128 #训练的batch_size
learner: adam #使用的pytorch内置优化器
learning_rate: 0.001 #学习率
training_neg_sample_num: 0 #负采样数目
eval_step: 1 #每次训练后做evalaution的次数
stopping_step: 10 #控制训练收敛的步骤数，在该步骤数内若选取的评测标准没有什么变化，就可以提前停止了

evalution settings

eval_setting: TO_LS,full #对数据按时间排序，设置留一法划分数据集，并使用全排序
metrics: ["Recall", "MRR","NDCG","Hit","Precision"] #评测标准
valid_metric: MRR@10 #选取哪个评测标准作为作为提前停止训练的标准
eval_batch_size: 8 #评测的batch_size

show_progress: True

预期
出现文件关闭的错误

实验环境（请补全下列信息）：

操作系统: ubuntu
RecBole 版本 1.2.0
Python 版本3.10.14
PyTorch 版本2.3.0
cudatoolkit CUDA Version: 12.3 Driver Version: 545.23.06

zhengbw0324 · 2024-06-24T03:59:06Z

@zw81929
不建议将save_dataset和save_dataloaders都设置为True。

zw81929 added the bug Something isn't working label Jun 12, 2024

zw81929 changed the title ~~[🐛BUG] 用一句话描述您的问题。~~ [🐛BUG] ValueError: I/O operation on closed file. Jun 12, 2024

zhengbw0324 self-assigned this Jun 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[🐛BUG] ValueError: I/O operation on closed file. #2058

[🐛BUG] ValueError: I/O operation on closed file. #2058

zw81929 commented Jun 12, 2024

zhengbw0324 commented Jun 24, 2024

[🐛BUG] ValueError: I/O operation on closed file. #2058

[🐛BUG] ValueError: I/O operation on closed file. #2058

Comments

zw81929 commented Jun 12, 2024

model config

dataset config

training settings

evalution settings

zhengbw0324 commented Jun 24, 2024