You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Im new with petastorm and Im facing some issues.
I need to iterate over a dataset getting three equals batches to transform 2 of them to extract some info.
The dataset consist on users ratings movies (like the Movie-Lens dataset). I need to get three batches with the same ratings(rows) to extract each user(in ratings the user could appear repeated) and extract each movie rated. I write this code.
The question is: Why in some groups the batched are differen, for example in the Epoch1, Group of Batches2 ?.
The expected behavior is that all batches be always the same like in Epoch 1, Group of Batches3,4 and 5.
The text was updated successfully, but these errors were encountered:
It's likely due to a race before workers. We currently don't have a reordering buffer after the reader (that launches multiple threads or processes to parallelize the work). To test this hypothesis, please pass workers_count=1 to the make_tf_dataset call.
Obviously make_tf_dataset=1 will cause a degradation in the reading speed.
I am not sure I understand though, why do you use three different dataset instances to read different columns? i.e. what prevents you from doing something like this?
withconv_train.make_tf_dataset(batch_size=batch_size, num_epochs=epochs, seed=1) astrain:
epoch_eval=Truefori, b, inenumerate(train):
# b should have all three fields: uid_dec, mid_dec and eval...
I´m not trying to read different columns. What I want is to make different transformations over each batch, but I need to make these transformations over the same data, thats why im trying to get three identical batches, in fact, that´s why Im calling make_tf_dataset three times over the same dataset, in each make_tf_dataset call I need to pass a different TransformSpec object.
Could I achieve the desired result in another way?
Thanks for your attention
Im new with petastorm and Im facing some issues.
I need to iterate over a dataset getting three equals batches to transform 2 of them to extract some info.
The dataset consist on users ratings movies (like the Movie-Lens dataset). I need to get three batches with the same ratings(rows) to extract each user(in ratings the user could appear repeated) and extract each movie rated. I write this code.
Creating fake dataset and spark converter:
Get three batches from the same converter(hoping they are the same):
This is the output:
The question is: Why in some groups the batched are differen, for example in the Epoch1, Group of Batches2 ?.
The expected behavior is that all batches be always the same like in Epoch 1, Group of Batches3,4 and 5.
The text was updated successfully, but these errors were encountered: