Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datasets seem to have 404 errors. #13

Open
Coriana opened this issue Jul 19, 2022 · 6 comments
Open

Datasets seem to have 404 errors. #13

Coriana opened this issue Jul 19, 2022 · 6 comments
Labels
bug Something isn't working

Comments

@Coriana
Copy link

Coriana commented Jul 19, 2022

Hi, have been getting chunks of https://openaipublic.blob.core.windows.net/minecraft-rl/snapshots/all_7xx_Apr_6.json and I have noticed each username is missing chunks of data.

Attached is what is coming up as 404 for the user hazy-thistle-chipmunk but all seem to be missing chunks (both mp4 + jsonl)

hazy-thistle-chipmunk-404.txt

@Miffyli Miffyli added the bug Something isn't working label Jul 19, 2022
@Miffyli
Copy link
Collaborator

Miffyli commented Jul 19, 2022

Hey, thanks for compiling a list of missing things! Indeed some things are missing (and some of it was expected, as not all contractor data was uploaded on the servers). Do you have a rough estimate of how much data is 404'ing (in percentages)? If it is more than expected, @brandonhoughton could take a look at it :)

@Coriana
Copy link
Author

Coriana commented Jul 21, 2022

Appologies, still trying to get a real handle on that answer, if I go by hazy-thistle-chipmunk its about 20% missing, if i go by woozy-ruby-ostrich its more like 2-5% so far.

so, I have yet to get a proper answer for that, but I think I had a tool that checks status codes, without downloading that might be able to 'ping' every file and give a 100% answer across all datasets. Just... trying to find that program.

@brandonhoughton
Copy link
Collaborator

Hi! This was an issue with my indexing code. I checked if both files exist and then ignore that and include all files =P
I will try and get new indexes out that only include complete data but for now know that you should have all of the data that we have!

@Coriana
Copy link
Author

Coriana commented Aug 9, 2022

Still working on this, however: can someone verify there is a good copy of https://openaipublic.blob.core.windows.net/minecraft-rl/data/10.0/thirsty-lavender-koala-f153ac423f61-20220414-104227.jsonl as I keep getting an invalid output that ends mid line 442.

@Miffyli
Copy link
Collaborator

Miffyli commented Aug 9, 2022

@brandonhoughton wonder if something happened in the upload process where these files broke half-way upload?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants