-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Currently Missing Datasets #40
Comments
Dataset 148 currently only links to a zip archive with no data but one empty folder called |
Thank you for informing us. This should be rectified now. I can find 4 files in the downloaded zip file. Please let us know if this isn't the case for you, thanks! Link to the dataset page: https://archive.ics.uci.edu/dataset/148/statlog+shuttle |
Hey, thanks so much for the fast response! Yes, all 4 files are there. I wonder if it is intentional that the training data is still compressed after unzip-ing the downloaded archive while the test data is not? One can get the original data by running Edit. Ah, just saw that the index file also lists the training data as a compressed file, disregard then :) |
Same issue with Census Income (#20)—the zip only contains a "Graphics" folder |
Hi Markelle, the abstract of the Census Income dataset says that it is the same as the Adult dataset. We can either copy the Adult files to the Census Income dataset, or remove Census Income altogether. How should we handle this? |
Since this dataset is well-known under both names, let's have the data available under both for now (i.e., go ahead and copy the Adult files)—we can discuss combining the two later. thanks! |
Dataset 341 is also missing: https://archive.ics.uci.edu/dataset/341/smartphone+based+recognition+of+human+activities+and+postural+transitions |
@maxxu05 Fixed, thanks for letting us know. |
There is missing data from Dataset 301 "Parkinson Speech Dataset with Multiple Types of Sound Recordings": It used to include a .rar file that contained the audio files (~20 mb). But not only includes a couple of text files. For example, this snapshot from 2015 shows the full dataset: |
Dataset 28 - Japanese Credit Screening at https://archive.ics.uci.edu/dataset/28/japanese+credit+screening appears to be missing the dataset, the download contains only an empty Graphics folder. |
Dataset 84 [Prodigy] currently only links to a zip archive with no data but one empty folder called Graphics. |
Dataset 157 [Dodgers Loop Sensor] currently only links to a zip archive with no data but one empty folder called Graphics with two images (the images for the dataset). just to mention that the file https://archive.ics.uci.edu/static/public/156/calit2+building+people+counts.zip contains 6 files. I think two of them belong to [Dodgers Loop Sensor] dataset, which are:
|
Dataset 75 [Musk (Version 2)] currently only links to a zip archive with no data but one empty folder called Graphics. just to mention that the file https://archive.ics.uci.edu/static/public/74/musk+version+1.zip contains 7 files. I think three of them belong to [Musk (Version 2] dataset, which are:
|
Dataset 91 [Soybean (Small)] currently only links to a zip archive with no data but one empty folder called Graphics. just to mention that the file https://archive.ics.uci.edu/static/public/90/soybean+large.zip contains 12 files. I think two of them belong to [Soybean (Small)] dataset, which are:
|
Dataset 96 [SPECTF Heart] currently only links to a zip archive with no data but one empty folder called Graphics. just to mention that the file https://archive.ics.uci.edu/static/public/95/spect+heart.zip contains 8 files. I think two of them belong to [SPECTF Heart] dataset, which are:
|
Another question please, |
When datasets are donated, they have to be approved by admins. There are currently 657 approved datasets, and 892 datasets in total including pending & rejected datasets. |
Hello, Thanks. |
Also missing: Connectionist Bench (Sonar, Mines vs. Rocks) |
We used to have the PIMA Indians dataset (many other websites, e.g., Kaggle attribute it to us), not sure what happened to it |
@markellekelly The owners of the PIMA dataset replaced the files with a note.txt that says "Thank you for your interest in the Pima Indians Diabetes dataset. The dataset is no longer available due to permission restrictions." |
i also cannot access my dataset and get "DatasetNotFoundError: Error reading data csv file for "Cirrhosis Patient Survival Prediction" dataset (id=878)." |
A list of dataset files we believe are missing. Will be updated as they're reported / found. Feel free to comment to report additional ones.
Version 1, don't know where to find version 2
Popular dataset
Only has a
Graphics
folder in its zip file.Located with the original
breast-cancer-wisconsin
dataset files prefixed withwdbc
Duplicate of Adult dataset
datasets 150-155 found under machine-learning-databases/undocumented
Folder exists directly under dataweb2/ml, but I don't have permission to access it
Found at dataweb2/ml/files
Recovered from Kaggle
Recovered from Kaggle
The text was updated successfully, but these errors were encountered: