A python util to fetch datasets from different databases.
Currently supported databases are:
- LIBSVM (libsvm)
Getting design matrix and target variable is as easy as:
from libsvmdata import fetch_dataset X, y = fetch_dataset("news20.binary")
Currently supported datasets are in libsvmdata.supported
and can be displayed as:
from libsvmdata import print_supported_datasets print_supported_datasets()
There is no need to specify the database name.
Files are saved under DATA_HOME/<database_name>
, where the value of DATA_HOME
is:
- the environment variable
LIBSVMDATA_HOME
if it exists, - else, the environment variable
XDG_DATA_HOME
if it exists, - else,
$HOME/data
.