Skip to content

This repository contains datasets used in the skforecast library. It also contains datasets used in related tutorials.

License

Notifications You must be signed in to change notification settings

skforecast/skforecast-datasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 

Repository files navigation

skforecast-datasets

This repository contains datasets used in the skforecast library. It also contains datasets used in related tutorials.

All datasets included have a sort description as well as the original source. They can be downloaded directly from the repository or by using the fetch_dataset() function from the skforecast library.

from skforecast.datasets import fetch_dataset()
data = fetch_dataset(name="h20")

Datasets

h2o

h2o_exog

fuel_consumption

items_sales

air_quality_valencia

air_quality_valencia_no_missing

website_visits

bike_sharing

  • url: https://raw.githubusercontent.com/skforecast/skforecast-datasets/main/data/bike_sharing_dataset_clean.csv
  • sep: ','
  • index_col: date_time
  • date_format: %Y-%m-%d %H:%M:%S
  • freq: H
  • file_type: csv
  • description: Hourly usage of the bike share system in the city of Washington, D.C. during the years 2011 and 2012. In addition to the number of users per hour, information about weather conditions and holidays is available. The following modifications have been applied to the original data: Renamed columns with more descriptive names, renamed categories of the weather variables, the category of 'heavy_rain' has been combined with that of 'rain', denormalized temperature, humidity and wind variables, 'date_time' variable created and set as index, imputed missing values by forward filling.
  • source: Fanaee-T,Hadi. (2013). Bike Sharing Dataset. UCI Machine Learning Repository. https://doi.org/10.24432/C5W894.

bike_sharing_extended_features

  • url: https://raw.githubusercontent.com/skforecast/skforecast-datasets/main/data/bike_sharing_extended_features.csv
  • sep: ','
  • index_col: date_time
  • date_format: %Y-%m-%d %H:%M:%S
  • freq: H
  • file_type: csv
  • description: Hourly usage of the bike share system in the city of Washington, D.C. during the years 2011 and 2012. In addition to the number of users per hour, information about weather conditions and holidays is available. The following modifications have been applied to the original data: Renamed columns with more descriptive names, renamed categories of the weather variables, the category of 'heavy_rain' has been combined with that of 'rain', denormalized temperature, humidity and wind variables, 'date_time' variable created and set as index, imputed missing values by forward filling. Additionally, the dataset was enriched by introducing supplementary features. Additions include calendar-based variables (day of the week, hour of the day, month, etc.), indicators for sunlight, rolling temperature averages, and polynomial features generated from variable pairs. All cyclic variables are encoded using sine and cosine transformations.
  • source: Fanaee-T,Hadi. (2013). Bike Sharing Dataset. UCI Machine Learning Repository. https://doi.org/10.24432/C5W894.

australia_tourism

  • url: https://raw.githubusercontent.com/skforecast/skforecast-datasets/main/data/australia_tourism.csv
  • sep: ','
  • index_col: date_time
  • date_format: %Y-%m-%d
  • freq: Q
  • file_type: csv
  • description: Quarterly overnight trips (in thousands) from 1998 Q1 to 2016 Q4 across Australia. The tourism regions are formed through the aggregation of Statistical Local Areas (SLAs) which are defined by the various State and Territory tourism authorities according to their research and marketing needs.
  • source: Wang, E, D Cook, and RJ Hyndman (2020). A new tidy data structure to support exploration and modeling of temporal data, Journal of Computational and Graphical Statistics, 29:3, 466-478, doi:10.1080/10618600.2019.1695624.

uk_daily_flights

wikipedia_visits

vic_electricity

store_sales

bicimad

m4_daily

m4_hourly

ashrae_daily

bdg2_daily

bdg2_hourly

m5

ett_m1

  • url: https://raw.githubusercontent.com/skforecast/skforecast-datasets/refs/heads/main/data/ETTm1.csv
  • sep: ','
  • index_col: date
  • date_format: %Y-%m-%d %H:%M:%S
  • freq: 15min
  • file_type: csv
  • description: Data from an electricity transformer station was collected between July 2016 and July 2018 (2 years × 365 days × 24 hours × 4 intervals per hour = 70,080 data points). Each data point consists of 8 features, including the date of the point, the predictive value "Oil Temperature (OT)", and 6 different types of external power load features: High UseFul Load (HUFL), High UseLess Load (HULL), Middle UseFul Load (MUFL), Middle UseLess Load (MULL), Low UseFul Load (LUFL), Low UseLess Load(LULL).
  • source: Zhou, Haoyi & Zhang, Shanghang & Peng, Jieqi & Zhang, Shuai & Li, Jianxin & Xiong, Hui & Zhang, Wancai. (2020). Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. 10.48550/arXiv.2012.07436. https://github.com/zhouhaoyi/ETDataset

ett_m2

  • url: https://raw.githubusercontent.com/skforecast/skforecast-datasets/main/data/ETTm2.csv
  • sep: ','
  • index_col: date
  • date_format: %Y-%m-%d %H:%M:%S
  • freq: 15min
  • file_type: csv
  • description: Data from an electricity transformer station was collected between July 2016 and July 2018 (2 years × 365 days × 24 hours × 4 intervals per hour = 70,080 data points). Each data point consists of 8 features, including the date of the point, the predictive value "Oil Temperature (OT)", and 6 different types of external power load features: High UseFul Load (HUFL), High UseLess Load (HULL), Middle UseFul Load (MUFL), Middle UseLess Load (MULL), Low UseFul Load (LUFL), Low UseLess Load(LULL).
  • source: Zhou, Haoyi & Zhang, Shanghang & Peng, Jieqi & Zhang, Shuai & Li, Jianxin & Xiong, Hui & Zhang, Wancai. (2020). Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. 10.48550/arXiv.2012.07436. https://github.com/zhouhaoyi/ETDataset

expenditures_australia

public_transport_madrid

About

This repository contains datasets used in the skforecast library. It also contains datasets used in related tutorials.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published