Skip to content

Latest commit

 

History

History
241 lines (194 loc) · 7.48 KB

README.md

File metadata and controls

241 lines (194 loc) · 7.48 KB

Travis build status Coverage status lifecycle

efor

The goal of EasyFORcasting or short efor is to make it easier if you have to forecast multiple timeseries. The package supports you in creating forecasts with different methods implemented in the forecast, forecastHyrid, smooth and prophet package. Furthermore it provides some functions for evaluating these forecasts.

Following you see a possible workflow:

Installation

You can install the released version of efor from github with:

#devtools::install_github("flostracke/efor")

Example

Setup

First we load some packages for this example. The efor package contains some fictional timeseries data for demonstrating the apporaches:

library(tidyverse)
library(tsibble) # for creating nicer representation of monthly data
library(efor)
library(furrr) # for running the forecasting in parallel
library(forecast) #provides forecast mehotds
library(prophet) # provides forecast mehod


sales_data <- sales_monthly %>% 
  mutate(date = yearmonth(date))

sales_data
#> # A tibble: 216 x 3
#>        date     y iterate  
#>       <mth> <dbl> <chr>    
#>  1 2012 Jan  1179 Article_A
#>  2 2012 Feb   516 Article_A
#>  3 2012 Mrz   381 Article_A
#>  4 2012 Apr   171 Article_A
#>  5 2012 Mai   264 Article_A
#>  6 2012 Jun   135 Article_A
#>  7 2012 Jul   225 Article_A
#>  8 2012 Aug    66 Article_A
#>  9 2012 Sep   123 Article_A
#> 10 2012 Okt    54 Article_A
#> # ... with 206 more rows

We have some sales data for four articles. We want to create forecasts for all these articles. The efor package makes this quite easy, because it provides functionality to create forecasts for multiple articles with just one function call

The idea is that all the data has to be organised in one dataframe with the following columns:

  • date: A date column
  • iterate: the grouping variable. In this example it is the articlenumber
  • y: the value you want to forecast

The dataframe sales_data is already meeting these requirements. You can verify the correct structure with the following function call, otherweise there would be an error:

check_input_data(sales_data)

Before we start forecasts let’s quickly create a plot of the 4 different articles we want to forecast:

ggplot(sales_data, aes(x = date, y = y)) +
  geom_line() +
  geom_point() +
  facet_wrap(~iterate) +
  ggtitle("The original series") +
  theme_minimal() 

We split the dataset in a train and test set. All observations from the year 2016 go into the test set. We want to create forecasts for the next 4 months of the testset and evaluate the performance of different methods.

train_data <- sales_data %>% 
  filter(date < "2016-01-01")

test_data <- sales_data %>% 
  filter(date >= "2016-01-01")

Now we can apply the the auto.arima function to the dataset and create the forecasts. All the methods from the forecast package can be run in parallel.

Model Fitting

forecasts_ar <- tf_grouped_forecasts(
  train_data,        # used training dataset
  n_pred = 6,        # number of predictions
  func = auto.arima, # used forecasting method
  parallel = TRUE    # for runiing in parallel
)

forecasts_ar
#> # A tibble: 24 x 6
#>        date iterate   key            y y_lo_95 y_hi_95
#>       <mth> <chr>     <chr>      <dbl>   <dbl>   <dbl>
#>  1 2016 Jan Article_A auto.arima 1266.  1051.    1481.
#>  2 2016 Feb Article_A auto.arima  425.   210.     640.
#>  3 2016 Mrz Article_A auto.arima  476.   260.     691.
#>  4 2016 Apr Article_A auto.arima  367.   152.     582.
#>  5 2016 Mai Article_A auto.arima  319.   103.     534.
#>  6 2016 Jun Article_A auto.arima  255.    39.8    470.
#>  7 2016 Jan Article_B auto.arima  723    500.     946.
#>  8 2016 Feb Article_B auto.arima  501    278.     724.
#>  9 2016 Mrz Article_B auto.arima  642    419.     865.
#> 10 2016 Apr Article_B auto.arima  306     83.0    529.
#> # ... with 14 more rows

With the same syntax you can create forecasts utilizing the prophet package: Please note that there we disable the parallel function for prophet, because there is some bug right now.

forecasts_prophet <- tf_grouped_forecasts(
  train_data,      # used training dataset
  n_pred = 6,      # number of predictions
  func = prophet,  # used forecasting method
  parallel = FALSE #disabling parallel for prohet
)

forecasts_prophet
#> # A tibble: 24 x 6
#>        date iterate   key         y y_lo_95 y_hi_95
#>       <mth> <chr>     <chr>   <dbl>   <dbl>   <dbl>
#>  1 2016 Jan Article_A prophet 1224.   1072.   1365.
#>  2 2016 Feb Article_A prophet  589.    453.    744.
#>  3 2016 Mrz Article_A prophet  545.    409.    679.
#>  4 2016 Apr Article_A prophet  334.    196.    470.
#>  5 2016 Mai Article_A prophet  384.    240.    518.
#>  6 2016 Jun Article_A prophet  258.    114.    399.
#>  7 2016 Jan Article_B prophet  835.    691.    972.
#>  8 2016 Feb Article_B prophet  635.    494.    771.
#>  9 2016 Mrz Article_B prophet  759.    618.    899.
#> 10 2016 Apr Article_B prophet  355.    216.    496.
#> # ... with 14 more rows

Evaluation

In order to create some plots and evaluate the performance we combine the forecasts into one dataset.

forecasts <- bind_rows(forecasts_ar, forecasts_prophet) %>% 
  mutate(date = yearmonth(date)) #reformat the date because of a bug in bind_rows

The package brings also a function which makes it quite easy to access the performance (right now thhe mae, rmse and rsquared is calculated) of all the forecasting methods in the passed prediction dataframe:

tf_calc_metrics(forecasts, test_data) %>% 
  spread(metric, value)
#> # A tibble: 2 x 6
#>   key          mae  mape  mase  rmse   rsq
#>   <chr>      <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 auto.arima  129.  25.4 0.469  192. 0.865
#> 2 prophet     135.  31.4 0.491  188. 0.869

Also it is possible to access the performance of each article:

tf_calc_metrics(forecasts, test_data, detailed = TRUE)
#> # A tibble: 40 x 4
#>    key        metric value iterate  
#>    <chr>      <chr>  <dbl> <chr>    
#>  1 auto.arima mae     60   Article_B
#>  2 prophet    mae     83.8 Article_C
#>  3 prophet    mae     84.5 Article_B
#>  4 auto.arima mae    115.  Article_C
#>  5 auto.arima mae    154.  Article_A
#>  6 prophet    mae    186.  Article_D
#>  7 prophet    mae    187.  Article_A
#>  8 auto.arima mae    189.  Article_D
#>  9 auto.arima rmse    73.8 Article_B
#> 10 prophet    rmse    94.0 Article_C
#> # ... with 30 more rows

Finally we create a quick graph visualising the results of the forecasts.

train_data_plot <- train_data %>% 
  mutate(key = "train")

test_data_plot <- test_data %>% 
  mutate(key = "test")

bind_rows(train_data_plot, test_data_plot) %>% 
  bind_rows(forecasts) %>% 
  filter(key %in% c("auto.arima", "train", "test")) %>% 
  mutate(date = yearmonth(date)) %>% 
  ggplot(aes(x = date, y = y, color = key)) +
  geom_point() +
  geom_line() +
  facet_wrap(~iterate) +
  ggtitle("Forecasted values for each article") +
  ylab("Sales amount") +
  theme_minimal()