Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about future exogenous variables #1264

Closed
StorywithLove opened this issue Feb 5, 2025 · 5 comments
Closed

Question about future exogenous variables #1264

StorywithLove opened this issue Feb 5, 2025 · 5 comments
Assignees
Labels

Comments

@StorywithLove
Copy link

Description

Code: Original RNN code as Use Case

Requirement: when I predict pollutants for the next 24 hours, I have historical pollutant data from the previous year, weather forecast data for the next 24 hours, and historical meteorological data from 10 days ago (but missing historical meteorological data from the middle 10 days).

At this point I can't library the code to make a prediction, I try to remove the historical prediction data during modelling and the error is reported as follows:

Exception                                 Traceback (most recent call last)
Cell In[4], line 34
     13 fcst = NeuralForecast(
     14     models=[RNN(h=12,
     15                 input_size=-1,
   (...)
     30     freq='M'
     31 )
     33 Y_train_df = Y_train_df.drop(['y_[lag12]'], axis = 1)
---> 34 fcst.fit(df=Y_train_df, static_df=AirPassengersStatic, val_size=12)
     35 forecasts = fcst.predict(futr_df=Y_test_df)
     37 Y_hat_df = forecasts.reset_index(drop=False).drop(columns=['unique_id','ds'])

File g:\miniconda3\envs\forecast\lib\site-packages\neuralforecast\core.py:576, in NeuralForecast.fit(self, df, static_df, val_size, sort_df, use_init_models, verbose, id_col, time_col, target_col, distributed_config, prediction_intervals)
    573     self._reset_models()
    575 for i, model in enumerate(self.models):
--> 576     self.models[i] = model.fit(
    577         self.dataset, val_size=val_size, distributed_config=distributed_config
    578     )
    580 self._fitted = True

File g:\miniconda3\envs\forecast\lib\site-packages\neuralforecast\common\_base_recurrent.py:537, in BaseRecurrent.fit(self, dataset, val_size, test_size, random_seed, distributed_config)
    508 def fit(
    509     self,
...
    209     raise Exception(
    210         f"{missing_stat} static exogenous variables not found in input dataset"
    211     )

Exception: {'y_[lag12]'} future exogenous variables not found in input dataset

With meteorological data missing in the middle (first 10 days), is there a way to still make predictions using exogenous variables

Use case

import pandas as pd
import matplotlib.pyplot as plt

from neuralforecast import NeuralForecast
from neuralforecast.models import RNN
from neuralforecast.losses.pytorch import MQLoss, DistributionLoss
from neuralforecast.utils import AirPassengersPanel, AirPassengersStatic

Y_train_df = AirPassengersPanel[AirPassengersPanel.ds<AirPassengersPanel['ds'].values[-12]] # 132 train
Y_test_df = AirPassengersPanel[AirPassengersPanel.ds>=AirPassengersPanel['ds'].values[-12]].reset_index(drop=True) # 12 test

fcst = NeuralForecast(
    models=[RNN(h=12,
                input_size=-1,
                inference_input_size=24,
                loss=MQLoss(level=[80, 90]),
                scaler_type='robust',
                encoder_n_layers=2,
                encoder_hidden_size=128,
                context_size=10,
                decoder_hidden_size=128,
                decoder_layers=2,
                max_steps=300,
                futr_exog_list=['y_[lag12]'],
                #hist_exog_list=['y_[lag12]'],
                stat_exog_list=['airline1'],
                )
    ],
    freq='M'
)

# Y_train_df = Y_train_df.drop(['y_[lag12]'], axis = 1)
fcst.fit(df=Y_train_df, static_df=AirPassengersStatic, val_size=12)
forecasts = fcst.predict(futr_df=Y_test_df)

Y_hat_df = forecasts.reset_index(drop=False).drop(columns=['unique_id','ds'])
plot_df = pd.concat([Y_test_df, Y_hat_df], axis=1)
plot_df = pd.concat([Y_train_df, plot_df])

plot_df = plot_df[plot_df.unique_id=='Airline1'].drop('unique_id', axis=1)
plt.plot(plot_df['ds'], plot_df['y'], c='black', label='True')
plt.plot(plot_df['ds'], plot_df['RNN-median'], c='blue', label='median')
plt.fill_between(x=plot_df['ds'][-12:], 
                 y1=plot_df['RNN-lo-90'][-12:].values, 
                 y2=plot_df['RNN-hi-90'][-12:].values,
                 alpha=0.4, label='level 90')
plt.legend()
plt.grid()
plt.plot()
@StorywithLove
Copy link
Author

By the way.
When I try ask a question(Issue ), I can't log in using Gmail normally.When trying other ways, the system prompts that I need the help of the community administrator. What should I do.

@marcopeix marcopeix changed the title [<Library component: Models|Core|etc...>] about future exogenous variables Question about future exogenous variables Feb 5, 2025
@marcopeix
Copy link
Contributor

Hello!

I am confused by what you report and the code that you show.

The code you share is the sample usage of RNN from the documentation, which works fine. There are no errors, no matter if we include hist_exog or not.

Now, it seems that you have missing values for your exogenous features. This is not supported in neuralforecast. There cannot be any missing values in your target series or in your features.

Let me know if this helps!

@StorywithLove
Copy link
Author

Hello!

I am confused by what you report and the code that you show.

The code you share is the sample usage of RNN from the documentation, which works fine. There are no errors, no matter if we include hist_exog or not.

Now, it seems that you have missing values for your exogenous features. This is not supported in neuralforecast. There cannot be any missing values in your target series or in your features.

Let me know if this helps!

Thanks for your reply, there is no problem with the original code.
Background: in practice, there is difficulty in keeping features continuous for long periods of time

If there can't be missing values in the features, my idea is to use, for a certain sample/prediction point, the historical values of its historical exogenous variables and combine them with the future values of the predicted exogenous variables (excluding exogenous variables prior to the prediction point) for training and prediction, can this be done by setting parameters in the library code?

Finally, thanks for the great work!

@marcopeix
Copy link
Contributor

No, that's not possible. Historical features must have values for all time steps, and future exogenous variables must have values for all past and future time steps, including the horizon. You can read more about exogenous features here: https://nixtlaverse.nixtla.io/neuralforecast/docs/capabilities/exogenous_variables.html#3-training-with-exogenous-variables.

@StorywithLove
Copy link
Author

StorywithLove commented Feb 7, 2025

No, that's not possible. Historical features must have values for all time steps, and future exogenous variables must have values for all past and future time steps, including the horizon. You can read more about exogenous features here: https://nixtlaverse.nixtla.io/neuralforecast/docs/capabilities/exogenous_variables.html#3-training-with-exogenous-variables.

Thanks again for your reply.

I am not quite sure about the details of the model implementation. By analysing the code, I found out that the RNN is implemented to generate 24-hour forecasts for each time step, so for this model, future exogenous variables need to be configured for each time step. I think I know what to do next: fill in the missing future variables.

Thanks for your help and have a nice day.

@marcopeix marcopeix self-assigned this Feb 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants