NHITS does not give consistent results on GPU #1217

Mickailkhadhar · 2024-11-26T13:55:14Z

What happened + What you expected to happen [EDIT : does not work either with 2.0.1]

Hello, I'm tuning NHITS model when I discovered I had inconsistent results with the same experiments.
I replicated the bug using utilsforecast to generate series.
What I do in my code is generating series once. Then I create the exact same model twice.
Predictions are always different using a GPU on Databricks.
It seems that the bug never occurs when using CPU in Databricks too. I also tested it with neuralforecast 1.7.3, 1.7.5 and 2.0.1 version.

Predictions seem inconsistent regardless of the hyperparameters values and for equal random_seed.
PS : I am also using NBEATSx and TFT and have no problem with both!

Thanks in advance, Appreciate your help!

Versions / Dependencies

Within Databricks environment.
Using neuralforecast 1.7.3, 1.75, and 2.0.1
Bug reproducible on GPU but not on CPU

Reproduction script [EDIT: simplified the code but still does not work on GPU]

import pandas as pd
from neuralforecast import NeuralForecast
from neuralforecast.models import NHITS
from utilsforecast.data import generate_series

n_series = 100
freq = "W"

df = generate_series(
    n_series=n_series,
    freq=freq,
    min_length=156,
    max_length=400,
    equal_ends=True,
)

nf_1 = NeuralForecast(
    models=[
        NHITS(
            h=52,
            input_size=104,
            max_steps=100,
            random_seed=42,
        )
    ],
    freq=freq,
)
nf_2 = NeuralForecast(
    models=[
        NHITS(
            h=52,
            input_size=104,
            max_steps=100,
            random_seed=42,
        )
    ],
    freq=freq,
)

nf_1.fit(df=df)
nf_2.fit(df=df)

pred_1 = nf_1.predict(df=df)
pred_2 = nf_2.predict(df=df)

pd.testing.assert_frame_equal(pred_1, pred_2)

Issue Severity

High: It blocks me from completing my task.

The text was updated successfully, but these errors were encountered:

elephaint · 2025-02-21T16:59:43Z

It's a known issue when using CUDA that completely deterministic results are not possible.

See here: https://pytorch.org/docs/stable/notes/randomness.html

You could try prepending the following (so start with this code):

import torch
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.enabled = False

However, that still results in a small difference, albeit slightly smaller. It negatively impacts speed, though, so I'd definitely not recommend using this unless it's an academic exercise where reproducibility is more important.

Furthermore, you can enforce deterministic behaviour even more by doing the following:

torch.use_deterministic_algorithms(True)

NHITS however uses a function that (apparently) doesn't have a deterministic CUDA implementation (on my machine & PyTorch version), so using the latter will error. Ultimately I believe this is the underlying issue that causes these discrepancies to occur.

I'd personally not worry about this too much, getting completely deterministic results in general is more or less impossible in any computer environment that uses floating numbers with a limited precision and a high degree of parallelization. For example, if you'd run your algorithm on a different machine tomorrow, you'd get different results. It's near impossible to prevent that. The delta should be small up to a tolerance, but exact matching is basically impossible.

So, I'd look at the actual differences that you measure. Are they within a small tolerance? Then I'd not worry about it too much.

elephaint · 2025-02-21T21:48:10Z

The NHITS function that gives the non-deterministic behavior is F.interpolate when used with linear interpolation. Hence, the solution (when you really desire deterministic behavior) is to use a deterministic interpolation method, e.g.:

from neuralforecast import NeuralForecast
from neuralforecast.models import NHITS
from utilsforecast.data import generate_series
import pandas as pd


n_series = 100
freq = "W"

df = generate_series(
    n_series=n_series,
    freq=freq,
    min_length=156,
    max_length=400,
    equal_ends=True,
)

nf_1 = NeuralForecast(
    models=[
        NHITS(
            interpolation_mode="nearest",
            h=52,
            input_size=104,
            max_steps=100,
            random_seed=42,
        )
    ],
    freq=freq,
)
nf_2 = NeuralForecast(
    models=[
        NHITS(
            interpolation_mode="nearest",
            h=52,
            input_size=104,
            max_steps=100,
            random_seed=42,
        )
    ],
    freq=freq,
)

nf_1.fit(df=df)
nf_2.fit(df=df)

pred_1 = nf_1.predict(df=df)
pred_2 = nf_2.predict(df=df)

pd.testing.assert_frame_equal(pred_1, pred_2)

which should give no errors.

Mickailkhadhar · 2025-02-24T08:51:23Z

Thanks for you comments! It works just fine ! I'll close the issue.
Also thanks for you information about torch and cuda deterministic behaviours, i will look into it

Mickailkhadhar added the bug label Nov 26, 2024

Mickailkhadhar closed this as completed Feb 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NHITS does not give consistent results on GPU #1217

NHITS does not give consistent results on GPU #1217

Mickailkhadhar commented Nov 26, 2024 •

edited

Loading

elephaint commented Feb 21, 2025 •

edited

Loading

elephaint commented Feb 21, 2025 •

edited

Loading

Mickailkhadhar commented Feb 24, 2025

NHITS does not give consistent results on GPU #1217

NHITS does not give consistent results on GPU #1217

Comments

Mickailkhadhar commented Nov 26, 2024 • edited Loading

What happened + What you expected to happen [EDIT : does not work either with 2.0.1]

Versions / Dependencies

Reproduction script [EDIT: simplified the code but still does not work on GPU]

Issue Severity

elephaint commented Feb 21, 2025 • edited Loading

elephaint commented Feb 21, 2025 • edited Loading

Mickailkhadhar commented Feb 24, 2025

Mickailkhadhar commented Nov 26, 2024 •

edited

Loading

elephaint commented Feb 21, 2025 •

edited

Loading

elephaint commented Feb 21, 2025 •

edited

Loading