-
Notifications
You must be signed in to change notification settings - Fork 389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NHITS does not give consistent results on GPU #1217
Comments
It's a known issue when using CUDA that completely deterministic results are not possible. See here: https://pytorch.org/docs/stable/notes/randomness.html You could try prepending the following (so start with this code):
However, that still results in a small difference, albeit slightly smaller. It negatively impacts speed, though, so I'd definitely not recommend using this unless it's an academic exercise where reproducibility is more important. Furthermore, you can enforce deterministic behaviour even more by doing the following:
NHITS however uses a function that (apparently) doesn't have a deterministic CUDA implementation (on my machine & PyTorch version), so using the latter will error. Ultimately I believe this is the underlying issue that causes these discrepancies to occur. I'd personally not worry about this too much, getting completely deterministic results in general is more or less impossible in any computer environment that uses floating numbers with a limited precision and a high degree of parallelization. For example, if you'd run your algorithm on a different machine tomorrow, you'd get different results. It's near impossible to prevent that. The delta should be small up to a tolerance, but exact matching is basically impossible. So, I'd look at the actual differences that you measure. Are they within a small tolerance? Then I'd not worry about it too much. |
The NHITS function that gives the non-deterministic behavior is
which should give no errors. |
Thanks for you comments! It works just fine ! I'll close the issue. |
What happened + What you expected to happen [EDIT : does not work either with 2.0.1]
Hello, I'm tuning NHITS model when I discovered I had inconsistent results with the same experiments.
I replicated the bug using utilsforecast to generate series.
What I do in my code is generating series once. Then I create the exact same model twice.
Predictions are always different using a GPU on Databricks.
It seems that the bug never occurs when using CPU in Databricks too. I also tested it with neuralforecast 1.7.3, 1.7.5 and 2.0.1 version.
Predictions seem inconsistent regardless of the hyperparameters values and for equal random_seed.
PS : I am also using NBEATSx and TFT and have no problem with both!
Thanks in advance, Appreciate your help!
Versions / Dependencies
Within Databricks environment.
Using neuralforecast 1.7.3, 1.75, and 2.0.1
Bug reproducible on GPU but not on CPU
Reproduction script [EDIT: simplified the code but still does not work on GPU]
Issue Severity
High: It blocks me from completing my task.
The text was updated successfully, but these errors were encountered: