You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dear Developer,
I'm going to use OC20 for HPC benchmark. I used Nvidia implementation of OC20 that can be found here: https://github.com/mlcommons/hpc_results_v3.0/tree/dfea490b1e7b6545eb5c8b7a4ff9b05021cba098/NVIDIA/benchmarks/oc20/implementations/pytorch
I download the data as per instruction and and try to use this implementation. After a few epoch, I'm getting the NaN error from model output.
I normalized the data, but I don't know why NaN is in the output?
Could you please tell me somethings changes in dataset?
The text was updated successfully, but these errors were encountered:
Hi @javak87 ,
Can you provide more information on what data you are training on, model, system information (os, python, nvidia cuda versions etc) and any relevant plots, NaN could be the result of many errors.
Additionally we don't directly maintain this version of the code from Nvidia so our help might be limited, perhaps you can try to train oc20 in the fairchem repo to see if you can reproduce the error?
What would you like to report?
Dear Developer,
I'm going to use OC20 for HPC benchmark. I used Nvidia implementation of OC20 that can be found here:
https://github.com/mlcommons/hpc_results_v3.0/tree/dfea490b1e7b6545eb5c8b7a4ff9b05021cba098/NVIDIA/benchmarks/oc20/implementations/pytorch
I download the data as per instruction and and try to use this implementation. After a few epoch, I'm getting the
NaN
error from model output.I normalized the data, but I don't know why NaN is in the output?
Could you please tell me somethings changes in dataset?
The text was updated successfully, but these errors were encountered: