You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On page 3 of the paper in algorithm 1 it states that that if (x_i, y_i) are not in the current batch, then that means we add (N/B)*g_t_i to the previous step's moment gradient scaled by the momentum and the previous step's hyper gradient scaled by the regularization co-efficient.
However, when I look at the HydraHook class I see that we include the instance gradient (N/B)*g_t_i if the index is part of the current batch and do not include the instance gradient if it is not. This seems opposite to what is suggested in Algorithm 1 and I hope you can help me figure what is going on.
Thanks!
The text was updated successfully, but these errors were encountered:
On page 3 of the paper in algorithm 1 it states that that if (x_i, y_i) are not in the current batch, then that means we add (N/B)*g_t_i to the previous step's moment gradient scaled by the momentum and the previous step's hyper gradient scaled by the regularization co-efficient.
However, when I look at the HydraHook class I see that we include the instance gradient (N/B)*g_t_i if the index is part of the current batch and do not include the instance gradient if it is not. This seems opposite to what is suggested in Algorithm 1 and I hope you can help me figure what is going on.
Thanks!
The text was updated successfully, but these errors were encountered: