Hypergradient Calculation Different than from the paper? #2

JohnnyC08 · 2022-01-19T22:29:58Z

On page 3 of the paper in algorithm 1 it states that that if (x_i, y_i) are not in the current batch, then that means we add (N/B)*g_t_i to the previous step's moment gradient scaled by the momentum and the previous step's hyper gradient scaled by the regularization co-efficient.

However, when I look at the HydraHook class I see that we include the instance gradient (N/B)*g_t_i if the index is part of the current batch and do not include the instance gradient if it is not. This seems opposite to what is suggested in Algorithm 1 and I hope you can help me figure what is going on.

Thanks!

cyyever · 2023-11-27T13:57:11Z

@JohnnyC08 There is an error in the pseudo code. The condition should be if (x_i, y_i) are in the current batch. Just noticed this issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hypergradient Calculation Different than from the paper? #2

Hypergradient Calculation Different than from the paper? #2

JohnnyC08 commented Jan 19, 2022

cyyever commented Nov 27, 2023 •

edited

Loading

Hypergradient Calculation Different than from the paper? #2

Hypergradient Calculation Different than from the paper? #2

Comments

JohnnyC08 commented Jan 19, 2022

cyyever commented Nov 27, 2023 • edited Loading

cyyever commented Nov 27, 2023 •

edited

Loading