Add SVRG parameter updater #7

ahwillia · 2016-10-08T16:36:05Z

This is an interesting stochastic optimizer with some nice theoretical guarantees for convex problems. Would be interesting to compare to the others we have implemented already.

https://papers.nips.cc/paper/4937-accelerating-stochastic-gradient-descent-using-predictive-variance-reduction.pdf

tbreloff · 2016-10-09T15:33:58Z

Just trying to make sense of the paper. It seems like they are proposing the gradient update for weight w[i] is:

θ[i] -= lr * (∇[i] - V)

where V is the difference: E(grad(mean(theta))) - mean(E(grad(theta)))

Do agree this is their method?

ahwillia · 2016-10-09T17:14:03Z

This paper has a maybe slightly better summary of it: https://arxiv.org/pdf/1603.06160v2.pdf (see Algorithm 2).

I think you have the basic idea... It might be slightly more difficult to fit this into the ParameterUpdater framework since there is a nested loop. Here is some pseudocode -- w is the parameters of the model. (Please double-check that I got this right!)

function svrg_psuedocode(data, w)    
    # hold current parameters and params from last iter
    w = initialize_weights()
    w_prev = similar(w)

    # main loop
    for s = 1:iterations

        # store previous weights
        copy!(w_prev, w)

        mu = mean([ grad(w, target, output) for (target,output) in data ])

        for t = 1:epoch_length
            (target,output) = rand_sample(data)

            # calc gradients
            ∇w = grad(w, target, output)
            ∇w_prev = grad(w_prev, target, output)

            # update
            w -= learnrate*(∇w - ∇w_prev + mu) 
        end
    end
end

tbreloff · 2016-10-13T11:17:59Z

Related? https://arxiv.org/abs/1604.07070v2

On Sunday, October 9, 2016, Alex Williams [email protected] wrote:

This paper has a maybe slightly better summary of it:
https://arxiv.org/pdf/1603.06160v2.pdf (see Algorithm 2).

I think you have the basic idea... It might be slightly more difficult to
fit this into the ParameterUpdater framework since there is a nested loop.
Here is some pseudocode -- w is the parameters of the model. (Please
double-check that I got this right!)

function svrg_psuedocode(data, w)
# hold current parameters and params from last iter
w = initialize_weights()
w_prev = similar(w)
# main loop
for s = 1:iterations

    # store previous weights
    copy!(w_prev, w)

    mu = mean([ grad(w, target, output) for (target,output) in data ])

    for t = 1:epoch_length
        (target,output) = rand_sample(data)

        # calc gradients
        ∇w = grad(w, target, output)
        ∇w_prev = grad(w_prev, target, output)

        # update
        w -= learnrate*(∇w - ∇w_prev + mu)
    end
endend
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#7 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA492m1DME0wh4582SkuI7PzRBz9YTs5ks5qySDbgaJpZM4KRw5v
.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SVRG parameter updater #7

Add SVRG parameter updater #7

ahwillia commented Oct 8, 2016

tbreloff commented Oct 9, 2016

ahwillia commented Oct 9, 2016

tbreloff commented Oct 13, 2016

Add SVRG parameter updater #7

Add SVRG parameter updater #7

Comments

ahwillia commented Oct 8, 2016

tbreloff commented Oct 9, 2016

ahwillia commented Oct 9, 2016

tbreloff commented Oct 13, 2016