Skip to content
This repository has been archived by the owner on May 21, 2022. It is now read-only.

Add SVRG parameter updater #7

Open
ahwillia opened this issue Oct 8, 2016 · 3 comments
Open

Add SVRG parameter updater #7

ahwillia opened this issue Oct 8, 2016 · 3 comments

Comments

@ahwillia
Copy link
Contributor

ahwillia commented Oct 8, 2016

This is an interesting stochastic optimizer with some nice theoretical guarantees for convex problems. Would be interesting to compare to the others we have implemented already.

https://papers.nips.cc/paper/4937-accelerating-stochastic-gradient-descent-using-predictive-variance-reduction.pdf

@tbreloff
Copy link
Member

tbreloff commented Oct 9, 2016

Just trying to make sense of the paper. It seems like they are proposing the gradient update for weight w[i] is:

θ[i] -= lr * (∇[i] - V)

where V is the difference: E(grad(mean(theta))) - mean(E(grad(theta)))

Do agree this is their method?

@ahwillia
Copy link
Contributor Author

ahwillia commented Oct 9, 2016

This paper has a maybe slightly better summary of it: https://arxiv.org/pdf/1603.06160v2.pdf (see Algorithm 2).

I think you have the basic idea... It might be slightly more difficult to fit this into the ParameterUpdater framework since there is a nested loop. Here is some pseudocode -- w is the parameters of the model. (Please double-check that I got this right!)

function svrg_psuedocode(data, w)    
    # hold current parameters and params from last iter
    w = initialize_weights()
    w_prev = similar(w)

    # main loop
    for s = 1:iterations

        # store previous weights
        copy!(w_prev, w)

        mu = mean([ grad(w, target, output) for (target,output) in data ])

        for t = 1:epoch_length
            (target,output) = rand_sample(data)

            # calc gradients
            ∇w = grad(w, target, output)
            ∇w_prev = grad(w_prev, target, output)

            # update
            w -= learnrate*(∇w - ∇w_prev + mu) 
        end
    end
end

@tbreloff
Copy link
Member

Related? https://arxiv.org/abs/1604.07070v2

On Sunday, October 9, 2016, Alex Williams [email protected] wrote:

This paper has a maybe slightly better summary of it:
https://arxiv.org/pdf/1603.06160v2.pdf (see Algorithm 2).

I think you have the basic idea... It might be slightly more difficult to
fit this into the ParameterUpdater framework since there is a nested loop.
Here is some pseudocode -- w is the parameters of the model. (Please
double-check that I got this right!)

function svrg_psuedocode(data, w)
# hold current parameters and params from last iter
w = initialize_weights()
w_prev = similar(w)

# main loop
for s = 1:iterations

    # store previous weights
    copy!(w_prev, w)

    mu = mean([ grad(w, target, output) for (target,output) in data ])

    for t = 1:epoch_length
        (target,output) = rand_sample(data)

        # calc gradients
        ∇w = grad(w, target, output)
        ∇w_prev = grad(w_prev, target, output)

        # update
        w -= learnrate*(∇w - ∇w_prev + mu)
    end
endend


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#7 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA492m1DME0wh4582SkuI7PzRBz9YTs5ks5qySDbgaJpZM4KRw5v
.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants