Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of tall Jacobians #100

Open
MaximilianJHuber opened this issue Apr 6, 2020 · 4 comments
Open

Performance of tall Jacobians #100

MaximilianJHuber opened this issue Apr 6, 2020 · 4 comments

Comments

@MaximilianJHuber
Copy link

MaximilianJHuber commented Apr 6, 2020

Let H be a matrix, x a vector, and f! a matrix-valued function. To make this fit the interface, I allow H to be a vector and reshape H :

using SparseArrays, SparseDiffTools, BenchmarkTools, ProfileView
function f!(H, x)
    if length(size(H)) == 1
        H = reshape(H, 500000, 4)
    end
    M = size(H)[1]
    N = size(H)[2]

    for m in 1:M
        for n in 1:N
            if isodd(n)
                H[m, n] = 2*x[1]
            else
                H[m, n] = 3*x[2]
            end
        end
    end
end
H = zeros(500000, 4)
x = 5.0 * ones(200)
@views f!(H[:], x) # same as f!(H, x)

Of the 200 variables only the first two are relevant, but even they collapse to one color.

pattern = sparse([1, 3, 2, 4], [1, 1, 2, 2], 1, size(H, 2), length(x))
jac = Float64.(sparse(pattern))
colors = matrix_colors(jac)
pattern = kron(pattern, ones(size(H, 1)))
jac = Float64.(sparse(pattern))
cache = ForwardColorJacCache(f!, x, dx=H, colorvec=colors ,sparsity=pattern);
@btime forwarddiff_color_jacobian!(jac, f!, x, cache);

108.402 ms (4 allocations: 160 bytes)

It is still much faster than full ForwardDiff:

full_jac = Matrix(jac);
full_jac .= 0;
@time ForwardDiff.jacobian!(full_jac, f!, H, x);

2.270 s (6 allocations: 198.39 MiB)

but @profview forwarddiff_color_jacobian!(jac, f!, x, cache); shows that most time is spent in FiniteDiff._colorediteration!.

@YingboMa
Copy link
Member

YingboMa commented Apr 6, 2020

Here are some thoughts on optimizing FiniteDiff._colorediteration!. Currently, our iteration order is to linearly traverse the compressed Jacobian and to find the appropriate column in the uncompressed Jacobian to copy to. This could bad for cache locality because when the matrix is large and sparse, we want to minimize the number of times that we traverse the large uncompressed Jacobian. So, I suggest to linearly traverse the uncompressed Jacobian and to find the appropriate in compressed Jacobian to copy from.

@MaximilianJHuber
Copy link
Author

MaximilianJHuber commented Apr 7, 2020

But isn't traversing the compressed Jacobian slower than traversing the uncompressed?

I have no sense how cache locality works for sparse matrices, but to find an element in the sparse matrix a heuristic may be good before starting to traverse the rowval: the next element in the compressed Jacobian is likely to be the next element in the uncompressed, at least for the sparsity pattern I have.

@MaximilianJHuber
Copy link
Author

I read a comment from @j-fu on this post. His ExtendableSparse.jl and the way he uses it to extract the Jacobian out of ForwardDiff into a sparse matrix may be relevant for this issue.

@sjdaines
Copy link

sjdaines commented Jul 18, 2021

Partial fix in PR #146

This adds a fast path that avoids slow sparse indexing if J and sparsity are both SparseMatrixCSC with the same sparsity pattern.

Updated timing (Julia 1.6.2)

SparseDiffTools.jl v1.14.0:

julia> @btime forwarddiff_color_jacobian!($jac, $f!, $x, $cache);
  107.003 ms (4 allocations: 160 bytes)

With PR #146:

julia> @btime forwarddiff_color_jacobian!($jac, $f!, $x, $cache);
  10.153 ms (4 allocations: 160 bytes)

and @profview forwarddiff_color_jacobian!(jac, f!, x, cache); shows that the specialized
SparseDiffTools. _colorediteration! is now approx 1/4 the total.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants