-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
~5x speedup than SparseArrays #28
Comments
more detailed benchmark comparing with ThreadedSparseCSR julia> speedup = map(reports["base"], reports["this"]) do br, r
minimum(br).time / minimum(r).time
end
9-element Vector{Float64}:
0.0055334472724891184
0.010322679308762808
0.04274428084218395
0.14642857142857144
0.47042220219007225
2.1794871794871793
4.394308194754103
5.724467113647089
4.775544347485181
julia> speedup = map(reports["base"], reports["tmul"]) do br, r
minimum(br).time / minimum(r).time
end
9-element Vector{Float64}:
0.006753095374756376
0.03454683929931455
0.09163389771566433
0.5019588638589618
0.9172189246816113
3.053892215568862
6.099349844322258
5.354817480719794
5.343557759590914
julia> speedup = map(reports["base"], reports["bmul"]) do br, r
minimum(br).time / minimum(r).time
end
9-element Vector{Float64}:
0.014777543308632282
0.03007957559681698
0.11305128801015392
0.4823529411764706
1.7014805708482548
7.927461139896373
7.44011544011544
5.3025629910750975
5.277065594192646 with the following scriptusing BenchmarkTools
using ParallelMergeCSR
using SparseArrays
using SparseMatricesCSR
using ThreadedSparseCSR
ThreadedSparseCSR.get_num_threads()
function benchmark(n::Int)
A = transpose(sprand(2^n, 2^n, 1e-4))
x = rand(2^n)
y = similar(x)
return @benchmark ParallelMergeCSR.mul!($y, $A, $x, 1.0, 0.0)
end
function benchmark_base(n::Int)
A = transpose(sprand(2^n, 2^n, 1e-4))
x = rand(2^n)
y = similar(x)
return @benchmark SparseArrays.mul!($y, $A, $x, 1.0, 0.0)
end
function benchmark_tmul(n::Int)
x = rand(2^n)
y = similar(x)
csrA = SparseMatrixCSR(transpose(sprand(2^n, 2^n, 1e-4)));
return @benchmark tmul!($y, $csrA, $x, 1.0, 0.0)
end
function benchmark_bmul(n::Int)
x = rand(2^n)
y = similar(x)
csrA = SparseMatrixCSR(transpose(sprand(2^n, 2^n, 1e-4)));
return @benchmark bmul!($y, $csrA, $x, 1.0, 0.0)
end
reports = Dict(
"this" => [],
"base" => [],
"tmul" => [],
"bmul" => [],
)
for n in 4:2:20
@info "benchmarking" n
push!(reports["this"], benchmark(n))
push!(reports["base"], benchmark_base(n))
push!(reports["tmul"], benchmark_tmul(n))
push!(reports["bmul"], benchmark_bmul(n))
end
speedup = map(reports["base"], reports["this"]) do br, r
minimum(br).time / minimum(r).time
end
speedup = map(reports["base"], reports["tmul"]) do br, r
minimum(br).time / minimum(r).time
end
speedup = map(reports["base"], reports["bmul"]) do br, r
minimum(br).time / minimum(r).time
end |
These results really don't make sense to me, why does the speedup seem to decrease with larger matrices? |
Also, One issue I see is that the matrices are not the same between the benchmarks so it's not exactly a fair comparison for smaller matrices since the fluctuations with the number of non-zero elements from matrix-to-matrix are larger. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
conclusion: this package seems slightly slow than existing ones, but faster for some of the sizes,
bmul
is the fastest one, I'm looking forward to see how much@batch
can improve on thisnot sure why, but this is currently 8x slower than SparseArrays, the following is the comparison at different sizeI just have a naive benchmark, currently have 5x speedup at 2^20 x 2^20
version info (8 threads on M1 pro)
the script
to reproduce
The text was updated successfully, but these errors were encountered: