Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix getindex #108

Merged
merged 1 commit into from
Nov 2, 2024
Merged

Fix getindex #108

merged 1 commit into from
Nov 2, 2024

Conversation

devmotion
Copy link
Contributor

@devmotion devmotion commented Nov 1, 2024

Fixes #107 and fixes #109 by using copyto! instead of unsafe_copyto!.

Additionally, I fixed type stability of getindex and support for empty ChainedVectors.

Currently on the main branch:

julia> x = ChainedVector([[1, 2], [3, 4]]);

julia> x[Int[]]
0-element ChainedVector{Int64, Vector{Int64}}

julia> x[2:1]
0-element ChainedVector{Int64, Vector{Int64}}

julia> x[[2,3]]
2-element Vector{Int64}:
 2
 3

julia> x[2:3]
2-element Vector{Int64}:
 2
 3

julia> x = ChainedVector([[]]);

julia> x[Int[]]
0-element ChainedVector{Any, Vector{Any}}

julia> x[2:1]
0-element ChainedVector{Any, Vector{Any}}

With this PR:

julia> x = ChainedVector([[1, 2], [3, 4]]);

julia> x[Int[]]
Int64[]

julia> x[2:1]
Int64[]

julia> x[[2,3]]
2-element Vector{Int64}:
 2
 3

julia> x[2:3]
2-element Vector{Int64}:
 2
 3

julia> x = ChainedVector([[]]);

julia> x[Int[]]
Any[]

julia> x[2:1]
Any[]

I don't see any performance regression with this PR in the benchmark in #105, on the contrary probably due to the type inference fixes the PR reduces the number of allocations:

Currently on the main branch:

julia> @b f($vinputs, $(1:95000))
10.083 μs (9 allocs: 372.125 KiB)

julia> @b g($vinputs, $(1:95000))
18.750 μs (8 allocs: 904.703 KiB)

With this PR:

julia> @b f($vinputs, $(1:95000))
8.875 μs (5 allocs: 372.016 KiB)

julia> @b g($vinputs, $(1:95000))
19.042 μs (8 allocs: 897.328 KiB)

@devmotion
Copy link
Contributor Author

@quinnj would you be able to review the PR? We hit #107, the bug fixed in this PR, with a simple df[1:1522, :] subsetting operation of a DataFrame with ChainedVector columns that was created by CSV.read (Julia with multiple threads), and given that this seems a rather standard use of DataFrames + CSV I think it would be good to fix #107.

Copy link
Member

@quinnj quinnj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @devmotion! This is great!

@quinnj quinnj merged commit a2976cb into JuliaData:main Nov 2, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants