Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Functional non-mutating methods. #46453

Open
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

Tokazama
Copy link
Contributor

Her I've implemented what I think are some of the more straightforward non-mutating counterparts to mutating methods.
I borrowed liberally from @tkf's PR #33495 to get Base.setindex working here.
Should also resolve type piracy in ArrayInterface on Base.setindex JuliaArrays/ArrayInterface.jl#305.
Also implemented delete, deleteat, and insert borrowing from work from ArrayInterface.
I also took a bit from #24836, but I thoughtinsert was a more clear method name that can be used to accomplish what the non-mutating push/pushfirst methods (and a bit more).
I think some issues with functional collections can could really benefit from some basic tools like this (#34478, https://discourse.julialang.org/t/functional-implementation-of-collect/15177).
Currently delete, deleteat, and insert are in the export list but if that's too intrusive I can change that.

base/array.jl Outdated

function setindex(xs::AbstractArray, v, I...)
@_propagate_inbounds_meta
T = promote_type(eltype(xs), typeof(v))
Copy link
Contributor

@mcabbott mcabbott Aug 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means setindex(rand(5), ones(2), 3:4) makes a Vector{Any}.

I see that the pirate methods in ArrayInterfaceCore don't seem to make this mistake. Do they have tests, of all the many weird ways to index arrays?

julia> @which Base.setindex(rand(5), ones(2), 3:4)
setindex(x::AbstractArray, v, i...)
     @ ArrayInterfaceCore ~/.julia/packages/ArrayInterfaceCore/7kMjZ/src/ArrayInterfaceCore.jl:248

julia> Base.setindex(rand(5), ones(2), 3:4)
5-element Vector{Float64}:
 0.9187440365487348
 0.5816199534074197
 1.0
 1.0
 0.1473879772541169

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have tests but I thought that tkf's branch was discussed and agreed upon so I moved most of it over to what's done there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about a combined version of that accounts for non-scalar indexing and accommodates new types

function setindex(x::AbstractArray, v, i...)
    @_propagate_inbounds_meta
    inds = to_indices(x, i)
    if inds isa Tuple{Vararg{Integer}}
        T = promote_type(eltype(x), typeof(v))
    else
        T = promote_type(eltype(x), eltype(v))
    end
    y = similar(x, T)
    copy!(y, x)
    y[inds...] = v
    return y
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this definition along with the corresponding tests from ArrayInterface

Copy link
Contributor

@jw3126 jw3126 Aug 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Essentially this definition was discovered in other places as well.

@@ -38,7 +38,7 @@ get(f::Callable, t::Tuple, i::Integer) = i in 1:length(t) ? getindex(t, i) : f()
# returns new tuple; N.B.: becomes no-op if `i` is out-of-bounds

"""
setindex(c::Tuple, v, i::Integer)
setindex(x::Tuple, v, i::Integer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relatedly, note that this doesn't allow non-scalar indexing at all:

julia> getindex(Tuple("julia"), 3:5)
('l', 'i', 'a')

julia> Base.setindex(Tuple("julia"), ('l', 'i', 'o'), 3:5)
ERROR: MethodError:

And the pirate methods don't alter this.

Copy link
Contributor Author

@Tokazama Tokazama Aug 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure if we wanted to do multiple values at once for Tuple and NamedTuple but I personally have no problem with it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going to support setting multiple values what do we do with repeated indices? setindex! just does another scalar setindex! at the repeated position when iterating through the new values. Do we just write it so the behavior appears the same to users, or do we throw an error for repeated values like deleteat! does?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest addition supports non-scalar setindex for both Tuple and NamedTuple. These probably could be more efficient but it's a bit difficult to accomplish this without traits that could tell us if the indices have a known size at compile time to help do things in place (instead of acting on a temporary vector and then transforming to a tuple).

@Tokazama
Copy link
Contributor Author

It may be good to get #46500 in first so that cases where it's more efficient to perform the mutating counterpart on a copy of the provided array.

@Tokazama
Copy link
Contributor Author

Tokazama commented Sep 3, 2022

@mbauman I saw that you reviewed the previous effort for setindex, so I thought I'd ping you in case you're interested.

merge(nt, (; idx => v))
setindex(nt::NamedTuple, v, idx::Symbol) = merge(nt, (; idx => v))
function setindex(nt::NamedTuple, vs, idxs::AbstractVector{Symbol})
merge(nt, (; [i => v for (i,v) in zip(idxs, vs)]...))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be a little footgun hidden here, as zip does not check lengths:

julia> zip([:a,:c], [1,2,3])
zip([:a, :c], [1, 2, 3])

julia> collect(zip([:a,:c], [1,2,3]))
2-element Vector{Tuple{Symbol, Int64}}:
 (:a, 1)
 (:c, 2)

So I think there should be a length check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also allows duplicates, which getindex does not. Is this indended?

julia> _setindex((x=1, y=2, z=3), [4, 5, 6, 7], [:x, :y, :x])
(x = 6, y = 5, z = 3)

julia> (x=1, y=2, z=3)[[:x, :y, :x]]
ERROR: duplicate field name

Only one test https://github.com/JuliaLang/julia/pull/46453/files#diff-a4ae685ea77cc80977dc4214e7dbb9d96fed55bdb918396d3453f840da0a695bR308

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured this was more like the behavior of setindex! where a value can be overwritten multiple times.

Copy link
Contributor

@mcabbott mcabbott Sep 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But get/set both allow repeats for arrays. I'm not sure whether getindex disallowing repeats here is a feature or a bug, either. Do you know?

For arrays, the types must match: setindex!(rand(10), (3, 4), 5:6) is an error, rather than itererating the tuple, because getindex(rand(10), 5:6) is a vector. Is there an argument for why this method should accept & iterate anything?

julia> rand(10)[5:6]
2-element Vector{Float64}:
 0.46482934129714726
 0.06656695848727301

julia> (x=1, y=2, z=3)[[:x, :y]]
(x = 1, y = 2)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But get/set both allow repeats for arrays. I'm not sure whether getindex disallowing repeats here is a feature or a bug, either. Do you know?

I think it's just because you can't have repeated fields in a NamedTuple, so (x=1, y=2, z=3)[[:x, :y, :x]] would have to make up a new symbol for the last :x in order for it to work.

For arrays, the types must match: setindex!(rand(10), (3, 4), 5:6) is an error, rather than itererating the tuple, because getindex(rand(10), 5:6) is a vector. Is there an argument for why this method should accept & iterate anything?

I'm not completely sure what rules should carry over. You also can't index arrays with anything but integers and arrays of integers. I don't know if this means that anytime setindex sets multiple values the collection should be an array, or if the type should match the destination.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I guess the lack of repeats in getindex is indeed forced on you by the return type.

The others I don't know, I just think they need careful thought.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative definition could be

function setindex(nt::NamedTuple, vs, idxs::Union{AbstractVector{Symbol},Tuple{Vararg{Symbol}}})
    merge(nt, NamedTuple{Tuple(idxs)}(vs))
end

which requires vs have a conversion to Tuple and matches the definition of getindex more.

@jw3126
Copy link
Contributor

jw3126 commented Sep 3, 2022

pinging @aplavin who might also be excited about this PR.

base/array.jl Outdated Show resolved Hide resolved
delete(nt::NamedTuple, key::Integer)

Constructs a new `NamedTuple` with the field corresponding to `key` removed.
If no field corresponds to `key`, `nt` is returned.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This differs from the error for tuples with out-of-bounds index. The logic is that setindex lets you add arbitrary keys? But not for integers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was following the behavior of delete which doesn't throw an error if key isn't found. deleteat! and deleteat do throw errors if the field/index isn't found.

base/tuple.jl Outdated Show resolved Hide resolved
merge(nt, (; idx => v))
setindex(nt::NamedTuple, v, idx::Symbol) = merge(nt, (; idx => v))
function setindex(nt::NamedTuple, vs, idxs::AbstractVector{Symbol})
merge(nt, (; [i => v for (i,v) in zip(idxs, vs)]...))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I guess the lack of repeats in getindex is indeed forced on you by the return type.

The others I don't know, I just think they need careful thought.

@jw3126
Copy link
Contributor

jw3126 commented Sep 23, 2022

What is the status of this? I would love to get this merged, especially setindex.

@Tokazama
Copy link
Contributor Author

What is the status of this? I would love to get this merged, especially setindex.

I'd love to get this finalized. Are there any issues that I still need to address here so we can move forward.

@jw3126
Copy link
Contributor

jw3126 commented Sep 27, 2022

What is the status of this? I would love to get this merged, especially setindex.

I'd love to get this finalized. Are there any issues that I still need to address here so we can move forward.

I don't see any open issues.

@Tokazama
Copy link
Contributor Author

What is the status of this? I would love to get this merged, especially setindex.

I'd love to get this finalized. Are there any issues that I still need to address here so we can move forward.

I don't see any open issues.

I tried to use established discussion from previous PRs, rather than start from scratch. Hopefully, that gets us closer to a final product here.

It might be worth noting that I didn't incorporate these methods on IOContext, as was brought up in a previous PR. IIRC, some people didn't like that idea as much. I figured that interested parties could more easily address that after syntax and implementation for supporting data structures was in place.

@jw3126
Copy link
Contributor

jw3126 commented Sep 27, 2022

It might be worth noting that I didn't incorporate these methods on IOContext, as was brought up in a previous PR. IIRC, some people didn't like that idea as much. I figured that interested parties could more easily address that after syntax and implementation for supporting data structures was in place.

I also think it is good to keep this initial PR as minimal as possible. And I think some persistent nagging will be required to get this merged. I think there is no real technical or design issue why #33495 and other previous attempts did not get merged. Just that core devs have limited time and other priorities.

@aplavin
Copy link
Contributor

aplavin commented Nov 9, 2022

If we could reference multiple keys at once

Same as how setindex! can assign multiple values to multiple indices at once?

  setindex!(A, X, inds...)
  A[inds...] = X
  Store values from array X within some subset of A as specified by inds.

@jw3126
Copy link
Contributor

jw3126 commented Nov 9, 2022

An O(n) setindex for vectors and dictionaries seems like a performance footgun. This sort of method needs to be backed by a data structure that supports more efficient operations.

This is a very good point. We should probably document this. I think a key strength of julia is, that it allows both of the following:

  • Write very generic code
  • Dive into hotspots and optimize them to peak performance without switching the language

In my view julia is not a language that compromises on composability and genericness in order to force you to write fast code.
For instance, I pretty often use x in xs where xs::Vector. This is also O(n), does it mean we should delete that method and tell people to use e.g. Set instead?

Mathematically it is not possible to have a data structure for which every operation is fast. But for writing generic code it is critical to support everything and not force the user to write fast code.

@LilithHafner
Copy link
Member

LilithHafner commented Nov 10, 2022

If setindex were already used in base with data structures that support it well, I find that reasoning convincing, but as is, I am not aware of any data structures in base that work well with this operation. I would propose defining this function in a package with some such data structures and then once the interface is settled and the package is in widespread use one could make the case to move those functions to base for convenience.

@Tokazama
Copy link
Contributor Author

If setindex were already used in base with data structures that support it well, I would find that reasoning convincing, but as is, I am not aware of any data structures in base that work well with this operation.

It is already in use in base with Tuple and NamedTuple. This PR implements setindex for ImmutableDict with the same description you provided and linked to a haskell example earlier.

I would propose defining this function in a package with some such data structures and then once the interface is settled and the package is in widespread use one could make the case to move those functions to base for convenience.

The point of adding this to Base is that this is already part of the API. The interface that I previously referenced being ill-defined is the dictionary interface, not this one.

@LilithHafner
Copy link
Member

Thank you for the correction, my mistake. I support adding more general methods to the existing Base.setindex (with a performance note).

@Tokazama
Copy link
Contributor Author

Thank you for the correction, my mistake. I support adding more general methods to the existing Base.setindex (with a performance note).

Thank for the suggestion. Documentation now includes a performance note.

base/array.jl Show resolved Hide resolved
base/array.jl Outdated
end
end
end
function setindex(t::Tuple, v, inds::AbstractVector{<:Integer})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For vectors with statically known size (eg SVectors), the following alternative performs better and without allocations:

	function setindex(t::Tuple, v, inds::AbstractVector{<:Integer})
	    Base.@_propagate_inbounds_meta
		foldl(ntuple(identity, length(inds)); init=t) do acc, i
			Base.setindex(acc, v[i], inds[i])
		end
	end

Maybe, this should be used instead? And/or, add a similar method with inds::NTuple{Integer}?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the snippet! it now replaces the old commented code.

Copy link
Contributor

@aplavin aplavin Mar 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I must've gone crazy with that fold!
This works the same for static arrays, better for regular arrays, and even seems efficient for tuples:

function setindex(t::Tuple, v, inds::AbstractVector{<:Integer})
    ntuple(length(t)) do i
        ix = findfirst(==(i), inds)
        isnothing(ix) ? t[i] : v[ix]
    end
end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and for unitranges

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we want to allow having tuples of indices be permitted as inds here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah maybe best without tuples.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's correct.

julia> let x = zeros(3)
         setindex!(x, [4,5,6], [1,2,1])
         x
       end
3-element Vector{Float64}:
 6.0
 5.0
 0.0

julia> function _setindex(t::Tuple, v, inds::AbstractVector{<:Integer})
           ntuple(length(t)) do i
               ix = findfirst(==(i), inds)
               isnothing(ix) ? t[i] : v[ix]
           end
       end
_setindex (generic function with 1 method)

julia> let x = Tuple(zeros(3))
         _setindex(x, [4,5,6], [1,2,1])
       end
(4, 5, 0.0)

Copy link
Contributor

@aplavin aplavin Mar 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, findlast instead of findfirst? Or throw?
Not clear which is the best semantics.

true
```
"""
function deleteat(src::AbstractVector, i::Integer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleteat(src::AbstractVector, i) = deleteat!(copy(src), i)

?
Instead of all these methods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see - it may work with setindex-able, but not resizeable arrays as of now...
Unfortunate that so much duplication is needed.

Copy link
Contributor Author

@Tokazama Tokazama Mar 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's a bit messy. I thought about putting # TODO refactor when immutable arrays are in base but I don't think we know how that's going to play out still. Also, I use "TODO" as a keyword in searches for my code bases, so I try not to pollute community projects with it unless it's clearly helpful for everyone.

@@ -381,8 +381,43 @@ julia> Base.setindex(nt, "a", :a)
(a = "a",)
```
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setindex(x::NamedTuple{names}, v, i::Integer) where {names} = NamedTuple{names}(setindex(values(x), v, i))

@Tokazama
Copy link
Contributor Author

setindex (not setindex!) isn't a great function name (#51066 (comment)). Maybe set for changing the value associated with an existing key/index?

@aplavin
Copy link
Contributor

aplavin commented Sep 1, 2023

What's wrong with setindex? It already exists, is used by many packages, and follows the recommended naming pattern of non-mutating f vs mutating f!. Would be strange to have the same operation to have totally different names for mutable/immutable versions.

@Tokazama
Copy link
Contributor Author

Tokazama commented Sep 1, 2023

What's wrong with setindex? It already exists, is used by many packages, and follows the recommended naming pattern of non-mutating f vs mutating f!. Would be strange to have the same operation to have totally different names for mutable/immutable versions.

It's discussed a bit here, but the gist of it is the order of the arguments is odd and is more of an artifact of translating obj[inds...] = val to setindex!(obj, val, inds...). The more natural translation of the syntax puts indices closer to the object so that obj.name = val -> setproperty!(obj, name, val).

Furthermore, the function means something different for dictionaries than arrays. It's add a new index/key for dicts and replacing an existing for other collections.

@aplavin
Copy link
Contributor

aplavin commented Sep 1, 2023

Argument order: yes, it's strange in setindex! compared to other functions, but still for consistency it would be nice to have setindex! and setindex with the same arguments and order, and also for other functions - setproperty! can have setproperty counterpart with the same arguments.
If setindex! argument order changes at some point (2.0?), then setindex should follow of course.

If there was a function set! in Base, then non-mutating set with the same arguments would be natural, but it isn't the case.
Like with insert: there is insert! in Base, and adding insert with the same behavior + arguments makes sense.

Furthermore, the function means something different for dictionaries than arrays.

Yeah, and this behavior is already familiar to everyone using this function. Making an immutable counterpart with the same meaning is only natural.

@Tokazama
Copy link
Contributor Author

Tokazama commented Sep 1, 2023

Argument order: yes, it's strange in setindex! compared to other functions, but still for consistency it would be nice to have setindex! and setindex with the same arguments and order, and also for other functions - setproperty! can have setproperty counterpart with the same arguments. If setindex! argument order changes at some point (2.0?), then setindex should follow of course.

I really doubt this is going to change in the future since the entire reason is to support splatting indices.

Furthermore, the function means something different for dictionaries than arrays.

Yeah, and this behavior is already familiar to everyone using this function. Making an immutable counterpart with the same meaning is only natural.

My point was that the similarity between the two functions when working with immutable functions is less clear given that setindex!(obj, val, inds...) is kind of just an implementation detail for users to have obj[inds...] = val. Whatever we call this new function will have the exact same syntax in implementation and for users.

Although I would prefer something that diverges from setindex! in name and argument order, facilitating a more self documenting design, I also understand that takes some more bike-shedding to move forward an API that has been in development for over a year. @LilithHafner and @vchuravy might feel more strongly though.

@aplavin
Copy link
Contributor

aplavin commented Sep 11, 2023

I really doubt this is going to change in the future since the entire reason is to support splatting indices.

Isn't the exact same concern (splatting indices) just as valid for setindex as it is for setindex!?

similarity between the two functions when working with immutable functions is less clear given that <...>

Nevertheless, much more people are familiar with setindex! than with (non-existing) set! :) That's why I'm saying it makes more sense to make new apis consistent with existing ones (setindex[!]) instead of creating new concepts (set).

Same as is done in this PR for insert and delete, mirroring existing ! functions.

Comment on lines +847 to +857
function delete(d::ImmutableDict{K,V}, key) where {K,V}
if isdefined(d, :parent)
if isequal(d.key, key)
d.parent
else
ImmutableDict{K,V}(delete(d.parent, key), d.key, d.value)
end
else
d
end
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation is not quite correct since I can current;y construct an ImmutableDict where there are multiple entries for key each shadowed.

Copy link
Member

@vtjnash vtjnash Sep 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect delete would be more easily done with adding a new key as a tombstone (with the same key, but val being #undef), rather than a bulk copy like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect delete would be more easily done with adding a new key as a tombstone (with the same key, but val being #undef), rather than a bulk copy like this.

I really like that idea, but I think we would need to have a new type since we can't rely on #undef for bits types.

@Tokazama
Copy link
Contributor Author

Isn't the exact same concern (splatting indices) just as valid for setindex as it is for setindex!?

That's a good point. I've mostly been trying to capture what I can remember from the triage where it was discussed. There may or may not be some things I'm leaving out that are important. We probably need to wait until relevant parties chime in.

@brenhinkeller brenhinkeller added the collections Data structures holding multiple items, e.g. sets label Sep 15, 2023
@jariji
Copy link
Contributor

jariji commented Sep 18, 2023

Should setindex support AbstractString?

@LilithHafner
Copy link
Member

setindex(::String, ::Char, ::Int) -> String seems like a major performance footgun.

@Tokazama
Copy link
Contributor Author

Should setindex support AbstractString?

I'm not against this but I fear I've already made the scope of this PR too big given the lack of consensus on many details herein.

@@ -95,6 +96,7 @@ Library changes
* `@time` now separates out % time spent recompiling invalidated methods ([#45015]).
* `eachslice` now works over multiple dimensions; `eachslice`, `eachrow` and `eachcol` return
a `Slices` object, which allows dispatching to provide more efficient methods ([#32310]).
* The non-mutationg `Base.setindex` function now has `AbstractDict` support ([#46453]).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mutationg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
collections Data structures holding multiple items, e.g. sets triage This should be discussed on a triage call
Projects
None yet
Development

Successfully merging this pull request may close these issues.