-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Throw an error when collecting from iterators with inconsistent length #29458
base: master
Are you sure you want to change the base?
Conversation
Check that HasLength and HasShape iterators return the same number of elements as indicated by length(itr). An error was already thrown in some situations when the number of elements was higher than the declared length, but not all.
base/array.jl
Outdated
len = length(dest) | ||
i = 0 | ||
for x in src | ||
i == len && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like copyto!
already does this check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but it doesn't do the one below, and it cannot be done with access to i
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, then we should add a _copyto_impl!
that takes a strict::Bool
argument, using the current code from copyto!
. Then it can be shared by both this code (passing true
for strict) and copyto!
(passing false
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do that. Though note that the error message will be less specific (like the current one in copyto!
), since copyto!
cannot assume that the problem comes from an invalid declared length. So I'm not sure it's really better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, either way. This does need to use eachindex
though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not if we keep this method, as it only applies to Array
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it can also be called on OffsetArray
and possibly on other arrays in corner cases. So I've taken the _copy_impl!
approach.
base/array.jl
Outdated
@@ -667,6 +687,10 @@ function collect_to!(dest::AbstractArray{T}, itr, offs, st) where T | |||
return collect_to!(new, itr, i+1, st) | |||
end | |||
end | |||
i-1 < length(dest) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should use lastindex
.
test/arrayops.jl
Outdated
@test_throws ErrorException collect(InvalidIter2()) | ||
@test_throws ErrorException collect(Any, InvalidIter2()) | ||
@test_throws ErrorException collect(Int, InvalidIter2()) | ||
@test_throws ErrorException [x for x in InvalidIter2()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can test this, since it will do an invalid memory write.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm, yeah, so the only way to catch this would be to drop the @inbounds
in collect_to!
, or put a i == len
check as in copyto!
/copyto_check_length!
. Not sure that's worth it, but it sounds inconsistent to have it in one place and not the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to add checks for some cases (when it can be efficient) than to have no checks at all, I'd say.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But are there reasons to care less about performance in copyto!
than in collect_to!
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
collect_to!
is used by comprehensions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, does that mean it's OK for copyto!
to be slow? :-/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it doesn't mean it's OK for copyto!
to be slow, it just means we don't want regressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was actually an honest question: maybe we should remove the check from copyto!
to make it faster.
@JeffBezanson Could you suggest how to apply the same change to the parser (cf. issue description)? |
base/array.jl
Outdated
@@ -667,6 +687,10 @@ function collect_to!(dest::AbstractArray{T}, itr, offs, st) where T | |||
return collect_to!(new, itr, i+1, st) | |||
end | |||
end | |||
i-1 < length(dest) && | |||
throw(ErrorException("iterator returned fewer elements than its declared length")) | |||
i-1 > length(dest) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, the @inbounds dest[i]
above has likely corrupted something or segfaulted. Should probably replace the while true
with while i <= lastindex(dest)
and verify here that iterate(itr, st) === nothing
(or Base.isdone(itr, st) === true || iterate(itr, st) === nothing
?). Should then fix #29532.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory I agree, I don't like this @inbounds
call which isn't associated with an actual check of the index against the array. But it comes with an additional cost with hurts performance at lot since it disables SIMD (see discussion above. I've just tried it:
# After change
julia> x = Vector{Int}(undef, 1000);
julia> itr = 1:1000;
julia> @btime Base.collect_to!($x, $itr, 1, 0);
676.981 ns (0 allocations: 0 bytes)
julia> y = copy(x);
julia> @btime Base.collect_to!($x, $y, 1, 1);
847.618 ns (0 allocations: 0 bytes)
# Master
julia> x = Vector{Int}(undef, 1000);
julia> itr = 1:1000;
julia> @btime Base.collect_to!($x, $itr, 1, 0);
86.233 ns (0 allocations: 0 bytes)
julia> y = copy(x);
julia> @btime Base.collect_to!($x, $y, 1, 1);
90.980 ns (0 allocations: 0 bytes)
Maybe the compiler could be improved, but for now I guess we can't do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ouch, that's bad.
I've pushed a commit applying @JeffBezanson's suggestions. I think this could be merged as-is, and improved later to cover more cases. |
@@ -667,6 +677,11 @@ function collect_to!(dest::AbstractArray{T}, itr, offs, st) where T | |||
return collect_to!(new, itr, i+1, st) | |||
end | |||
end | |||
lastidx = lastindex(dest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should actually be last(LinearIndices(dest))
. Will fix in the next round of reviews.
@@ -720,8 +725,12 @@ end | |||
## copy between abstract arrays - generally more efficient | |||
## since a single index variable can be used. | |||
|
|||
copyto!(dest::AbstractArray, src::AbstractArray) = | |||
function _copyto_impl!(dest::AbstractArray, src::AbstractArray, allowshorter::Bool) | |||
if !allowshorter && length(src) < length(dest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check and the one below is actually not needed for this PR since we already allocate an array of the correct length, and the AbstractArray
interface relies on the fact that the length is correctly declared. But I added these for consistency, so that the allowshorter
argument (which needs to be accepted) isn't a no-op. I can drop it and add a comment explaining that if you want.
@JeffBezanson Are you happy with what the PR implements now? It doesn't fix all cases, but at least it catches more errors than before. |
@JeffBezanson Do you think this is worth reviving? |
Check that
HasLength
andHasShape
iterators return the same number of elements as indicated bylength(itr)
. An error was already thrown in some situations when the number of elements was higher than the declared length, but not all.Fixes JuliaData/DelimitedFiles.jl#3.
I'd appreciate if somebody with Scheme skills could indicate how to change the lowering of
T[x for x in itr]
here to throw an error ifidx != length(itr)
when!szunk
inconstruct-loops
. That would fix the two@test_broken
cases.