Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throw an error when collecting from iterators with inconsistent length #29458

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

nalimilan
Copy link
Member

@nalimilan nalimilan commented Oct 1, 2018

Check that HasLength and HasShape iterators return the same number of elements as indicated by length(itr). An error was already thrown in some situations when the number of elements was higher than the declared length, but not all.

Fixes JuliaData/DelimitedFiles.jl#3.

I'd appreciate if somebody with Scheme skills could indicate how to change the lowering of T[x for x in itr] here to throw an error if idx != length(itr) when !szunk in construct-loops. That would fix the two @test_broken cases.

Check that HasLength and HasShape iterators return the same number of elements
as indicated by length(itr). An error was already thrown in some situations when
the number of elements was higher than the declared length, but not all.
base/array.jl Outdated
len = length(dest)
i = 0
for x in src
i == len &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like copyto! already does this check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but it doesn't do the one below, and it cannot be done with access to i.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then we should add a _copyto_impl! that takes a strict::Bool argument, using the current code from copyto!. Then it can be shared by both this code (passing true for strict) and copyto! (passing false).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that. Though note that the error message will be less specific (like the current one in copyto!), since copyto! cannot assume that the problem comes from an invalid declared length. So I'm not sure it's really better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, either way. This does need to use eachindex though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not if we keep this method, as it only applies to Array.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it can also be called on OffsetArray and possibly on other arrays in corner cases. So I've taken the _copy_impl! approach.

base/array.jl Outdated
@@ -667,6 +687,10 @@ function collect_to!(dest::AbstractArray{T}, itr, offs, st) where T
return collect_to!(new, itr, i+1, st)
end
end
i-1 < length(dest) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use lastindex.

test/arrayops.jl Outdated
@test_throws ErrorException collect(InvalidIter2())
@test_throws ErrorException collect(Any, InvalidIter2())
@test_throws ErrorException collect(Int, InvalidIter2())
@test_throws ErrorException [x for x in InvalidIter2()]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can test this, since it will do an invalid memory write.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm, yeah, so the only way to catch this would be to drop the @inbounds in collect_to!, or put a i == len check as in copyto!/copyto_check_length!. Not sure that's worth it, but it sounds inconsistent to have it in one place and not the other.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to add checks for some cases (when it can be efficient) than to have no checks at all, I'd say.

Copy link
Member Author

@nalimilan nalimilan Oct 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But are there reasons to care less about performance in copyto! than in collect_to!?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collect_to! is used by comprehensions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, does that mean it's OK for copyto! to be slow? :-/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it doesn't mean it's OK for copyto! to be slow, it just means we don't want regressions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was actually an honest question: maybe we should remove the check from copyto! to make it faster.

@nalimilan
Copy link
Member Author

@JeffBezanson Could you suggest how to apply the same change to the parser (cf. issue description)?

base/array.jl Outdated
@@ -667,6 +687,10 @@ function collect_to!(dest::AbstractArray{T}, itr, offs, st) where T
return collect_to!(new, itr, i+1, st)
end
end
i-1 < length(dest) &&
throw(ErrorException("iterator returned fewer elements than its declared length"))
i-1 > length(dest) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, the @inbounds dest[i] above has likely corrupted something or segfaulted. Should probably replace the while true with while i <= lastindex(dest) and verify here that iterate(itr, st) === nothing (or Base.isdone(itr, st) === true || iterate(itr, st) === nothing ?). Should then fix #29532.

Copy link
Member Author

@nalimilan nalimilan Oct 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory I agree, I don't like this @inbounds call which isn't associated with an actual check of the index against the array. But it comes with an additional cost with hurts performance at lot since it disables SIMD (see discussion above. I've just tried it:

# After change
julia> x = Vector{Int}(undef, 1000);

julia> itr = 1:1000;

julia> @btime Base.collect_to!($x, $itr, 1, 0);
  676.981 ns (0 allocations: 0 bytes)

julia> y = copy(x);

julia> @btime Base.collect_to!($x, $y, 1, 1);
  847.618 ns (0 allocations: 0 bytes)

# Master
julia> x = Vector{Int}(undef, 1000);

julia> itr = 1:1000;

julia> @btime Base.collect_to!($x, $itr, 1, 0);

  86.233 ns (0 allocations: 0 bytes)

julia> y = copy(x);

julia> @btime Base.collect_to!($x, $y, 1, 1);
  90.980 ns (0 allocations: 0 bytes)

Maybe the compiler could be improved, but for now I guess we can't do this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch, that's bad.

@nalimilan
Copy link
Member Author

I've pushed a commit applying @JeffBezanson's suggestions. I think this could be merged as-is, and improved later to cover more cases.

@@ -667,6 +677,11 @@ function collect_to!(dest::AbstractArray{T}, itr, offs, st) where T
return collect_to!(new, itr, i+1, st)
end
end
lastidx = lastindex(dest)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should actually be last(LinearIndices(dest)). Will fix in the next round of reviews.

@@ -720,8 +725,12 @@ end
## copy between abstract arrays - generally more efficient
## since a single index variable can be used.

copyto!(dest::AbstractArray, src::AbstractArray) =
function _copyto_impl!(dest::AbstractArray, src::AbstractArray, allowshorter::Bool)
if !allowshorter && length(src) < length(dest)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check and the one below is actually not needed for this PR since we already allocate an array of the correct length, and the AbstractArray interface relies on the fact that the length is correctly declared. But I added these for consistency, so that the allowshorter argument (which needs to be accepted) isn't a no-op. I can drop it and add a comment explaining that if you want.

@nalimilan
Copy link
Member Author

@JeffBezanson Are you happy with what the PR implements now? It doesn't fix all cases, but at least it catches more errors than before.

@nalimilan
Copy link
Member Author

@JeffBezanson Do you think this is worth reviving?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Custom iterator breaks on last row of a readdlm-loaded array
3 participants