-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should Row
objects be required to support iteration
#75
Comments
As mentioned on Slack, I think a way forward would be to up the requirements for these |
The correct place to enforce this property is here, right? Tables.jl/src/iteratorwrapper.jl Line 37 in 0f75893
We need can just check if |
So, the issue is that DataFrames doesn't do anything to implement row iteration, so it defaults to the fallback
JuliaDB returns a named tuple instead of a |
JuliaDB returns a NamedTuple because of the definitions in IndexedTables (and StructArrays). I don't think it's reasonable to require rows to be iterable. Especially when it's easy enough to write your own "property iterator" to accomplish what you're going for here: struct PropertyIterator{T}
source::T
end
function properties(x::T) where {T}
props = propertynames(x)
length(props) == 0 && throw(ArgumentError("$T isn't property-iterable because it has no propertynames"))
return PropertyIterator(x)
end
Base.length(p::PropertyIterator) = length(propertynames(p.source))
Base.IteratorEltype(::Type{<:PropertyIterator}) = Base.EltypeUnknown()
function Base.iterate(p::PropertyIterator, (props, i)=(propertynames(p.source), 1))
i > length(props) && return nothing
return getproperty(p.source, props[i]), (props, i + 1)
end Happy to put this definition in Tables.jl if you think it'll be helpful generally, but I think it's about as efficient as we could expect for the various |
Your proposed solution would be
Ideally, I would like the following (which work for
Or even
In my view, a |
@quinnj Why do you think some row types couldn't implement iteration? That sounds like an essential feature for this kind of object, and falling back to |
I feel like the argument here is essentially the same as "why aren't structs naturally iterable over their fields?" Answer: because they're fields. You access them by name. The core idea of the Tables.jl API is the dual nature of two key interfaces: PropertyAccessible and Iterable. It feels uncomfortable to me, therefore, to consider imposing additional requirements on what @nalimilan, I explicitly wrote the PropertyIterator example above so you would only need to call |
But for I understand the tension between feature completeness and keeping API requirements low, but IMHO iteration is still worth it. |
I just remembered that we also already provide the df.total_income = map(Tables.rows(df[income_sources])) do row
sum(Tables.eachcolumn(row))
end |
Row
objects be required to support iteration
One other thought on this: by requiring |
I just ran into this, and I also think its weird that iteration over rows isn't supported. I don't think it's "easy enough" for Jacob's solution, as someone who barely understands Tables.jl to start Edit: Is the solution really this simple? If so, then it does feel like it should be a part of Tables.jl, because it's essentially syntactic sugar:
instead of
|
So I've been thinking about this more recently and I think what I'm coming around to is requiring Requiring indexing could be a good update to then tag 1.0. Anyone have thoughts/concerns with this plan? |
Edit: I think I'm mangling concepts, nevermind Threads.@threads for i in 1:iter.n
v[i] = somefunction(iter[i, :])
end |
@quinnj: so the proposal is to have two ways for row element iteration: I think that indexing in Julia should be ideally be reserved for collections with homogeneous element types (ie things that support |
@tpapp IIUC what Jacob said, the idea is to make require rows and columns to be indexable (which is generally the case already for most implementations), not tables. |
In #131, the Tables.jl API has been enhanced and clarified so that
While this streamlines and simplifies the interface, it was discussed that it potentially makes things less convenient in casual settings when users want/expect things like indexing, iteration, and property-access to work on rows or columns. With the new 1.0 API, my hope is that it helps clarifies things that you just use |
Ok, one more thought I just had on this is that we could perhaps have something like the following defined in Tables.jl: struct Row{T} <: Tables.AbstractRow
x::T
end
Tables.getcolumn(x::Row, i::Int) = Tables.getcolumn(x.x, i)
Tables.getcolumn(x::Row, nm::Symbol) = Tables.getcolumn(x.x, nm)
Tables.getcolumn(x::Row, ::Type{T}, i::Int, nm::Symbol) where {T} = Tables.getcolumn(x.x, T, i, nm)
Tables.columnnames(x::Row) = Tables.columnnames(x.x) which would then allow a user to do something like: x = 0.0
for raw_row in Tables.rows(table)
row = Tables.Row(raw_row)
x += sum(row)
end i.e. the very simple Do people think it's worth defining something like that? Would it be useful for more casual users in order to not lose convenience? It's not a ton of code and it's not super complicated, so I'm inclined to do it, but I also just wonder if people would actually use it. |
Sounds like it could be useful. And it would be clearer than having |
Just to wrap this issue up for those on this thread: for the 1.0 release, we added a |
It looks like
row
objects fromTables.rows
have different iteration methods. Or, rather, therow
s in JuliaDB allow for iteration and the ones in DataFrames do not.This is despite that
DataFrameRow
s were changed somewhat recently to allow iteration so they behave more like a NamedTuple.Does Tables care about this? Or is it enough that they both implement
Tables.rows
and not that they return objects with the same behavior.The text was updated successfully, but these errors were encountered: