Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: trait isrowtable #134

Closed
tkf opened this issue Feb 6, 2020 · 3 comments
Closed

Feature request: trait isrowtable #134

tkf opened this issue Feb 6, 2020 · 3 comments

Comments

@tkf
Copy link
Contributor

tkf commented Feb 6, 2020

Can we have a new trait isrowtable to declare a certain table type implements "row table" interface? (Previous discussion: JuliaData/TypedTables.jl#57 cc @andyferris)

The motivation is for implementing append!(dest, src) and alike for table dest with column-oriented memory layout and row-wise container semantics (e.g., TypedTables.jl, StructArrays.jl). Ideally:

  1. If src is a table but not a container of rows (e.g., NamedTuple of Vectors), it is not desirable for append!(dest, src) to append rows. For example, append!([(a=1,)], (a=[1],)) should throw.
  2. If src is a row table but has column-oriented memory layout, it is desirable to append elements column-by-column as an optimization.

It is tempting to use Tabels.columnaccess(src) to satisfy point 2. However, Tabels.columnaccess(src) does not imply that src has rows as elements. So, it might violate point 1.

If we have isrowtable, we can satisfy those two points by

function append!(dest::MyContainer, src)
    if Tables.isrowtable(src) && Tables.columnaccess(src)
        return append_column_by_column!(dest, src)
    else
        return append_row_by_row!(dest, src)
    end
end

Suggested specification

A type RowTable satisfies the row table interface if:

  • Tables.rows(x::RowTable) === x
  • eltype(::RowTable) is a row (not just eltype(Tables.rows(::RowTable)))
  • If x :: RowTable is a mutable container:
    • push!(x, row) must work for any compatible row.
    • append!(x, rows) must work for compatible row table rows (i.e., Tables.isrowtable(rows)).
  • If x :: RowTable is a mutable and indexable:
    • x[i] = row must work for any compatible row.

By x and y are compatible I mean that row or table x and y satisfy Set(Tables.columnnames(x)) == Set(Tables.columnnames(y)) and the types are promotable.

Default implementation and derived interfaces

I suggest to add isrowtable with the default definition

isrowtable(::Type{T}) where {T} = false
isrowtable(::T) where {T} = isrowtable(T)

The default definition of other parts of Tables.jl can be derived from it

istable(::Type{T}) where {T} = isrowtable(T)
rowaccess(::Type{T}) where {T} = isrowtable(T)

function rows(x::T) where {T}
    isrowtable(x) && return x
    ...
end

This way, row tables like TypedTables.jl and StructArrays.jl can just declare

isrowtable(::Type{<:MyTable}) = true

and other interfaces would be derived from it.

@quinnj
Copy link
Member

quinnj commented Feb 8, 2020

Ok, thinking about this, I think we could add this for 1.0. The part I'm most worried about is if x :: RowTable is a mutable container:. I'm trying to imagine the use-case you mentioned of using this, and feeling like it would be annoying if the container wasn't mutable, yet was isrowtable and trying to do append!(rows, rows2) would just fail? It also feels a little off to have explicit usage of Tables.columnaccess, since that has usually indicated a code smell of sorts. But then again, I don't immediately have any other ideas on how you'd implement a generic append for tables.

Along those same lines, I wonder if it would make sense for us to have an official Tables.cat for 1.0 and how we'd implement it.

@tkf
Copy link
Contributor Author

tkf commented Feb 8, 2020

it would be annoying if the container wasn't mutable, yet was isrowtable and trying to do append!(rows, rows2) would just fail?

Do you mind explaining why it's not a desirable behavior? Isn't it like how append!(::SVector, ::Vector) fails?

explicit usage of Tables.columnaccess

AFAICT this is the only trait function out there for figuring out the memory layout of a table. I'm personally seeing it as something like Base.IndexStyle trait; you don't use it for everyday programming but it's occasionally very useful in library code.

an official Tables.cat for 1.0

Yes, I think this would be great. I think it'd make sense to add Tables.append! and Tables.push! as well. Those functions would always act as if rows are the elements, independent of the container semantics of the input tables.

@quinnj
Copy link
Member

quinnj commented Feb 8, 2020

@tkf, would you mind doing a PR for the isrowtable? I think it should be fairly minimal, and we can just add a note about it in the "Implementing" section of the docs, in case it's more convenient when implementing. We can add Tables.cat/append/push later since they're utility functions and not part of the interface definitions.

If you don't think you can get to isrowtable today, just let me know and I can do the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants