Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error joining to distributed NDSparse #308

Open
grahamas opened this issue Oct 16, 2019 · 2 comments
Open

Error joining to distributed NDSparse #308

grahamas opened this issue Oct 16, 2019 · 2 comments

Comments

@grahamas
Copy link

@everywhere using JuliaDB
  
indicesA = (S=[0.6, 0.7], T=[1,2.0])
indicesB = (S=[1.6, 1.7], T=[2,5.0])
valsAscalar = (u=[1, 2], t=[2, 3])
valsBscalar = (u=[30, 50], t=[4, 5])

Ascalar = ndsparse(indicesA, valsAscalar)
Bscalar = ndsparse(indicesB, valsBscalar)
ddb = distribute(Ascalar, 1)
Cscalar = join(ddb, Bscalar)

gives

ERROR: MethodError: Cannot `convert` an object of type Nullables.Nullable{Array{Pair{Dagger.OSProc,Int64},1}} to an object of type Array{Pair{Dagger.OSProc,Int64},1}
Closest candidates are:
  convert(::Type{Array{S,N}}, ::DataValues.DataValueArray{T,N}) where {S, T, N} at /home/grahams/.julia/packages/DataValues/XQWvG/src/array/primitives.jl:272
  convert(::Type{Array{S,N}}, ::DataValues.DataValueArray{T,N}, ::Any) where {S, T, N} at /home/grahams/.julia/packages/DataValues/XQWvG/src/array/primitives.jl:301
  convert(::Type{Array{S,N}}, ::PooledArrays.PooledArray{T,R,N,RA} where RA) where {S, T, R, N} at /home/grahams/.julia/packages/PooledArrays/ufJSl/src/PooledArrays.jl:288
  ...
Stacktrace:
 [1] convert(::Type{Union{Nothing, Array{Pair{Dagger.OSProc,Int64},1}}}, ::Nullables.Nullable{Array{Pair{Dagger.OSProc,Int64},1}}) at ./some.jl:34
 [2] setproperty!(::Dagger.Thunk, ::Symbol, ::Nullables.Nullable{Array{Pair{Dagger.OSProc,Int64},1}}) at ./Base.jl:21
 [3] #join#267(::Symbol, ::Tuple{Symbol,Symbol}, ::Tuple{Symbol,Symbol}, ::Tuple{Symbol,Symbol}, ::Tuple{Symbol,Symbol}, ::Symbol, ::Int64, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(join), ::typeof(IndexedTables.concat_tup), ::JuliaDB.DNDSparse{NamedTuple{(:S, :T),Tuple{Float64,Float64}},NamedTuple{(:u, :t),Tuple{Int64,Int64}}}, ::JuliaDB.DNDSparse{NamedTuple{(:S, :T),Tuple{Float64,Float64}},NamedTuple{(:u, :t),Tuple{Int64,Int64}}}) at /home/grahams/.julia/packages/JuliaDB/jDAlJ/src/join.jl:54
 [4] (::Base.var"#kw##join")(::NamedTuple{(:broadcast, :how),Tuple{Symbol,Symbol}}, ::typeof(join), ::Function, ::JuliaDB.DNDSparse{NamedTuple{(:S, :T),Tuple{Float64,Float64}},NamedTuple{(:u, :t),Tuple{Int64,Int64}}}, ::JuliaDB.DNDSparse{NamedTuple{(:S, :T),Tuple{Float64,Float64}},NamedTuple{(:u, :t),Tuple{Int64,Int64}}}) at ./none:0
 [5] #join#278(::Symbol, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(join), ::Function, ::JuliaDB.DNDSparse{NamedTuple{(:S, :T),Tuple{Float64,Float64}},NamedTuple{(:u, :t),Tuple{Int64,Int64}}}, ::NDSparse{NamedTuple{(:u, :t),Tuple{Int64,Int64}},Tuple{Float64,Float64},StructArrays.StructArray{NamedTuple{(:S, :T),Tuple{Float64,Float64}},1,NamedTuple{(:S, :T),Tuple{Array{Float64,1},Array{Float64,1}}},Int64},StructArrays.StructArray{NamedTuple{(:u, :t),Tuple{Int64,Int64}},1,NamedTuple{(:u, :t),Tuple{Array{Int64,1},Array{Int64,1}}},Int64}}) at /home/grahams/.julia/packages/JuliaDB/jDAlJ/src/join.jl:130
 [6] (::Base.var"#kw##join")(::NamedTuple{(:how,),Tuple{Symbol}}, ::typeof(join), ::Function, ::JuliaDB.DNDSparse{NamedTuple{(:S, :T),Tuple{Float64,Float64}},NamedTuple{(:u, :t),Tuple{Int64,Int64}}}, ::NDSparse{NamedTuple{(:u, :t),Tuple{Int64,Int64}},Tuple{Float64,Float64},StructArrays.StructArray{NamedTuple{(:S, :T),Tuple{Float64,Float64}},1,NamedTuple{(:S, :T),Tuple{Array{Float64,1},Array{Float64,1}}},Int64},StructArrays.StructArray{NamedTuple{(:u, :t),Tuple{Int64,Int64}},1,NamedTuple{(:u, :t),Tuple{Array{Int64,1},Array{Int64,1}}},Int64}}) at ./none:0
 [7] #join#273 at /home/grahams/.julia/packages/JuliaDB/jDAlJ/src/join.jl:117 [inlined]
 [8] join(::JuliaDB.DNDSparse{NamedTuple{(:S, :T),Tuple{Float64,Float64}},NamedTuple{(:u, :t),Tuple{Int64,Int64}}}, ::NDSparse{NamedTuple{(:u, :t),Tuple{Int64,Int64}},Tuple{Float64,Float64},StructArrays.StructArray{NamedTuple{(:S, :T),Tuple{Float64,Float64}},1,NamedTuple{(:S, :T),Tuple{Array{Float64,1},Array{Float64,1}}},Int64},StructArrays.StructArray{NamedTuple{(:u, :t),Tuple{Int64,Int64}},1,NamedTuple{(:u, :t),Tuple{Array{Int64,1},Array{Int64,1}}},Int64}}) at /home/grahams/.julia/packages/JuliaDB/jDAlJ/src/join.jl:116
 [9] top-level scope at REPL[21]:1

I tried the naive solution of adding a Base.convert(::Type{T}, x::Nullable{T}) where T = x.value but then the join returns an empty table.

@zgornel
Copy link

zgornel commented Oct 25, 2019

Two issues:

  • there seems to be a bug for the inner join for DNDSparse; NDSparse join works and can be tested with join(Ascalar, Bscalar)
  • for a join, ideally the names of the data columns should be different in the two ndsparse (if you do not want Bscalar or Ascalar data values to replace one another, in which case it is correct)

The example above works for join(ddb, Bscalar; how=:outer)
EDIT: after your fix, an empty table would be the correct result.

@grahamas
Copy link
Author

grahamas commented Nov 21, 2019

@zgornel Thank you! I can't believe I didn't notice I was doing the wrong kind of join.

I do get the correct result with an outer join, with one caveat: No matter how many tables I join to the original distributed table, the number of chunks remains the same. I guess I'm not sure what the default behavior should be, so this behavior makes sense, even though I didn't expect it. However, I'm not sure how to join to a distributed table and increase the number of chunks, which is important in my use-case because each joined table individually approaches the memory limit of my machine. Does anyone know how to do this? Does it warrant its own issue? At the very least, I would expect documentation on this and I'm happy to provide that if I understand how to do it.

I'll leave this issue open because of the remaining problem that join errors uninformatively when it should return an empty distributed table.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants