You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is something I have discussed elsewhere as being useful, and came up today on slack.
Problem
function standardize(x)
mu = mean(x)
sigma = std(x)
return (x .- mu) ./ sigma
end
standardize([1,2,3, missing]
We can't use passmissing because this isn't an element-wise operation. We need an operation that applies skipmissing to the input, then applies the function, then "spreads" the result of the function to a vector of the same length as the output.
function skipmissing_then_collect(fun, args...)
smargs = skipmissings(args...)
res = fun(smargs...)
out = Union{eltype(res), Missing}[missing for i in 1:length(first(args))]# assume all args vectors also
res_counter = 1
for i in eachindex(first(smargs))
out[i] = res[res_counter] # can probably do fancy iteration stuff here
res_counter += 1
end
out
end
This might solve a lot of problems in DataFrames as well.
The text was updated successfully, but these errors were encountered:
Great - let us discuss it here. This would be much better to have a generic solution (if we can come up with a good recipe) than having a custom patch in DataFrames.jl.
As noted when discussing this in DataFrames (JuliaData/DataFrames.jl#2258 (comment)), it could be better (when inputs are vectors) to first find the indices of complete observations, pass a SubArray view of the complete observations to the user-provided function, and use these indices to fill the returned vector. That would avoid going over the data twice to identify missing values, and that would be simpler for users since the function would be passed AbstractVectors rather than SkipMissings iterators.
This is something I have discussed elsewhere as being useful, and came up today on slack.
Problem
We can't use
passmissing
because this isn't an element-wise operation. We need an operation that applies skipmissing to the input, then applies the function, then "spreads" the result of the function to a vector of the same length as the output.This might solve a lot of problems in DataFrames as well.
The text was updated successfully, but these errors were encountered: