-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to compare to value? #363
Comments
is there any reason not to implement: function >(v::CategoricalValue{T,R},y::String) where {T,R}
catv_y = CategoricalArrays.pool(v)[get(CategoricalArrays.pool(v), y)]
v > catv_y
end
function <(v::CategoricalValue{T,R},y::String) where {T,R}
catv_y = CategoricalArrays.pool(v)[get(CategoricalArrays.pool(v), y)]
v < catv_y
end
function ==(v::CategoricalValue{T,R},y::String) where {T,R}
catv_y = CategoricalArrays.pool(v)[get(CategoricalArrays.pool(v), y)]
v < catv_y
end |
See #346 and use:
|
@bkamins thanks! That’s a bit more ergonomic but still fairly unwieldy—is there a reason to not do |
As explained in linked #346 doing it automatically leads to non transitivity of An example from R:
so should imply that
The problem is that, as opposed to R Julia exposes |
Hm, I'm not sure I agree--I think R is principled for the example you give. There are two categories-- a) co-product of String and levels=c("1", "2"), and b) co-product of String and levels=c("2,"1"). Transitivity only holds within a category. Edit: to clarify, I think it is also principled to only allow comparison within the same categorical pool. I just wanted to point out that from a category theory standpoint, |
Consider the following in Julia (this is not doable in R):
And now if you called:
under your rules you would get
Also if you called Now you might ask if things like Having said that, I keep the issue open as when doing the current design we were aware that it is inconvenient. However - design wise - it is better to be restrictive (as we are now) and then consider loosening the rules as this approach is non breaking. Let us discuss here the pros and cons. |
Thanks for the thoughtful discussion! I’ll reply to your comment here to consolidate. I completely agree with you on However, note that
As such, I think the contract in Base explicitly endorses |
This is an excellent point. Indeed the benefit of having #346 was implemented to support @nalimilan - what do you think? |
I'm glad to hear that some people would want to use That said, AFAICT the docstring for BTW, R is well-known for being full of dangerous corner cases (in particular in operations involving factors), so I wouldn't take it as a good reference for this kind of thing. |
@ablaom - could you please cross check/comment how breaking transitivity of |
I’m still a bit puzzled by the comments that such a change could break transitivity. Transitivity is maintained, it’s just now a partial order: if edit: I suppose our confusion arises from how we differ in our consideration of oops and yes, meant to use |
The transitivity issue that I'm referring to is this: julia> x = categorical(["a", "b", "c"], levels=["c", "b", "a"], ordered=true)
3-element CategoricalArray{String,1,UInt32}:
"a"
"b"
"c"
julia> x[3] < "b" && "b" < x[1]
true
julia> x[3] == "c" && x[1] == "a"
true
julia> "c" < "a"
false |
Ah I was editing my comment right as you posted that. Multiple dispatch is hiding that there are multiple methods at play there, transitivity is not broken for an individual method. So a question becomes, is transitivity supposed to be maintained at a method level or a function level? I think method level makes more sense—these are different relations represented by same symbol. We could rewrite to make this point more explicit that transitivity is not implied since these are different relations
|
I guess you meant to use different levels between the two arrays? If so yes, the decision to avoid making levels part of the type is due to performance (compilation cost) considerations, but also on practical considerations (you wouldn't be able to add, remove or reorder levels without first creating a new array).
Type theory is nice and all, but in practical terms what matters whether we think that people might use |
In Julia, all methods defined for a function are supposed to follow the same API contract. Otherwise it's impossible to write generic code reliably. Of course we can make some minor exceptions to this rule, but that requires careful consideration. Otherwise, we could perfectly introduce a different function and/or operator. |
Transitivity is broken on current implementation (although maybe a bug?) julia> x = categorical(["a", "b", "c"], levels=["c", "b", "a"], ordered=true)
julia> y = categorical(["a", "b", "c"], levels=["a","b","c"], ordered=true);
julia> x[1] == y[1] # y1 >= x1
true
julia> y[2] > y[1] # y2 >= y1
true
julia> y[2] > x[1] # y2 >= x1
ERROR: ArgumentError: CategoricalValue objects from unequal pools cannot be tested for order
Stacktrace:
[1] <(x::CategoricalValue{String, UInt32}, y::CategoricalValue{String, UInt32})
@ CategoricalArrays ~/.julia/packages/CategoricalArrays/rDwMt/src/value.jl:172
[2] >(x::CategoricalValue{String, UInt32}, y::CategoricalValue{String, UInt32})
@ Base ./operators.jl:305
[3] top-level scope
@ REPL[6]:1 |
Well yes, but at least it throws an error, so it's much less dangerous than silently returning a misleading result. There's no good solution here anyway: should |
I agree that the current implementation is reasonable! I'm more just pointing out the folly of aiming for transitivity for the comparison functions, across all methods. It's not currently the case in Julia nor is it reasonable to expect, imho. here's a silent "error" julia> x[1] == "a"
true
julia> x[2] == "b"
true
julia> x[2] < x[1]
true
julia> "b" < "a"
false |
Note that the full error message is relatively explicit after fixing the typo at #366: julia> x .> "Young"
ERROR: ArgumentError: cannot compare a `CategoricalValue` to value `v` of type `String`: wrap `v` using `CategoricalValue(v, catvalue)` or `CategoricalValue(v, catarray)` first |
I'm not sure how to do
x .> "Young"
Is there a way to makeYoung
aCategoricalValue
?The text was updated successfully, but these errors were encountered: