-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LabeledArray
and CategoricalArray
#4
Comments
Hi @nalimilan! I am glad to see you here and thank for your work on The core idea behind In some sense, what I did is simply letting users themselves have the options to decide how they want to proceed. I feel like that's something that add flexibility without introducing complexity for handling all hard-to-imagine usage scenarios. |
@junyuan-chen - for my better understanding of your design. What is the value added of |
Hi, @bkamins! That's a great question. You are correct that julia> lv = LabeledValue(1,Dict(1=>"a"))
1 => a
julia> lv==1
true
julia> lv=="a"
true This is really just some syntax sugar. But, the whole point when I initially thought about it is just something that is simple so there is no need to type more every time such a comparison is needed. |
This was exactly the reason of my question. The point is that with this syntactic sugar the transitivity of I understand that this is quite convenient in interactive use, but my fear was that some production code could assume transitivity of What do you think about it? My question is guided by the following reasoning:
|
Thank you (@bkamins) for your comment! I really appreciate your effort. Yes, I agree that the behavior of |
Thanks for the quick reply! Indeed it would be safer to avoid defining
I see the goal, though in practice what differences does it make compared with a |
You are right. If the original values are retained, then |
OK, great. Then I'll try to add that feature in a CategoricalArrays branch and I'll let you know so we can see whether it would fit ReadStatTables's needs. |
Awesome! @nalimilan |
I am closing this issue due to the changes made here. The transitivity of |
Great! |
I've just discovered this package, it's really cool! I was precisely willing to implement something like this.
Something which I've been wondering for some time based on my experience in R is the relationship between
LabeledArray
andCategoricalArray
(orlabelled
andfactor
in R). As I see it, value labels are just the way Stata, SAS and SPSS deal with categorical variables. Unfortunately that's a leaky abstraction which doesn't really allow knowing whether a variable is supposed to be categorical or continuous (see e.g. this discussion). In R, I find the divide betweenlabelled
vectors and factors to be annoying as it creates a schism which makes data handling even more complex than it needs to be. Maybe we can do better in Julia thanks to its powerful type system and by learning from previous experiences (since the fundamental design isn't set in stone as in R and Stata)?Of course currently
CategoricalArray
isn't able to store variables with value labels as it only allows for consecutive reference codes starting from 1 -- so it loses the underlying values if they do not fit that scheme. But it could store an additional field giving the mapping from each level to a custom code. Do you think this would allow mergingLabeledArray
andCategoricalArray
?I know it's possible in Stata to assign value labels only to some values. This does not seem too problematic as a level could be generated automatically by calling
string
on the value. I've also read that some variables may have labels attached to some values even though they are truly continuous, but I couldn't find examples of this. Probably value labels are used for continuous variables only to attach labels to missing values, which makes more sense than attaching labels to arbitrary numeric values. Maybe Likert scales are an intermediate case which can be considered as continuous with some assumptions, but treating them as categorical by default sounds safer.Thanks in advance for your feedback! My experience designing CategoricalArrays is that it's super hard to get right (and issues keep being spotted from time to time), so I figure we'd better join forces if that's possible.
Cc: @bkamins, @pdeffebach
The text was updated successfully, but these errors were encountered: