Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constructing categorical vector of pure "missing" with prescribed levels #362

Open
ablaom opened this issue Aug 5, 2021 · 3 comments
Open

Comments

@ablaom
Copy link

ablaom commented Aug 5, 2021

It would be nice if the following worked, instead of throwing an error:

categorical(fill(missing, 4), levels=["inlier", "outlier"], ordered=true) 

edit corrected as pointed out in next comment

My use case is a machine learning model (an outlier detector) that is semi-supervised. The features are paired with labels indicating outlier/inlier, with possibly some or all labels missing. Even in the case all labels are missing (unsupervised) I still want to extract from the categorical vector the labels to get the user's preference for these labels.

@bkamins
Copy link
Member

bkamins commented Aug 5, 2021

use vector with proper eltype (as I guess in your code there is a typo and you meant fill(missing, 4) not fill("missing", 4)):

using Missings
categorical(missings(String, 4), levels=["inlier", "outlier"], ordered=true)

@ablaom
Copy link
Author

ablaom commented Aug 5, 2021

Thanks! I didn't know about missings method.

Still think the request has merit, but feel free to close.

@nalimilan
Copy link
Member

It could make sense to take into account the eltype of levels if the eltype of the input is Missing. We could even always call promote_type in both the array and the levels eltypes. Hopefully this only requires adapting a handful of constructors. Feel free to make a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants