-
-
Notifications
You must be signed in to change notification settings - Fork 19.5k
Open
Labels
EnhancementExtensionArrayExtending pandas with custom dtypes or arrays.Extending pandas with custom dtypes or arrays.Needs DiscussionRequires discussion from core team before further actionRequires discussion from core team before further action
Description
Feature Type
- Adding new functionality to pandas
- Changing existing functionality in pandas
- Removing existing functionality in pandas
Problem Description
A lot of people want to use UUID arrays, so we should provide first-class support.
If you want to play around with it, I prototyped this functionality here: https://pypi.org/project/pandas-uuid/
Feature Description
I suggest to do something similar as for {Arrow,}StringArray, i.e. have a dyad of ExtensionArray types, one backed by numpy and one by pyarrow.
They don’t need a lot of features except for comparison (self == other / self == elem) and membership tests (self.__contains__(elem)/elem in self and self.isin(other)).
The numpy variant needs to be backed by a np.void(16)/"V16" array, since np.bytes_/"S" has special treatment of null bytes, whereas numeric types (like UUIDs) assign no special meaning to them.
Alternative Solutions
- We could use
MaskedArrayas base for the numpy-backed variant to allow missing values in both cases. - We could just force people to rely on pyarrow for this functionality, but I feel that wrapping
np.void(16)is simple enough. People didn’t like adding the pyarrow dependency for basic functionality, so I assume you want to keep adding basic features that don’t rely on pyarrow. - We could keep the
pandas-uuidpackage around, see here for its limitations.
Additional Context
- Turns out the following issue isn’t actually a blocker if we have the
ExtensionDtypereport itskindto be"O": BUG: Series constructor incorrectly assumes that any dtype with.kind == 'V...'is a “compound dtype” #54810 - We should test that things like this work by default: BUG: DataFrame to JSON failed when it with UUID #59132
Metadata
Metadata
Assignees
Labels
EnhancementExtensionArrayExtending pandas with custom dtypes or arrays.Extending pandas with custom dtypes or arrays.Needs DiscussionRequires discussion from core team before further actionRequires discussion from core team before further action