Skip to content

ENH: first-class UUID support #63511

@flying-sheep

Description

@flying-sheep

Feature Type

  • Adding new functionality to pandas
  • Changing existing functionality in pandas
  • Removing existing functionality in pandas

Problem Description

A lot of people want to use UUID arrays, so we should provide first-class support.

If you want to play around with it, I prototyped this functionality here: https://pypi.org/project/pandas-uuid/

Feature Description

I suggest to do something similar as for {Arrow,}StringArray, i.e. have a dyad of ExtensionArray types, one backed by numpy and one by pyarrow.

They don’t need a lot of features except for comparison (self == other / self == elem) and membership tests (self.__contains__(elem)/elem in self and self.isin(other)).

The numpy variant needs to be backed by a np.void(16)/"V16" array, since np.bytes_/"S" has special treatment of null bytes, whereas numeric types (like UUIDs) assign no special meaning to them.

Alternative Solutions

  • We could use MaskedArray as base for the numpy-backed variant to allow missing values in both cases.
  • We could just force people to rely on pyarrow for this functionality, but I feel that wrapping np.void(16) is simple enough. People didn’t like adding the pyarrow dependency for basic functionality, so I assume you want to keep adding basic features that don’t rely on pyarrow.
  • We could keep the pandas-uuid package around, see here for its limitations.

Additional Context

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementExtensionArrayExtending pandas with custom dtypes or arrays.Needs DiscussionRequires discussion from core team before further action

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions