Skip to content

Make output of label-based indexing (e.g. df.loc) jittable #1

@ehariri

Description

@ehariri

Series.loc and DataFrame.loc are not currently jittable / type-stable because the type of the output type is not consistent for a set of input types. In particular, the output with a single label depends on if there is 1 or more entries with this label. Here is an example:

>>> df = pd.DataFrame({"A": [1, 2, 3, 1], "B": 2}, index=[0, 1, 1, 2])

>>> df.loc[1, :]
   A  B
1  2  2
1  3  2

>>> df.loc[0, :]
A    1
B    2
Name: 0, dtype: int64

>>> df.loc[0, "A"]
1

>>> df.loc[1, "A"]
1    2
1    3
Name: A, dtype: int64

The same thing also exists with Series.loc.

We are interested in helping to provide support for the Pandas library via JIT compilation. To do this, we require each operation to be jittable. The current label based indexing approach (e.g. loc, Series getitem) is not jitable because it is not type stable (for the same input types there can be multiple divergent output types). In this case we have a two possible return types (i.e. Series and scalar) for a single input type.

To make this types stable, Series.loc should always return a Series, even with a single element. If people really need the scalar value, there should be a separate API, e.g. loc_elem, available to them.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions