-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Series.loc and DataFrame.loc are not currently jittable / type-stable because the type of the output type is not consistent for a set of input types. In particular, the output with a single label depends on if there is 1 or more entries with this label. Here is an example:
>>> df = pd.DataFrame({"A": [1, 2, 3, 1], "B": 2}, index=[0, 1, 1, 2])
>>> df.loc[1, :]
A B
1 2 2
1 3 2
>>> df.loc[0, :]
A 1
B 2
Name: 0, dtype: int64
>>> df.loc[0, "A"]
1
>>> df.loc[1, "A"]
1 2
1 3
Name: A, dtype: int64The same thing also exists with Series.loc.
We are interested in helping to provide support for the Pandas library via JIT compilation. To do this, we require each operation to be jittable. The current label based indexing approach (e.g. loc, Series getitem) is not jitable because it is not type stable (for the same input types there can be multiple divergent output types). In this case we have a two possible return types (i.e. Series and scalar) for a single input type.
To make this types stable, Series.loc should always return a Series, even with a single element. If people really need the scalar value, there should be a separate API, e.g. loc_elem, available to them.