Skip to content

BUG: Pyarrow numeric dtype fillna filled not null entry #62878

@chwong-arini

Description

@chwong-arini

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd


def test_show_double_pyarrow_has_issue_when_fillna():
    """
    Very strange behavior observed when using pyarrow 'double' dtype with pandas Series.
    When using `fillna(0.0)`, the second element which is supposed no op becomes is "filled".
    """
    for raw_ls in [
        [1, 2, 3, 4, 5, pd.NA],
        [1, 2, 3, 4, pd.NA, 6],
        [1, 2, pd.NA, 4, 5, 6],
        # [pd.NA, 2, 3, 4, 5, 6],       # this works fine when pd.NA is first
        # [1, 2, 3, 4, pd.NA],          # this works fine when there are five elements
        # [1, 2, 3, pd.NA, pd.NA, 6],   # this works fine when there are two pd.NA
    ]:
        def get_series(ls):
            return pd.Series(ls, dtype="double[pyarrow]")
        s = get_series(raw_ls)
        s_filled = s.fillna(0.0)
        zero_counts = (s_filled == 0.0).sum()
        assert zero_counts == 2
        assert raw_ls[1] != 0 and not(pd.isna(raw_ls[1])) and s_filled.iloc[1] == 0.0


def test_show_int_pyarrow_has_issue_when_fillna():
    """
    Similar to double dtype, very strange behavior observed when using pyarrow 'int64' dtype.
    """
    for raw_ls in [
        [1, 2, 3, 4, 5, pd.NA],
        [1, 2, 3, 4, pd.NA, 6],
        [1, 2, pd.NA, 4, 5, 6],
        # [pd.NA, 2, 3, 4, 5, 6],       # this works fine when pd.NA is first
        # [1, 2, 3, 4, pd.NA],          # this works fine when there are five elements
        # [1, 2, 3, pd.NA, pd.NA, 6],   # this works fine when there are two pd.NA
    ]:
        def get_series(ls):
            return pd.Series(ls, dtype="int64[pyarrow]")
        s = get_series(raw_ls)
        s_filled = s.fillna(0)
        zero_counts = (s_filled == 0).sum()
        assert zero_counts == 2
        assert raw_ls[1] != 0 and not(pd.isna(raw_ls[1])) and s_filled.iloc[1] == 0

test_show_double_pyarrow_has_issue_when_fillna()
test_show_int_pyarrow_has_issue_when_fillna()

Issue Description

There are some issues when using fillna with pyarrow numeric series len >= 6. It will some row that is not supposed to be "filled."

Expected Behavior

Only the rows with pd.NA should be filled.

Installed Versions

INSTALLED VERSIONS

commit : 9c8bc3e
python : 3.13.3
python-bits : 64
OS : Windows
OS-release : 11

pandas : 2.3.3
numpy : 2.3.3
pyarrow : 21.0.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    Arrowpyarrow functionalityBugMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateUpstream issueIssue related to pandas dependency

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions