You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I discovered pandera a couple of days ago and I encounter the same kind of bug within my implementation. I provide the full working example below.
I use pandera to validate a DataFrame I receive from an external source of data. The size of the received DataFrame might vary. In my example, we are interested about the SINSGA column. In the DataFrame I receive (as a CSV file), empty values can be denoted as zero-length strings (""), empty strings (" ") or string representations of NaN (i.e. "nan").
In order to standardize the representation of empty values, I first define my DataFrameModel with a DataFrame-wide parser replacing the representations of empty values with pd.NA. As I understood from the documentation, DataFrame-wide parsers are applied first, before column parsers and checks and thus it should be applied before the type coercion of the SINSGA column.
However, it seems the coercion is applied before my parser as the line "In parser" is never written to my standard output and the error below is thrown:
[V] I have checked that this issue has not already been reported.
[V] I have confirmed this bug exists on the latest version of pandera.
[V] (optional) I have confirmed this bug exists on the main branch of pandera.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
I expect the DataFrame-wide parser to be executed before column parser, including data type coercion, as specified here:
You can specify both dataframe- and column-level parsers, where dataframe-level parsers are performed before column-level parsers. Assuming that a schema contains parsers and checks, the validation process consists of the following steps:
dataframe-level parsing
column-level parsing
dataframe-level checks
column-level and index-level checks
Did I miss anything ?
Desktop (please complete the following information):
OS: iOS
Python: 3.10.12 (withing poetry environment created with poetry version 1.7.1)
Describe the bug
Hello,
I discovered pandera a couple of days ago and I encounter the same kind of bug within my implementation. I provide the full working example below.
I use pandera to validate a DataFrame I receive from an external source of data. The size of the received DataFrame might vary. In my example, we are interested about the
SINSGA
column. In the DataFrame I receive (as a CSV file), empty values can be denoted as zero-length strings (""
), empty strings (" "
) or string representations ofNaN
(i.e."nan"
).In order to standardize the representation of empty values, I first define my
DataFrameModel
with a DataFrame-wide parser replacing the representations of empty values withpd.NA
. As I understood from the documentation, DataFrame-wide parsers are applied first, before column parsers and checks and thus it should be applied before the type coercion of theSINSGA
column.However, it seems the coercion is applied before my parser as the line "In parser" is never written to my standard output and the error below is thrown:
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
Expected behavior
I expect the DataFrame-wide parser to be executed before column parser, including data type coercion, as specified here:
Did I miss anything ?
Desktop (please complete the following information):
Additional context
Error thrown:
The text was updated successfully, but these errors were encountered: