Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: how to implement vectorized validations on struct columns? #125

Open
dkapitan opened this issue Dec 18, 2024 · 0 comments
Open

Comments

@dkapitan
Copy link

dkapitan commented Dec 18, 2024

We are looking for a solution for performance validation of healthcare data in the FHIR format. Details are described here:
beda-software/FHIRPathMappingLanguage#18

Patito looks very interesting and I was wondering whether you could say something about vectorized validation of struct columns with polars and patito?

Basic idea

We want a performant solution for tabular legacy --> FHIR mappings like this (note: this code doesn't work):

import pandera as pa
from pandera.engines.pandas_engine import PydanticModel # PydanticModel only available in pandas_engine
import polars as pl
from resources import Patient

class PatientSchema(pa.DataFrameModel):
    """Pandera schema using the pydantic model."""

    class Config:
        """Config with dataframe-level data type."""

        dtype = PydanticModel(Patient)
        coerce = True  # this is required, otherwise a SchemaInitError is raised

patient = pl.read_json("general-person-example.json")
PatientSchema.validate(patient)

Current issues

tagging @ir4y and @yannick-vinkesteijn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant