Quality Aware Feature Store

Simple and scalable feature store with data quality checks.

WARN: API WILL BE REWRITE FOR SIMPLICITY

feature store aim to solve the data management problems when building Machine Learning applications. However the data quality is a component which data teams need integrate and handle as separated component. This project join both concepts keeping the data quality closely coupled with data transformations making necessary a minimal data verification check and possibiliting the data/transformations check evolve during the projects.

For that qafs have a strong dependecy with pandera to build the data validations.

Features

Pandas-like API
Features information stored in database along with metadata.
Dask to process large datasets in a cluster enviroment.
Data is stored as timeseries in Parquet format, store in filesystem or object storage services.
Store transformations as feature.

Get Started

Installing the python package through pip:

$ pip install qafs

Bellow is an example of usage qafs where we'll create a feature store and register numbers feature and an squared feature transformation. First we need import the packages and create the feature store, for this example we are using sqlite database and persisting the features in the filesystem:

import qafs
import pandas as pd
import pandera as pa
from pandera import Check, Column, DataFrameSchema
from pandera import io


fs = qafs.FeatureStore(
    connection_string='sqlite:///test.sqlite',
    url='/tmp/featurestore/example'
)

Features could be stored in namespaces, it help organize the data. When creating numbers we specify the 'example/numbers' feature to point the feature numbersat that namespace example however we can use the arguments name='numbers', namespace='example' as well. The we specify the data validation using pandera telling that feature is Integer and the values should be greater than 0:

fs.create_namespace('example', description='Example datasets')
fs.create_feature(
    'example/numbers',
    description='Timeseries of numbers',
    check=Column(pa.Int, Check.greater_than(0))
)


dts = pd.date_range('2020-01-01', '2021-02-09')
df = pd.DataFrame({'time': dts, 'numbers': list(range(1, len(dts) + 1))})

fs.save_dataframe(df, name='numbers', namespace='example')

To register our squared transformation feature we're using the annotation fs.transform and fetching the data from the numbers feature applying the same data validation from numbers:

@fs.transform(
    'example/squared',
    from_features=['example/numbers'],
    check=Column(pa.Int, Check.greater_than(0))
)
def squared(df):
    return df ** 2

When fetch our features we should see:

df_query = fs.load_dataframe(
    ['example/numbers', 'example/squared'], 
    from_date='2021-01-01',
    to_date='2021-01-31'
)
print(df_query.tail(1))
##----
#             example/numbers  example/squared
# time                                        
# 2021-01-31              397           157609
##----

Contributing

Please follow the Contributing guide.

License

GPL-3.0 License

This project started using the as base bytehub feature store and is under the same license.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github/workflows		.github/workflows
examples		examples
src/qafs		src/qafs
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
VERSION		VERSION
mypy.ini		mypy.ini
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
requirements_dev.txt		requirements_dev.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quality Aware Feature Store

Simple and scalable feature store with data quality checks.

WARN: API WILL BE REWRITE FOR SIMPLICITY

Features

Get Started

Contributing

License

About

Releases 1

Packages

Languages

License

rodrigobaron/qafs

Folders and files

Latest commit

History

Repository files navigation

Quality Aware Feature Store

Simple and scalable feature store with data quality checks.

WARN: API WILL BE REWRITE FOR SIMPLICITY

Features

Get Started

Contributing

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages