Data versioning for Fraud Detectin with lakeFS

Fraud detection is a critical part of any business. Discover how data management and versioning with lakeFS enables repeatable, version-controlled data sets, using familiar workflows and processes, while reducing storage costs for generative and predictive AI applications.

Detailed description

The purpose of this AI quickstart is to highlight the benefits of data versioning, provided by lakeFS, in an AI/ML environment. lakeFS allows the data engineer to manage the lifecycle of data using the same workflow a developer uses to manage source code, using git. This means that, like source code, data can be versioned, branched, merged and pulled from a git repository, although the data is actually stored in a backend object storage.

The quickstart will allow a demonstrator to quickly deploy both object storage, using MinIO, and lakeFS to serve as a git-like gateway that data engineers can interface with for data access. The following steps can be run very quickly:

Deploy Minio for on-premesis object storage, running on the local OpenShift cluster
Deploy an instance of lakeFS for git-like management of data and data versioning
Deploy fraud detection notebooks in OpenShift AI
Create and train a model using the notebooks and data
Serve the trained model
Perform fraud detection on sample transactions data
Update the training data and retrain the model using the new data version
Perform fraud detection on a new version of the sample transaction data
Show how OpenShift AI pipelines can be used to retrain and/or perform detection on new versions of training and sample data

See it in action

TODO: create an arcade?

Architecture diagrams

Requirements

This quickstart was developed and test on an OpenShift cluster with the following components and resources. This can be considered the minimum requirements.

Minimum hardware requirements

Node Type	Qty	vCPU	Memory (GB)
Control Plane	3	8	16
Worker	3	8	16

Note

A GPU is not required for this quickstart

Minimum software requirements

This quickstart was tested with the following software versions:

Software	Version
Red Hat OpenShift	4.20.5
Red Hat OpenShift Service Mesh	2.5.11-0
Red Hat OpenShift Serverless	1.37.0
Red Hat OpenShift AI	2.25
helm	3.17.1
lakeFS	1.73.0
MinIO	TBD

Required user permissions

The user performing this quickstart should have the ability to create a project in OpenShift and OpenShift AI. This requires the cluster role of admin (does not require cluster-admin)

Deploy

The process is very simple. Just follow the steps below.

Pre-requisites

The steps assume the following pre-requisite products and components are deployed and functional with required permissions on the cluster:

Red Hat OpenShift Container Platform
Red Hat OpenShift Service Mesh
Red Hat OpenShift Serverless
Red Hat OpenShift AI
User has admin permissions in the cluster

Deployment Steps

Clone this repo

$ git clone https://github.com/rh-ai-quickstart/Fraud-Detection-data-versioning-with-lakeFS.git

cd to deploy directory

$ cd Fraud-Detection-data-versioning-with-lakeFS/deploy

Login to the OpenShift cluster:

$ oc login --token=<user_token> --server=https://api.<openshift_cluster_fqdn>:6443

Make sure deploy.sh is executable and run it, passing it the name of the project in which to install. It can be an existing or new project. In this example, it will deploy to the lakefs project.

# Make script executable
$ chmod + deploy.sh

# Run script passing it the project in which to install
$ ./deploy.sh lakefs

Access lakeFS UI

Use the route to access the lakeFS browser-base UI.

Leave the username set to admin
Enter your email address (or a bogus email address)
Download the access_key_id and secret_access_key displayed on the new page, as they will not be accessible later on
Go back to the login page and log in using those credentials.

Delete

The project the apps were installed in can be deleted, which will delete all of the resources in it, including deployments, secrets, pods, configmaps, etc.

oc delete project lakefs

References

lakeFS documentation v1.73
OpenShift AI documentatin v2.25
OpenShift AI Fraud Detection example

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
.github		.github
demo		demo
deploy		deploy
docs/images		docs/images
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data versioning for Fraud Detectin with lakeFS

Table of contents

Detailed description

See it in action

Architecture diagrams

Requirements

Minimum hardware requirements

Minimum software requirements

Required user permissions

Deploy

Pre-requisites

Deployment Steps

Access lakeFS UI

Delete

References

Technical details

Tags

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

rh-ai-quickstart/Fraud-Detection-data-versioning-with-lakeFS

Folders and files

Latest commit

History

Repository files navigation

Data versioning for Fraud Detectin with lakeFS

Table of contents

Detailed description

See it in action

Architecture diagrams

Requirements

Minimum hardware requirements

Minimum software requirements

Required user permissions

Deploy

Pre-requisites

Deployment Steps

Access lakeFS UI

Delete

References

Technical details

Tags

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages