Skip to content

Files

Latest commit

ed14585 · Jul 31, 2024

History

History
This branch is 192 commits behind oracle-samples/oci-data-science-ai-samples:main.

ml-insights

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
Jul 31, 2024
Jan 12, 2024

Oracle Machine Learning Observability Insights Library (ML Insights)

ML Insights is a python library for data scientists, ML engineers and developers. Insights can be used to ingest data in different formats, apply row based transformations and monitor data and ML Models from validation to production.

ML Insights library also provides many ways to process and evaluate data and ML models. The options include low code alternative for customisation, a pre-built application and and further extensibility through custom applications and custom components.

Installation

ML Insights can be installed in a python 3.8 environment using:

pip install oracle-ml-insights

Several ML Insights dependencies are optional (for eg: Execution Engine) and can be installed with:

pip install oracle-ml-insights[option]

where "option" can be one of:

  • "dask", to run ML Insights on Dask Execution Engine

How it works

ML Insights helps evaluate and monitor data and ML model for entirety of ML Observability lifecycle.

Insights is component based where each component has a specific responsibility with a workflow managing the individual components.

Insights provides components to carry out tasks like data ingestion, row level data transformation, metric calculation and post processing of metric output. More details on these are covered in the Getting Started section.

In very simple terms, one has to provide location to the input data set that needs to be processed, select any additional simple transformation needed on the input data (for example, converting an un-structured column to structured one), and decide which metrics should be calculated for different features (columns of data). The user can also decide to define some post-action to be performed once all the metrics have been calculated.

Insights provides a simple, declarative API, out of box components covering majority of common use cases to choose from. Also, Insights enables users to author json-based configurations that can be used to define and customise all of its core features.

  • Insights currently supports CSV, JSON, and JSONL data types.

  • It also supports major execution engines like Native Pandas, Dask, and Spark.

  • Insights provides metrics in different groups like

    • Data Integrity
    • Data Quality/ Summary
    • Feature and Prediction Drift Detection
    • Model Performance for both classification and Regression Models
  • Insights also supports integration for writing metric data, or connecting to OCI monitoring service.

Examples

Jupyter Notebook
Minimal ML Insight to calculate metrics
Run ML Insight Using Config File
Run ML Insight using APIs
ML Insights Data Reader & Data Source Example
ML Insights : Post Processor Component Example.
ML Insights : Data Quality & Data Integrity Metrics
ML Insights : Performance Metrics For Classification Models
ML Insights Conflict Metrics
ML Insights : Performance Metrics For Regression Models
ML Insights : Drift Metrics
ML Insights : Data Correlation Metrics
ML Insights run with Custom Metrics

Note

The files sum_divide_by_k_custom_metrics.py and sum_divide_by_two_custom_metrics.py are examples of how to implement custom metrics and usage is demonstrated in sample notebook ML Insights run with Custom Metrics.