ML Insights is a python library for data scientists, ML engineers and developers. Insights can be used to ingest data in different formats, apply row based transformations and monitor data and ML Models from validation to production.
ML Insights library also provides many ways to process and evaluate data and ML models. The options include low code alternative for customisation, a pre-built application and and further extensibility through custom applications and custom components.
ML Insights can be installed in a python 3.8 environment using:
pip install oracle-ml-insights
Several ML Insights dependencies are optional (for eg: Execution Engine) and can be installed with:
pip install oracle-ml-insights[option]
where "option" can be one of:
- "dask", to run ML Insights on Dask Execution Engine
ML Insights helps evaluate and monitor data and ML model for entirety of ML Observability lifecycle.
Insights is component based where each component has a specific responsibility with a workflow managing the individual components.
Insights provides components to carry out tasks like data ingestion, row level data transformation, metric calculation and post processing of metric output. More details on these are covered in the Getting Started section.
In very simple terms, one has to provide location to the input data set that needs to be processed, select any additional simple transformation needed on the input data (for example, converting an un-structured column to structured one), and decide which metrics should be calculated for different features (columns of data). The user can also decide to define some post-action to be performed once all the metrics have been calculated.
Insights provides a simple, declarative API, out of box components covering majority of common use cases to choose from. Also, Insights enables users to author json-based configurations that can be used to define and customise all of its core features.
-
Insights currently supports CSV, JSON, and JSONL data types.
-
It also supports major execution engines like Native Pandas, Dask, and Spark.
-
Insights provides metrics in different groups like
- Data Integrity
- Data Quality/ Summary
- Feature and Prediction Drift Detection
- Model Performance for both classification and Regression Models
-
Insights also supports integration for writing metric data, or connecting to OCI monitoring service.
The files sum_divide_by_k_custom_metrics.py and sum_divide_by_two_custom_metrics.py are examples of how to implement custom metrics and usage is demonstrated in sample notebook ML Insights run with Custom Metrics.