Snowpark ML is a set of tools including SDKs and underlying infrastructure to build and deploy machine learning models. With Snowpark ML, you can pre-process data, train, manage and deploy ML models all within Snowflake, using a single SDK, and benefit from Snowflake’s proven performance, scalability, stability and governance at every stage of the Machine Learning workflow.
The Snowpark ML Python SDK provides a number of APIs to support each stage of an end-to-end Machine Learning development and deployment process, and includes two key components.
Snowpark ML Development provides a collection of python APIs enabling efficient ML model development directly in Snowflake:
-
Modeling API (
snowflake.ml.modeling
) for data preprocessing, feature engineering and model training in Snowflake. This includes thesnowflake.ml.modeling.preprocessing
module for scalable data transformations on large data sets utilizing the compute resources of underlying Snowpark Optimized High Memory Warehouses, and a large collection of ML model development classes based on sklearn, xgboost, and lightgbm. -
Framework Connectors: Optimized, secure and performant data provisioning for Pytorch and Tensorflow frameworks in their native data loader formats.
-
FileSet API: FileSet provides a Python fsspec-compliant API for materializing data into a Snowflake internal stage from a query or Snowpark Dataframe along with a number of convenience APIs.
Snowflake MLOps contains suit of tools and objects to make ML development cycle. It complements the Snowpark ML Development API, and provides end to end development to deployment within Snowflake. Currently, the API consists of:
- Registry: A python API allows secure deployment and management of models in Snowflake, supporting models trained both inside and outside of Snowflake.
- Feature Store: A fully integrated solution for defining, managing, storing and discovering ML features derived from your data. The Snowflake Feature Store supports automated, incremental refresh from batch and streaming data sources, so that feature pipelines need be defined only once to be continuously updated with new data.
- Datasets: Dataset provide an immutable, versioned snapshot of your data suitable for ingestion by your machine learning models.
If you don't have a Snowflake account yet, you can sign up for a 30-day free trial account.
Follow the installation instructions in the Snowflake documentation.
Python versions 3.9 to 3.11 are supported. You can use miniconda or anaconda to create a Conda environment (recommended), or virtualenv to create a virtual environment.
The Snowflake Conda Channel contains the official snowpark ML package releases.
The recommended approach is to install snowflake-ml-python
this conda channel:
conda install \
-c https://repo.anaconda.com/pkgs/snowflake \
--override-channels \
snowflake-ml-python
See the developer guide for installation instructions.
The latest version of the snowpark-ml-python
package is also published in a conda channel in this repository. Package versions
in this channel may not yet be present in the official Snowflake conda channel.
Install snowflake-ml-python
from this channel with the following (being sure to replace <version_specifier>
with the
desired version, e.g. 1.0.10
):
conda install \
-c https://raw.githubusercontent.com/snowflakedb/snowflake-ml-python/conda/releases/ \
-c https://repo.anaconda.com/pkgs/snowflake \
--override-channels \
snowflake-ml-python==<version_specifier>
Note that until a snowflake-ml-python
package version is available in the official Snowflake conda channel, there may
be compatibility issues. Server-side functionality that snowflake-ml-python
depends on may not yet be released.
-
Install cosign. This example is using golang installation: installing-cosign-with-go.
-
Download the file from the repository like pypi.
-
Download the signature files from the release tag.
-
Verify signature on projects signed using Jenkins job:
cosign verify-blob snowflake_ml_python-1.7.0.tar.gz --key snowflake-ml-python-1.7.0.pub --signature resources.linux.snowflake_ml_python-1.7.0.tar.gz.sig cosign verify-blob snowflake_ml_python-1.7.0.tar.gz --key snowflake-ml-python-1.7.0.pub --signature resources.linux.snowflake_ml_python-1.7.0
NOTE: Version 1.7.0 is used as example here. Please choose the the latest version.