Skip to content

Commit

Permalink
Project import generated by Copybara. (#131)
Browse files Browse the repository at this point in the history
GitOrigin-RevId: 376d560591c49a1cbb8de1922d03cb51867613b5

Co-authored-by: Snowflake Authors <[email protected]>
  • Loading branch information
sfc-gh-anavalos and Snowflake Authors authored Nov 21, 2024
1 parent 38d2497 commit 7bc5f40
Show file tree
Hide file tree
Showing 42 changed files with 856 additions and 575 deletions.
4 changes: 2 additions & 2 deletions .bazelrc
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Common Default

# Wrapper to make sure tests are run.
# Allow at most 3 hours for eternal tests.
test --run_under='//bazel:test_wrapper' --test_timeout=-1,-1,-1,10800
# Allow at most 4 hours for eternal tests.
test --run_under='//bazel:test_wrapper' --test_timeout=-1,-1,-1,14400

# Since integration tests are located in different packages than code under test,
# the default instrumentation filter would exclude the code under test. This
Expand Down
16 changes: 15 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,20 @@
# Release History

## 1.7.1
## 1.7.2

### Bug Fixes

- Model Explainability: Fix issue that explain is enabled for scikit-learn pipeline
whose task is UNKNOWN and fails later when invoked.

### Behavior Changes

### New Features

- Registry: Support asynchronous model inference service creation with the `block` option
in `ModelVersion.create_service()` set to True by default.

## 1.7.1 (2024-11-05)

### Bug Fixes

Expand Down
35 changes: 29 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ and deployment process, and includes two key components.

### Snowpark ML Development

[Snowpark ML Development](https://docs.snowflake.com/en/developer-guide/snowpark-ml/index#snowpark-ml-development)
[Snowpark ML Development](https://docs.snowflake.com/en/developer-guide/snowpark-ml/index#ml-modeling)
provides a collection of python APIs enabling efficient ML model development directly in Snowflake:

1. Modeling API (`snowflake.ml.modeling`) for data preprocessing, feature engineering and model training in Snowflake.
Expand All @@ -26,14 +26,21 @@ their native data loader formats.
1. FileSet API: FileSet provides a Python fsspec-compliant API for materializing data into a Snowflake internal stage
from a query or Snowpark Dataframe along with a number of convenience APIs.

### Snowpark Model Management [Public Preview]
### Snowflake MLOps

[Snowpark Model Management](https://docs.snowflake.com/en/developer-guide/snowpark-ml/index#snowpark-ml-ops) complements
the Snowpark ML Development API, and provides model management capabilities along with integrated deployment into Snowflake.
Snowflake MLOps contains suit of tools and objects to make ML development cycle. It complements
the Snowpark ML Development API, and provides end to end development to deployment within Snowflake.
Currently, the API consists of:

1. Registry: A python API for managing models within Snowflake which also supports deployment of ML models into Snowflake
as native MODEL object running with Snowflake Warehouse.
1. [Registry](https://docs.snowflake.com/en/developer-guide/snowpark-ml/index#snowflake-model-registry): A python API
allows secure deployment and management of models in Snowflake, supporting models trained both inside and outside of
Snowflake.
2. [Feature Store](https://docs.snowflake.com/en/developer-guide/snowpark-ml/index#snowflake-feature-store): A fully
integrated solution for defining, managing, storing and discovering ML features derived from your data. The
Snowflake Feature Store supports automated, incremental refresh from batch and streaming data sources, so that
feature pipelines need be defined only once to be continuously updated with new data.
3. [Datasets](https://docs.snowflake.com/developer-guide/snowflake-ml/overview#snowflake-datasets): Dataset provide an
immutable, versioned snapshot of your data suitable for ingestion by your machine learning models.

## Getting started

Expand Down Expand Up @@ -80,3 +87,19 @@ conda install \

Note that until a `snowflake-ml-python` package version is available in the official Snowflake conda channel, there may
be compatibility issues. Server-side functionality that `snowflake-ml-python` depends on may not yet be released.

### Verifying the package

1. Install cosign.
This example is using golang installation: [installing-cosign-with-go](https://edu.chainguard.dev/open-source/sigstore/cosign/how-to-install-cosign/#installing-cosign-with-go).
1. Download the file from the repository like [pypi](https://pypi.org/project/snowflake-ml-python/#files).
1. Download the signature files from the [release tag](https://github.com/snowflakedb/snowflake-ml-python/releases/tag/1.7.0).
1. Verify signature on projects signed using Jenkins job:

```sh
cosign verify-blob snowflake_ml_python-1.7.0.tar.gz --key snowflake-ml-python-1.7.0.pub --signature resources.linux.snowflake_ml_python-1.7.0.tar.gz.sig

cosign verify-blob snowflake_ml_python-1.7.0.tar.gz --key snowflake-ml-python-1.7.0.pub --signature resources.linux.snowflake_ml_python-1.7.0
```

NOTE: Version 1.7.0 is used as example here. Please choose the the latest version.
1 change: 1 addition & 0 deletions bazel/environments/conda-env-snowflake.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ dependencies:
- protobuf==3.20.3
- psutil==5.9.0
- pyarrow==10.0.1
- pyjwt==2.8.0
- pytest-rerunfailures==12.0
- pytest-xdist==3.5.0
- pytest==7.4.0
Expand Down
1 change: 1 addition & 0 deletions bazel/environments/conda-env.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ dependencies:
- protobuf==3.20.3
- psutil==5.9.0
- pyarrow==10.0.1
- pyjwt==2.8.0
- pytest-rerunfailures==12.0
- pytest-xdist==3.5.0
- pytest==7.4.0
Expand Down
1 change: 1 addition & 0 deletions bazel/environments/conda-gpu-env.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ dependencies:
- protobuf==3.20.3
- psutil==5.9.0
- pyarrow==10.0.1
- pyjwt==2.8.0
- pytest-rerunfailures==12.0
- pytest-xdist==3.5.0
- pytest==7.4.0
Expand Down
3 changes: 2 additions & 1 deletion ci/conda_recipe/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ build:
noarch: python
package:
name: snowflake-ml-python
version: 1.7.1
version: 1.7.2
requirements:
build:
- python
Expand All @@ -35,6 +35,7 @@ requirements:
- packaging>=20.9,<25
- pandas>=1.0.0,<3
- pyarrow
- pyjwt>=2.0.0, <3
- pytimeparse>=1.1.8,<2
- pyyaml>=6.0,<7
- requests
Expand Down
1 change: 0 additions & 1 deletion ci/targets/quarantine/prod3.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,3 @@
//tests/integ/snowflake/ml/modeling/preprocessing:k_bins_discretizer_test
//tests/integ/snowflake/ml/modeling/linear_model:logistic_regression_test
//tests/integ/snowflake/ml/registry/model:registry_mlflow_model_test
//tests/integ/snowflake/ml/registry/services/...
3 changes: 2 additions & 1 deletion docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,4 +32,5 @@ The following files are in the `docs/source` directory:
- `index.rst`: ReStructuredText (RST) file that will be built as the index page.
It mainly as a landing point and indicates the subp-ackages to include in the API reference.
Currently these include the Modeling and FileSet/FileSystem APIs.
- `fileset.rst`, `modeling.rst`, `registry.rst`: RST files that direct Sphinx to include the specific classes in each submodule.
- RST files that direct Sphinx to include the specific classes in each submodule.
- `fileset.rst`, `modeling.rst`, `monitoring.rst`, `registry.rst`
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,4 +32,5 @@ Table of Contents
fileset
model
modeling
monitoring
registry
31 changes: 31 additions & 0 deletions docs/source/monitoring.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
===========================
snowflake.ml.monitoring
===========================

.. automodule:: snowflake.ml.monitoring
:noindex:

snowflake.ml.monitoring.model_monitor
-------------------------------------

.. currentmodule:: snowflake.ml.monitoring.model_monitor

.. rubric:: Classes

.. autosummary::
:toctree: api/monitoring

ModelMonitor

snowflake.ml.monitoring.entities
-------------------------------------

.. currentmodule:: snowflake.ml.monitoring.entities

.. rubric:: Classes

.. autosummary::
:toctree: api/monitoring

model_monitor_config.ModelMonitorConfig
model_monitor_config.ModelMonitorSourceConfig
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ peft==0.5.0
protobuf==3.20.3
psutil==5.9.0
pyarrow==10.0.1
pyjwt==2.8.0
pytest-rerunfailures==12.0
pytest-xdist==3.5.0
pytest==7.4.0
Expand Down
3 changes: 3 additions & 0 deletions requirements.yml
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,9 @@
- name: pyarrow
dev_version: 10.0.1
version_requirements: ''
- name: pyjwt
dev_version: 2.8.0
version_requirements: '>=2.0.0, <3'
- name: pytest
dev_version: 7.4.0
tags:
Expand Down
5 changes: 5 additions & 0 deletions snowflake/ml/_internal/utils/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -249,3 +249,8 @@ py_test(
"//snowflake/ml/test_utils:mock_session",
],
)

py_library(
name = "jwt_generator",
srcs = ["jwt_generator.py"],
)
141 changes: 141 additions & 0 deletions snowflake/ml/_internal/utils/jwt_generator.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
import base64
import hashlib
import logging
from datetime import datetime, timedelta, timezone
from typing import Optional

import jwt
from cryptography.hazmat.primitives import serialization
from cryptography.hazmat.primitives.asymmetric import types

logger = logging.getLogger(__name__)

ISSUER = "iss"
EXPIRE_TIME = "exp"
ISSUE_TIME = "iat"
SUBJECT = "sub"


class JWTGenerator:
"""
Creates and signs a JWT with the specified private key file, username, and account identifier. The JWTGenerator
keeps the generated token and only regenerates the token if a specified period of time has passed.
"""

_DEFAULT_LIFETIME = timedelta(minutes=59) # The tokens will have a 59-minute lifetime
_DEFAULT_RENEWAL_DELTA = timedelta(minutes=54) # Tokens will be renewed after 54 minutes
ALGORITHM = "RS256" # Tokens will be generated using RSA with SHA256

def __init__(
self,
account: str,
user: str,
private_key: types.PRIVATE_KEY_TYPES,
lifetime: Optional[timedelta] = None,
renewal_delay: Optional[timedelta] = None,
) -> None:
"""
Create a new JWTGenerator object.
Args:
account: The account identifier.
user: The username.
private_key: The private key used to sign the JWT.
lifetime: The lifetime of the token.
renewal_delay: The time before the token expires to renew it.
"""

# Construct the fully qualified name of the user in uppercase.
self.account = JWTGenerator._prepare_account_name_for_jwt(account)
self.user = user.upper()
self.qualified_username = self.account + "." + self.user
self.private_key = private_key
self.public_key_fp = JWTGenerator._calculate_public_key_fingerprint(self.private_key)

self.issuer = self.qualified_username + "." + self.public_key_fp
self.lifetime = lifetime or JWTGenerator._DEFAULT_LIFETIME
self.renewal_delay = renewal_delay or JWTGenerator._DEFAULT_RENEWAL_DELTA
self.renew_time = datetime.now(timezone.utc)
self.token: Optional[str] = None

logger.info(
"""Creating JWTGenerator with arguments
account : %s, user : %s, lifetime : %s, renewal_delay : %s""",
self.account,
self.user,
self.lifetime,
self.renewal_delay,
)

@staticmethod
def _prepare_account_name_for_jwt(raw_account: str) -> str:
account = raw_account
if ".global" not in account:
# Handle the general case.
idx = account.find(".")
if idx > 0:
account = account[0:idx]
else:
# Handle the replication case.
idx = account.find("-")
if idx > 0:
account = account[0:idx]
# Use uppercase for the account identifier.
return account.upper()

def get_token(self) -> str:
now = datetime.now(timezone.utc) # Fetch the current time
if self.token is not None and self.renew_time > now:
return self.token

# If the token has expired or doesn't exist, regenerate the token.
logger.info(
"Generating a new token because the present time (%s) is later than the renewal time (%s)",
now,
self.renew_time,
)
# Calculate the next time we need to renew the token.
self.renew_time = now + self.renewal_delay

# Create our payload
payload = {
# Set the issuer to the fully qualified username concatenated with the public key fingerprint.
ISSUER: self.issuer,
# Set the subject to the fully qualified username.
SUBJECT: self.qualified_username,
# Set the issue time to now.
ISSUE_TIME: now,
# Set the expiration time, based on the lifetime specified for this object.
EXPIRE_TIME: now + self.lifetime,
}

# Regenerate the actual token
token = jwt.encode(payload, key=self.private_key, algorithm=JWTGenerator.ALGORITHM)
# If you are using a version of PyJWT prior to 2.0, jwt.encode returns a byte string instead of a string.
# If the token is a byte string, convert it to a string.
if isinstance(token, bytes):
token = token.decode("utf-8")
self.token = token
logger.info(
"Generated a JWT with the following payload: %s",
jwt.decode(self.token, key=self.private_key.public_key(), algorithms=[JWTGenerator.ALGORITHM]),
)

return token

@staticmethod
def _calculate_public_key_fingerprint(private_key: types.PRIVATE_KEY_TYPES) -> str:
# Get the raw bytes of public key.
public_key_raw = private_key.public_key().public_bytes(
serialization.Encoding.DER, serialization.PublicFormat.SubjectPublicKeyInfo
)

# Get the sha256 hash of the raw bytes.
sha256hash = hashlib.sha256()
sha256hash.update(public_key_raw)

# Base64-encode the value and prepend the prefix 'SHA256:'.
public_key_fp = "SHA256:" + base64.b64encode(sha256hash.digest()).decode("utf-8")
logger.info("Public key fingerprint is %s", public_key_fp)

return public_key_fp
Loading

0 comments on commit 7bc5f40

Please sign in to comment.