Skip to content

Commit

Permalink
Project import generated by Copybara. (#38)
Browse files Browse the repository at this point in the history
GitOrigin-RevId: 58e65003b64918af74ece769567892c98a3f9fbd

Co-authored-by: Snowflake Authors <[email protected]>
  • Loading branch information
snowflake-provisioner and Snowflake Authors authored Aug 31, 2023
1 parent f3a83fb commit 192f794
Show file tree
Hide file tree
Showing 159 changed files with 12,013 additions and 3,282 deletions.
39 changes: 37 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,41 @@
# Release History

## 1.0.5
## 1.0.6

### New Features
- Model Registry: add `create_if_not_exists` parameter in constructor.
- Model Registry: Added get_or_create_model_registry API.
- Model Registry: Added support for using GPU inference when deploying XGBoost (`xgboost.XGBModel` and `xgboost.Booster`), PyTorch (`torch.nn.Module` and `torch.jit.ScriptModule`) and TensorFlow (`tensorflow.Module` and `tensorflow.keras.Model`) models to Snowpark Container Services.
- Model Registry: When inferring model signature, `Sequence` of built-in types, `Sequence` of `numpy.ndarray`, `Sequence` of `torch.Tensor`, `Sequence` of `tensorflow.Tensor` and `Sequence` of `tensorflow.Tensor` can be used instead of only `List` of them.
- Model Registry: Added `get_training_dataset` API.
- Model Development: Size of metrics result can exceed previous 8MB limit.
- Model Registry: Added support save/load/deploy HuggingFace pipeline object (`transformers.Pipeline`) and our wrapper (`snowflake.ml.model.models.huggingface_pipeline.HuggingFacePipelineModel`) to it. Using the wrapper to specify configurations and the model for the pipeline will be loaded dynamically when deploying. Currently, following tasks are supported to log without manually specifying model signatures:
- "conversational"
- "fill-mask"
- "question-answering"
- "summarization"
- "table-question-answering"
- "text2text-generation"
- "text-classification" (alias "sentiment-analysis" available)
- "text-generation"
- "token-classification" (alias "ner" available)
- "translation"
- "translation_xx_to_yy"
- "zero-shot-classification"

### Bug Fixes
- Model Development: Fixed a bug when using simple imputer with numpy >= 1.25.
- Model Development: Fixed a bug when inferring the type of label columns.

### Behavior Changes
- Model Registry: `log_model()` now return a `ModelReference` object instead of a model ID.
- Model Registry: When deploying a model with 1 `target method` only, the `target_method` argument can be omitted.
- Model Registry: When using the snowflake-ml-python with version newer than what is available in Snowflake Anaconda Channel, `embed_local_ml_library` option will be set as `True` automatically if not.
- Model Registry: When deploying a model to Snowpark Container Services and using GPU, the default value of num_workers will be 1.
- Model Registry: `keep_order` and `output_with_input_features` in the deploy options have been removed. Now the behavior is controlled by the type of the input when calling `model.predict()`. If the input is a `pandas.DataFrame`, the behavior will be the same as `keep_order=True` and `output_with_input_features=False` before. If the input is a `snowpark.DataFrame`, the behavior will be the same as `keep_order=False` and `output_with_input_features=True` before.
- Model Registry: When logging and deploying PyTorch (`torch.nn.Module` and `torch.jit.ScriptModule`) and TensorFlow (`tensorflow.Module` and `tensorflow.keras.Model`) models, we no longer accept models whose input is a list of tensor and output is a list of tensors. Instead, now we accept models whose input is 1 or more tensors as positional arguments, and output is a tensor or a tuple of tensors. The input and output dataframe when predicting keep the same as before, that is every column is an array feature and contains a tensor.

## 1.0.5 (2023-08-17)

### New Features

Expand All @@ -13,7 +48,7 @@
- Model Registry: Fixed an issue that the UDF name created when deploying a model is not identical to what is provided and cannot be correctly dropped when deployment getting dropped.
- connection_params.SnowflakeLoginOptions(): Added support for `private_key_path`.

## 1.0.4
## 1.0.4 (2023-07-28)

### New Features

Expand Down
2 changes: 1 addition & 1 deletion bazel/environments/conda-env-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,5 +14,5 @@ dependencies:
- numpy==1.24.3
- packaging==23.0
- pyyaml==6.0
- scikit-learn==1.2.2
- scikit-learn==1.3.0
- xgboost==1.7.3
5 changes: 4 additions & 1 deletion bazel/environments/conda-env-snowflake.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ dependencies:
- aiohttp==3.8.3
- anyio==3.5.0
- boto3==1.24.28
- cachetools==4.2.2
- cloudpickle==2.0.0
- conda-libmamba-solver==23.3.0
- coverage==6.3.2
Expand All @@ -23,6 +24,7 @@ dependencies:
- lightgbm==3.3.5
- mlflow==2.3.1
- moto==4.0.11
- multipledispatch==0.6.0
- mypy==0.981
- networkx==2.8.4
- numpy==1.24.3
Expand All @@ -36,13 +38,14 @@ dependencies:
- requests==2.29.0
- ruamel.yaml==0.17.21
- s3fs==2022.11.0
- scikit-learn==1.2.2
- scikit-learn==1.3.0
- scipy==1.9.3
- snowflake-connector-python==3.0.3
- snowflake-snowpark-python==1.5.1
- sqlparse==0.4.3
- tensorflow==2.10.0
- transformers==4.29.2
- types-protobuf==4.23.0.1
- types-requests==2.30.0.0
- typing-extensions==4.5.0
- xgboost==1.7.3
6 changes: 5 additions & 1 deletion bazel/environments/conda-env.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,11 @@ dependencies:
- aiohttp==3.8.3
- anyio==3.5.0
- boto3==1.24.28
- cachetools==4.2.2
- cloudpickle==2.0.0
- conda-forge::starlette==0.27.0
- conda-forge::types-PyYAML==6.0.12
- conda-forge::types-cachetools==4.2.2
- conda-libmamba-solver==23.3.0
- coverage==6.3.2
- cryptography==39.0.1
Expand All @@ -25,6 +27,7 @@ dependencies:
- lightgbm==3.3.5
- mlflow==2.3.1
- moto==4.0.11
- multipledispatch==0.6.0
- mypy==0.981
- networkx==2.8.4
- numpy==1.24.3
Expand All @@ -39,13 +42,14 @@ dependencies:
- requests==2.29.0
- ruamel.yaml==0.17.21
- s3fs==2022.11.0
- scikit-learn==1.2.2
- scikit-learn==1.3.0
- scipy==1.9.3
- snowflake-connector-python==3.0.3
- snowflake-snowpark-python==1.5.1
- sqlparse==0.4.3
- tensorflow==2.10.0
- transformers==4.29.2
- types-protobuf==4.23.0.1
- types-requests==2.30.0.0
- typing-extensions==4.5.0
- xgboost==1.7.3
7 changes: 4 additions & 3 deletions ci/conda_recipe/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ build:
noarch: python
package:
name: snowflake-ml-python
version: 1.0.5
version: 1.0.6
requirements:
build:
- python
Expand All @@ -34,7 +34,7 @@ requirements:
- python
- pyyaml>=6.0,<7
- requests
- scikit-learn>=1.2.1,<1.3
- scikit-learn>=1.2.1,<1.4
- scipy>=1.9,<2
- snowflake-connector-python>=3.0.3,<4
- snowflake-snowpark-python>=1.5.1,<2
Expand All @@ -43,8 +43,9 @@ requirements:
- xgboost>=1.7.3,<2
run_constrained:
- lightgbm==3.3.5
- mlflow>=2.1.0,<3
- mlflow>=2.1.0,<2.4
- tensorflow>=2.9,<3
- torchdata>=0.4,<1
- transformers>=4.29.2,<5
source:
path: ../../
26 changes: 19 additions & 7 deletions codegen/sklearn_wrapper_template.py_template
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,10 @@ from snowflake.snowpark import DataFrame, Session
from snowflake.snowpark.functions import pandas_udf, sproc
from snowflake.snowpark.types import PandasSeries
from snowflake.snowpark._internal.type_utils import convert_sp_to_sf_type
from snowflake.snowpark._internal.utils import (
TempObjectType,
random_name_for_temp_object,
)

from snowflake.ml.model.model_signature import (
DataType,
Expand Down Expand Up @@ -244,7 +248,7 @@ class {transform.original_class_name}(BaseTransformer):
cp.dump(self._sklearn_object, local_transform_file)

# Create temp stage to run fit.
transform_stage_name = "SNOWML_TRANSFORM_{{safe_id}}".format(safe_id=self._get_rand_id())
transform_stage_name = random_name_for_temp_object(TempObjectType.STAGE)
stage_creation_query = f"CREATE OR REPLACE TEMPORARY STAGE {{transform_stage_name}};"
SqlResultValidator(
session=session,
Expand All @@ -258,7 +262,7 @@ class {transform.original_class_name}(BaseTransformer):
stage_result_file_name = posixpath.join(transform_stage_name, os.path.basename(local_transform_file_name))
local_result_file_name = get_temp_file_path()

fit_sproc_name = "SNOWML_FIT_{{safe_id}}".format(safe_id=self._get_rand_id())
fit_sproc_name = random_name_for_temp_object(TempObjectType.PROCEDURE)
statement_params = telemetry.get_function_usage_statement_params(
project=_PROJECT,
subproject=_SUBPROJECT,
Expand Down Expand Up @@ -439,8 +443,7 @@ class {transform.original_class_name}(BaseTransformer):
pkg_versions=self._get_dependencies(), session=session, subproject=_SUBPROJECT)

# Register vectorized UDF for batch inference
batch_inference_udf_name = "SNOWML_BATCH_INFERENCE_{{safe_id}}_{{method}}".format(
safe_id=self._get_rand_id(), method=inference_method)
batch_inference_udf_name = random_name_for_temp_object(TempObjectType.FUNCTION)

# Need to do this since if we use self._sklearn_object directly in the UDF, Snowpark
# will try to pickle all of self which fails.
Expand Down Expand Up @@ -701,8 +704,17 @@ class {transform.original_class_name}(BaseTransformer):
expected_type_inferred = "{transform.udf_datatype}"
# when it is classifier, infer the datatype from label columns
if expected_type_inferred == "" and 'predict' in self.model_signatures:
# Batch inference takes a single expected output column type. Use the first columns type for now.
# TODO: Handle varying output column types.
label_cols_signatures = [row for row in self.model_signatures['predict'].outputs if row.name in self.output_cols]
if len(label_cols_signatures) == 0:
error_str = f"Output columns {{self.output_cols}} do not match model signatures {{self.model_signatures['predict'].outputs}}."
raise exceptions.SnowflakeMLException(
error_code=error_codes.INVALID_ATTRIBUTE,
original_exception=ValueError(error_str),
)
expected_type_inferred = convert_sp_to_sf_type(
self.model_signatures['predict'].outputs[0].as_snowpark_type()
label_cols_signatures[0].as_snowpark_type()
)

output_df = self._batch_inference(
Expand Down Expand Up @@ -955,7 +967,7 @@ class {transform.original_class_name}(BaseTransformer):
cp.dump(self._sklearn_object, local_score_file)

# Create temp stage to run score.
score_stage_name = "SNOWML_SCORE_{{safe_id}}".format(safe_id=self._get_rand_id())
score_stage_name = random_name_for_temp_object(TempObjectType.STAGE)
session = dataset._session
assert session is not None # keep mypy happy
stage_creation_query = f"CREATE OR REPLACE TEMPORARY STAGE {{score_stage_name}};"
Expand All @@ -968,7 +980,7 @@ class {transform.original_class_name}(BaseTransformer):

# Use posixpath to construct stage paths
stage_score_file_name = posixpath.join(score_stage_name, os.path.basename(local_score_file_name))
score_sproc_name = "SNOWML_SCORE_{{safe_id}}".format(safe_id=self._get_rand_id())
score_sproc_name = random_name_for_temp_object(TempObjectType.PROCEDURE)
statement_params = telemetry.get_function_usage_statement_params(
project=_PROJECT,
subproject=_SUBPROJECT,
Expand Down
21 changes: 18 additions & 3 deletions requirements.yml
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@
version_requirements: ">=0.15,<2"
tags:
- build_essential
- deployment_core
# For fsspec[http] in conda
- name_conda: aiohttp
dev_version_conda: "3.8.3"
Expand Down Expand Up @@ -123,7 +124,7 @@
- build_essential
- name: mlflow
dev_version: "2.3.1"
version_requirements: ">=2.1.0,<3"
version_requirements: ">=2.1.0,<2.4"
requirements_extra_tags:
- mlflow
- name: moto
Expand Down Expand Up @@ -176,8 +177,8 @@
- name: s3fs
dev_version: "2022.11.0"
- name: scikit-learn
dev_version: "1.2.2"
version_requirements: ">=1.2.1,<1.3"
dev_version: "1.3.0"
version_requirements: ">=1.2.1,<1.4"
tags:
- build_essential
- name: scipy
Expand Down Expand Up @@ -211,6 +212,11 @@
- torch
- name: transformers
dev_version: "4.29.2"
version_requirements: ">=4.29.2,<5"
requirements_extra_tags:
- transformers
- name: types-requests
dev_version: "2.30.0.0"
- name: types-protobuf
dev_version: "4.23.0.1"
- name: types-PyYAML
Expand All @@ -226,3 +232,12 @@
version_requirements: ">=1.7.3,<2"
tags:
- build_essential
- name: types-cachetools
dev_version: "4.2.2"
from_channel: conda-forge
- name: cachetools
dev_version: "4.2.2"
# TODO: this will be a user side dep requirement
# enable when we are releasing FS.
- name: multipledispatch
dev_version: "0.6.0"
Loading

0 comments on commit 192f794

Please sign in to comment.