Skip to content

Releases: snowflakedb/snowflake-ml-python

1.1.2

18 Dec 19:43
35d2b4f
Compare
Choose a tag to compare

1.1.2

Bug Fixes

  • Generic: Fix the issue that stack trace is hidden by telemetry unexpectedly.
  • Model Development: Execute model signature inference without materializing full dataframe in memory.
  • Model Registry: Fix occasional 'snowflake-ml-python library does not exist' error when deploying to SPCS.

Behavior Changes

  • Model Registry: When calling predict with Snowpark DataFrame, both inferred or normalized column names are accepted.
  • Model Registry: When logging a Snowpark ML Modeling Model, sample input data or manually provided signature will be
    ignored since they are not necessary.

New Features

  • Model Development: SQL implementation of binary precision_score metric.

1.1.1

06 Dec 05:36
7dd0738
Compare
Choose a tag to compare

1.1.1

Bug Fixes

  • Model Registry: The predict target method on registered models is now compatible with unsupervised estimators.
  • Model Development: Fix confusion_matrix incorrect results when the row number cannot be divided by the batch size.

Behavior Changes

New Features

  • Introduced passthrough_col param in Modeling API. This new param is helpful in scenarios
    requiring automatic input_cols inference, but need to avoid using specific
    columns, like index columns, during training or inference.

1.1.0

01 Dec 18:08
abe5b67
Compare
Choose a tag to compare

1.1.0

Bug Fixes

  • Model Registry: Fix panda dataframe input not handling first row properly.
  • Model Development: OrdinalEncoder and LabelEncoder output_columns do not need to be valid snowflake identifiers. They
    would previously be excluded if the normalized name did not match the name specified in output_columns.

Behavior Changes

New Features

  • Model Registry: Add support for invoking public endpoint on SPCS service, by providing a "enable_ingress" SPCS
    deployment option.
  • Model Development: Add support for distributed HPO - GridSearchCV and RandomizedSearchCV execution will be
    distributed on multi-node warehouses.

1.0.12

13 Nov 22:20
b938743
Compare
Choose a tag to compare

1.0.12

Bug Fixes

  • Model Registry: Fix regression issue that container logging is not shown during model deployment to SPCS.
  • Model Development: Enhance the column capacity of OrdinalEncoder.
  • Model Registry: Fix unbound `batch_size`` error when deploying a model other than Hugging Face Pipeline
    and LLM with GPU on SPCS.

Behavior Changes

  • Model Registry: Raise early error when deploying to SPCS with db/schema that starts with underscore.
  • Model Registry: conda-forge channel is now automatically added to channel lists when deploying to SPCS.
  • Model Registry: relax_version will not strip all version specifier, instead it will relax ==x.y.z specifier to
    >=x.y,<(x+1).
  • Model Registry: Python with different patchlevel but the same major and minor will not result a warning when loading
    the model via Model Registry and would be considered to use when deploying to SPCS.
  • Model Registry: When logging a snowflake.ml.model.models.huggingface_pipeline.HuggingFacePipelineModel object,
    versions of local installed libraries won't be picked as dependencies of models, instead it will pick up some pre-
    defined dependencies to improve user experience.

New Features

  • Model Registry: Enable best-effort SPCS job/service log streaming when logging level is set to INFO.

1.0.11

27 Oct 15:31
73c2cf0
Compare
Choose a tag to compare

1.0.11

New Features

  • Model Registry: Add log_artifact() public method.
  • Model Development: Add support for kneighbors.

Behavior Changes

  • Model Registry: Change log_model() argument from TrainingDataset to List of Artifact.
  • Model Registry: Change get_training_dataset() to get_artifact().

Bug Fixes

  • Model Development: Fix support for XGBoost and LightGBM models using SKLearn Grid Search and Randomized Search model selectors.
  • Model Development: DecimalType is now supported as a DataType.
  • Model Development: Fix metrics compatibility with Snowpark Dataframes that use Snowflake identifiers

1.0.10

14 Oct 01:44
9130a0b
Compare
Choose a tag to compare

1.0.10

Behavior Changes

  • Model Development: precision_score, recall_score, f1_score, fbeta_score, precision_recall_fscore_support,
    mean_absolute_error, mean_squared_error, and mean_absolute_percentage_error metric calculations are now distributed.
  • Model Registry: deploy will now return Deployment for deployment information.

New Features

  • Model Registry: When the model signature is auto-inferred, it will be printed to the log for reference.
  • Model Registry: For SPCS deployment, Deployment details will contains image_name, service_spec and service_function_sql.

Bug Fixes

  • Model Development: Fix an issue that leading to UTF-8 decoding errors when using modeling modules on Windows.
  • Model Development: Fix an issue that alias definitions cause SnowparkSQLUnexpectedAliasException in inference.
  • Model Registry: Fix an issue that signature inference could be incorrect when using Snowpark DataFrame as sample input.
  • Model Registry: Fix too strict data type validation when predicting. Now, for example, if you have a INT8
    type feature in the signature, if providing a INT64 dataframe but all values are within the range, it would not fail.

[1.0.9]

29 Sep 00:39
Compare
Choose a tag to compare

Behavior Changes

  • Model Development: log_loss metric calculation is now distributed.

Bug Fixes

  • Model Registry: Fix an issue that building images fails with specific docker setup.
  • Model Registry: Fix an issue that unable to embed local ML library when the library is imported by zipimport.
  • Model Registry: Fix out-of-date doc about platform argument in the deploy function.
  • Model Registry: Fix an issue that unable to deploy a GPU-trained PyTorch model to a platform where GPU is not available.

[1.0.8]

15 Sep 15:58
76d191e
Compare
Choose a tag to compare

1.0.8

Bug Fixes

  • Model Development: Ordinal encoder can be used with mixed input column types.
  • Model Registry: Fix an issue that incorrect docker executable is used when building images.
  • Model Registry: Fix an issue that specifying token argument when using
    snowflake.ml.model.models.huggingface_pipeline.HuggingFacePipelineModel with transformers < 4.32.0 is not effective.
  • Model Registry: Fix an issue that incorrect system function call is used when deploying to SPCS.
  • Model Registry: Fix an issue when using a transformers.pipeline that does not have a tokenizer.
  • Model Registry: Fix incorrectly-inferred image repository name during model deployment to SPCS.
  • Model Registry: Fix GPU resource retention issue caused by failed or stuck previous deployments in SPCS.

[1.0.7]

05 Sep 22:14
6f23e59
Compare
Choose a tag to compare

Bug Fixes

  • Model Development & Model Registry: Fix an error related to pandas.io.json.json_normalize.

[1.0.6]

01 Sep 19:43
f0326eb
Compare
Choose a tag to compare

New Features

  • Model Registry: add create_if_not_exists parameter in constructor.
  • Model Registry: Added get_or_create_model_registry API.
  • Model Registry: Added support for using GPU inference when deploying XGBoost (xgboost.XGBModel and xgboost.Booster), PyTorch (torch.nn.Module and torch.jit.ScriptModule) and TensorFlow (tensorflow.Module and tensorflow.keras.Model) models to Snowpark Container Services.
  • Model Registry: When inferring model signature, Sequence of built-in types, Sequence of numpy.ndarray, Sequence of torch.Tensor, Sequence of tensorflow.Tensor and Sequence of tensorflow.Tensor can be used instead of only List of them.
  • Model Registry: Added get_training_dataset API.
  • Model Development: Size of metrics result can exceed previous 8MB limit.
  • Model Registry: Added support save/load/deploy HuggingFace pipeline object (transformers.Pipeline) and our wrapper (snowflake.ml.model.models.huggingface_pipeline.HuggingFacePipelineModel) to it. Using the wrapper to specify configurations and the model for the pipeline will be loaded dynamically when deploying. Currently, following tasks are supported to log without manually specifying model signatures:
    • "conversational"
    • "fill-mask"
    • "question-answering"
    • "summarization"
    • "table-question-answering"
    • "text2text-generation"
    • "text-classification" (alias "sentiment-analysis" available)
    • "text-generation"
    • "token-classification" (alias "ner" available)
    • "translation"
    • "translation_xx_to_yy"
    • "zero-shot-classification"

Bug Fixes

  • Model Development: Fixed a bug when using simple imputer with numpy >= 1.25.
  • Model Development: Fixed a bug when inferring the type of label columns.

Behavior Changes

  • Model Registry: log_model() now return a ModelReference object instead of a model ID.
  • Model Registry: When deploying a model with 1 target method only, the target_method argument can be omitted.
  • Model Registry: When using the snowflake-ml-python with version newer than what is available in Snowflake Anaconda Channel, embed_local_ml_library option will be set as True automatically if not.
  • Model Registry: When deploying a model to Snowpark Container Services and using GPU, the default value of num_workers will be 1.
  • Model Registry: keep_order and output_with_input_features in the deploy options have been removed. Now the behavior is controlled by the type of the input when calling model.predict(). If the input is a pandas.DataFrame, the behavior will be the same as keep_order=True and output_with_input_features=False before. If the input is a snowpark.DataFrame, the behavior will be the same as keep_order=False and output_with_input_features=True before.
  • Model Registry: When logging and deploying PyTorch (torch.nn.Module and torch.jit.ScriptModule) and TensorFlow (tensorflow.Module and tensorflow.keras.Model) models, we no longer accept models whose input is a list of tensor and output is a list of tensors. Instead, now we accept models whose input is 1 or more tensors as positional arguments, and output is a tensor or a tuple of tensors. The input and output dataframe when predicting keep the same as before, that is every column is an array feature and contains a tensor.