Releases: snowflakedb/snowflake-ml-python
1.7.2
1.7.2
Bug Fixes
- Model Explainability: Fix issue that explain is enabled for scikit-learn pipeline
whose task is UNKNOWN and fails later when invoked.
Behavior Changes
New Features
- Registry: Support asynchronous model inference service creation with the
block
option
inModelVersion.create_service()
set to True by default.
1.7.1
1.7.1
Bug Fixes
- Registry: Null value is now allowed in the dataframe used in model signature inference. Null values will be ignored
and others will be used to infer the signature. - Registry: Pandas Extension DTypes (
pandas.StringDType()
,pandas.BooleanDType()
, etc.) are now supported in model
signature inference. - Registry: Null value is now allowed in the dataframe used to predict.
- Data: Fix missing
snowflake.ml.data.*
module exports in wheel - Dataset: Fix missing
snowflake.ml.dataset.*
module exports in wheel. - Registry: Fix the issue that
tf_keras.Model
is not recognized as keras model when logging.
Behavior Changes
New Features
- Registry: Option to
enable_monitoring
set to False by default. This will gate access to preview features of Model Monitoring. - Model Monitoring:
show_model_monitors
Registry method. This feature is still in Private Preview. - Registry: Support
pd.Series
in input and output data. - Model Monitoring:
add_monitor
Registry method. This feature is still in Private Preview. - Model Monitoring:
resume
andsuspend
ModelMonitor. This feature is still in Private Preview. - Model Monitoring:
get_monitor
Registry method. This feature is still in Private Preview. - Model Monitoring:
delete_monitor
Registry method. This feature is still in Private Preview.
1.7.0
1.7.0
Behavior Change
- Generic: Require python >= 3.9.
- Data Connector: Update
to_torch_dataset
andto_torch_datapipe
to add a dimension for scalar data.
This allows for more seamless integration with PyTorchDataLoader
, which creates batches by stacking inputs of each batch.
Examples:
ds = connector.to_torch_dataset(shuffle=False, batch_size=3)
-
Input: "col1": [10, 11, 12]
- Previous batch: array([10., 11., 12.]) with shape (3,)
- New batch: array([[10.], [11.], [12.]]) with shape (3, 1)
-
Input: "col2": [[0, 100], [1, 110], [2, 200]]
- Previous batch: array([[ 0, 100], [ 1, 110], [ 2, 200]]) with shape (3,2)
- New batch: No change
-
Model Registry: External access integrations are optional when creating a model inference service in
Snowflake >= 8.40.0. -
Model Registry: Deprecate
build_external_access_integration
withbuild_external_access_integrations
in
ModelVersion.create_service()
.
Bug Fixes
- Registry: Updated
log_model
API to accept both signature and sample_input_data parameters. - Feature Store: ExampleHelper uses fully qualified path for table name. change weather features aggregation from 1d to 1h.
- Data Connector: Return numpy array with appropriate object type instead of list for multi-dimensional
data fromto_torch_dataset
andto_torch_datapipe
- Model explainability: Incompatibility between SHAP 0.42.1 and XGB 2.1.1 resolved by using latest SHAP 0.46.0.
New Features
- Registry: Provide pass keyworded variable length of arguments to class ModelContext. Example usage:
mc = custom_model.ModelContext(
config = 'local_model_dir/config.json',
m1 = model1
)
class ExamplePipelineModel(custom_model.CustomModel):
def __init__(self, context: custom_model.ModelContext) -> None:
super().__init__(context)
v = open(self.context['config']).read()
self.bias = json.loads(v)['bias']
@custom_model.inference_api
def predict(self, input: pd.DataFrame) -> pd.DataFrame:
model_output = self.context['m1'].predict(input)
return pd.DataFrame({'output': model_output + self.bias})
- Model Development: Upgrade scikit-learn in UDTF backend for log_loss metric. As a result,
eps
argument is now ignored. - Data Connector: Add the option of passing a
None
sized batch toto_torch_dataset
for better
interoperability with PyTorch DataLoader. - Model Registry: Support pandas.CategoricalDtype
- Registry: It is now possible to pass
signatures
andsample_input_data
at the same time to capture background
data from explainablity and data lineage.
1.6.4
1.6.4
Bug Fixes
- Registry: Fix an issue that leads to incident when using
ModelVersion.run
with service.
1.6.3
1.6.3
- Model Registry (PrPr) has been removed.
Bug Fixes
- Registry: Fix a bug that when package whose name does not follow PEP-508 is provided when logging the model,
an unexpected normalization is happening. - Registry: Fix
not a valid remote uri
error when logging mlflow models. - Registry: Fix a bug that
ModelVersion.run
is called in a nested way. - Registry: Fix an issue that leads to
log_model
failure when local package version contains parts other than
base version.
New Features
- Data: Improve
DataConnector.to_pandas()
performance when loading from Snowpark DataFrames. - Model Registry: Allow users to set a model task while using
log_model
. - Feature Store: FeatureView supports ON_CREATE or ON_SCHEDULE initialize mode.
1.6.2
1.6.2 (TBD)
Bug Fixes
-
Modeling: Support XGBoost version that is larger than 2.
-
Data: Fix multiple epoch iteration over
DataConnector.to_torch_datapipe()
DataPipes. -
Generic: Fix a bug that when an invalid name is provided to argument where fully qualified name is expected, it will
be parsed wrongly. Now it raises an exception correctly. -
Model Explainability: Handle explanations for multiclass XGBoost classification models
-
Model Explainability: Workarounds and better error handling for XGB>2.1.0 not working with SHAP==0.42.1
New Features
- Data: Add top-level exports for
DataConnector
andDataSource
tosnowflake.ml.data
. - Data: Add native batching support via
batch_size
anddrop_last_batch
arguments toDataConnector.to_torch_dataset()
- Feature Store: update_feature_view() supports taking feature view object as argument.
Behavior Changes
1.6.1
1.6.1 (2024-08-12)
Bug Fixes
- Feature Store: Support large metadata blob when generating dataset
- Feature Store: Added a hidden knob in FeatureView as kargs for setting customized
refresh_mode - Registry: Fix an error message in Model Version
run
whenfunction_name
is not mentioned and model has multiple
target methods. - Cortex inference: snowflake.cortex.Complete now only uses the REST API for streaming and the use_rest_api_experimental
is no longer needed. - Feature Store: Add a new API: FeatureView.list_columns() which list all column information.
- Data: Fix
DataFrame
ingestion withArrowIngestor
.
New Features
- Enable
set_params
to set the parameters of the underlying sklearn estimator, if the snowflake-ml model has been fit. - Data: Add top-level exports for
DataConnector
andDataSource
tosnowflake.ml.data
. - Data: Add
snowflake.ml.data.ingestor_utils
module with utility functions helpful forDataIngestor
implementations. - Data: Add new
to_torch_dataset()
connector toDataConnector
to replace deprecated DataPipe. - Registry: Option to
enable_explainability
set to True by default for XGBoost, LightGBM and CatBoost as PuPr feature. - Registry: Option to
enable_explainability
when registering SHAP supported sklearn models.
Behavior Changes
1.6.0
1.6.0
Bug Fixes
- Modeling:
SimpleImputer
can impute integer columns with integer values. - Registry: Fix an issue when providing a pandas Dataframe whose index is not starting from 0 as the input to
theModelVersion.run
.
New Features
- Feature Store: Add overloads to APIs accept both object and name/version. Impacted APIs include read_feature_view(),
refresh_feature_view(), get_refresh_history(), resume_feature_view(), suspend_feature_view(), delete_feature_view(). - Feature Store: Add docstring inline examples for all public APIs.
- Feature Store: Add new utility class
ExampleHelper
to help with load source data to simplify public notebooks. - Registry: Option to
enable_explainability
when registering XGBoost models as a pre-PuPr feature. - Feature Store: add new API
update_entity()
. - Registry: Option to
enable_explainability
when registering Catboost models as a pre-PuPr feature. - Feature Store: Add new argument warehouse to FeatureView constructor to overwrite the default warehouse. Also add
a new column 'warehouse' to the output of list_feature_views(). - Registry: Add support for logging model from a model version.
- Modeling: Distributed Hyperparameter Optimization now announce GA refresh version. The latest memory efficient version
will not have the 10GB training limitation for dataset any more. To turn off, please run
from snowflake.ml.modeling._internal.snowpark_implementations import ( distributed_hpo_trainer, ) distributed_hpo_trainer.ENABLE_EFFICIENT_MEMORY_USAGE = False
- Registry: Option to
enable_explainability
when registering LightGBM models as a pre-PuPr feature.
Behavior Changes
- Feature Store: change some positional parameters to keyword arguments in following APIs:
- Entity(): desc.
- FeatureView(): timestamp_col, refresh_freq, desc.
- FeatureStore(): creation_mode.
- update_entity(): desc.
- register_feature_view(): block, overwrite.
- list_feature_views(): entity_name, feature_view_name.
- get_refresh_history(): verbose.
- retrieve_feature_values(): spine_timestamp_col, exclude_columns, include_feature_view_timestamp_col.
- generate_training_set(): save_as, spine_timestamp_col, spine_label_cols, exclude_columns,
include_feature_view_timestamp_col. - generate_dataset(): version, spine_timestamp_col, spine_label_cols, exclude_columns,
include_feature_view_timestamp_col, desc, output_type.
1.5.4
1.5.4
Bug Fixes
- Model Registry (PrPr): Fix 401 Unauthorized issue when deploying model to SPCS.
- Feature Store: Downgrades exceptions to warnings for few property setters in feature view. Now you can set
desc, refresh_freq and warehouse for draft feature views. - Modeling: Fix an issue with calling
OrdinalEncoder
withcategories
as a dictionary and a pandas DataFrame - Modeling: Fix an issue with calling
OneHotEncoder
withcategories
as a dictionary and a pandas DataFrame
New Features
- Registry: Allow overriding
device_map
anddevice
when loading huggingface pipeline models. - Registry: Add
set_alias
method toModelVersion
instance to set an alias to model version. - Registry: Add
unset_alias
method toModelVersion
instance to unset an alias to model version. - Registry: Add
partitioned_inference_api
allowing users to create partitioned inference functions in registered
models. Enable model inference methods with table functions with vectorized process methods in registered models. - Feature Store: add 3 more columns: refresh_freq, refresh_mode and scheduling_state to the result of
list_feature_views()
. - Feature Store:
update_feature_view()
supports updating description. - Feature Store: add new API
refresh_feature_view()
. - Feature Store: add new API
get_refresh_history()
. - Feature Store: Add
generate_training_set()
API for generating table-backed feature snapshots. - Feature Store: Add
DeprecationWarning
forgenerate_dataset(..., output_type="table")
. - Feature Store:
update_feature_view()
supports updating description. - Feature Store: add new API
refresh_feature_view()
. - Feature Store: add new API
get_refresh_history()
. - Model Development: OrdinalEncoder supports a list of array-likes for
categories
argument. - Model Development: OneHotEncoder supports a list of array-likes for
categories
argument.
1.5.3
Bug Fixes
Modeling: Fix an issue causing lineage information to be missing for Pipeline, GridSearchCV , SimpleImputer, and RandomizedSearchCV
Registry: Fix an issue that leads to incorrect result when using pandas Dataframe with over 100, 000 rows as the input of ModelVersion.run method in Stored Procedure.
New Features
Registry: Add support for TIMESTAMP_NTZ model signature data type, allowing timestamp input and output.
Dataset: Add DatasetVersion.label_cols and DatasetVersion.exclude_cols properties.