Skip to content

Commit

Permalink
Project import generated by Copybara. (#92)
Browse files Browse the repository at this point in the history
GitOrigin-RevId: cd1cf14167a03d4d572a86fb6162ba2d9d9e8457

Co-authored-by: Snowflake Authors <[email protected]>
  • Loading branch information
sfc-gh-sdas and Snowflake Authors authored Mar 12, 2024
1 parent de45707 commit 27431b2
Show file tree
Hide file tree
Showing 108 changed files with 5,671 additions and 2,356 deletions.
55 changes: 37 additions & 18 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,33 @@
# Release History

## 1.2.3
## 1.3.0

### Bug Fixes

- Registry: Fix a bug that leads to module in `code_paths` when `log_model` cannot be correctly imported.
- Registry: Fix incorrect error message when validating input Snowpark DataFrame with array feature.
- Model Registry: Fix an issue when deploying a model to SPCS that some files do not have proper permission.
- Model Development: Relax package versions for all inference methods if the installed version
is not available in the Snowflake conda channel

### Behavior Changes

- Registry: When running the method of a model, the value range based input validation to avoid input from overflowing
is now optional rather than enforced, this should improve the performance and should not lead to problem for most
kinds of model. If you want to enable this check as previous, specify `strict_input_validation=True` when
calling `run`.
- Registry: By default `relax_version=True` when logging a model instead of using the specific local dependency versions.
This improves dependency versioning by using versions available in Snowflake. To switch back to the previous behavior
and use specific local dependency versions, specify `relax_version=False` when calling `log_model`.
- Model Development: The behavior of `fit_predict` for all estimators is changed.
Firstly, it will cover all the estimator that contains this function,
secondly, the output would be the union of pandas DataFrame and snowpark DataFrame.

### New Features

- FileSet: `snowflake.ml.fileset.sfcfs.SFFileSystem` can now be serialized with `pickle`.

## 1.2.3 (2024-02-26)

### Bug Fixes

Expand All @@ -23,11 +50,7 @@
GridSearchCV, RandomizedSearchCV, PCA, IsolationForest, ...
- Registry: Support deleting a version of a model.

## 1.2.2

### Bug Fixes

### Behavior Changes
## 1.2.2 (2024-02-13)

### New Features

Expand All @@ -38,23 +61,21 @@
`snowflake.ml.model.models.huggingface_pipeline.HuggingFacePipelineModel` object, the following endpoints are required
to be allowed: huggingface.com:80, huggingface.com:443, huggingface.co:80, huggingface.co:443.

## 1.2.1
## 1.2.1 (2024-01-25)

### New Features

- Model Development: Infers output column data type for transformers when possible.
- Registry: `relax_version` option is available in the `options` argument when logging the model.

## 1.2.0
## 1.2.0 (2024-01-11)

### Bug Fixes

- Model Registry: Fix "XGBoost version not compiled with GPU support" error when running CPU inference against open-source
XGBoost models deployed to SPCS.
- Model Registry: Fix model deployment to SPCS on Windows machines.

### Behavior Changes

### New Features

- Model Development: Introduced XGBoost external memory training feature. This feature enables training XGBoost models
Expand All @@ -72,7 +93,7 @@
`snowflake.ml.registry.Registry`, except when specifically required. The old model registry will be removed once all
its primary functionalities are fully integrated into the new registry.

## 1.1.2
## 1.1.2 (2023-12-18)

### Bug Fixes

Expand All @@ -90,7 +111,7 @@ its primary functionalities are fully integrated into the new registry.

- Model Development: SQL implementation of binary `precision_score` metric.

## 1.1.1
## 1.1.1 (2023-12-05)

### Bug Fixes

Expand All @@ -103,24 +124,22 @@ its primary functionalities are fully integrated into the new registry.
requiring automatic input_cols inference, but need to avoid using specific
columns, like index columns, during training or inference.

## 1.1.0
## 1.1.0 (2023-12-01)

### Bug Fixes

- Model Registry: Fix panda dataframe input not handling first row properly.
- Model Development: OrdinalEncoder and LabelEncoder output_columns do not need to be valid snowflake identifiers. They
would previously be excluded if the normalized name did not match the name specified in output_columns.

### Behavior Changes

### New Features

- Model Registry: Add support for invoking public endpoint on SPCS service, by providing a "enable_ingress" SPCS
deployment option.
- Model Development: Add support for distributed HPO - GridSearchCV and RandomizedSearchCV execution will be
distributed on multi-node warehouses.

## 1.0.12
## 1.0.12 (2023-11-13)

### Bug Fixes

Expand All @@ -145,7 +164,7 @@ its primary functionalities are fully integrated into the new registry.

- Model Registry: Enable best-effort SPCS job/service log streaming when logging level is set to INFO.

## 1.0.11
## 1.0.11 (2023-10-27)

### New Features

Expand All @@ -164,7 +183,7 @@ its primary functionalities are fully integrated into the new registry.
- Model Development: Fix metrics compatibility with Snowpark Dataframes that use Snowflake identifiers
- Model Registry: Resolve 'delete_deployment' not deleting the SPCS service in certain cases.

## 1.0.10
## 1.0.10 (2023-10-13)

### Behavior Changes

Expand Down
4 changes: 2 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,13 +46,13 @@ Note: You may need to configure your editor to run this on save.
To build the package, run:

```shell
> bazel build //snowflake/ml:wheel
> bazel build //:wheel
```

`bazel` can be run from anywhere under the monorepo and it can accept absolute path or a relative path. For example,

```sh
snowflake/ml> bazel build :wheel
snowml> bazel build :wheel
```

You can build an entire sub-tree as:
Expand Down
4 changes: 2 additions & 2 deletions ci/conda_recipe/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ build:
noarch: python
package:
name: snowflake-ml-python
version: 1.2.3
version: 1.3.0
requirements:
build:
- python
Expand All @@ -42,7 +42,7 @@ requirements:
- scikit-learn>=1.2.1,<1.4
- scipy>=1.9,<2
- snowflake-connector-python>=3.0.4,<4
- snowflake-snowpark-python>=1.8.0,<2
- snowflake-snowpark-python>=1.8.0,<2,!=1.12.0
- sqlparse>=0.4,<1
- typing-extensions>=4.1.0,<5
- xgboost>=1.7.3,<2
Expand Down
2 changes: 2 additions & 0 deletions codegen/codegen_rules.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,8 @@ def autogen_estimators(module, estimator_info_list):
"//snowflake/ml/modeling/_internal:estimator_utils",
"//snowflake/ml/modeling/_internal:model_trainer",
"//snowflake/ml/modeling/_internal:model_trainer_builder",
"//snowflake/ml/modeling/_internal:transformer_protocols",
"//snowflake/ml/modeling/_internal:model_transformer_builder",
],
)

Expand Down
54 changes: 32 additions & 22 deletions codegen/sklearn_wrapper_generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -154,18 +154,6 @@ def _is_classifier_obj(class_object: Tuple[str, type]) -> bool:
"""
return WrapperGeneratorFactory._is_class_of_type(class_object[1], "ClassifierMixin")

@staticmethod
def _is_cluster_obj(class_object: Tuple[str, type]) -> bool:
"""Check if the given estimator object can cluster features and conduct fit_predict methods.
Args:
class_object: Meta class object which needs to be checked.
Returns:
True if the class inherits from ClusterMixin, otherwise False.
"""
return WrapperGeneratorFactory._is_class_of_type(class_object[1], "ClusterMixin")

@staticmethod
def _is_meta_estimator_obj(class_object: Tuple[str, type]) -> bool:
"""Check if the given estimator object requires an `estimator` parameter.
Expand Down Expand Up @@ -277,6 +265,33 @@ def _is_xgboost(module_name: str) -> bool:
"""
return module_name.split(".")[0] == "xgboost"

@staticmethod
def _is_deterministic(class_object: Tuple[str, type]) -> bool:
"""Checks if the given module is deterministic or not
Args:
class_object: Meta class object which needs to be checked.
Returns:
True if the class is deterministic, otherwise False.
"""
return not (
WrapperGeneratorFactory._is_class_of_type(class_object[1], "LinearDiscriminantAnalysis")
or WrapperGeneratorFactory._is_class_of_type(class_object[1], "BernoulliRBM")
)

@staticmethod
def _is_deterministic_cross_platform(class_object: Tuple[str, type]) -> bool:
"""Checks if the given module is deterministic or not across different platforms
Args:
class_object: Meta class object which needs to be checked.
Returns:
True if the class is deterministic across different platforms, otherwise False.
"""
return not (WrapperGeneratorFactory._is_class_of_type(class_object[1], "Isomap"))

@staticmethod
def _is_lightgbm(module_name: str) -> bool:
"""Checks if the given module belongs to LightGBM package.
Expand Down Expand Up @@ -604,7 +619,6 @@ def __init__(self, module_name: str, class_object: Tuple[str, type]) -> None:
self.test_estimator_imports_list: List[str] = []

# Optional function support
self.fit_predict_cluster_function_support = False
self.fit_transform_manifold_function_support = False

# Dependencies
Expand Down Expand Up @@ -654,7 +668,6 @@ def _populate_flags(self) -> None:
self._is_multioutput_estimator = WrapperGeneratorFactory._is_multioutput_estimator_obj(self.class_object)
self._is_k_neighbors = WrapperGeneratorFactory._is_k_neighbors_obj(self.class_object)
self._is_heterogeneous_ensemble = WrapperGeneratorFactory._is_heterogeneous_ensemble_obj(self.class_object)
self._is_cluster = WrapperGeneratorFactory._is_cluster_obj(self.class_object)
self._is_stacking_ensemble = WrapperGeneratorFactory._is_stacking_ensemble_obj(self.class_object)
self._is_voting_ensemble = WrapperGeneratorFactory._is_voting_ensemble_obj(self.class_object)
self._is_chain_multioutput = WrapperGeneratorFactory._is_chain_multioutput_obj(self.class_object)
Expand All @@ -668,6 +681,10 @@ def _populate_flags(self) -> None:
self._is_randomized_search_cv = WrapperGeneratorFactory._is_randomized_search_cv(self.class_object)
self._is_iterative_imputer = WrapperGeneratorFactory._is_iterative_imputer(self.class_object)
self._is_xgboost = WrapperGeneratorFactory._is_xgboost(self.module_name)
self._is_deterministic = WrapperGeneratorFactory._is_deterministic(self.class_object)
self._is_deterministic_cross_platform = WrapperGeneratorFactory._is_deterministic_cross_platform(
self.class_object
)

def _populate_import_statements(self) -> None:
self.estimator_imports_list.append("import numpy")
Expand Down Expand Up @@ -984,11 +1001,6 @@ def generate(self) -> "SklearnWrapperGenerator":
]
self.test_estimator_input_args_list.append(f"dictionary={dictionary}")

if self._is_cluster:
self.fit_predict_cluster_function_support = True
if self._is_manifold:
self.fit_transform_manifold_function_support = True

if self._is_manifold:
self.fit_transform_manifold_function_support = True

Expand All @@ -998,12 +1010,10 @@ def generate(self) -> "SklearnWrapperGenerator":

if "n_components" in self.original_init_signature.parameters.keys():
if WrapperGeneratorFactory._is_class_of_type(self.class_object[1], "SpectralBiclustering"):
# For spectral bi clustering, set number of sigular vertors to consider to number of input cols and
# For spectral bi clustering, set number of singular vectors to consider to number of input cols and
# num best vector to select to half the number of input cols.
self.test_estimator_input_args_list.append("n_components=len(cols)")
self.test_estimator_input_args_list.append("n_best=int(len(cols)/2)")
else:
self.test_estimator_input_args_list.append("n_components=1")

if self._is_heterogeneous_ensemble:
if self._is_regressor:
Expand Down
Loading

0 comments on commit 27431b2

Please sign in to comment.