Skip to content

Commit 72c6c24

Browse files
sfc-gh-kdamaSnowflake Authors
and
Snowflake Authors
authored
Project import generated by Copybara. (#73)
GitOrigin-RevId: 6af23f3594aae1b0f36177bfe8c706ef07c9350c Co-authored-by: Snowflake Authors <[email protected]>
1 parent abe5b67 commit 72c6c24

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

50 files changed

+1222
-504
lines changed

.github/workflows/jira_close.yml

+38
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
---
2+
name: Jira closure
3+
4+
on:
5+
issues:
6+
types:
7+
- closed
8+
- deleted
9+
10+
jobs:
11+
close-issue:
12+
runs-on: ubuntu-latest
13+
steps:
14+
- name: Checkout
15+
uses: actions/checkout@v2
16+
with:
17+
repository: snowflakedb/gh-actions
18+
ref: jira_v1
19+
token: ${{ secrets.SNOWFLAKE_GITHUB_TOKEN }} # stored in GitHub secrets
20+
path: .
21+
- name: Jira login
22+
uses: atlassian/gajira-login@master
23+
env:
24+
JIRA_API_TOKEN: ${{ secrets.JIRA_API_TOKEN }}
25+
JIRA_BASE_URL: ${{ secrets.JIRA_BASE_URL }}
26+
JIRA_USER_EMAIL: ${{ secrets.JIRA_USER_EMAIL }}
27+
- name: Extract issue from title
28+
id: extract
29+
env:
30+
TITLE: ${{ github.event.issue.title }}
31+
run: |
32+
jira=$(echo -n $TITLE | awk '{print $1}' | sed -e 's/://')
33+
echo ::set-output name=jira::$jira
34+
- name: Close issue
35+
uses: ./jira/gajira-close
36+
if: startsWith(steps.extract.outputs.jira, 'SNOW-')
37+
with:
38+
issue: ${{ steps.extract.outputs.jira }}

.github/workflows/jira_comment.yml

+33
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
---
2+
name: Jira comment
3+
4+
on:
5+
issue_comment:
6+
types:
7+
- created
8+
9+
jobs:
10+
comment-issue:
11+
if: ${{ !github.event.issue.pull_request }}
12+
runs-on: ubuntu-latest
13+
steps:
14+
- name: Jira login
15+
uses: atlassian/gajira-login@master
16+
env:
17+
JIRA_API_TOKEN: ${{ secrets.JIRA_API_TOKEN }}
18+
JIRA_BASE_URL: ${{ secrets.JIRA_BASE_URL }}
19+
JIRA_USER_EMAIL: ${{ secrets.JIRA_USER_EMAIL }}
20+
- name: Extract issue from title
21+
id: extract
22+
env:
23+
TITLE: ${{ github.event.issue.title }}
24+
run: |
25+
jira=$(echo -n $TITLE | awk '{print $1}' | sed -e 's/://')
26+
echo ::set-output name=jira::$jira
27+
- name: Comment on issue
28+
uses: atlassian/gajira-comment@master
29+
if: startsWith(steps.extract.outputs.jira, 'SNOW-')
30+
with:
31+
issue: ${{ steps.extract.outputs.jira }}
32+
comment: "${{ github.event.comment.user.login }} commented:\n\n${{ github.event.comment.body }}\n\n${{ github.event.comment.html_url\
33+
\ }}"

.github/workflows/jira_issue.yml

+54
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
---
2+
name: Jira creation
3+
4+
on:
5+
issues:
6+
types:
7+
- opened
8+
issue_comment:
9+
types:
10+
- created
11+
12+
jobs:
13+
create-issue:
14+
runs-on: ubuntu-latest
15+
permissions:
16+
issues: write
17+
if: (github.event_name == 'issues' && github.event.pull_request.user.login != 'whitesource-for-github-com[bot]')
18+
steps:
19+
- name: Checkout
20+
uses: actions/checkout@v2
21+
with:
22+
repository: snowflakedb/gh-actions
23+
ref: jira_v1
24+
token: ${{ secrets.SNOWFLAKE_GITHUB_TOKEN }} # stored in GitHub secrets
25+
path: .
26+
27+
- name: Login
28+
uses: atlassian/[email protected]
29+
env:
30+
JIRA_BASE_URL: ${{ secrets.JIRA_BASE_URL }}
31+
JIRA_USER_EMAIL: ${{ secrets.JIRA_USER_EMAIL }}
32+
JIRA_API_TOKEN: ${{ secrets.JIRA_API_TOKEN }}
33+
34+
- name: Create JIRA Ticket
35+
id: create
36+
uses: atlassian/[email protected]
37+
with:
38+
project: SNOW
39+
issuetype: Bug
40+
summary: ${{ github.event.issue.title }}
41+
description: |
42+
${{ github.event.issue.body }} \\ \\ _Created from GitHub Action_ for ${{ github.event.issue.html_url }}
43+
# Assign triage-ml-platform-dl and set "Data Platform: ML Engineering" component.
44+
fields: '{"customfield_11401":{"id":"14538"}, "assignee":{"id":"639020ab3c26ca7fa0d6eb3f"},"components":[{"id":"16520"}]}'
45+
46+
- name: Update GitHub Issue
47+
uses: ./jira/gajira-issue-update
48+
env:
49+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
50+
with:
51+
issue_number: '{{ event.issue.id }}'
52+
owner: '{{ event.repository.owner.login }}'
53+
name: '{{ event.repository.name }}'
54+
jira: ${{ steps.create.outputs.issue }}

CHANGELOG.md

+15
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,20 @@
11
# Release History
22

3+
## 1.1.1
4+
5+
### Bug Fixes
6+
7+
- Model Registry: The `predict` target method on registered models is now compatible with unsupervised estimators.
8+
- Model Development: Fix confusion_matrix incorrect results when the row number cannot be divided by the batch size.
9+
10+
### Behavior Changes
11+
12+
### New Features
13+
14+
- Introduced passthrough_col param in Modeling API. This new param is helpful in scenarios
15+
requiring automatic input_cols inference, but need to avoid using specific
16+
columns, like index columns, during training or inference.
17+
318
## 1.1.0
419

520
### Bug Fixes

codegen/sklearn_wrapper_generator.py

+10-3
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,8 @@
2121
input_cols: Optional[Union[str, List[str]]]
2222
A string or list of strings representing column names that contain features.
2323
If this parameter is not specified, all columns in the input DataFrame except
24-
the columns specified by label_cols and sample_weight_col parameters are
25-
considered input columns.
24+
the columns specified by label_cols, sample_weight_col, and passthrough_cols
25+
parameters are considered input columns.
2626
2727
label_cols: Optional[Union[str, List[str]]]
2828
A string or list of strings representing column names that contain labels.
@@ -44,6 +44,13 @@
4444
A string representing the column name containing the sample weights.
4545
This argument is only required when working with weighted datasets.
4646
47+
passthrough_cols: Optional[Union[str, List[str]]]
48+
A string or a list of strings indicating column names to be excluded from any
49+
operations (such as train, transform, or inference). These specified column(s)
50+
will remain untouched throughout the process. This option is helpful in scenarios
51+
requiring automatic input_cols inference, but need to avoid using specific
52+
columns, like index columns, during training or inference.
53+
4754
drop_input_cols: Optional[bool], default=False
4855
If set, the response of predict(), transform() methods will not contain input columns.
4956
"""
@@ -743,7 +750,7 @@ def _populate_function_names_and_signatures(self) -> None:
743750
signature_lines.append(v.name)
744751
sklearn_init_args_dict_list.append(f"'{v.name}':({v.name}, None, True)")
745752

746-
for arg in ["input_cols", "output_cols", "label_cols"]:
753+
for arg in ["input_cols", "output_cols", "label_cols", "passthrough_cols"]:
747754
signature_lines.append(f"{arg}: Optional[Union[str, Iterable[str]]] = None")
748755
init_member_args.append(f"self.set_{arg}({arg})")
749756

codegen/sklearn_wrapper_template.py_template

+10-18
Original file line numberDiff line numberDiff line change
@@ -83,24 +83,6 @@ class {transform.original_class_name}(BaseTransformer):
8383
"""
8484
return str(uuid4()).replace("-", "_").upper()
8585

86-
def _infer_input_output_cols(self, dataset: Union[DataFrame, pd.DataFrame]) -> None:
87-
"""
88-
Infer `self.input_cols` and `self.output_cols` if they are not explicitly set.
89-
90-
Args:
91-
dataset: Input dataset.
92-
"""
93-
if not self.input_cols:
94-
cols = [
95-
c for c in dataset.columns
96-
if c not in self.get_label_cols() and c != self.sample_weight_col
97-
]
98-
self.set_input_cols(input_cols=cols)
99-
100-
if not self.output_cols:
101-
cols = [identifier.concat_names(ids=['OUTPUT_', c]) for c in self.label_cols]
102-
self.set_output_cols(output_cols=cols)
103-
10486
def set_input_cols(self, input_cols: Optional[Union[str, Iterable[str]]]) -> "{transform.original_class_name}":
10587
"""
10688
Input columns setter.
@@ -737,12 +719,22 @@ class {transform.original_class_name}(BaseTransformer):
737719
self._model_signature_dict["predict"] = ModelSignature(inputs,
738720
([] if self._drop_input_cols else inputs)
739721
+ outputs)
722+
# For mixture models that use the density mixin, `predict` returns the argmax of the log prob.
723+
# For outlier models, returns -1 for outliers and 1 for inliers.
724+
# Clusterer returns int64 cluster labels.
725+
elif self._sklearn_object._estimator_type in ["DensityEstimator", "clusterer", "outlier_detector"]:
726+
outputs = [FeatureSpec(dtype=DataType.INT64, name=c) for c in self.output_cols]
727+
self._model_signature_dict["predict"] = ModelSignature(inputs,
728+
([] if self._drop_input_cols else inputs)
729+
+ outputs)
730+
740731
# For regressor, the type of predict is float64
741732
elif self._sklearn_object._estimator_type == 'regressor':
742733
outputs = [FeatureSpec(dtype=DataType.DOUBLE, name=c) for c in self.output_cols]
743734
self._model_signature_dict["predict"] = ModelSignature(inputs,
744735
([] if self._drop_input_cols else inputs)
745736
+ outputs)
737+
746738
for prob_func in PROB_FUNCTIONS:
747739
if hasattr(self, prob_func):
748740
output_cols_prefix: str = f"{{prob_func}}_"

codegen/transformer_autogen_test_template.py_template

+2
Original file line numberDiff line numberDiff line change
@@ -127,6 +127,8 @@ class {transform.test_class_name}(TestCase):
127127
inference_methods = ["transform", "predict"]
128128
for m in inference_methods:
129129
if callable(getattr(sklearn_reg, m, None)):
130+
if m == 'predict':
131+
self.assertTrue(m in reg.model_signatures)
130132

131133
if inference_with_udf:
132134
output_df = getattr(reg, m)(input_df)

0 commit comments

Comments
 (0)