Skip to content

Commit

Permalink
Project import generated by Copybara. (#31)
Browse files Browse the repository at this point in the history
  • Loading branch information
snowflake-provisioner authored Jul 28, 2023
1 parent 091fb6c commit 9eec61f
Show file tree
Hide file tree
Showing 150 changed files with 8,551 additions and 4,620 deletions.
12 changes: 12 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,17 @@
# Release History

## 1.0.4

### New Features
- Model Registry: Added support save/load/deploy Tensorflow models (`tensorflow.Module`).
- Model Registry: Added support save/load/deploy MLFlow PyFunc models (`mlflow.pyfunc.PyFuncModel`).
- Model Development: Input dataframes can now be joined against data loaded from staged files.
- Model Development: Added support for non-English languages.

### Bug Fixes

- Model Registry: Fix an issue that model dependencies are incorrectly reported as unresolvable on certain platforms.

## 1.0.3 (2023-07-14)

### Behavior Changes
Expand Down
8 changes: 7 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
Snowpark ML is a set of tools including SDKs and underlying infrastructure to build and deploy machine learning models. With Snowpark ML, you can pre-process data, train, manage and deploy ML models all within Snowflake, using a single SDK, and benefit from Snowflake’s proven performance, scalability, stability and governance at every stage of the Machine Learning workflow.

## Key Components of Snowpark ML

The Snowpark ML Python SDK provides a number of APIs to support each stage of an end-to-end Machine Learning development and deployment process, and includes two key components.

### Snowpark ML Development [Public Preview]
Expand All @@ -16,6 +17,7 @@ A collection of python APIs to enable efficient model development directly in Sn
### Snowpark ML Ops [Private Preview]

Snowpark MLOps complements the Snowpark ML Development API, and provides model management capabilities along with integrated deployment into Snowflake. Currently, the API consists of

1. FileSet API: FileSet provides a Python fsspec-compliant API for materializing data into a Snowflake internal stage from a query or Snowpark Dataframe along with a number of convenience APIs.

1. Model Registry: A python API for managing models within Snowflake which also supports deployment of ML models into Snowflake Warehouses as vectorized UDFs.
Expand All @@ -25,15 +27,19 @@ During PrPr, we are iterating on API without backward compatibility guarantees.
- [Documentation](https://docs.snowflake.com/developer-guide/snowpark-ml)

## Getting started

### Have your Snowflake account ready

If you don't have a Snowflake account yet, you can [sign up for a 30-day free trial account](https://signup.snowflake.com/).

### Create a Python virtual environment
Python 3.8 is required. You can use [miniconda](https://docs.conda.io/en/latest/miniconda.html), [anaconda](https://www.anaconda.com/), or [virtualenv](https://docs.python.org/3/tutorial/venv.html) to create a Python 3.8 virtual environment.

Python version 3.8, 3.9 & 3.10 are supported. You can use [miniconda](https://docs.conda.io/en/latest/miniconda.html), [anaconda](https://www.anaconda.com/), or [virtualenv](https://docs.python.org/3/tutorial/venv.html) to create a virtual environment.

To have the best experience when using this library, [creating a local conda environment with the Snowflake channel](https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-packages.html#local-development-and-testing) is recommended.

### Install the library to the Python virtual environment

```
pip install snowflake-ml-python
```
6 changes: 4 additions & 2 deletions bazel/get_affected_targets.sh
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,10 @@ help() {
echo "Running ${PROG}"

bazel="bazel"
current_revision=$(git rev-parse HEAD)
pr_revision=${current_revision}
current_revision=$(git symbolic-ref --short -q HEAD \
|| git describe --tags --exact-match 2> /dev/null \
|| git rev-parse --short HEAD)
pr_revision=$(git rev-parse HEAD)
output_path="/tmp/affected_targets/targets"
workspace_path=$(pwd)

Expand Down
3 changes: 3 additions & 0 deletions bazel/mypy/CREDITS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Special thanks to [bazel-mypy-integration](https://github.com/bazel-contrib/bazel-mypy-integration).

This package has been forked from that repo and modified to cater specific need of this Snowflake repo.
263 changes: 138 additions & 125 deletions bazel/mypy/mypy.bzl
Original file line number Diff line number Diff line change
@@ -1,54 +1,52 @@
"Public API"

load("@bazel_skylib//lib:shell.bzl", "shell")
load("@bazel_skylib//lib:sets.bzl", "sets")
load("//bazel/mypy:rules.bzl", "MyPyStubsInfo")

MyPyAspectInfo = provider(
"TODO: documentation",
fields = {
"out": "mypy output.",
"cache": "cache generated by mypy.",
"exe": "Used to pass the rule implementation built exe back to calling aspect.",
"out": "Used to pass the dummy output file back to calling aspect.",
},
)

# We don't support stubs (pyi) yet.
PY_EXTENSIONS = ["py"]
PY_RULES = ["py_binary", "py_library", "py_test", "py_wheel", "py_package"]
# Switch to True only during debugging and development.
# All releases should have this as False.
DEBUG = False

VALID_EXTENSIONS = ["py", "pyi"]

DEFAULT_ATTRS = {
"_mypy_sh": attr.label(
"_template": attr.label(
default = Label("//bazel/mypy:mypy.sh.tpl"),
allow_single_file = True,
),
"_mypy": attr.label(
"_mypy_cli": attr.label(
default = Label("//bazel/mypy:mypy"),
executable = True,
cfg = "host",
cfg = "exec",
),
"_mypy_config": attr.label(
default = Label("//:mypy.ini"),
allow_single_file = True,
),
"_debug": attr.bool(
default = False,
)
}

# See https://github.com/python/mypy/pull/4759 for what `cache_map_triples` mean.
def _sources_to_cache_map_triples(cache_files, dep_cache_files):
def _sources_to_cache_map_triples(srcs):
triples_as_flat_list = []
for d in (cache_files, dep_cache_files):
for src, (meta, data) in d.items():
triples_as_flat_list.extend([
shell.quote(src.path),
shell.quote(meta.path),
shell.quote(data.path),
])
for f in srcs:
f_path = f.path
triples_as_flat_list.extend([
shell.quote(f_path),
shell.quote("{}.meta.json".format(f_path)),
shell.quote("{}.data.json".format(f_path)),
])
return triples_as_flat_list

def _flatten_cache_dict(cache_files):
result = []
for meta, data in cache_files.values():
result.append(meta)
result.append(data)
return result
def _is_external_dep(dep):
return dep.label.workspace_root.startswith("external/")

def _is_external_src(src_file):
return src_file.path.startswith("external/")
Expand All @@ -57,127 +55,142 @@ def _extract_srcs(srcs):
direct_src_files = []
for src in srcs:
for f in src.files.to_list():
if f.extension in PY_EXTENSIONS and not _is_external_src(f):
if f.extension in VALID_EXTENSIONS:
direct_src_files.append(f)
return direct_src_files

# Overview
# This aspect does the following:
# - Create an action to run mypy against the sources of `target`
# - input of this action:
# - source files of `target` and source files of all its deps.
# - cache files produced by checking its deps.
# - output of this action:
# - mypy stderr+stdout in a file
# - cache files produced by checking the source files of `target`
# - this action depends on actions created for the deps, so that it always
# has access to cache files produced by those actions.
# - Propagate the output of this action along the `deps` edge of the build graph.
# - Produces a OutputGroup which contains the output of all the actions created
# along the build graph so that one can use bazel commandline to mark all those
# actions as required and to make them run.
def _mypy_aspect_impl(target, ctx):
if (ctx.rule.kind not in PY_RULES or
ctx.label.workspace_root.startswith("external")):
return []
def _extract_transitive_deps(deps):
transitive_deps = []
for dep in deps:
if MyPyStubsInfo not in dep and PyInfo in dep and not _is_external_dep(dep):
transitive_deps.append(dep[PyInfo].transitive_sources)
return transitive_deps

def _extract_stub_deps(deps):
# Need to add the .py files AND the .pyi files that are
# deps of the rule
stub_files = []
for dep in deps:
if MyPyStubsInfo in dep:
for stub_srcs_target in dep[MyPyStubsInfo].srcs:
for src_f in stub_srcs_target.files.to_list():
if src_f.extension == "pyi":
stub_files.append(src_f)
return stub_files

def _extract_imports(imports, label):
# NOTE: Bazel's implementation of this for py_binary, py_test is at
# src/main/java/com/google/devtools/build/lib/bazel/rules/python/BazelPythonSemantics.java
mypypath_parts = []
for import_ in imports:
if import_.startswith("/"):
# buildifier: disable=print
print("ignoring invalid absolute path '{}'".format(import_))
elif import_ in ["", "."]:
mypypath_parts.append(label.package)
else:
mypypath_parts.append("{}/{}".format(label.package, import_))
return mypypath_parts

def _mypy_rule_impl(ctx):
base_rule = ctx.rule
debug = ctx.attr._debug
mypy_config_file = ctx.file._mypy_config

# Get the cache files generated by running mypy against the deps.
dep_cache_files = {}
for dep in ctx.rule.attr.deps:
if MyPyAspectInfo in dep:
dep_cache_files.update(dep[MyPyAspectInfo].cache)
mypy_config_file = ctx.file._mypy_config

mypypath_parts = []
direct_src_files = []
transitive_srcs_depsets = []
stub_files = []

if hasattr(base_rule.attr, "srcs"):
direct_src_files = _extract_srcs(base_rule.attr.srcs)

# It's possible that this target does not have srcs (py_wheel for example).
# However, if the user requests to type check a py_wheel, we should make sure
# its python transitive deps get checked.
if direct_src_files:
# There are source files in this target to check. The check will result in
# cache files. Request bazel to allocate those files now.
cache_files = {}
for src in direct_src_files:
meta_file = ctx.actions.declare_file("{}.meta.json".format(src.basename))
data_file = ctx.actions.declare_file("{}.data.json".format(src.basename))
cache_files[src] = (meta_file, data_file)


# The mypy stdout, which is expected to be produced by mypy_script.
mypy_out = ctx.actions.declare_file("%s_mypy_out" % ctx.rule.attr.name)
# The script to invoke mypy against this target.
mypy_script = ctx.actions.declare_file(
"%s_mypy_script" % ctx.rule.attr.name,
)

# Generated files are located in a different root dir than source files
# Thus we need to let mypy know where to find both kinds in case in one analysis
# both kinds are present.
src_root_paths = sets.to_list(
sets.make(
[f.root.path for f in dep_cache_files.keys()] +
[f.root.path for f in cache_files.keys()]),
)

all_src_files = direct_src_files + list(dep_cache_files.keys())
if hasattr(base_rule.attr, "deps"):
transitive_srcs_depsets = _extract_transitive_deps(base_rule.attr.deps)
stub_files = _extract_stub_deps(base_rule.attr.deps)

if hasattr(base_rule.attr, "imports"):
mypypath_parts = _extract_imports(base_rule.attr.imports, ctx.label)

final_srcs_depset = depset(transitive = transitive_srcs_depsets +
[depset(direct = direct_src_files)])
src_files = [f for f in final_srcs_depset.to_list() if not _is_external_src(f)]
if not src_files:
return None

mypypath_parts += [src_f.dirname for src_f in stub_files]
mypypath = ":".join(mypypath_parts)

out = ctx.actions.declare_file("%s_dummy_out" % ctx.rule.attr.name)
exe = ctx.actions.declare_file(
"%s_mypy_exe" % ctx.rule.attr.name,
)

# Compose a list of the files needed for use. Note that aspect rules can use
# the project version of mypy however, other rules should fall back on their
# relative runfiles.
runfiles = ctx.runfiles(files = src_files + stub_files + [mypy_config_file])

src_root_paths = sets.to_list(
sets.make([f.root.path for f in src_files]),
)

ctx.actions.expand_template(
template = ctx.file._template,
output = exe,
substitutions = {
"{MYPY_BIN}": ctx.executable._mypy.path,
"{CACHE_MAP_TRIPLES}": " ".join(_sources_to_cache_map_triples(cache_files, dep_cache_files)),
"{MYPY_EXE}": ctx.executable._mypy_cli.path,
"{MYPY_ROOT}": ctx.executable._mypy_cli.root.path,
"{CACHE_MAP_TRIPLES}": " ".join(_sources_to_cache_map_triples(src_files)),
"{PACKAGE_ROOTS}": " ".join([
"--package-root " + shell.quote(path or ".")
for path in src_root_paths
]),
"{SRCS}": " ".join([
shell.quote(f.path)
for f in all_src_files
for f in src_files
]),
"{VERBOSE_OPT}": "--verbose" if debug else "",
"{VERBOSE_BASH}": "set -x" if debug else "",
"{OUTPUT}": mypy_out.path,
"{ADDITIONAL_MYPYPATH}": ":".join([p for p in src_root_paths if p]),
"{MYPY_INI}": mypy_config_file.path,
}
ctx.actions.expand_template(
template = ctx.file._mypy_sh,
output = mypy_script,
substitutions = substitutions,
is_executable = True,
)

# We want mypy to follow imports, so all the source files of the dependencies
# are need altoghther to check this target.
ctx.actions.run(
outputs = [mypy_out] + _flatten_cache_dict(cache_files),
inputs = depset(
all_src_files +
[mypy_config_file] +
_flatten_cache_dict(dep_cache_files) # cache generated by analyzing deps
),
tools = [ctx.executable._mypy],
executable = mypy_script,
mnemonic = "MyPy",
progress_message = "Type-checking %s" % ctx.label,
use_default_shell_env = True,
)
dep_cache_files.update(cache_files)
transitive_mypy_outs = []
for dep in ctx.rule.attr.deps:
if OutputGroupInfo in dep:
if hasattr(dep[OutputGroupInfo], "mypy"):
transitive_mypy_outs.append(dep[OutputGroupInfo].mypy)
"{VERBOSE_OPT}": "--verbose" if DEBUG else "",
"{VERBOSE_BASH}": "set -x" if DEBUG else "",
"{OUTPUT}": out.path if out else "",
"{MYPYPATH_PATH}": mypypath if mypypath else "",
"{MYPY_INI_PATH}": mypy_config_file.path,
},
is_executable = True,
)

return [
DefaultInfo(executable = exe, runfiles = runfiles),
MyPyAspectInfo(exe = exe, out = out),
]

def _mypy_aspect_impl(_, ctx):
if (ctx.rule.kind not in ["py_binary", "py_library", "py_test", "mypy_test"] or
ctx.label.workspace_root.startswith("external")):
return []

providers = _mypy_rule_impl(
ctx
)
if not providers:
return []

info = providers[0]
aspect_info = providers[1]

ctx.actions.run(
outputs = [aspect_info.out],
inputs = info.default_runfiles.files,
tools = [ctx.executable._mypy_cli],
executable = aspect_info.exe,
mnemonic = "MyPy",
progress_message = "Type-checking %s" % ctx.label,
use_default_shell_env = True,
)
return [
OutputGroupInfo(
# We may not need to run mypy against this target, but we request
# all its dependencies to be checked, recursively, but demanding the output
# of those checks.
mypy = depset([mypy_out] if direct_src_files else [], transitive=transitive_mypy_outs),
mypy = depset([aspect_info.out]),
),
MyPyAspectInfo(out = mypy_out if direct_src_files else None, cache = dep_cache_files),
]

mypy_aspect = aspect(
Expand Down
Loading

0 comments on commit 9eec61f

Please sign in to comment.