We are using bazel
as the build system.
Install bazel, if not already done:
# This installs bazelisk in ~/go/bin/bazelisk
go install github.com/bazelbuild/bazelisk@latest
Add shortcut in your ~/.bashrc
(or equivalent):
if [ -f ~/go/bin/bazelisk ]; then
alias bazel=~/go/bin/bazelisk
fi
This tool helps auto-formatting BUILD.bazel
file. Installation is similar:
go install github.com/bazelbuild/buildtools/buildifier@latest
Add shortcut in your ~/.bashrc
(or equivalent):
if [ -f ~/go/bin/buildifier ]; then
alias buildifier=~/go/bin/buildifier
fi
Note: You may need to configure your editor to run this on save.
To build the package, run:
> bazel build //:wheel
bazel
can be run from anywhere under the monorepo and it can accept absolute path or a relative path. For example,
snowml> bazel build :wheel
You can build an entire sub-tree as:
> bazel build //snowflake/...
-
Instead of using
py_binary
,py_library
andpy_test
rule from bazel, use those frombazel/py_rules.bzl
. Example, instead ofpy_library( name="my_lib", srcs=["my_lib.py"], )
use the following instead
load("//bazel:py_rules.bzl", "py_library") py_library( name="my_lib", srcs=["my_lib.py"], )
-
When using a
genrule
rule whose tool is apy_binary
, usepy_genrule
frombazel/py_rules.bzl
instead. Example, instead ofpy_binary( name="my_tool", srcs=["my_tool.py"], ) genrule( name="generate_something", cmd="$(location :my_tool)", tools=[":my_tool"] )
use the following instead
load("//bazel:py_rules.bzl", "py_binary", "py_genrule") py_binary( name="my_tool", srcs=["my_tool.py"], ) py_genrule( name="generate_something", cmd="$(location :my_tool)", tools=[":my_tool"] )
-
If the visibility of the target is not
//visibility:public
, you need to make sure your target is visible to//bazel:snowml_public_common
to make sure CI type checking work.
We use mypy to type-check our Python source files. mypy is integrated into our bazel environment.
The version of MyPy is specified in conda-env.yml
, just like other conda
packages we depend on.
bazel build --config=typecheck <your target>
Or you could run
./ci/type_check/type_check.sh -b <path_to_bazel>
You only need to specify -b <path_to_bazel>
if your bazel
is not in $PATH
or is an alias.
Similar to bazel build
, bazel test
can test any target. The target must be
a test target. It will run the target and report if PASSED
or FAILED
. It essentially build
s the target and then
run
it. You can also build and run separately.
TIP: If a test fails, there will be a log file, which is executable. You do not need to open via less
or editor
. You
can directly paste the path in command line.
Integration tests are configured to run against an existing Snowflake account. To run tests locally, make sure that you
have configured a SnowSQL config
file in <HOME_DIR>/.snowsql/config
(see Snowflake
documentation for configuration options).
For example, to run all autogenerated tests locally:
# Then run all autogenerated tests
bazel test //... --test_tag_filters=autogen
A lcov
coverage report can be generated by running
bazel coverage --combined_report=lcov <target pattern>
To get a human-readable report:
lcov --list $(bazel info output_path)/_coverage/_coverage_report.dat
To get an HTML report:
genhtml --output <output_dir> "$(bazel info output_path)/_coverage/_coverage_report.dat"
Both lcov
and genhtml
are part of the lcov
project. To install it
on MacOS:
brew install lcov
The unit test coverage report is generated periodically by a GitHub workflow. You can download the report in the artifacts generated by the action runs.
Another useful command is, bazel run
. This builds and then run the built target directly. Useful for binaries while debugging.
bazel
is pretty powerful and has lots of other commands. Read more here.
To introduce a third-party Python dependency, first check if it is available as a package in the Snowflake conda channel. Then modify requirements.yml, and run the following to re-generate all requirements files, including conda-env.yml:
bazel run --config=pre_build //bazel/requirements:sync_requirements
Then, your code can use the package as if it were "installed" in the Python environment.
Please provide the following fields when adding a new record:
name
: The name of the package. Set this if the package is available with the same name and is required in both PyPI
and conda
.
name_pypi
: The name of the package in PyPI
. Set this only to indicate that it is available in PyPI
only. You can
also set this along with name_conda
if the package has different names in PyPI
and conda
.
name_conda
: The name of the package in conda
. Set this only to indicate that it is available in conda
only. You
can also set this along with name_pypi
if the package has different names in PyPI
and conda
.
(At least one of these three fields should be set.)
dev_version
: The version of the package to be pinned in the dev environment. Set this if the package is available
with the same version and is required in both PyPI
and conda.
dev_version_pypi
: The version from PyPI
to be pinned in the dev environment. Set this only to indicate that it is
available in PyPI
only. You can also set this along with dev_version_conda
if the package has different versions in
PyPI
and conda
.
dev_version_conda
: The version from conda
to be pinned in the dev environment. Set this only to indicate that it is
available in conda
only. You can also set this along with dev_version_pypi
if the package has different versions in
PyPI
and conda
.
(At least one of these three fields should be set.)
require_gpu
: Set this to true if the package is only a requirement for the environment with GPUs.
from_channel
: Set this if the package is not available in the Snowflake Anaconda Channel
(https://repo.anaconda.com/pkgs/snowflake).
version_requirements
: The version requirements specifiers when this requirement is a dependency of the
snowflake-ml-python
release. Set this if the package is available with the same name and required in both PyPI
and
conda
.
version_requirements_pypi
: The version requirements specifiers when this requirement is a dependency of the
snowflake-ml-python
release via PyPI
. Set this only to indicate that it is required by the PyPI
release only. You
can also set this along with version_requirements_conda
if the package has different versions in PyPI
and conda
.
version_requirements_conda
: The version requirements specifiers when this requirement is a dependency of the
snowflake-ml-python
release via conda
. Set this only to indicate that it is required by the conda
release only.
You can also set this along with version_requirements_pypi
if the package has different versions in PyPI
and conda
.
(At least one of these three fields must be set to indicate that this package is a dependency of the release. If you don't want to constrain the version, set the field to an empty string.)
requirements_extra_tags
: Set this to indicate that the package is an extras dependency of snowflake-ml-python
. This
requirement will be added to all extras tags specified here, and an all
extras tag will be auto-generated to include
all extras requirements. All extras requirements will be labeled as run_constrained
in conda's meta.yaml.
tags
: Set tags to filter some of the requirements in specific cases. The current valid tags include:
deployment_core
: Used by model deployment to indicate dependencies required to execute model deployment code on the server-side.build_essential
: Used to indicate the packages composing the build environment.
Example:
- name: pandas
name_pypi: pandas-pypi-name
dev_version: 1.2.0
dev_version_pypi: 1.2.0-pypi
version_requirements: ">=1.0.0"
version_requirements_pypi: ">=1.0.0"
from_channel: "conda-forge"
requirements_extra_tags:
- pandas
tags:
- deployment_core
- build_essential
Write Python unittest
style unit tests. Pytest is allowed, but not recommended.
Use absl.testing.absltest
as a drop-in replacement of unittest
.
For example:
# instead of
# import unittest
from absl.testing import absltest
# instead of
# from unittest import TestCase, main
from absl.testing.absltest import TestCase, main
# Call main.
if __name__ == '__main__':
absltest.main()
absltest
provides better bazel
integration which produces a more detailed XML
test report. The test report is picked up by a Github workflow to provide a nice UI
for test results.
Make each unit test file its own runnable py_test
target and use the main()
function provided by snowflake.ml.test_utils.pytest_driver
.
For example:
from snowflake.ml.utils import pytest_driver
def test_case():
assert some_feature()
if __name__ == "__main__":
pytest_driver.main()
pytest_driver
contains bazel
integration that allows pytest
to produce a XML
test report.
When you add a new test file, you should always ensure the existence of a if __name__ == "__main__":
block, otherwise,
the test file will not be instructed by bazel. We have a test wrapper here to ensure that the
test will fail if you forget that part.
To test if your code is working in store procedure or not simply, you could work based on CommonTestBase
in
tests/integ/snowflake/ml/test_utils/common_test_base.py
. An example of such test could be found in
tests/integ/snowflake/ml/_internal/file_utils_integ_test.py
.
To write a such test, you need to
- Your test cannot have a parameter called
_sproc_test_mode
. - Let your test case inherit from
common_test_base.CommonTestBase
. - Remove all Snowpark Session creation in your test, and use
self.session
to access the session if needed. - If you write your own
setUp
andtearDown
method, remember to callsuper().setUp()
orsuper().tearDown()
. - Decorate your test method with
common_test_base.CommonTestBase.sproc_test()
. If you want your test running in store procedure only rather than both locally and in store procedure, setlocal=False
. If you don't want to test with caller's rights, settest_callers_rights=False
. (Owner's rights store procedure is always tested)
To test if your code is compatible with previous version simply, you could work based on CommonTestBase
in
tests/integ/snowflake/ml/test_utils/common_test_base.py
. An example of such test could be found in
tests/integ/snowflake/ml/registry/model_registry_compat_test.py
.
To write a such test, you need to
-
Your test cannot have a parameter called
_snowml_pkg_ver
. -
Let your test case inherit from
common_test_base.CommonTestBase
. -
Remove all Snowpark Session creation in your test, and use
self.session
to access the session if needed. -
If you write your own
setUp
andtearDown
method, remember to callsuper().setUp()
orsuper().tearDown()
. -
Write a factory method in your test class that return a tuple of a function and its parameters as a tuple. The function will be run as a store procedure in the environment with previous version of library.
Note: Since the function will be created as a store procedure, the first argument must be a Snowpark Session. The arguments tuple you provided via the factory method does not require to include the session object.
Note: To avoid any objects from current environment affecting the result, instead of using
cloudpickle
to pickle the function, the function will be created as a Python file and registered as a store procedure. This means you cannot use any object outside of the function, and if you want to import anything, you need to import inside the function definition. So it would help if you make your prepare function as simple as possible. -
Decorate your test method with
common_test_base.CommonTestBase.compatibility_test
, providing the factory method you created in the above step, optional version range to test with, as well as additional package requirements.
Pull requests against the main branch are subject to pre-commit
checks. Those checks enforce the code style.
You can make sure the checks can pass by installing the pre-commit
hooks to your local repo
(instructions). Those hooks will be invoked when you commit locally,
and they fix the style violations in-place. The minimal pre-commit
version required is 3.4.0
.
Tip: if you want to isolate those fixes, avoid the -a
the option in git commit
. This way the automated changes
will be unstaged changes.
The darglint pre-commit hook lints docstrings to make sure they conform to the Google style guide for docstrings. Function docstrings must contain "Args" section with input value descriptions, "Returns" section describing output, and "Raises" section enumerating the exceptions that the function can raise. Darglint will ensure that all input args are present in the docstring and is sensitive to whitespace (e.g. args should be indented the correct number of spaces). Refer to the list of darglint error codes for guidance.
Here are few good plugins to use:
- Python
- Pylance static checking
- Bazel
- You need to configure
buildifier
in settings for auto-formattingBUILD.bazel
files
- You need to configure
- Black Python Formatter
- Flake8 Linter