Contributing Guide

Build system

We are using bazel as the build system.

Installation

Bazel

Install bazel, if not already done:

# This installs bazelisk in ~/go/bin/bazelisk
go install github.com/bazelbuild/bazelisk@latest

Add shortcut in your ~/.bashrc (or equivalent):

if [ -f ~/go/bin/bazelisk ]; then
  alias bazel=~/go/bin/bazelisk
fi

Buildifier

This tool helps auto-formatting BUILD.bazel file. Installation is similar:

go install github.com/bazelbuild/buildtools/buildifier@latest

Add shortcut in your ~/.bashrc (or equivalent):

if [ -f ~/go/bin/buildifier ]; then
  alias buildifier=~/go/bin/buildifier
fi

Note: You may need to configure your editor to run this on save.

Build

To build the package, run:

> bazel build //snowflake/ml:wheel

bazel can be run from anywhere under the monorepo and it can accept absolute path or a relative path. For example,

snowflake/ml> bazel build :wheel

You can build an entire sub-tree as:

> bazel build //snowflake/...

Notes when you add new target in a `BUILD.bazel` file

Instead of using py_binary, py_library and py_test rule from bazel, use those from bazel/py_rules.bzl. Example, instead of

py_library(
    name="my_lib",
    srcs=["my_lib.py"],
  )

use the following instead

load("//bazel:py_rules.bzl", "py_library")

py_library(
    name="my_lib",
    srcs=["my_lib.py"],
  )

When using a genrule rule whose tool is a py_binary, use py_genrule from bazel/py_rules.bzl instead. Example, instead of

py_binary(
    name="my_tool",
    srcs=["my_tool.py"],
  )

genrule(
    name="generate_something",
    cmd="$(location :my_tool)",
    tools=[":my_tool"]
)

use the following instead

load("//bazel:py_rules.bzl", "py_binary", "py_genrule")

py_binary(
    name="my_tool",
    srcs=["my_tool.py"],
  )

py_genrule(
    name="generate_something",
    cmd="$(location :my_tool)",
    tools=[":my_tool"]
)

Type-check

mypy

We use mypy to type-check our Python source files. mypy is integrated into our bazel environment.

The version of MyPy is specified in conda-env.yml, just like other conda packages we depend on.

Invoke MyPy locally

bazel build --config=typecheck <your target>

Or you could run

./ci/type_check.sh -b <path_to_bazel>

You only need to specify -b <path_to_bazel> if your bazel is not in $PATH or is an alias.

Test

Similar to bazel build, bazel test can test any target. The target must be a test target. It will run the target and report if PASSED or FAILED. It essentially builds the target and then run it. You can also build and run separately.

TIP: If a test fails, there will be a log file, which is executable. You do not need to open via less or editor. You can directly paste the path in command line.

Integration tests are configured to run against an existing Snowflake account. To run tests locally, make sure that you have configured a SnowSQL config file in <HOME_DIR>/.snowsql/config (see Snowflake documentation for configuration options).

For example, to run all autogenerated tests locally:

# Then run all autogenerated tests
bazel test //... --test_tag_filters=autogen

Coverage

A lcov coverage report can be generated by running

bazel coverage --combined_report=lcov <target pattern>

To get a human-readable report:

lcov --list $(bazel info output_path)/_coverage/_coverage_report.dat

To get an HTML report:

genhtml --output <output_dir> "$(bazel info output_path)/_coverage/_coverage_report.dat"

Both lcov and genhtml are part of the lcov project. To install it on MacOS:

brew install lcov

The unit test coverage report is generated periodically by a GitHub workflow. You can download the report in the artifacts generated by the action runs.

Run

Another useful command is, bazel run. This builds and then run the built target directly. Useful for binaries while debugging.

Other commands

bazel is pretty powerful and has lots of other commands. Read more here.

Python dependencies

To introduce a third-party Python dependency, first check if it is available as a package in the Snowflake conda channel. Then modify requirements.yml, and run the following to re-generate all requirements files, including conda-env.yml:

bazel run --config=pre_build //bazel/requirements:sync_requirements

Then, your code can use the package as if it were "installed" in the Python environment.

Adding a new dependencies

Please provide the following fields when adding a new record:

Package Name Fields

name: The name of the package. Set this if the package is available with the same name and is required in both PyPI and conda.

name_pypi: The name of the package in PyPI. Set this only to indicate that it is available in PyPI only. You can also set this along with name_conda if the package has different names in PyPI and conda.

name_conda: The name of the package in conda. Set this only to indicate that it is available in conda only. You can also set this along with name_pypi if the package has different names in PyPI and conda.

(At least one of these three fields should be set.)

Development Version Fields

dev_version: The version of the package to be pinned in the dev environment. Set this if the package is available with the same version and is required in both PyPI and conda.

dev_version_pypi: The version from PyPI to be pinned in the dev environment. Set this only to indicate that it is available in PyPI only. You can also set this along with dev_version_conda if the package has different versions in PyPI and conda.

dev_version_conda: The version from conda to be pinned in the dev environment. Set this only to indicate that it is available in conda only. You can also set this along with dev_version_pypi if the package has different versions in PyPI and conda.

(At least one of these three fields should be set.)

Snowflake Anaconda Channel

from_channel: Set this if the package is not available in the Snowflake Anaconda Channel (https://repo.anaconda.com/pkgs/snowflake).

Version Requirements Fields (for `snowflake-ml-python` release)

version_requirements: The version requirements specifiers when this requirement is a dependency of the snowflake-ml-python release. Set this if the package is available with the same name and required in both PyPI and conda.

version_requirements_pypi: The version requirements specifiers when this requirement is a dependency of the snowflake-ml-python release via PyPI. Set this only to indicate that it is required by the PyPI release only. You can also set this along with version_requirements_conda if the package has different versions in PyPI and conda.

version_requirements_conda: The version requirements specifiers when this requirement is a dependency of the snowflake-ml-python release via conda. Set this only to indicate that it is required by the conda release only. You can also set this along with version_requirements_pypi if the package has different versions in PyPI and conda.

(At least one of these three fields must be set to indicate that this package is a dependency of the release. If you don't want to constrain the version, set the field to an empty string.)

Extras Tags and Tags

requirements_extra_tags: Set this to indicate that the package is an extras dependency of snowflake-ml-python. This requirement will be added to all extras tags specified here, and an all extras tag will be auto-generated to include all extras requirements. All extras requirements will be labeled as run_constrained in conda's meta.yaml.

tags: Set tags to filter some of the requirements in specific cases. The current valid tags include:

deployment_core: Used by model deployment to indicate dependencies required to execute model deployment code on the server-side.
build_essential: Used to indicate the packages composing the build environment.

Example:

- name: pandas
  name_pypi: pandas-pypi-name
  dev_version: 1.2.0
  dev_version_pypi: 1.2.0-pypi
  version_requirements: ">=1.0.0"
  version_requirements_pypi: ">=1.0.0"
  from_channel: "conda-forge"
  requirements_extra_tags:
    - pandas
  tags:
    - deployment_core
    - build_essential

Unit Testing

Write pytest or Python unittest style unit tests.

`unittest`

Use absl.testing.absltest as a drop-in replacement of unittest.

For example:

# instead of
# import unittest
from absl.testing import absltest

# instead of
# from unittest import TestCase, main
from absl.testing.absltest import TestCase, main

absltest provides better bazel integration which produces a more detailed XML test report. The test report is picked up by a Github workflow to provide a nice UI for test results.

`pytest`

Make each unit test file its own runnable py_test target and use the main() function provided by snowflake.ml.test_utils.pytest_driver.

For example:

from snowflake.ml.utils import pytest_driver

def test_case():
    assert some_feature()

if __name__ == "__main__":
    pytest_driver.main()

pytest_driver contains bazel integration that allows pytest to produce a XML test report.

Important Notes

When you add a new test file, you should always ensure the existence of a if __name__ == "__main__": block, otherwise, the test file will not be instructed by bazel. We have a test wrapper here to ensure that the test will fail if you forget that part.

`pre-commit`

Pull requests against the main branch are subject to pre-commit checks. Those checks enforce the code style.

You can make sure the checks can pass by installing the pre-commit hooks to your local repo (instructions). Those hooks will be invoked when you commit locally, and they fix the style violations in-place. The minimal pre-commit version required is 3.4.0.

Tip: if you want to isolate those fixes, avoid the -a the option in git commit. This way the automated changes will be unstaged changes.

Darglint

The darglint pre-commit hook lints docstrings to make sure they conform to the Google style guide for docstrings. Function docstrings must contain "Args" section with input value descriptions, "Returns" section describing output, and "Raises" section enumerating the exceptions that the function can raise. Darglint will ensure that all input args are present in the docstring and is sensitive to whitespace (e.g. args should be indented the correct number of spaces). Refer to the list of darglint error codes for guidance.

Editors

VSCode

Here are few good plugins to use:

Python
Pylance static checking
Bazel
- You need to configure buildifier in settings for auto-formatting BUILD.bazel files
Black Python Formatter
Flake8 Linter

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CONTRIBUTING.md

CONTRIBUTING.md

Contributing Guide

Build system

Installation

Bazel

Buildifier

Build

Notes when you add new target in a `BUILD.bazel` file

Type-check

mypy

Invoke MyPy locally

Test

Coverage

Run

Other commands

Python dependencies

Adding a new dependencies

Package Name Fields

Development Version Fields

Snowflake Anaconda Channel

Version Requirements Fields (for `snowflake-ml-python` release)

Extras Tags and Tags

Unit Testing

`unittest`

`pytest`

Important Notes

`pre-commit`

Darglint

Editors

VSCode

Files

CONTRIBUTING.md

Latest commit

History

CONTRIBUTING.md

File metadata and controls

Contributing Guide

Build system

Installation

Bazel

Buildifier

Build

Notes when you add new target in a BUILD.bazel file

Type-check

mypy

Invoke MyPy locally

Test

Coverage

Run

Other commands

Python dependencies

Adding a new dependencies

Package Name Fields

Development Version Fields

Snowflake Anaconda Channel

Version Requirements Fields (for snowflake-ml-python release)

Extras Tags and Tags

Unit Testing

unittest

pytest

Important Notes

pre-commit

Darglint

Editors

VSCode

Notes when you add new target in a `BUILD.bazel` file

Version Requirements Fields (for `snowflake-ml-python` release)

`unittest`

`pytest`

`pre-commit`