This document describes the development guidelines for contributing to the KFP-Tekton SDK/compiler. Details about the required contributor license agreement (CLA) and the code review process can be found in the CONTRIBUTING.md document. A quick-start guide with general setup instruction, trouble shooting guide and technical limitations can be found in the SDK README
- Development Prerequisites
- Origins of the KFP-Tekton Compiler Code
- Adding New Code
- Coding Style
- Testing
- License Headers
- Removed Features
Python
: version3.8
or later (new code must maintain compatibility with3.8
)Kubernetes
Cluster: version1.25
(required by Kubeflow and Tekton0.50.1
)kubectl
CLI: required to deploy Tekton pipelines to Kubernetes clusterTekton
Deployment: version0.50.1
or greater, required for end-to-end testingtkn
CLI: version0.30.1
or greater, required for end-to-end testing of Tekton pipelinesKubeflow Pipelines
Deployment: required for some end-to-end tests
A working Tekton cluster deployment is required to perform end-to-end tests of the pipelines generated
by the kfp_tekton
compiler. The Tekton CLI is useful to start a pipeline and analyze the pipeline logs.
Follow the instructions listed here or simply run:
kubectl apply -f https://storage.googleapis.com/tekton-releases/pipeline/previous/v0.47.0/release.yaml
Note, if your container runtime does not support image-reference:tag@digest
(like cri-o used in OpenShift 4.x),
use release.notags.yaml
instead.
Optionally, for convenience, set the default namespace to tekton-pipelines
:
kubectl config set-context --current --namespace=tekton-pipelines
In order to utilize the latest features and functions of the kfp-tekton
compiler,
it may be necessary to install Tekton from a nightly built or to build it from the
master
branch. Currently there are no features that require a special build.
Follow the instructions here.
Mac OS users can install the Tekton CLI using the homebrew
formula:
brew tap tektoncd/tools
brew install tektoncd/tools/tektoncd-cli
Follow the installation instructions here, i.e.:
kubectl apply --filename https://storage.googleapis.com/tekton-releases/dashboard/previous/v0.35.0/release.yaml
The Tekton Dashboard can be accessed through its ClusterIP
service by running kubectl proxy
or the service can
be patched to expose a public NodePort
IP:
kubectl patch svc tekton-dashboard -n tekton-pipelines --type='json' -p '[{"op":"replace","path":"/spec/type","value":"NodePort"}]'
To open the dashboard run:
TKN_DASHBOARD_SVC_PORT=$(kubectl -n tekton-pipelines get service tekton-dashboard -o jsonpath='{.spec.ports[0].nodePort}')
PUBLIC_IP=$(kubectl get nodes -o jsonpath='{.items[0].status.addresses[?(@.type=="ExternalIP")].address}')
open "http://${PUBLIC_IP}:${TKN_DASHBOARD_SVC_PORT}/#/pipelineruns"
The source code of the kfp-tekton
compiler was created as an extension of the Kubeflow Pipelines SDK Compiler
.
This approach allowed us to leverage much of the existing Kubeflow Pipelines Python SDK code,
like the DSL and components packages, but "override" or "replace" those parts of the compiler code required to generate
the Tekton YAML instead of Argo YAML. Since the KFP SDK was not designed and implemented to easily be extended,
monkey-patching was used to replace non-class methods and functions at runtime.
In order for the monkey patch to work properly, the kfp-tekton
compiler source code has to be aligned with a
specific version of the kfp
SDK compiler. As of now the kfp-tekton
SDK version is 1.8.0
which is aligned with KFP
SDK version 1.8.22
.
The Python package structure as well as the module names and method signatures closely mirror those of the
Kubeflow Pipelines Python SDK
.
This helps keeping track of all the code that had to modified and will make merging (some of) the code back into KFP or
identify pieces of code that need to be refactored in KFP in order to accommodate various execution platforms.
When it is necessary to bring further methods from kfp
compiler package into the kfp-tekton
compiler package, keep
the original method names and signatures as well as their position inside their respective Python modules.
Be sure to run make verify
before committing your code changes and creating a pull request:
$ make verify
check_license: OK
lint: OK
unit_test: OK
report: OK
verify: OK
Most of the functions provided by the kfp.compiler.compiler.Compiler
are instance based and can be overridden in
kfp_tekton.compiler.compiler.TektonCompiler
.
Static Compiler
methods may need to be also be added to the monkey patch described in the next
section unless they are only used by other methods that are already overridden in TektonCompiler
.
Be careful not to mix inheritance and monkey patching. A method which in its body calls on its
super().
implementation must not be added to the list of methods that get dynamically replaced
via the monkey patch.
When code changes are required to static helper methods in kfp.compiler
the "overridden" methods
should be added to their respective modules in kfp_tekton.compiler
and added to the monkey patch
which dynamically replaces the code in the kfp
at runtime.
As of May 2020, the monkey patch was no longer needed and removed since all of the patched methods
are now invoked directly (and exclusively) by other code implemented in the kfp_tekton
compiler.
Details on how to implement a monkey patch can be found in the
Removed Features
section if it becomes necessary to reintroduce the monkey patch for any methods we need to "override"
which are not exclusively called directly by other methods we already implemented in the kfp_tekton
compiler.
The Python code in this project follows the Google Python style guide. You can make use of a yapf configuration file to auto-format Python code and adopt the Google Python style. We encouraged to lint Python docstrings using docformatter. Our CI/CD integration with Travis uses Flake8 and the current set of enforced rules can be found in Makefile:
.PHONY: lint
lint: venv ## Check Python code style compliance
@which flake8 > /dev/null || pip install flake8
flake8 sdk/python --count --show-source --statistics \
--select=E9,E2,E3,E5,F63,F7,F82,F4,F841,W291,W292 \
--per-file-ignores sdk/python/tests/compiler/testdata/*:F841 \
--max-line-length=140 && echo OK
Make sure to run make lint
before you create a pull request (PR) that includes changes to Python source files.
Before committing code changes to the compiler make sure to run the compiler unit tests:
Ideally, whenever a code change to the compiler results in modified YAML, an end-to-end test should be run on a Tekton cluster.
Any new functionality being added to the kfp_tekton.compiler
should be accompanied by a new unit test in sdk/python/tests/compiler/compiler_tests.py
Typically a test case comes with a minimal Python DSL script and a "golden" YAML file in sdk/python/tests/compiler/testdata
.
The "golden" YAML file contains the expected compiler output. The unit tests use the "golden" YAML files to compare
the current compiler output with the previously expected compiler output.
make unit_test
If the pipeline script compiles but does not match the "golden" YAML, then the unit test should fail.
If the change in the output YAML is desired, then the "golden" YAML needs to be regenerated, i.e. by
temporarily enabling the GENERATE_GOLDEN_YAML
flag in compiler_tests.py
, or by using the
environment variable:
make test GENERATE_GOLDEN_YAML="True"
The unit tests are designed to verify the YAML produced by the compiler matches the expected, previously generated "golden" YAML. End-to-end (E2E) tests are necessary to verify that the generated Tekton YAML is syntactically valid and that the pipeline can be executed successfully on a Tekton cluster.
A manual E2E test can be performed in the following manner:
kubectl apply -f <pipeline.yaml>
tkn pipeline start <pipeline-name> --showlog
Some E2E tests require a Kubernetes cluster with Kubeflow Pipelines installed in order to make use of the
artifact storage provided by Minio and need to run in the kubeflow
namespace in order to
access secrets:
kubectl apply -f <pipeline.yaml> -n kubeflow
tkn pipeline start <pipeline-name> --showlog -n kubeflow
You can also run the dynamically generated end-to-end test suite which takes all of the "golden" YAML files from the
compiler testdata
directory and runs them on a Kubernetes cluster, prerequisite that the environment variable
KUBECONFIG
is set and the K8s cluster has both Kubeflow Pipelines and Tekton Pipelines installed:
make e2e_test
After adding new E2E test case or after modifying existing ones, regenerate the "golden" log files:
make e2e_test GENERATE_GOLDEN_E2E_LOGS=True
The goal of the first phase of the KFP-Tekton project was to ensure that most or all of the KFP compiler features are
working for Tekton. That is, the kfp_tekton
compiler can compile all Python DSL test scripts in the KFP compiler
testdata
folder.
To update the "Compiler Status Report" use the output of this command:
make report
All source files should have the following license header. Adjust the year accordingly to reflect the year the file was added, and the last year it was modified:
# Copyright 2019-2023 kubeflow.org
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
Use the check_license
Make target to verify all Python and YAML files contain the license header:
make check_license
As of May 2020, the monkey patch was no longer needed and removed since all of the patched methods were
now invoked directly (and exclusively) by other code implemented in the kfp_tekton
compiler. However it may become
necessary again in the future to reintroduce the monkey patch for any methods we need to "override" which are not
exclusively called directly by other methods we already implemented in the kfp_tekton
compiler.
sdk/python/kfp_tekton/compiler/__init__.py
:
def monkey_patch():
"""
Overriding (replacing) selected methods/function in the KFP SDK compiler package.
This is a temporary hack during early development of the KFP-Tekton compiler.
"""
import kfp
from kfp.compiler._data_passing_rewriter import fix_big_data_passing
from kfp.compiler._k8s_helper import convert_k8s_obj_to_json
from kfp.compiler._op_to_template import _op_to_template, _process_base_ops
from kfp.compiler.compiler import Compiler as KFPCompiler
from ._data_passing_rewriter import fix_big_data_passing as tekton_fix_big_data_passing
from ._k8s_helper import convert_k8s_obj_to_json as tekton_convert_k8s_obj_to_json
from ._op_to_template import _op_to_template as tekton_op_to_template
from ._op_to_template import _process_base_ops as tekton_process_base_ops
from .compiler import TektonCompiler
kfp.compiler._data_passing_rewriter.fix_big_data_passing = tekton_fix_big_data_passing
kfp.compiler._k8s_helper.convert_k8s_obj_to_json = tekton_convert_k8s_obj_to_json
kfp.compiler._op_to_template._op_to_template = tekton_op_to_template
kfp.compiler._op_to_template._process_base_ops = tekton_process_base_ops
KFPCompiler._resolve_value_or_reference = TektonCompiler._resolve_value_or_reference
KFPCompiler._create_dag_templates = TektonCompiler._create_dag_templates
KFPCompiler._create_and_write_workflow = TektonCompiler._create_and_write_workflow
KFPCompiler._create_pipeline_workflow = TektonCompiler._create_pipeline_workflow
KFPCompiler._create_workflow = TektonCompiler._create_workflow
KFPCompiler._group_to_dag_template = TektonCompiler._group_to_dag_template
KFPCompiler._write_workflow = TektonCompiler._write_workflow
try:
print("Applying KFP-Tekton compiler patch")
monkey_patch()
except Exception as error:
traceback.print_exc()
print("Failed to apply KFP-Tekton compiler patch")
sys.exit(1)
Note: Since the monkey patch gets triggered by importing any member of the kfp_tekton.compiler
module, we try to
avoid using top-level imports of any members in kfp_tekton.compiler
in pipeline DSL scripts.
Instead use local imports to avoid triggering the monkey-patch when the original KFP compiler is used to compile a
pipeline DSL script using KFP's dsl-compile --py <DSL script>
command.
if __name__ == '__main__':
# don't use top-level import of TektonCompiler to prevent monkey-patching KFP compiler when using KFP's dsl-compile
from kfp_tekton.compiler import TektonCompiler
TektonCompiler().compile(pipeline_func, __file__.replace('.py', '.yaml'))