state-machine-operator

State machines in Kubernetes (and coming soon, Flux)! 🐦‍🔥

The Kubernetes operator provided here works with the python library of the same name, which can be used on bare metal (without the operator) to orchestrate jobs in Flux. Both are state machines and event driven.

Usage

Prerequisites

go version v1.22.0+
docker version 17.03+.
kubectl version v1.11.3+.
Access to a Kubernetes v1.11.3+ cluster.

1. Create Cluster

You can create a cluster locally (if your computer is chonky and can handle it) or use AWS. Here is locally:

kind create cluster --config ./examples/kind-config.yaml

2. State Machine Workflows

We provide two examples - one using the operator, and one manual for those that want to create the various objects and understand how the state machine operator (and corresponding Python library) work. For the manual examples, see the readme in examples. We will continue here with the operator example.

3. Install the Operator

The operator is built via its manifest in dist. For development:

# Install and load into general cluster
make test-deploy-recreate

# Install and load into kind
make test-deploy-kind

For non-development:

kubectl apply -f examples/dist/state-machine-operator.yaml

And apply the CRD to create the state machine. For interactive work, remember to set spec->workflow->interactive (or the same for any job under jobs) to true.

kubectl apply -f examples/state-machine.yaml

For the Mummi example (all code is private) see examples/mummi.

Job Variables

For each job script section, the following environment variables are provided for your application:

jobid: the job identifer, which defaults to job_ and can be set under the state machine workflow->prefix.
outpath: defaults to /tmp/out and is where your working directory will be, and where output is expected to be written.
registry: the registry where your artifact will be pushed
- pull_tag: the pull tag to use (if the workflow is pulling)
- push_tag: the push tag to use (if the workflow is pushing)

Take a look at the simple example examples/state-machine.yaml to see how push/pull is defined between steps. Given that these are found (with a tag) your artifact will be named <registry>:<jobid>:<tag> to be moved between steps.

Design

These are some design decisions I've made (of course open to discussion):

Initial Design

The workflow model is a state machine - state is derived from Kubernetes, always
The state machine manager manages units of job sequences (each a state machine) and each state machine orchestrates the logic of the jobs within it.
No application code (the jobs) is tangled with the state machine or manager
We assume jobs don't need to be paused / resumed / reclaimed like on HPC
Jobs are modular units with a config known how to be parsed by the manager, and the rest is provided to them.

TODO

We likely want to test with a real registry OR allow a volume bind (existing data) to the registry.
- Otherwise, artifacts deleted on cleanup. We could also have an option that allows keeping the ephemeral registry.

Questions

Under what conditions do we cancel / cleanup jobs?
I haven't tested a failure yet (or need to cleanup / delete)
We might want to do other cleanup (e.g., config maps)

License

HPCIC DevTools is distributed under the terms of the MIT license. All new contributions must be made under this license.

See LICENSE, COPYRIGHT, and NOTICE for details.

SPDX-License-Identifier: (MIT)

LLNL-CODE- 842614

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github		.github
api/v1alpha1		api/v1alpha1
chart		chart
cmd		cmd
config		config
docker/manager		docker/manager
examples		examples
hack		hack
img		img
internal/controller		internal/controller
python		python
.dockerignore		.dockerignore
.gitignore		.gitignore
.golangci.yml		.golangci.yml
COPYRIGHT		COPYRIGHT
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
NOTICE		NOTICE
PROJECT		PROJECT
README.md		README.md
go.mod		go.mod
go.sum		go.sum
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

state-machine-operator

Usage

Prerequisites

1. Create Cluster

2. State Machine Workflows

3. Install the Operator

Job Variables

Design

Initial Design

TODO

Questions

License

About

Releases

Packages

Languages

License

converged-computing/state-machine-operator

Folders and files

Latest commit

History

Repository files navigation

state-machine-operator

Usage

Prerequisites

1. Create Cluster

2. State Machine Workflows

3. Install the Operator

Job Variables

Design

Initial Design

TODO

Questions

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages