Skip to content
This repository was archived by the owner on Jan 6, 2023. It is now read-only.

Commit 7742a64

Browse files
Jeffwanfacebook-github-bot
authored andcommitted
Implement TorchElastic Controller for Kubernetes (#63)
Summary: Please help review on the TorchElastic operator. This is based on design here #36 (needs update). API has been changed a little bit, idea is still the same. Here's the code structure ``` . ├── CONTRIBUTING.md ├── Dockerfile ├── INSTRUCTION.md ├── Makefile ├── PROJECT ├── README.md ├── api │   └── v1alpha1 │   ├── constants.go │   ├── elasticjob_types.go ---> API Definition │   ├── groupversion_info.go │   └── zz_generated.deepcopy.go ├── bin │   └── README.md ├── classy-vision.yaml ├── config. ---> K8s manifest, some of them are not being used yet, come from skeleton │   ├── certmanager │   ├── crd │   ├── default │   ├── manager │   ├── prometheus │   ├── rbac │   ├── samples │   └── webhook ├── controllers │   ├── elasticjob_controller.go. --> controller code │   ├── expectation.go │   ├── job.go │   ├── pod.go │   ├── service.go │   ├── suite_test.go │   └── util.go ├── go.mod ├── go.sum ├── hack │   └── boilerplate.go.txt └── main.go ``` A few main files to review. 1. ElasticJob API - api/v1alpha1/elasticjob_types.go 2. Controller codes - controllers/ 3. Docs - CONTRIBUTING.md and INSTRUCTION.md 4. Naming. The api group is `elasticjobs.elastic.pytorch.org`. 5. License - use Apache2 as header. Author is ` The PyTorch Elastic Authors` Pull Request resolved: #63 Reviewed By: drdarshan Differential Revision: D20658020 Pulled By: kiukchung fbshipit-source-id: e8e7372f48351fb1ae60934bc6fc3a4c1ce83984
1 parent a79d3ce commit 7742a64

39 files changed

+2989
-0
lines changed

.gitignore

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,3 +2,24 @@ docs/src
22
docs/build
33
docs/torchelastic.docset
44
**/__pycache__
5+
6+
# Mac OS X files
7+
.DS_Store
8+
9+
# Binaries for programs and plugins
10+
*.exe
11+
*.dll
12+
*.so
13+
*.dylib
14+
15+
# Test binary, build with `go test -c`
16+
*.test
17+
18+
# Output of the go coverage tool, specifically when used with LiteIDE
19+
*.out
20+
21+
# IDE
22+
**/.idea/
23+
24+
# Operator Binary
25+
kubernetes//bin/manager

kubernetes/DEVELOPMENT.md

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
## Development Guidance
2+
3+
This project uses go modules. We suggest to use golang 1.13.x for development and make sure`GO111MODULE` is enabled.
4+
5+
### Setup development environment
6+
7+
Fork [PyTorch/Elastic](https://github.com/pytorch/elastic)
8+
9+
```shell
10+
mkdir -p ${GOPATH}/src/github.com/pytorch/
11+
cd ${GOPATH}/src/github.com/pytorch
12+
git clone [email protected]:${GITHUB_USER}/elastic.git
13+
14+
# operator codes is under kubernetes directory
15+
cd elastic/kubernetes
16+
```
17+
18+
### Download dependencies.
19+
20+
```shell
21+
go mod download
22+
```
23+
24+
### Build the binary locally
25+
26+
```shell
27+
make manager
28+
```
29+
30+
### Test Binaries locally
31+
32+
```shell
33+
./bin/manager
34+
```
35+
36+
### Run Tests
37+
38+
```shell
39+
go test ./... -coverprofile cover.out
40+
```
41+
42+
### Build container image
43+
44+
```shell
45+
# It requires you to build binary locally first.
46+
docker build -t ${your_dockerhub_username}/torch-elastic-operator:latest .
47+
```

kubernetes/Dockerfile

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# Build the manager binary
2+
FROM golang:1.13 as builder
3+
4+
WORKDIR /workspace
5+
# Copy the Go Modules manifests
6+
COPY go.mod go.mod
7+
COPY go.sum go.sum
8+
# cache deps before building and copying source so that we don't need to re-download as much
9+
# and so that source changes don't invalidate our downloaded layer
10+
RUN go mod download
11+
12+
# Copy the go source
13+
COPY main.go main.go
14+
COPY api/ api/
15+
COPY controllers/ controllers/
16+
17+
# Build
18+
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 GO111MODULE=on go build -a -o manager main.go
19+
20+
# Use distroless as minimal base image to package the manager binary
21+
# Refer to https://github.com/GoogleContainerTools/distroless for more details
22+
FROM gcr.io/distroless/static:nonroot
23+
WORKDIR /
24+
COPY --from=builder /workspace/manager .
25+
USER nonroot:nonroot
26+
27+
ENTRYPOINT ["/manager"]

kubernetes/Makefile

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
2+
# Image URL to use all building/pushing image targets
3+
IMG ?= controller:latest
4+
# Produce CRDs that work back to Kubernetes 1.11 (no version conversion)
5+
CRD_OPTIONS ?= "crd:trivialVersions=true"
6+
7+
# Get the currently used golang install path (in GOPATH/bin, unless GOBIN is set)
8+
ifeq (,$(shell go env GOBIN))
9+
GOBIN=$(shell go env GOPATH)/bin
10+
else
11+
GOBIN=$(shell go env GOBIN)
12+
endif
13+
14+
all: manager
15+
16+
# Run tests
17+
test: generate fmt vet manifests
18+
go test ./... -coverprofile cover.out
19+
20+
# Build manager binary
21+
manager: generate fmt vet
22+
go build -o bin/manager main.go
23+
24+
# Run against the configured Kubernetes cluster in ~/.kube/config
25+
run: generate fmt vet manifests
26+
go run ./main.go
27+
28+
# Install CRDs into a cluster
29+
install: manifests
30+
kustomize build config/crd | kubectl apply -f -
31+
32+
# Uninstall CRDs from a cluster
33+
uninstall: manifests
34+
kustomize build config/crd | kubectl delete -f -
35+
36+
# Deploy controller in the configured Kubernetes cluster in ~/.kube/config
37+
deploy: manifests
38+
cd config/manager && kustomize edit set image controller=${IMG}
39+
kustomize build config/default | kubectl apply -f -
40+
41+
# Generate manifests e.g. CRD, RBAC etc.
42+
manifests: controller-gen
43+
$(CONTROLLER_GEN) $(CRD_OPTIONS) rbac:roleName=manager-role webhook paths="./..." output:crd:artifacts:config=config/crd/bases
44+
45+
# Run go fmt against code
46+
fmt:
47+
go fmt ./...
48+
49+
# Run go vet against code
50+
vet:
51+
go vet ./...
52+
53+
# Generate code
54+
generate: controller-gen
55+
$(CONTROLLER_GEN) object:headerFile=./hack/boilerplate.go.txt paths="./..."
56+
57+
# Build the docker image
58+
docker-build: test
59+
docker build . -t ${IMG}
60+
61+
# Push the docker image
62+
docker-push:
63+
docker push ${IMG}
64+
65+
# find or download controller-gen
66+
# download controller-gen if necessary
67+
controller-gen:
68+
ifeq (, $(shell which controller-gen))
69+
@{ \
70+
set -e ;\
71+
CONTROLLER_GEN_TMP_DIR=$$(mktemp -d) ;\
72+
cd $$CONTROLLER_GEN_TMP_DIR ;\
73+
go mod init tmp ;\
74+
go get sigs.k8s.io/controller-tools/cmd/[email protected] ;\
75+
rm -rf $$CONTROLLER_GEN_TMP_DIR ;\
76+
}
77+
CONTROLLER_GEN=$(GOBIN)/controller-gen
78+
else
79+
CONTROLLER_GEN=$(shell which controller-gen)
80+
endif

kubernetes/PROJECT

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
domain: pytorch.org
2+
repo: github.com/pytorch/elastic/kubernetes
3+
resources:
4+
- group: elastic
5+
kind: ElasticJob
6+
version: v1alpha1
7+
version: "2"

0 commit comments

Comments
 (0)