Skip to content

Commit edbeedb

Browse files
jlewik8s-ci-robot
authored andcommitted
Create a script to build the jupyter-web-app image and create a PR to update it (kubeflow#3066)
* Create a script to auto build the Jupyter web app image and update the prototype * The script works as follows * Use git to determine the last commit at which the source to the Jupyter web app changed * Look for an image tagged with that commit * If no such image exists build a new image * Update the ksonnet prototype to use the new image * Push the commit to git; we will use the kubeflow-bot account * Use the hub CLI to create the pull request if one doesn't already exist * Create a Makefile to build the jupyter web app * Add a git label to the image so we can compare against the current image. * Put the new python code in python package kubeflow/kubeflow * We now have namespace packaging working * Provide a K8s job to run it. In a follow on PR we will turn this into a cron job Miscellaneous changes: Specify the build and publish projects separately. * Update pylintrcfile to always do no-self-use * Fix some lint issues * Update the README. * Address comments * Replace regex parsing (Which is not very readable) with simpler parsing. * Resolve conflicts. * Fix some bugs. * Fix bug with the image. * Latest. * Restore changes to the Jupyter web app image.
1 parent 308b79f commit edbeedb

13 files changed

+550
-2
lines changed

.gitignore

-1
Original file line numberDiff line numberDiff line change
@@ -50,5 +50,4 @@ components/gcp-click-to-deploy/src/user_config/**
5050

5151
# This is generated by bootstrap
5252
**/reg_tmp
53-
5453
scripts/gke/build/**

.pylintrc

+1-1
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ confidence=
5656
# --enable=similarities". If you want to run only the classes checker, but have
5757
# no Warning level messages displayed, use"--disable=all --enable=classes
5858
# --disable=W"
59-
disable=import-star-module-level,old-octal-literal,oct-method,print-statement,unpacking-in-except,parameter-unpacking,backtick,old-raise-syntax,old-ne-operator,long-suffix,dict-view-method,dict-iter-method,metaclass-assignment,next-method-called,raising-string,indexing-exception,raw_input-builtin,long-builtin,file-builtin,execfile-builtin,coerce-builtin,cmp-builtin,buffer-builtin,basestring-builtin,apply-builtin,filter-builtin-not-iterating,using-cmp-argument,useless-suppression,range-builtin-not-iterating,suppressed-message,missing-docstring,no-absolute-import,old-division,cmp-method,reload-builtin,zip-builtin-not-iterating,intern-builtin,unichr-builtin,reduce-builtin,standarderror-builtin,unicode-builtin,xrange-builtin,coerce-method,delslice-method,getslice-method,setslice-method,input-builtin,round-builtin,hex-method,nonzero-method,map-builtin-not-iterating,relative-import,invalid-name,bad-continuation,no-member,locally-disabled,fixme,import-error,too-many-locals
59+
disable=import-star-module-level,old-octal-literal,oct-method,print-statement,unpacking-in-except,parameter-unpacking,backtick,old-raise-syntax,old-ne-operator,long-suffix,dict-view-method,dict-iter-method,metaclass-assignment,next-method-called,raising-string,indexing-exception,raw_input-builtin,long-builtin,file-builtin,execfile-builtin,coerce-builtin,cmp-builtin,buffer-builtin,basestring-builtin,apply-builtin,filter-builtin-not-iterating,using-cmp-argument,useless-suppression,range-builtin-not-iterating,suppressed-message,missing-docstring,no-absolute-import,old-division,cmp-method,reload-builtin,zip-builtin-not-iterating,intern-builtin,unichr-builtin,reduce-builtin,standarderror-builtin,unicode-builtin,xrange-builtin,coerce-method,delslice-method,getslice-method,setslice-method,input-builtin,round-builtin,hex-method,nonzero-method,map-builtin-not-iterating,relative-import,invalid-name,bad-continuation,no-member,locally-disabled,fixme,import-error,too-many-locals,no-self-use
6060

6161

6262
[REPORTS]

components/jupyter-web-app/Makefile

+34
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
# Copyright 2017 The Kubernetes Authors.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
#
15+
16+
# Project used with GCB
17+
PROJECT ?= kubeflow-dev
18+
# Registry where the image should be published
19+
REGISTRY_PROJECT ?= kubeflow-dev
20+
OUTPUT ?= output.yaml
21+
22+
# We want the git tag to be the last commit to this directory so we don't
23+
# bump the image on unrelated changes.
24+
GIT_TAG ?= $(shell git log -n 1 --pretty=format:"%h" ./)
25+
26+
info:
27+
echo image: \"gcr.io/$(REGISTRY_PROJECT)/jupyter-web-app:$(GIT_TAG)\" > $(OUTPUT)
28+
29+
build-gcb: info
30+
gcloud --project=$(PROJECT) \
31+
builds submit \
32+
--machine-type=n1-highcpu-32 \
33+
--substitutions=_GIT_VERSION=$(GIT_TAG),_REGISTRY=$(REGISTRY_PROJECT) \
34+
--config=cloudbuild.yaml .
+16
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
#cloudbuild.yaml
2+
steps:
3+
- name: 'gcr.io/cloud-builders/docker'
4+
args:
5+
- 'build'
6+
- '-t'
7+
- 'gcr.io/${_REGISTRY}/jupyter-web-app:${_GIT_VERSION}'
8+
- '--label=git-version=${_GIT_VERSION}'
9+
- '.'
10+
- name: 'gcr.io/cloud-builders/docker'
11+
args:
12+
- 'tag'
13+
- 'gcr.io/${_REGISTRY}/jupyter-web-app:${_GIT_VERSION}'
14+
- 'gcr.io/${_REGISTRY}/jupyter-web-app:latest'
15+
images: ['gcr.io/${_REGISTRY}/jupyter-web-app:${_GIT_VERSION}',
16+
'gcr.io/${_REGISTRY}/jupyter-web-app:latest']
+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
image: "gcr.io/kubeflow-dev/jupyter-web-app:v0-43-g810b0b46"

py/kubeflow/__init__.py

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
__path__ = __import__('pkgutil').extend_path(__path__, __name__)

py/kubeflow/kubeflow/__init__.py

Whitespace-only changes.

py/kubeflow/kubeflow/ci/__init__.py

Whitespace-only changes.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,315 @@
1+
"""Script to build and update the Jupyter WebApp image.
2+
3+
Requires python3
4+
5+
hub CLI depends on an OAuth token with repo permissions:
6+
https://hub.github.com/hub.1.html
7+
* It will look for environment variable GITHUB_TOKEN
8+
"""
9+
10+
import logging
11+
import os
12+
import re
13+
import tempfile
14+
import yaml
15+
16+
import fire
17+
import git
18+
import httplib2
19+
20+
from kubeflow.testing import util # pylint: disable=no-name-in-module
21+
22+
from containerregistry.client import docker_creds
23+
from containerregistry.client import docker_name
24+
from containerregistry.client.v2_2 import docker_http
25+
from containerregistry.client.v2_2 import docker_image as v2_2_image
26+
from containerregistry.transport import transport_pool
27+
28+
class WebAppUpdater(object): # pylint: disable=useless-object-inheritance
29+
def __init__(self):
30+
self._last_commit = None
31+
32+
def build_image(self, build_project, registry_project):
33+
"""Build the image.
34+
35+
Args:
36+
build_project: GCP project used to build the image.
37+
registry_project: GCP project used to host the image.
38+
"""
39+
env = dict()
40+
env.update(os.environ)
41+
env["PROJECT"] = build_project
42+
env["REGISTRY_PROJECT"] = registry_project
43+
env["GIT_TAG"] = self._last_commit
44+
45+
with tempfile.NamedTemporaryFile() as hf:
46+
name = hf.name
47+
env["OUTPUT"] = name
48+
web_dir = self._component_dir()
49+
util.run(["make", "build-gcb"], env=env, cwd=web_dir)
50+
51+
# TODO(jlewi): We want to get the actual image produced by GCB. Right
52+
# now this is a bit brittle because we have multiple layers of substitution
53+
# e.g. in the Makefile and then the GCB YAML.
54+
# It might be better to parse the stdout of make-build-gcb to get the
55+
# GCB job name and then fetch the GCB info specifying the images.
56+
with open(name) as hf:
57+
data = yaml.load(hf)
58+
59+
return data["image"]
60+
61+
def _replace_parameters(self, lines, values):
62+
"""Replace parameters in ksonnet text.
63+
64+
Args:
65+
lines: Lines of text
66+
values: A dictionary containing the names of parameters and the values
67+
to set them to.
68+
69+
Returns:
70+
lines: Modified lines
71+
old: Dictionary of old values for these parameters
72+
"""
73+
old = {}
74+
for i, line in enumerate(lines):
75+
# Split the line on white space
76+
pieces = re.findall(r'\S+', line)
77+
78+
# Check if this line is a parameter
79+
# // @optionalParam image string gcr.io/myimage Some image
80+
if len(pieces) < 5:
81+
continue
82+
83+
if pieces[0] != "//" or pieces[1] != "@optionalParam":
84+
continue
85+
86+
param_name = pieces[2]
87+
if not param_name in values:
88+
continue
89+
90+
old[param_name] = pieces[4]
91+
logging.info("Changing param %s from %s to %s", param_name, pieces[4],
92+
values[param_name])
93+
pieces[4] = values[param_name]
94+
95+
lines[i] = " ".join(pieces)
96+
97+
return lines, old
98+
99+
def update_prototype(self, image):
100+
"""Update the prototype file.
101+
102+
Args:
103+
image: New image to set
104+
105+
Returns:
106+
prototype_file: The modified prototype file or None if the image is
107+
already up to date.
108+
"""
109+
values = {"image": image}
110+
111+
112+
prototype_file = os.path.join(self._root_dir(),
113+
"kubeflow/jupyter/prototypes",
114+
"jupyter-web-app.jsonnet")
115+
with open(prototype_file) as f:
116+
prototype = f.read().split("\n")
117+
118+
new_lines, old_values = self._replace_parameters(prototype, values)
119+
120+
if old_values["image"] == image:
121+
logging.info("Existing image was already the current image; %s", image)
122+
return None
123+
temp_file = prototype_file + ".tmp"
124+
with open(temp_file, "w") as w:
125+
w.write("\n".join(new_lines))
126+
os.rename(temp_file, prototype_file)
127+
128+
return prototype_file
129+
130+
@property
131+
def last_commit(self):
132+
"""Get the last commit of a change to the source for the jupyter-web-app."""
133+
if not self._last_commit:
134+
# Get the hash of the last commit to modify the source for the Jupyter web
135+
# app image
136+
self._last_commit = util.run(["git", "log", "-n", "1",
137+
"--pretty=format:\"%h\"",
138+
"components/jupyter-web-app"],
139+
cwd=self._root_dir()).strip("\"")
140+
141+
return self._last_commit
142+
143+
def _find_remote_repo(self, repo, remote_url): # pylint: disable=no-self-use
144+
"""Find the remote repo if it has already been added.
145+
146+
Args:
147+
repo: The git python repo object.
148+
remote_url: The URL of the remote repo e.g.
149+
[email protected]:jlewi/kubeflow.git
150+
151+
Returns:
152+
remote: git-python object representing the remote repo or none if it
153+
isn't present.
154+
"""
155+
for r in repo.remotes:
156+
for u in r.urls:
157+
if remote_url == u:
158+
return r
159+
160+
return None
161+
162+
def all(self, build_project, registry_project, remote_fork,
163+
add_github_host=False): # pylint: disable=too-many-statements,too-many-branches
164+
"""Build the latest image and update the prototype.
165+
166+
Args:
167+
build_project: GCP project used to build the image.
168+
registry_project: GCP project used to host the image.
169+
remote_fork: Url of the remote fork.
170+
The remote fork used to create the PR;
171+
e.g. [email protected]:jlewi/kubeflow.git. currently only ssh is
172+
supported.
173+
add_github_host: If true will add the github ssh host to known ssh hosts.
174+
"""
175+
repo = git.Repo(self._root_dir())
176+
util.maybe_activate_service_account()
177+
last_commit = self.last_commit
178+
179+
# Ensure github.com is in the known hosts
180+
if add_github_host:
181+
output = util.run(["ssh-keyscan", "github.com"])
182+
with open(os.path.join(os.getenv("HOME"), ".ssh", "known_hosts"),
183+
mode='a') as hf:
184+
hf.write(output)
185+
186+
if not remote_fork.startswith("[email protected]"):
187+
raise ValueError("Remote fork currently only supports ssh")
188+
189+
remote_repo = self._find_remote_repo(repo, remote_fork)
190+
191+
if not remote_repo:
192+
fork_name = remote_fork.split(":", 1)[-1].split("/", 1)[0]
193+
logging.info("Adding remote %s=%s", fork_name, remote_fork)
194+
remote_repo = repo.create_remote(fork_name, remote_fork)
195+
196+
logging.info("Last change to components-jupyter-web-app was %s", last_commit)
197+
198+
base = "gcr.io/{0}/jupyter-web-app".format(registry_project)
199+
200+
# Check if there is already an image tagged with this commit.
201+
image = base + ":" + self.last_commit
202+
transport = transport_pool.Http(httplib2.Http)
203+
src = docker_name.from_string(image)
204+
creds = docker_creds.DefaultKeychain.Resolve(src)
205+
206+
image_exists = False
207+
try:
208+
with v2_2_image.FromRegistry(src, creds, transport) as src_image:
209+
logging.info("Image %s exists; digest: %s", image,
210+
src_image.digest())
211+
image_exists = True
212+
except docker_http.V2DiagnosticException as e:
213+
if e.status == 404:
214+
logging.info("%s doesn't exist", image)
215+
else:
216+
raise
217+
218+
if not image_exists:
219+
logging.info("Building the image")
220+
image = self.build_image(build_project, registry_project)
221+
logging.info("Created image: %s", image)
222+
else:
223+
logging.info("Image %s already exists", image)
224+
225+
# We should check what the current image is if and not update it
226+
# if its the existing image
227+
prototype_file = self.update_prototype(image)
228+
229+
if not prototype_file:
230+
logging.info("Prototype not updated so not creating a PR.")
231+
return
232+
233+
branch_name = "update_jupyter_{0}".format(last_commit)
234+
235+
if repo.active_branch.name != branch_name:
236+
logging.info("Creating branch %s", branch_name)
237+
238+
branch_names = [b.name for b in repo.branches]
239+
if branch_name in branch_names:
240+
logging.info("Branch %s exists", branch_name)
241+
util.run(["git", "checkout", branch_name], cwd=self._root_dir())
242+
else:
243+
util.run(["git", "checkout", "-b", branch_name], cwd=self._root_dir())
244+
245+
logging.info("Add file %s to repo", prototype_file)
246+
repo.index.add([prototype_file])
247+
repo.index.commit("Update the jupyter web app image to {0}".format(image))
248+
249+
util.run(["git", "push", "-f", remote_repo.name], cwd=self._root_dir())
250+
251+
self.create_pull_request(commit=last_commit)
252+
253+
def create_pull_request(self, base="kubeflow:master", commit=None):
254+
"""Create a pull request.
255+
256+
Args:
257+
base: The base to use. Defaults to "kubeflow:master". This should be
258+
in the form <GitHub OWNER>:<branch>
259+
"""
260+
# TODO(jlewi): Modeled on
261+
# https://github.com/kubeflow/examples/blob/master/code_search/docker/ks/update_index.sh
262+
# TODO(jlewi): We should use the GitHub API and check if there is an
263+
# existing open pull request. Or potentially just use the hub CLI.
264+
265+
if not commit:
266+
commit = self.last_commit
267+
logging.info("No commit specified defaulting to %s", commit)
268+
269+
pr_title = "[auto PR] Update the jupyter-web-app image to {0}".format(commit)
270+
271+
# See hub conventions:
272+
# https://hub.github.com/hub.1.html
273+
# The GitHub repository is determined automatically based on the name
274+
# of remote repositories
275+
output = util.run(["hub", "pr", "list", "--format=%U;%t\n"],
276+
cwd=self._root_dir())
277+
278+
279+
lines = output.splitlines()
280+
281+
prs = {}
282+
for l in lines:
283+
n, t = l.split(";", 1)
284+
prs[t] = n
285+
286+
if pr_title in prs:
287+
logging.info("PR %s already exists to update the Jupyter web app image "
288+
"to %s", prs[pr_title], commit)
289+
return
290+
291+
with tempfile.NamedTemporaryFile(delete=False) as hf:
292+
hf.write(pr_title.encode())
293+
message_file = hf.name
294+
295+
# TODO(jlewi): -f creates the pull requests even if there are local changes
296+
# this was useful during development but we may want to drop it.
297+
util.run(["hub", "pull-request", "-f", "--base=" + base, "-F",
298+
message_file],
299+
cwd=self._root_dir())
300+
301+
def _root_dir(self):
302+
this_dir = os.path.dirname(__file__)
303+
return os.path.abspath(os.path.join(this_dir, "..", "..", "..", ".."))
304+
305+
def _component_dir(self):
306+
return os.path.join(self._root_dir(), "components", "jupyter-web-app")
307+
308+
if __name__ == '__main__':
309+
logging.basicConfig(level=logging.INFO,
310+
format=('%(levelname)s|%(asctime)s'
311+
'|%(pathname)s|%(lineno)d| %(message)s'),
312+
datefmt='%Y-%m-%dT%H:%M:%S',
313+
)
314+
logging.getLogger().setLevel(logging.INFO)
315+
fire.Fire(WebAppUpdater)

0 commit comments

Comments
 (0)