Skip to content

Remote Desktop Dev Notes

frazs edited this page Sep 19, 2020 · 1 revision

Table of Contents

PRELIMINARY DOCKER TIPS

Layers

Docker images are built in layers. Each Dockerfile instruction, such as USER, COPY, and RUN, creates a new layer. And everything in a particular instruction (e.g. RUN foo && \ bar && \ baz) is part of the same layer.

Layers are cached during the build process. After the first time a Docker image is built, only the layer that changed and every layer after it will be rebuilt, saving development time. These cached layers can even be used in other images, up to the point of the first different layer. It is possible to circumvent this by either clearing the build cache or building with the --no-cache argument.

Layers and image size

Layers can be thought of as a record of changes. This allows them to be cached and shared, but there is a downside. If a file is created on one layer, and then changed on another layer, the total image size increases by the size of that file for each change on a different layer. This increase occurs even if only the file permissions are changed, and, perhaps counterintuitively, even if the change is the deletion of the file.

However, only the final state of the layer matters. If a file is changed and reverted, or created and deleted in the same layer, the image size will not increase.

The most common way to get around this limitation is to therefore make sure that as little as possible changes between layers. This includes, for example, clearing installation caches and other temporary files at the end of each layer where something is installed, so that these caches are never ultimately added to the image. That is often done in a reusable general-purpose layer cleaning script. Another common script is one that fixes permissions, to keep them as consistent as possible between layers.

There are two other common methods, but neither seems appropriate for the scale and complexity of the Remote Desktop images. One is to use an experimental docker daemon to build with the --squash argument, which combines every layer into a single layer. But this removes the advantages of multiple layers. Layers could no longer be shared, increasing the size that images take up in the Container Registry. And every build would start from scratch, even if an error occurs at the very end, which would make debugging and updating unfeasibly time consuming.

Another option appropriate to simpler images is to use multi-stage builds, where part or all of the final file system of one image is copied into another image. However, Remote Desktop is built for a typical Linux desktop experience, where some files have permissions only for root, and others have permissions for non-root users, as well as users and groups designed for systems and software. To the extent of my experimentation, these necessary permissions were unfortunately lost during the copy process. Restoring them manually afterwards is both very complicated and introduces the same image size increasing changes that this process was meant to avoid.

Testing changes

How can we best take advantage of layers to spend the least time building and the most time testing our changes? By making these changes as late in the Dockerfile as possible, so that the most cached layers are used and the fewest are rebuilt. This process can involve temporarily replacing something built on an early layer.

For example, let’s say we want to change the contents of a script.sh in a 100 layer image were layer 2 is COPY example/local/dir/script.sh example/container/dir/script.sh and layer 87 is RUN a/container/dir/script.sh.

If we edit example/local/dir/script.sh directly, layers 3-100 will rebuild for each change.

What we can do instead is make our changes in a new file, newscript.sh, and use it to overwrite the original. If right before layer 87 we insert COPY newscript.sh example/container/dir/script.sh, only the layers 87 and onward will rebuild.

We can do even better by moving both COPY newscript.sh example/container/dir/script.sh and RUN a/container/dir/script.sh as near to the end of the Dockerfile as possible (or right before the first layer that depends on them, and seeing if that can also be moved).

Similarly, instead of changing an installation (such as apt-get install, conda install...) early in the Dockerfile, we can go as close to the end of the Dockerfile as possible to RUN a corresponding uninstall, and proceed to test the absence of that package or its replacement by another.

These are all temporary changes to streamline the development process: when a working solution is identified, these experimental overwrites can be removed and their changes applied to the original layers. Now we can do a final verification by rebuilding from the original layer, ideally only needing to do that for the ultimate solution instead of for each attempted change.

Script build and run

Building images and running containers can become repetitive and error-prone for many quick, small changes. Consider creating a script to combine the process for you. How to best do so will vary depending on your operating system. For rapid testing of Remote Desktop images in a Linux environment, I have the following in my ~/.bashrc:

dr() {
    docker run --rm -it -p 8888:8888 $(docker build -q .)
}

With this, when I am in a terminal at the folder of any Dockerfile that runs on port 8888 (appropriate for the Remote Desktop images; the function can be changed for others), I only have to enter dr to build the image and run it in a container that will be removed after it is exited (--rm1).

Shell debug

docker run --rm -it -[your image id] /bin/bash opens an interactive terminal into the convtainer, which is convenient for quick debugging of certain issues. You can also add -u=root after docker run to debug with administrative privileges.

Free up space

Building a lot of images and using a lot of containers can quickly use up a lot of hard drive space.

Use --rm when running containers to make sure that they are removed after they are stopped.

Use docker tag [image id] [name:tag] to give names to images you want to keep, and clear the rest with docker rmi $(docker images | grep "^<none>" | awk "{print $3}") or an equivalent for your operating system.

For an all-purpose deep clean, use docker system prune, but note that this will also remove your build caches.

REMOTE DESKTOP BACKGROUND

Remote Desktop is based on https://github.com/ml-tooling/ml-workspace and adapted to our project’s needs.

The most significant difference is that while ML Workspace is built for one “root” user with full administrative privileges over the operating system in the container, for security purposes Remote Desktop runs as a regular user. At the time of writing this documentation, this user is always "jovyan".

Root is still used in the Dockerfiles to perform installations, and sudo access has been granted to run 3 specific Netdata, Rsyslogd, and Cron commands as I did not find a non-privileged way of using them in the time I had to prioritise that task (but there might still be one).

Remote Desktop is also much smaller than ML Workspace, removing tools, packages, and libraries that are not required by typical users of this project, as long as anyone who still needs them can install them themselves (through e.g. pip, conda, npm).

We also have two official image extensions: r and geomatics. These add commonly, but not universally, requested software that requires administrative privileges to install, but is too complex (long build time) and large (image size increase) to warrant including in the base Dockerfile. r extends the base image to install RStudio and various R libraries, while geomatics extends r to add QGIS and various geomatics libraries. Note that while most other software installation processes use checksums, QGIS is validated with its GPG key.

The Github Actions workflow defines a CI process such that when a change is accepted into the master repository, these images are sequentially built, tagged with the SHA of their commit to master, scanned for vulnerabilities, and pushed to an Azure Container Registry that users of our Kubeflow instance can pull from.

This workflow does not automatically populate the “Create Notebook Server” dropdown. To do that, submit a pull request to update the SHAs in this ConfigMap. Note that the base image is not offered in the dropdown: its purpose is to streamline development.

Github Actions have an image size limit: around 14 GB. In the future, a GPU version of Remote Desktop may be created, and it might not be possible to make it smaller than 14 GB. In that case, it would have to be pushed manually (akin to the manual build instructions below), or a different CI process would need to be established.

There is presently also one unofficial experimental image. It does not have a CI process, both due to its nature as a temporary placeholder, and because it is too large for Github Actions have an image size limit: around 14 GB.

Manual build for testing

  1. Install the Azure CLI if you have not done so before.
  2. Log in to the ACR with the command az acr login -n k8scc01covidacr using your cloud account. Sometimes the connection is refused: try again until it goes through (it should only take 2-4 tries unless something abnormal is going on).
  3. If you are testing an image that is based on parent images where you have made changes, build them first in the appropriate sequence and tag them locally with the master tag, but do not push them. For example, if you have changes in base that you want to test in geomatics:
    1. Build the modified base image.
    2. Tag the modified base image with docker tag [image id] k8scc01covidacr.azurecr.io/remote-desktop-base:master
    3. Build the r image.
    4. Tag the r image with docker tag [image id] k8scc01covidacr.azurecr.io/remote-desktop-r:master
  4. Build the image you want to test.
  5. Tag the image you want to test with docker tag [image id] k8scc01covidacr.azurecr.io/remote-desktop-test:[add a descriptive tag]
  6. docker push k8scc01covidacr.azurecr.io/remote-desktop-test:[your tag]
  7. Create a notebook server on Kubeflow with k8scc01covidacr.azurecr.io/remote-desktop-test:[your tag] as a custom image.

Note: If step 6 fails, ask your supervisor to confirm that your cloud account has been granted the appropriate permissions for the Azure Container Registry k8scc01covidacr

Jupyter: future removal

Jupyter Notebook is available as a vestigial interface. When possible, Jupyter users are directed to use one of the JupyterLab images which are individually supported and far more fully featured. However, it is not yet possible connect a Remote Desktop image and a different image (or any two different images) to the same Persistent Volume storage at the same time. In the future, when ReadWriteMany PVCs (or any alternative such as Minio buckets) are supported and confirmed working for this purpose, Jupyter will be removed from Remote Desktop.

Alternatively, this may be enacted earlier if the experimental image that combines JupyterLab and Remote Desktop is deemed to be a sufficient placeholder. However, that image would need to be kept up to date.

The eventual removal of Jupyter includes at least the following:

  • resources/branding
  • resources/home/.workspace
  • resources/jupyter
  • resources/tutorials
  • jupyter-related configuration in nginx.conf
  • jupyter-related code in base Dockerfile
  • possibly docker-entrypoint.py which would require at least a minor refactor
  • reviewing resources/scripts

Other review/remove/update candidates

  • resources/icons other than netdata (consider defining a netata icon from elsewhere, or if present using one available somewhere in netdata’s install directory, and then removing all of resources/icons)
  • resources/licenses (or update)
  • resources/reports (if resources/tests is completely removed; requires minor refactor)
  • resources/ssh - to confirm that it doesn’t interfere with VNC functionality
  • reviewing resources/scripts (for unused non-Jupyter scripts)
  • resources/tests

BASE IMAGE STRUCTURE

Resources Folder

The resources folder of the base image defines a lot of configurations, most of which are accessed by the base Dockerfile.

branding

Unchanged from original.

This folder contains branding assets for ML Workspace. The last of these that are still in use are in the Jupyter Notebook Tree View: when that is obsolete, this directory can be deleted.

config

Unchanged from original.

Contains a couple of configurations for apt-get installations and xrdp (graphical remote login).

home

This directory gets copied into the home directory of the container. In the present single-user state, the home directory of the container is /home/jovyan, and can also be referred to as the potentially more dynamic environment variables $HOME and /home/$NB_USER

resources/home contains that following subdirectories:

.config

Miscellaneous configurations, including VS Code configurations and default application associations.

.workspace/tools

JSON corresponding to the “Open Tools” dropdown of the Jupyter Notebook Tree View: when that is obsolete, this subdirectory can be deleted.

Desktop

Ensures that an empty “Desktop” directory is copied into the user’s home directory, to be populated by application shortcuts added during installation processes. An empty .dockerignore (a file type that is not built into the container) is in place because a fully empty directory would otherwise disappear from Git tracking.

icons

Unchanged from original.

Only netdata-icon.png is in use as the icon for the desktop shortcut to netdata.

jupyter

Unchanged from original beyond renaming an instance of the term “Workspace” with “Remote Desktop”.

Various configurations for the Jupyter Notebook Tree View: when that is obsolete, this directory can be deleted.

landing_page

The assets for the custom landing page for Remote Desktop. They require corresponding server configuration entries in nginx/nginx.conf to be displayed.

licenses

Unchanged from original.

Contains various license information, perhaps automatically generated with some sort of tool, for the original ML Workspace image: they are not up-to-date for Remote Desktop. Consider removing or finding a way to update them.

nginx

Various configurations for nginx, which handles making the container accessible through a web browser via reverse proxies. It provides an in-browser path, such as through Access Port and/or with an alias such as tools/vscode, to what is running inside the container on localhost:[port number]. Modifications have been made to nginx.conf, and further modifications will be warranted for Jupyter removal.

Note that the WORKSPACE_BASE_URL is a dynamic variable that corresponds to the relative path for connecting to the container on Kubeflow. It is of the pattern /notebook/[namespace]/[notebook server name].

The value for WORKSPACE_BASE_URL (or the /workspace placeholder when there is no value) is passed by [init.sh](#init-sh). When there is no such value, which occurs when the Remote Desktop container is run locally, the placeholder value /workspace is used.

novnc

Configurations for the VNC client, such as the lefthand sidebar for the Desktop GUI and clipboard behavior.

Note that direct clipboard sharing (ctrl+c and ctrl+v, or the command equivalent on Mac, working without use of the sidebar) is presently only supported on Chrome. Firefox does not yet permit clipboard sharing in both directions, and other browsers have had limited testing.

reports

A directory for holding reports generated by resources/tests, which may no longer be appropriate for Remote Desktop (unverified). This directory itself is empty, but was kept because something in the code expects it to exist. If resources/tests will be removed, then resources/reportsshould be refactored out.

scripts

Unchanged from original beyond renaming an instance of the term “Workspace” with “Remote Desktop”.

Various scripts. clean-layer.sh and fix-permissions.sh are used throughout the base Dockerfile, while run_workspace.py, start-vnc-server.sh, and configure-nginx.py are critical to the appropriate functioning of the container. The rest warrant review: while some may be important, others may not be in use, or may become irrelevant after Jupyter is removed.

ssh

Should be removed if it does not interfere with the VNC / RDP setup (graphically connecting to the desktop envrionment).

supervisor

Supervisor ensures that certain programs, as configured in the conf.d subdirectory, run at startup on specified ports, restart when they crash, and output status logs (stdout locally, pod logs on Kubeflow).

tests

Unchanged from original.

Various tests that have not been verified and may no longer be compatible with Remote Desktop. To review. If they are all removed, resources/reports should also be removed (that requires a minor refactor: its existence is expected somewhere).

tools

Various installation scripts referenced by the base Dockerfile. To make the base Dockerfile easier to read, consider converting sections of it into additional .sh files here, keeping in mind the difference in syntax between Dockerfile layers and plain bash.

tutorials

Unchanged from original.

Jupyter notebook tutorials to be deleted when Jupyter is removed.

5xx.html

Unchanged from original.

A simple loading page that refreshes itself until a resource is ready. Might make sense to move to resources/landing_page. A future design task could be to rebrand it to Remote Desktop. Note that it does not make use of resources/branding.

docker-entrypoint.py

Unchanged from original.

Runs directly after [init.sh](#init-sh) and sets various configurations, but mostly for components which are not in Remote Desktop. After Jupyter is removed, see if it is possible to remove this file and go directly to scripts/run_workspace.py instead.

Dockerfile notes

The current image is based on Ubuntu 18.04.

At the beginning of the base Dockerfile, as well as the extension Dockerfiles, the user is set to root, and at the end the user is set to the environmental variable $NB_USER (presently always ‘jovyan’).

Environmental variables are defined at the beginning, with the exception of ones that are part of an installation process.

The resources described above are brought in with COPY instructions. One present inconsistency is that some software is installed from bash scripts in resources/tools, while other software is installed directly in the Dockerfile. A potential refactor would be to separate at least the more complex of these installs in the Dockerfile out into bash scripts, which could make the Dockerfile easier to read, navigate, and track changes in.

Checksums are used throughout the Dockerfile, preventing the build from proceeding when the SHA-256 sum of an installation file no longer matches its expected value. This could occur due to malicious interception or, more commonly because the file has been updated, removed, or replaced. When such software is updated, a new SHA-256 sum must be generated for it, which can be done on a development machine (and, when possible, verified against an official source). How to do so varies by operating system: a Linux approach is sha256sum.

In order to allow the 3 privileged supervisor commands to run with sudo access, they are added to the sudoers.d file for the user with NOPASSWD so that there will be no password prompt (for which there is no root password set to begin with). These are logging commands managed automatically by supervisor without interaction from the user.

An unusual instruction, mv $HOME/.config $HOME/.config2 && \ mv $HOME/.config2 $HOME/.config, is a workaround for a strange bug. Without this workaround, sometimes the permissions on $HOME/.config break: they are present when logging in to the shell as root and checking with ls -a, but return question marks when doing so as the standard user jovyan. These broken permissions render the files in that directory, and in turn the GUI, inaccessible. At the time of writing, I had neither been able to trace why it this bug would (sometimes, but not always) occur; nor why moving $HOME/.config back and forth prevents it.

Another important workaround is the workspace override hotfix. When a Remote Desktop image is used to create a Notebook Server on Kubeflow, and a Persistent Volume is selected or created, that Persistent Volume mounts to the home directory (presently always /home/jovyan) and replaces that home directory completely. Everything in the home directory that has been set up throughout the Dockerfile, including critical configurations, is thus lost. Therefore, at the end of the Dockerfile, the contents of the home directory are copied to another folder, home_nbuser_default, to preserve them. init.sh, described below, will copy the contents of the original home directory back into the newly overwritten home directory (now a Persistent Volume) at container runtime.

Tini is used as the entrypoint which runs the CMD, executing init.sh.

init.sh

Because this script runs after the point that a persistent volume is attached (if one is specified), it completes the workspace overwrite workaround by copying home*nbuser_default back into the home directory that would have now been overwritten by the persistent volume. --no-clobber is specified in order to not overwrite any existing data on the persistent volume.

The script then exports the variables WORKSPACE_BASE_URL (using the path received as NB_PREFIX from Kubeflow, or if none exists (such as during a local docker run) then using the placeholder "/workspace") and NB_USER (presently always jovyan) so that they are accessible elsewhere, and concludes by running docker-entrypoint.py.