You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
environment.yml, renv**, ..., these tools try to solve the following problems:
55
+
Tools like **Conda, Anaconda, pip, virtualenv, Pipenv, pyenv, Poetry, renv** and files to record dependencies like **requirements.txt** and **environment.yml** try to solve the following problems:
57
56
58
57
-**Defining a specific set of dependencies**
59
58
-**Installing those dependencies** mostly automatically
60
59
-**Recording the versions** for all dependencies
61
60
-**Isolate environments**
62
-
- On your computer for projects so they can use different software
61
+
- On your computer for projects, so they can use different software
63
62
- Isolate environments on computers with many users (and allow self-installations)
64
-
- Using **different package versions** per project (also e.g. Python/R versions)
63
+
- Using **different package versions** per project (also, e.g., Python/R versions)
65
64
- Provide tools and services to **share packages**
66
65
67
66
Isolated environments are also useful because they help you make sure
@@ -73,7 +72,7 @@ more reproducible it is.
73
72
74
73
---
75
74
76
-
## Demo
75
+
## Exercise / Demo
77
76
78
77
``````{challenge} Dependencies-1: Time-capsule of dependencies
79
78
Situation: 5 students (A, B, C, D, E) wrote a code that depends on a couple of libraries.
@@ -247,17 +246,17 @@ Answer in the collaborative document:
247
246
**A**: It will be tedious to collect the dependencies one by one. And after
248
247
the tedious process you will still not know which versions they have used.
249
248
250
-
**B**: If there is no standard file to look for and look at and it might
251
-
become very difficult for to create the software environment required to
252
-
run the software. But at least we know the list of libraries. But we don't
249
+
**B**: If there is no standard file to look for and look at, it might
250
+
become very difficult to create the software environment required to
251
+
run the software. At least we know the list of libraries, but we don't
253
252
know the versions.
254
253
255
254
**C**: Having a standard file listing dependencies is definitely better
256
255
than nothing. However, if the versions are not specified, you or someone
257
256
else might run into problems with dependencies, deprecated features,
258
257
changes in package APIs, etc.
259
258
260
-
**D** and **E**: In both these cases exact versions of all dependencies are
259
+
**D** and **E**: In both of these cases exact versions of all dependencies are
261
260
specified and one can recreate the software environment required for the
262
261
project. One problem with the dependencies that come from GitHub is that
263
262
they might have disappeared (what if their authors deleted these
@@ -277,7 +276,7 @@ information?
277
276
`````{tabs}
278
277
````{group-tab} Conda
279
278
We start from an existing conda environment. Try this either with your own project or inside the "coderefinery" conda
280
-
environment. For demonstration puprposes, you can also create an environment with:
279
+
environment. For demonstration purposes, you can also create an environment with:
281
280
282
281
```console
283
282
$ conda env create -f myenv.yml
@@ -375,6 +374,6 @@ information?
375
374
``````
376
375
377
376
```{keypoints}
378
-
- Recording dependencies with versions can make it easier for the next person to execute your code
379
-
- There are many tools to record dependencies and separate environments
377
+
- Recording dependencies with versions can make it easier for the next person to execute your code.
378
+
- There are many tools to record dependencies and separate environments.
Copy file name to clipboardExpand all lines: content/environments.md
+18-18Lines changed: 18 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,7 +14,7 @@
14
14
## What is a container?
15
15
16
16
Imagine if you didn't have to install things yourself, but instead you could
17
-
get a computer with the exact software for a task pre-installed? Containers
17
+
get a computer with the exact software for a task pre-installed. Containers
18
18
effectively do that, with various advantages and disadvantages. They are
19
19
**like an entire operating system with software installed, all in one file**.
20
20
@@ -30,7 +30,7 @@ From [reddit](https://www.reddit.com/r/ProgrammerHumor/comments/cw58z7/it_works_
30
30
- Container definition files <-> like a blueprint to build a kitchen with all
31
31
utensils in which the recipe can be prepared.
32
32
- Container images <-> showroom kitchens
33
-
- Containers <-> A real connected kitchen
33
+
- Containers <-> a real connected kitchen
34
34
35
35
Just for fun: which operating systems do the following example kitchens represent?
36
36
`````{tabs}
@@ -69,15 +69,15 @@ Just for fun: which operating systems do the following example kitchens represen
69
69
- A container image is like a piece of paper with all the operating system on it. When you run it,
70
70
a transparent sheet is placed on top to form a container. The container runs and writes only on
71
71
that transparent sheet (and what other mounts have been layered on top). When you are done,
72
-
transparency is thrown away. It can be repeated as often as you want, and base is always the same.
73
-
- Definition files (e.g. Dockerfile or Singularity definition file) are text
72
+
the transparent sheet is thrown away. This can be repeated as often as you want, and base is always the same.
73
+
- Definition files (e.g., Dockerfile or Singularity definition file) are text
74
74
files that contain a series of instructions to build container images.
75
75
76
76
## You may have use for containers in different ways
77
77
78
78
-**Installing a certain software is tricky**, or not supported for your operating system? - Check if an image is available and run the software from a container instead!
79
79
- You want to make sure your colleagues are using the **same environment** for running your code? - Provide them an image of your container!
80
-
- If this does not work, because they are using a different architecture than you do? - Provide a definition file for them to **build the image suitable to their computers**. This does not create the exact environment as you have, but in most cases similar enough.
80
+
- If this does not work, because they are using a different architecture than you do? - Provide a definition file for them to **build the image suitable for their computers**. This does not create the exact environment you have, but in most cases a similar enough one.
81
81
82
82
## The container recipe
83
83
@@ -127,20 +127,20 @@ important problems:
127
127
- A mechanism to "send the computer to the data" when the **dataset is too large** to transfer.
128
128
-**Installing software into a file** instead of into your computer (removing
129
129
a file is often easier than uninstalling software if you suddenly regret an
130
-
installation)
130
+
installation).
131
131
132
132
However, containers may also have some drawbacks:
133
133
134
134
- Can be used to hide away software installation problems and thereby
135
135
**discourage good software development practices**.
136
136
- Instead of "works on my machine" problem: **"works only in this container"** problem?
137
-
- They can be **difficult to modify**
138
-
- Container **images can become large**
137
+
- They can be **difficult to modify**.
138
+
- Container **images can become large**.
139
139
140
140
```{danger}
141
141
Use only **official and trusted images**! Not all images can be trusted! There
142
-
have been examples of contaminated images so investigate before using images
143
-
blindly. Apply same caution as installing software packages from untrusted
142
+
have been examples of contaminated images, so investigate before using images
143
+
blindly. Apply the same caution as when installing software packages from untrusted
144
144
package repositories.
145
145
```
146
146
@@ -228,14 +228,14 @@ package repositories.
228
228
```
229
229
230
230
```{solution}
231
-
- Line 2: "ubuntu:latest" will mean something different 3 years in future.
231
+
- Line 2: "ubuntu:latest" will mean something different 3 years into the future.
232
232
- Lines 11-12: The compiler gcc and the library libgomp1 will have evolved.
233
233
- Line 30: The container uses requirements.txt to build the virtual environment but we don't see
234
234
here what libraries the code depends on.
235
235
- Line 33: Data is copied in from the hard disk of the person who created it. Hopefully we can find the data somewhere.
236
236
- Line 35: The library fancylib has been built outside the container and copied in but we don't see here how it was done.
237
-
- Python version will be different then and hopefully the code still runs then.
238
-
- Singularity/Apptainer will have also evolved by then. Hopefully this definition file then still works.
237
+
- The Python version will be different and hopefully the code still runs.
238
+
- Singularity/Apptainer will have also evolved by then. Hopefully this definition file still works.
239
239
- No contact address to ask more questions about this file.
240
240
- (Can you find more? Please contribute more points.)
241
241
```
@@ -251,7 +251,7 @@ package repositories.
251
251
````{exercise} (optional) Containers-2: Installing the impossible.
252
252
253
253
When you are missing privileges for installing certain software tools, containers can come handy.
254
-
Here we build a Singularity/Apptainer container for installing `cowsay` and `lolcat` Linux programs.
254
+
Here we build a Singularity/Apptainer container for installing the `cowsay` and `lolcat` Linux programs.
255
255
256
256
1. Make sure you have apptainer installed:
257
257
```console
@@ -266,12 +266,12 @@ Here we build a Singularity/Apptainer container for installing `cowsay` and `lol
266
266
$ export APPTAINER_TMPDIR="./temp/"
267
267
```
268
268
269
-
3. Build the container from the following definition file above.
269
+
3. Build the container from the container recipe file introduced above.
270
270
```console
271
271
apptainer build cowsay.sif cowsay.def
272
272
```
273
273
274
-
4. Let's test the container by entering into it with a shell terminal
274
+
4. Let's test the container by entering into it with a shell terminal:
275
275
```console
276
276
$ apptainer shell cowsay.sif
277
277
```
@@ -317,6 +317,6 @@ the Docker containers through Singularity/Apptainer.
317
317
-[Carpentries incubator lesson on Singularity/Apptainer](https://carpentries-incubator.github.io/singularity-introduction/)
318
318
319
319
```{keypoints}
320
-
- Containers can be helpful if complex setups are needed to running a specific software
321
-
- They can also be helpful for prototyping without "messing up" your own computing environment, or for running software that requires a different operating system than your own
320
+
- Containers can be helpful if complex setups are needed to run a specific software.
321
+
- They can also be helpful for prototyping without "messing up" your own computing environment, or for running software that requires a different operating system than your own.
Copy file name to clipboardExpand all lines: content/intro.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -33,7 +33,7 @@ This lesson on general **Reproducibility**: Preparing code to be usable by you a
33
33
34
34
This includes organizing your projects on your own computer and recording your computational steps, dependencies and computing environment.
35
35
36
-
We will also mention a few tools and platforms for sharing data (**"Here is my data"**) and research outputs(**"Here are my results"**) in the **social coding** lesson, but they are not the focus of this workshop.
36
+
We will also mention a few tools and platforms for sharing data (**"Here is my data"**) and research outputs(**"Here are my results"**) in the **social coding** lesson, but they are not the focus of this workshop.
Copy file name to clipboardExpand all lines: content/where-to-go.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -17,7 +17,7 @@ However, you will not always need all of them. As with so many things, it again
17
17
- You will want to consider workflow tools:
18
18
- When processing many files with many steps
19
19
- Steps or files may change
20
-
- Your main script, connecting your steps gets very long
20
+
- Your main script, connecting your steps, gets very long
21
21
- You are still collecting your input data
22
22
- ...
23
23
@@ -34,10 +34,10 @@ However, you will not always need all of them. As with so many things, it again
34
34
35
35
## Important for every project
36
36
37
-
- Clear file structure for your project
37
+
-A Clear directory/file structure for your project.
38
38
- Record your workflow and write it down in a script file.
39
-
- Create a dependency list and keep it updated, optimally in an environment file
40
-
- At least consider the possibility that someone, maybe you may want to reproduce your work
39
+
- Create a dependency list and keep it updated, optimally in an environment file.
40
+
- At least consider the possibility that someone, maybe you, may want to reproduce your work:
41
41
- Can you do something (small) to make it easier?
42
42
- If you have ideas, but no time: add an issue to your repository; maybe someone else wants to help.
43
43
@@ -52,6 +52,6 @@ Do you want to practice your reproducibility skills and get inspired by working
52
52
```
53
53
54
54
```{keypoints}
55
-
- Not everything in this lesson might be useful right now, but it is good to know that these things exist if you ever get in a situation that would require such solution.
55
+
- Not everything in this lesson might be useful right now, but it is good to know that these things exist if you ever get in a situation that would require such solutions.
56
56
- Caring about reproducibility makes work easier for the next person working on the project - and that might be you in a few years!
We can also use the cloud service [Binder](https://mybinder.org/) to make sure
113
114
we all have the same computing environment. This is interesting from a
114
115
reproducible research point of view and it's explained further in the [Jupyter
115
116
lesson](https://coderefinery.github.io/jupyter/sharing/) how this is even
116
117
possible.
117
118
- Go to <https://github.com/coderefinery/word-count> and click on the "launch binder" badge in the README.
118
-
- Once it get started, you can open a new Terminal from the **new** menu (top right) and select **Terminal**.
119
+
- Once it gets started, you can open a new **Terminal** from the Launcher or via **File > New > Terminal**.
119
120
````
120
121
121
122
````{exercise} Workflow-1: Workflow solution using Snakemake
@@ -223,10 +224,10 @@ Rules that have yet to be completed are indicated with solid outlines, while alr
223
224
-**Cross-platform** (Windows, MacOS, Linux) and compatible with all High Performance Computing (HPC) schedulers:
224
225
same workflow works without modification and scales appropriately whether on a laptop or cluster.
225
226
- If several workflow steps are independent of each other, and you have multiple cores available, Snakemake can run them **in parallel**.
226
-
-Is is possible to define **isolated software environments** per rule, e.g. by adding `conda: 'environment.yml'` to a rule.
227
-
-Also possible to run workflows in Docker or Apptainer **containers** e.g. by adding `container: 'docker://some-org/some-tool#2.3.1'` to a rule.
227
+
-It is possible to define **isolated software environments** per rule, e.g. by adding `conda: 'environment.yml'` to a rule.
228
+
-It is also possible to run workflows in Docker or Apptainer **containers**, e.g. by adding `container: 'docker://some-org/some-tool#2.3.1'` to a rule.
228
229
-[Heavily used in bioinformatics](https://twitter.com/carl_witt/status/1103951128046301185), but is **completely general**.
229
-
- Nice functionality for archiving the workflow, see: [the official documentation](https://snakemake.readthedocs.io/en/stable/snakefiles/deployment.html#sustainable-and-reproducible-archiving)
230
+
- Nice functionality for archiving the workflow, see: [the official Snakemake documentation](https://snakemake.readthedocs.io/en/stable/snakefiles/deployment.html#sustainable-and-reproducible-archiving)
230
231
231
232
Tools like Snakemake help us with **reproducibility** by supporting us with **automation**, **scalability** and **portability** of our workflows.
232
233
@@ -241,6 +242,6 @@ Tools like Snakemake help us with **reproducibility** by supporting us with **au
241
242
-[{targets} R package - make-like pipeline tool for R](https://books.ropensci.org/targets/)
242
243
243
244
```{keypoints}
244
-
- Computational steps can be recorded in many ways
245
-
- Workflow tools can help, if there are many steps to be executed and/or many datasets to be processed
245
+
- Computational steps can be recorded in many ways.
246
+
- Workflow tools can help if there are many steps to be executed and/or many datasets to be processed.
0 commit comments