Skip to content

Commit e88b79f

Browse files
authored
Merge pull request #281 from ipohner/ip_rr-prep
Small lesson updates / fixes
2 parents 5c80e73 + 1062148 commit e88b79f

File tree

8 files changed

+52
-51
lines changed

8 files changed

+52
-51
lines changed

content/dependencies.md

Lines changed: 12 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212

1313
```{instructor-note}
1414
- 10 min teaching
15-
- 10 min demo
15+
- 10 min exercise/demo
1616
```
1717

1818
Our codes often depend on other codes that in turn depend on other codes ...
@@ -52,16 +52,15 @@ When we create recipes, we often use tools created by others (libraries) [Midjou
5252

5353
## Dependency and environment management
5454

55-
**Conda, Anaconda, pip, virtualenv, Pipenv, pyenv, Poetry, requirements.txt,
56-
environment.yml, renv**, ..., these tools try to solve the following problems:
55+
Tools like **Conda, Anaconda, pip, virtualenv, Pipenv, pyenv, Poetry, renv** and files to record dependencies like **requirements.txt** and **environment.yml** try to solve the following problems:
5756

5857
- **Defining a specific set of dependencies**
5958
- **Installing those dependencies** mostly automatically
6059
- **Recording the versions** for all dependencies
6160
- **Isolate environments**
62-
- On your computer for projects so they can use different software
61+
- On your computer for projects, so they can use different software
6362
- Isolate environments on computers with many users (and allow self-installations)
64-
- Using **different package versions** per project (also e.g. Python/R versions)
63+
- Using **different package versions** per project (also, e.g., Python/R versions)
6564
- Provide tools and services to **share packages**
6665

6766
Isolated environments are also useful because they help you make sure
@@ -73,7 +72,7 @@ more reproducible it is.
7372

7473
---
7574

76-
## Demo
75+
## Exercise / Demo
7776

7877
``````{challenge} Dependencies-1: Time-capsule of dependencies
7978
Situation: 5 students (A, B, C, D, E) wrote a code that depends on a couple of libraries.
@@ -247,17 +246,17 @@ Answer in the collaborative document:
247246
**A**: It will be tedious to collect the dependencies one by one. And after
248247
the tedious process you will still not know which versions they have used.
249248
250-
**B**: If there is no standard file to look for and look at and it might
251-
become very difficult for to create the software environment required to
252-
run the software. But at least we know the list of libraries. But we don't
249+
**B**: If there is no standard file to look for and look at, it might
250+
become very difficult to create the software environment required to
251+
run the software. At least we know the list of libraries, but we don't
253252
know the versions.
254253
255254
**C**: Having a standard file listing dependencies is definitely better
256255
than nothing. However, if the versions are not specified, you or someone
257256
else might run into problems with dependencies, deprecated features,
258257
changes in package APIs, etc.
259258
260-
**D** and **E**: In both these cases exact versions of all dependencies are
259+
**D** and **E**: In both of these cases exact versions of all dependencies are
261260
specified and one can recreate the software environment required for the
262261
project. One problem with the dependencies that come from GitHub is that
263262
they might have disappeared (what if their authors deleted these
@@ -277,7 +276,7 @@ information?
277276
`````{tabs}
278277
````{group-tab} Conda
279278
We start from an existing conda environment. Try this either with your own project or inside the "coderefinery" conda
280-
environment. For demonstration puprposes, you can also create an environment with:
279+
environment. For demonstration purposes, you can also create an environment with:
281280
282281
```console
283282
$ conda env create -f myenv.yml
@@ -375,6 +374,6 @@ information?
375374
``````
376375

377376
```{keypoints}
378-
- Recording dependencies with versions can make it easier for the next person to execute your code
379-
- There are many tools to record dependencies and separate environments
377+
- Recording dependencies with versions can make it easier for the next person to execute your code.
378+
- There are many tools to record dependencies and separate environments.
380379
```

content/environments.md

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
## What is a container?
1515

1616
Imagine if you didn't have to install things yourself, but instead you could
17-
get a computer with the exact software for a task pre-installed? Containers
17+
get a computer with the exact software for a task pre-installed. Containers
1818
effectively do that, with various advantages and disadvantages. They are
1919
**like an entire operating system with software installed, all in one file**.
2020

@@ -30,7 +30,7 @@ From [reddit](https://www.reddit.com/r/ProgrammerHumor/comments/cw58z7/it_works_
3030
- Container definition files <-> like a blueprint to build a kitchen with all
3131
utensils in which the recipe can be prepared.
3232
- Container images <-> showroom kitchens
33-
- Containers <-> A real connected kitchen
33+
- Containers <-> a real connected kitchen
3434
3535
Just for fun: which operating systems do the following example kitchens represent?
3636
`````{tabs}
@@ -69,15 +69,15 @@ Just for fun: which operating systems do the following example kitchens represen
6969
- A container image is like a piece of paper with all the operating system on it. When you run it,
7070
a transparent sheet is placed on top to form a container. The container runs and writes only on
7171
that transparent sheet (and what other mounts have been layered on top). When you are done,
72-
transparency is thrown away. It can be repeated as often as you want, and base is always the same.
73-
- Definition files (e.g. Dockerfile or Singularity definition file) are text
72+
the transparent sheet is thrown away. This can be repeated as often as you want, and base is always the same.
73+
- Definition files (e.g., Dockerfile or Singularity definition file) are text
7474
files that contain a series of instructions to build container images.
7575

7676
## You may have use for containers in different ways
7777

7878
- **Installing a certain software is tricky**, or not supported for your operating system? - Check if an image is available and run the software from a container instead!
7979
- You want to make sure your colleagues are using the **same environment** for running your code? - Provide them an image of your container!
80-
- If this does not work, because they are using a different architecture than you do? - Provide a definition file for them to **build the image suitable to their computers**. This does not create the exact environment as you have, but in most cases similar enough.
80+
- If this does not work, because they are using a different architecture than you do? - Provide a definition file for them to **build the image suitable for their computers**. This does not create the exact environment you have, but in most cases a similar enough one.
8181

8282
## The container recipe
8383

@@ -127,20 +127,20 @@ important problems:
127127
- A mechanism to "send the computer to the data" when the **dataset is too large** to transfer.
128128
- **Installing software into a file** instead of into your computer (removing
129129
a file is often easier than uninstalling software if you suddenly regret an
130-
installation)
130+
installation).
131131

132132
However, containers may also have some drawbacks:
133133

134134
- Can be used to hide away software installation problems and thereby
135135
**discourage good software development practices**.
136136
- Instead of "works on my machine" problem: **"works only in this container"** problem?
137-
- They can be **difficult to modify**
138-
- Container **images can become large**
137+
- They can be **difficult to modify**.
138+
- Container **images can become large**.
139139

140140
```{danger}
141141
Use only **official and trusted images**! Not all images can be trusted! There
142-
have been examples of contaminated images so investigate before using images
143-
blindly. Apply same caution as installing software packages from untrusted
142+
have been examples of contaminated images, so investigate before using images
143+
blindly. Apply the same caution as when installing software packages from untrusted
144144
package repositories.
145145
```
146146

@@ -228,14 +228,14 @@ package repositories.
228228
```
229229
230230
```{solution}
231-
- Line 2: "ubuntu:latest" will mean something different 3 years in future.
231+
- Line 2: "ubuntu:latest" will mean something different 3 years into the future.
232232
- Lines 11-12: The compiler gcc and the library libgomp1 will have evolved.
233233
- Line 30: The container uses requirements.txt to build the virtual environment but we don't see
234234
here what libraries the code depends on.
235235
- Line 33: Data is copied in from the hard disk of the person who created it. Hopefully we can find the data somewhere.
236236
- Line 35: The library fancylib has been built outside the container and copied in but we don't see here how it was done.
237-
- Python version will be different then and hopefully the code still runs then.
238-
- Singularity/Apptainer will have also evolved by then. Hopefully this definition file then still works.
237+
- The Python version will be different and hopefully the code still runs.
238+
- Singularity/Apptainer will have also evolved by then. Hopefully this definition file still works.
239239
- No contact address to ask more questions about this file.
240240
- (Can you find more? Please contribute more points.)
241241
```
@@ -251,7 +251,7 @@ package repositories.
251251
````{exercise} (optional) Containers-2: Installing the impossible.
252252
253253
When you are missing privileges for installing certain software tools, containers can come handy.
254-
Here we build a Singularity/Apptainer container for installing `cowsay` and `lolcat` Linux programs.
254+
Here we build a Singularity/Apptainer container for installing the `cowsay` and `lolcat` Linux programs.
255255
256256
1. Make sure you have apptainer installed:
257257
```console
@@ -266,12 +266,12 @@ Here we build a Singularity/Apptainer container for installing `cowsay` and `lol
266266
$ export APPTAINER_TMPDIR="./temp/"
267267
```
268268
269-
3. Build the container from the following definition file above.
269+
3. Build the container from the container recipe file introduced above.
270270
```console
271271
apptainer build cowsay.sif cowsay.def
272272
```
273273
274-
4. Let's test the container by entering into it with a shell terminal
274+
4. Let's test the container by entering into it with a shell terminal:
275275
```console
276276
$ apptainer shell cowsay.sif
277277
```
@@ -317,6 +317,6 @@ the Docker containers through Singularity/Apptainer.
317317
- [Carpentries incubator lesson on Singularity/Apptainer](https://carpentries-incubator.github.io/singularity-introduction/)
318318

319319
```{keypoints}
320-
- Containers can be helpful if complex setups are needed to running a specific software
321-
- They can also be helpful for prototyping without "messing up" your own computing environment, or for running software that requires a different operating system than your own
320+
- Containers can be helpful if complex setups are needed to run a specific software.
321+
- They can also be helpful for prototyping without "messing up" your own computing environment, or for running software that requires a different operating system than your own.
322322
```

content/index.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,8 @@ reproducible environments and computational steps** for our future selves and ot
2424
.. prereq::
2525

2626
You need to install
27-
`Git, Python, and Snakemake <https://coderefinery.github.io/installation/>`__.
27+
`Git, Python, and Snakemake <https://coderefinery.github.io/installation/>`__
28+
(part of CodeRefinery Conda environment).
2829

2930
If you wish to follow in the terminal and are new to the command line, we
3031
recorded a `short shell crash course <https://youtu.be/xbTTDLA3txI>`__.

content/intro.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ This lesson on general **Reproducibility**: Preparing code to be usable by you a
3333

3434
This includes organizing your projects on your own computer and recording your computational steps, dependencies and computing environment.
3535

36-
We will also mention a few tools and platforms for sharing data (**"Here is my data"**) and research outputs(**"Here are my results"**) in the **social coding** lesson, but they are not the focus of this workshop.
36+
We will also mention a few tools and platforms for sharing data (**"Here is my data"**) and research outputs (**"Here are my results"**) in the **social coding** lesson, but they are not the focus of this workshop.
3737

3838
## Small steps towards reproducible research
3939

content/motivation.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ smaller.
4444
## Levels of reproducibility
4545

4646
A published article is like the top of a pyramid. It rests on multiple
47-
levels that each contributes to its reproducibility.
47+
levels, each contributing to its reproducibility.
4848

4949
```{figure} img/repro-pyramid.png
5050
:alt: Reproducibility pyramid
@@ -74,5 +74,5 @@ This also means that you can think about it from the beginning of your research
7474
````
7575

7676
```{keypoints}
77-
- Without reproducibility in scientific computing, everyone would have to start a new project / code from scratch
77+
- Without reproducibility in scientific computing, everyone would have to start a new project / code from scratch.
7878
```

content/organizing-projects.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,12 +18,12 @@ Let's go over some of the basic things which people have found to work (and not
1818
- Project files in a **single directory**
1919
- **Different projects** should have **separate directories**
2020
- Use **consistent and informative directory structure**
21-
- Avoid spaces in directory and file names – use `-`, `_` or CamelCase instead (nicer for computers to handle).
21+
- Avoid spaces in directory and file names – use `-`, `_` or CamelCase instead (nicer for computers to handle)
2222
- If you need to separate public/private directories,
2323
- put them separately in public and private Git repositories, or
2424
- use `.gitignore` to exclude the private information from being tracked
2525
- Add a **README file** to describe the project and instructions on reproducing the results
26-
- If you want to use the **same code in multiple projects**, host it on GitHub (or similar) and clone it into each of your project directories.
26+
- If you want to use the **same code in multiple projects**, host it on GitHub (or similar) and clone it into each of your project directories
2727

2828
A project directory can look something like this:
2929

content/where-to-go.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ However, you will not always need all of them. As with so many things, it again
1717
- You will want to consider workflow tools:
1818
- When processing many files with many steps
1919
- Steps or files may change
20-
- Your main script, connecting your steps gets very long
20+
- Your main script, connecting your steps, gets very long
2121
- You are still collecting your input data
2222
- ...
2323

@@ -34,10 +34,10 @@ However, you will not always need all of them. As with so many things, it again
3434

3535
## Important for every project
3636

37-
- Clear file structure for your project
37+
- A Clear directory/file structure for your project.
3838
- Record your workflow and write it down in a script file.
39-
- Create a dependency list and keep it updated, optimally in an environment file
40-
- At least consider the possibility that someone, maybe you may want to reproduce your work
39+
- Create a dependency list and keep it updated, optimally in an environment file.
40+
- At least consider the possibility that someone, maybe you, may want to reproduce your work:
4141
- Can you do something (small) to make it easier?
4242
- If you have ideas, but no time: add an issue to your repository; maybe someone else wants to help.
4343

@@ -52,6 +52,6 @@ Do you want to practice your reproducibility skills and get inspired by working
5252
```
5353

5454
```{keypoints}
55-
- Not everything in this lesson might be useful right now, but it is good to know that these things exist if you ever get in a situation that would require such solution.
55+
- Not everything in this lesson might be useful right now, but it is good to know that these things exist if you ever get in a situation that would require such solutions.
5656
- Caring about reproducibility makes work easier for the next person working on the project - and that might be you in a few years!
5757
```

content/workflow-management.md

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313

1414
```{instructor-note}
1515
- 5 min teaching
16-
- 15 min demo
16+
- 15 min exercise/demo
1717
```
1818

1919

@@ -69,7 +69,7 @@ steps in precisely this order, as we would run them manually, one after another.
6969
The advantage of this solution compared to processing one by one is more automation: We can generate all.
7070
This is not only easier, it is also less error-prone.
7171
72-
Yes, the scripted solution can be reproducible. But could you easily run it e.g. on a Windows computer?
72+
Yes, the scripted solution can be reproducible. But could you easily run it, e.g., on a Windows computer?
7373
7474
If we had more steps and once steps start to be time-consuming, a limitation of
7575
a scripted solution is that it tries to run all steps always. Rerunning only
@@ -102,20 +102,21 @@ cloud service:
102102
103103
**On your own computer**:
104104
- Install the necessary tools
105-
- Activate the [coderefinery conda environment](https://coderefinery.github.io/installation/conda-environment/) with `conda activate coderefinery`.
105+
- Activate the [CodeRefinery Conda environment](https://coderefinery.github.io/installation/conda/) with `conda activate coderefinery`.
106106
- Clone the word-count repository:
107107
```console
108108
$ git clone https://github.com/coderefinery/word-count.git
109109
```
110110
111111
**On Binder**:
112+
112113
We can also use the cloud service [Binder](https://mybinder.org/) to make sure
113114
we all have the same computing environment. This is interesting from a
114115
reproducible research point of view and it's explained further in the [Jupyter
115116
lesson](https://coderefinery.github.io/jupyter/sharing/) how this is even
116117
possible.
117118
- Go to <https://github.com/coderefinery/word-count> and click on the "launch binder" badge in the README.
118-
- Once it get started, you can open a new Terminal from the **new** menu (top right) and select **Terminal**.
119+
- Once it gets started, you can open a new **Terminal** from the Launcher or via **File > New > Terminal**.
119120
````
120121

121122
````{exercise} Workflow-1: Workflow solution using Snakemake
@@ -223,10 +224,10 @@ Rules that have yet to be completed are indicated with solid outlines, while alr
223224
- **Cross-platform** (Windows, MacOS, Linux) and compatible with all High Performance Computing (HPC) schedulers:
224225
same workflow works without modification and scales appropriately whether on a laptop or cluster.
225226
- If several workflow steps are independent of each other, and you have multiple cores available, Snakemake can run them **in parallel**.
226-
- Is is possible to define **isolated software environments** per rule, e.g. by adding `conda: 'environment.yml'` to a rule.
227-
- Also possible to run workflows in Docker or Apptainer **containers** e.g. by adding `container: 'docker://some-org/some-tool#2.3.1'` to a rule.
227+
- It is possible to define **isolated software environments** per rule, e.g. by adding `conda: 'environment.yml'` to a rule.
228+
- It is also possible to run workflows in Docker or Apptainer **containers**, e.g. by adding `container: 'docker://some-org/some-tool#2.3.1'` to a rule.
228229
- [Heavily used in bioinformatics](https://twitter.com/carl_witt/status/1103951128046301185), but is **completely general**.
229-
- Nice functionality for archiving the workflow, see: [the official documentation](https://snakemake.readthedocs.io/en/stable/snakefiles/deployment.html#sustainable-and-reproducible-archiving)
230+
- Nice functionality for archiving the workflow, see: [the official Snakemake documentation](https://snakemake.readthedocs.io/en/stable/snakefiles/deployment.html#sustainable-and-reproducible-archiving)
230231

231232
Tools like Snakemake help us with **reproducibility** by supporting us with **automation**, **scalability** and **portability** of our workflows.
232233

@@ -241,6 +242,6 @@ Tools like Snakemake help us with **reproducibility** by supporting us with **au
241242
- [{targets} R package - make-like pipeline tool for R](https://books.ropensci.org/targets/)
242243

243244
```{keypoints}
244-
- Computational steps can be recorded in many ways
245-
- Workflow tools can help, if there are many steps to be executed and/or many datasets to be processed
245+
- Computational steps can be recorded in many ways.
246+
- Workflow tools can help if there are many steps to be executed and/or many datasets to be processed.
246247
```

0 commit comments

Comments
 (0)