Skip to content

Commit

Permalink
Merge pull request #112 from bobturneruk/review-suggestions
Browse files Browse the repository at this point in the history
Review suggestions
  • Loading branch information
ggrimes authored Apr 23, 2024
2 parents 8e1c865 + 4afbe08 commit b415e2a
Show file tree
Hide file tree
Showing 13 changed files with 35 additions and 42 deletions.
13 changes: 5 additions & 8 deletions episodes/01-getting-started-with-nextflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ exercises: 10

## Workflows

Analysing data involves a sequence of tasks, including gathering, cleaning, and processing data. These sequence of tasks are called a workflow or a pipeline. These workflows typically require executing multiple software packages, sometimes running on different computing environments, such as a desktop or a compute cluster. Traditionally these workflows have been joined together in scripts using general purpose programming languages such as Bash or Python.
Analysing data involves a sequence of tasks, including gathering, cleaning, and processing data. This sequence of tasks is called a workflow or a pipeline. These workflows typically require executing multiple software packages, sometimes running on different computing environments, such as a desktop or a compute cluster. Traditionally these workflows have been joined together in scripts using general purpose programming languages such as Bash or Python.

<br>
<center>
Expand Down Expand Up @@ -244,19 +244,16 @@ This is a Nextflow script, which contains the following:

1. An optional interpreter directive ("Shebang") line, specifying the location of the Nextflow interpreter.
2. `nextflow.enable.dsl=2` to enable DSL2 syntax.
3. A multi-line Nextflow comment, written using C style block comments, followed by a single line comment.
3. A multi-line Nextflow comment, written using C style block comments, there are more comments later in the file.
4. A pipeline parameter `params.input` which is given a default value, of the relative path to the location of a compressed fastq file, as a string.
5. An unnamed `workflow` execution block, which is the default workflow to run.
6. A Nextflow channel used to read in data to the workflow.
5. A Nextflow channel `input_ch` used to read in data to the workflow.
6. An unnamed `workflow` execution block, which is the default workflow to run.
7. A call to the process `NUM_LINES`.
8. An operation on the process output, using the channel operator `.view()`.
8. A Nextflow process block named `NUM_LINES`, which defines what the process does.
9. An `input` definition block that assigns the `input` to the variable `read`, and declares that it should be interpreted as a file path.
10. An `output` definition block that uses the Linux/Unix standard output stream `stdout` from the script block.
11. A script block that contains the bash commands `printf '${read}'` and `gunzip -c ${read} | wc -l`.
12. A Nextflow channel `input_ch` used to read in data to the workflow.
13. An unnamed `workflow` execution block, which is the default workflow to run.
14. A call to the process `NUM_LINES` with input channel `input_ch`.
15. An operation on the process output, using the channel operator `.view()`.

## Running Nextflow scripts

Expand Down
9 changes: 6 additions & 3 deletions episodes/02-workflow_parameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -203,7 +203,8 @@ params.sleep=2
```groovy
script:
"""
sleep ${params.sleep} > printf '${read} '
sleep ${params.sleep}
printf '${read}\\t'
gunzip -c ${read} | wc -l
"""
```
Expand All @@ -217,7 +218,7 @@ The input file would be `data/yeast/reads/ref1_1.fq.gz` as this is the default.
To run all input files we could add the param
`--input 'data/yeast/reads/*.fq.gz'`
```bash
$ nextflow run wc-params.nf --sleep 1 --input 'data/yeast/reads/\*.fq.gz'
$ nextflow run wc-params.nf --sleep 1 --input 'data/yeast/reads/*.fq.gz'
```

:::::::::::::::::::::::::
Expand All @@ -244,6 +245,7 @@ and `input` in JSON format.
}
```

Create a file called `wc-params.json` with the above contents.

To run the `wc-params.nf` script using these parameters we add the
option `-params-file` and pass the file `wc-params.json`:
Expand Down Expand Up @@ -284,7 +286,8 @@ parameter file, specifying:
{
"sleep": 10,
"input": "data/yeast/reads/ref3_1.fq.gz"

}
```
```bash
$ nextflow run wc-params.nf -params-file params.json
```
Expand Down
10 changes: 5 additions & 5 deletions episodes/03-channels.md
Original file line number Diff line number Diff line change
Expand Up @@ -168,11 +168,11 @@ GRCh38

Queue (consumable) channels can be created using the following channel factory methods.

- Channel.of
- Channel.fromList
- Channel.fromPath
- Channel.fromFilePairs
- Channel.fromSRA
- `Channel.of`
- `Channel.fromList`
- `Channel.fromPath`
- `Channel.fromFilePairs`
- `Channel.fromSRA`

### The **of** Channel factory

Expand Down
1 change: 1 addition & 0 deletions episodes/04-processes-part1.md
Original file line number Diff line number Diff line change
Expand Up @@ -797,6 +797,7 @@ When a process declares an input file, the corresponding channel elements must b
::::::::::::::::::::::::::::::::::::::: challenge
## Add input channel
For the script `process_exercise_input.nf`:

1. Define a Channel using `fromPath` for the transcriptome `params.transcriptome`.
2. Add an input channel that takes the transcriptome channel as a file input.
3. Replace `params.transcriptome` in the `script:` block with the input variable you defined in the `input:` definition.
Expand Down
3 changes: 2 additions & 1 deletion episodes/06-workflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -307,7 +307,8 @@ If you only have two lines it might mean that you did not use `collect()` operat

- A Nextflow workflow is defined by invoking `processes` inside the `workflow` scope.
- A process is invoked like a function inside the `workflow` scope passing any required input parameters as arguments. e.g. `FASTQC(reads_ch)`.
- Process outputs can be accessed using the `out` attribute for the respective `process` object or assigning the output to a Nextflow variable. - Multiple outputs from a single process can be accessed using the list syntax `[]` and it's index or by referencing the a named process output .
- Process outputs can be accessed using the `out` attribute for the respective `process` object or assigning the output to a Nextflow variable.
- Multiple outputs from a single process can be accessed using the list syntax `[]` and it's index or by referencing the a named process output .

::::::::::::::::::::::::::::::::::::::::::::::::::

Expand Down
6 changes: 3 additions & 3 deletions episodes/07-operators.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,8 @@ In the Channels episode we learnt how to create Nextflow channels to enable us t
- **Filtering** operators: reduce the number of elements in a channel.
- **Transforming** operators: transform the value/data in a channel.
- **Splitting** operators: split items in a channel into smaller chunks.
- **Combining** operators: join channel together.
- **Maths** operators: apply simple math function on channels.
- **Combining** operators: join channels together.
- **Maths** operators: apply simple math functions on channels.
- **Other**: such as the view operator.

In this episode you will see examples, and get to use different types of operators.
Expand Down Expand Up @@ -226,7 +226,7 @@ channel

### Closures

In the above example we could remove the brackets around the filter condition e.g. `filter{ it<5}`, since it specifies a closure as the operator's argument. This is language short for `filter({ it<5})`
In the above example we have removed the brackets around the filter condition e.g. `filter{ it<5}`, since it specifies a closure as the operator's argument. This is language short for `filter({ it<5})`


::::::::::::::::::::::::::::::::::::::::::::::::::
Expand Down
6 changes: 1 addition & 5 deletions episodes/08-reporting.md
Original file line number Diff line number Diff line change
Expand Up @@ -207,8 +207,6 @@ name, hash, process and status

:::::::::::::: solution

## Solution

Example solution using run name `elegant_descartes`.

```bash
Expand All @@ -219,11 +217,9 @@ $ nextflow log elegant_descartes -f name,hash,process,status

## Filter pipeline run log

:::::::::::::: solution

Use the `-F` option and a regular expression to filter the for a specific process e.g. multiqc.

## Solution
:::::::::::::: solution

```bash
$ nextflow log elegant_descartes -f name,hash,process,status -F 'process =~ /multiqc/'
Expand Down
2 changes: 1 addition & 1 deletion episodes/09-configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -201,7 +201,7 @@ What is the outcome of the following commands?
1. `nextflow run print_message.nf`
2. `nextflow run print_message.nf --message '¿Que tal?'`
3. `nextflow run print_message.nf -c print_message.config`
4. `nextflow run print_message.nf -c pring_message.config --message '¿Que tal?'`
4. `nextflow run print_message.nf -c print_message.config --message '¿Que tal?'`

::::::::::::::: solution

Expand Down
8 changes: 4 additions & 4 deletions episodes/10-workflow_checkpoint_caching.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ and the parameter `--input 'data/yeast/reads/temp33*'`:

## Solution

```
```bash
$ nextflow run wc.nf --input 'data/yeast/reads/temp33*' -resume
```

Expand Down Expand Up @@ -122,12 +122,11 @@ $ touch data/yeast/reads/temp33_3_2.fq.gz

Run command below.

```
```bash
$ nextflow run wc.nf --input 'data/yeast/reads/temp33*' -resume
```

How many processes will be cached and how many will run ?
{: .language-bash}

::::::::::::::: solution

Expand Down Expand Up @@ -340,7 +339,8 @@ $ nextflow clean nauseous_leavitt -f

- Nextflow automatically keeps track of all the processes executed in your pipeline via checkpointing.
- Nextflow caches intermediate data in task directories within the work directory.
- Nextflow caching and checkpointing allows re-entrancy into a workflow after a pipeline error or using new data, skipping steps that have been successfully executed. - Re-entrancy is enabled using the `-resume` option.
- Nextflow caching and checkpointing allows re-entrancy into a workflow after a pipeline error or using new data, skipping steps that have been successfully executed.
- Re-entrancy is enabled using the `-resume` option.

::::::::::::::::::::::::::::::::::::::::::::::::::

Expand Down
8 changes: 3 additions & 5 deletions episodes/11-Simple_Rna-Seq_pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,15 +10,15 @@ exercises: 40
- Use the `log.info` function to print all the pipeline parameters.
- Print a confirmation message when the pipeline completes.
- Use a conda `environment.yml` file to install the pipeline's software requirement.
- Produce an execution report and generates run metrics from a pipeline run.
- Produce an execution report and generate run metrics from a pipeline run.

::::::::::::::::::::::::::::::::::::::::::::::::::

:::::::::::::::::::::::::::::::::::::::: questions

- How can I create a Nextflow pipeline from a series of unix commands and input data?
- How do I log my pipelines parameters?
- How can I manage my pipeline software requirement?
- How can I manage my pipeline software requirements?
- How do I know when my pipeline has finished?
- How do I see how much resources my pipeline has used?

Expand Down Expand Up @@ -84,12 +84,10 @@ println "reads: $params.reads"

Run it by using the following command:

```
```bash
$ nextflow run script1.nf
```

{: language-bash}

We can specify a different input parameter using the `--<params>` option, for example :

```groovy
Expand Down
4 changes: 0 additions & 4 deletions episodes/12-nfcore.md
Original file line number Diff line number Diff line change
Expand Up @@ -582,10 +582,6 @@ The pipeline does next-generation sequencing-based Human Leukozyte Antigen (HLA)
### Solution
```
$ nextflow run nf-core/hlatyping -r 1.2.0 -profile test,conda --max_memory 3G
```
```output
N E X T F L O W ~ version 21.04.0
Launching `nf-core/hlatyping` [pedantic_engelbart] - revision: 6998794795 [1.2.0]
Expand Down
2 changes: 1 addition & 1 deletion index.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ for building and sharing reproducible data science workflows.
## Prerequisites

This is an intermediate lesson and assumes familiarity with the core materials covered in the
[Software Carpentry Lessons] [swc-lessons]. In particular learners need to be familiar with
[Software Carpentry Lessons](https://software-carpentry.org/lessons/). In particular learners need to be familiar with
material covered in [The Unix Shell](https://swcarpentry.github.io/shell-novice).
It is helpful to be familiar with using another programming language, to the level of
[Plotting and Programming in Python](https://swcarpentry.github.io/python-novice-gapminder) or
Expand Down
5 changes: 3 additions & 2 deletions learners/setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,9 @@ A list of software with version required for this training is listed below:

The simplest way to install the software for this course is using conda.


To install conda see [here](https://carpentries-incubator.github.io/introduction-to-conda-for-data-scientists/setup/).

An environment file is provided here [environment.yml](https://raw.githubusercontent.com/carpentries-incubator/workflows-nextflow/main/episodes/data/environment.yml)

```bash
Expand All @@ -54,8 +57,6 @@ wget https://raw.githubusercontent.com/carpentries-incubator/workflows-nextflow/
curl -L -o environment.yml https://raw.githubusercontent.com/carpentries-incubator/workflows-nextflow/main/episodes/data/environment.yml
```

To install conda see [here](https://carpentries-incubator.github.io/introduction-to-conda-for-data-scientists/setup/).

To create the training environment run:

```bash
Expand Down

0 comments on commit b415e2a

Please sign in to comment.