Merge pull request #112 from bobturneruk/review-suggestions

Review suggestions
carpentries-incubator · Apr 23, 2024 · b415e2a · b415e2a
2 parents 8e1c865 + 4afbe08
commit b415e2a
Show file tree

Hide file tree

Showing 13 changed files with 35 additions and 42 deletions.
diff --git a/episodes/01-getting-started-with-nextflow.md b/episodes/01-getting-started-with-nextflow.md
@@ -27,7 +27,7 @@ exercises: 10
 
 ## Workflows
 
-Analysing data involves a sequence of tasks, including gathering, cleaning, and processing data. These sequence of tasks are called a workflow or a pipeline. These workflows typically require executing multiple software packages, sometimes running on different computing environments, such as a desktop or a compute cluster. Traditionally these workflows have been joined together in scripts using general purpose programming languages such as Bash or Python.
+Analysing data involves a sequence of tasks, including gathering, cleaning, and processing data. This sequence of tasks is called a workflow or a pipeline. These workflows typically require executing multiple software packages, sometimes running on different computing environments, such as a desktop or a compute cluster. Traditionally these workflows have been joined together in scripts using general purpose programming languages such as Bash or Python.
 
 <br>
 <center>
@@ -244,19 +244,16 @@ This is a Nextflow script, which contains the following:
 
 1. An optional interpreter directive ("Shebang") line, specifying the location of the Nextflow interpreter.
 2. `nextflow.enable.dsl=2` to enable DSL2 syntax.
-3. A multi-line Nextflow comment, written using C style block comments, followed by a single line comment.
+3. A multi-line Nextflow comment, written using C style block comments, there are more comments later in the file.
 4. A pipeline parameter `params.input` which is given a default value, of the relative path to the location of a compressed fastq file, as a string.
-5. An unnamed `workflow` execution block, which is the default workflow to run.
-6. A Nextflow channel used to read in data to the workflow.
+5. A Nextflow channel `input_ch` used to read in data to the workflow.
+6. An unnamed `workflow` execution block, which is the default workflow to run.
 7. A call to the process `NUM_LINES`.
+8. An operation on the process output, using the channel operator `.view()`.
 8. A Nextflow process block named `NUM_LINES`, which defines what the process does.
 9. An `input` definition block that assigns the `input` to the variable `read`, and declares that it should be interpreted as a file path.
 10. An `output` definition block that uses the Linux/Unix standard output stream `stdout` from the script block.
 11. A script block that contains the bash commands `printf '${read}'` and `gunzip -c ${read} | wc -l`.
-12. A Nextflow channel `input_ch` used to read in data to the workflow.
-13. An unnamed `workflow` execution block, which is the default workflow to run.
-14. A call to the process `NUM_LINES` with input channel `input_ch`.
-15. An operation on the process output, using the channel operator `.view()`.
 
 ## Running Nextflow scripts
 

diff --git a/episodes/02-workflow_parameters.md b/episodes/02-workflow_parameters.md
@@ -203,7 +203,8 @@ params.sleep=2
 ```groovy 
 script: 
 """
-sleep ${params.sleep} > printf '${read} '
+sleep ${params.sleep}
+printf '${read}\\t'
 gunzip -c ${read} | wc -l 
 """
 ```
@@ -217,7 +218,7 @@ The input file would be  `data/yeast/reads/ref1_1.fq.gz` as this is the default.
 To run all input files we could add the param
 `--input 'data/yeast/reads/*.fq.gz'` 
 ```bash
-$ nextflow run wc-params.nf --sleep 1 --input 'data/yeast/reads/\*.fq.gz' 
+$ nextflow run wc-params.nf --sleep 1 --input 'data/yeast/reads/*.fq.gz' 
 ```
 
 :::::::::::::::::::::::::
@@ -244,6 +245,7 @@ and `input` in JSON format.
 }
 ```
 
+Create a file called `wc-params.json` with the above contents.
 
 To run the `wc-params.nf` script using these parameters we add the
 option `-params-file` and pass the file `wc-params.json`:
@@ -284,7 +286,8 @@ parameter file, specifying:
 {
 "sleep": 10,
 "input": "data/yeast/reads/ref3_1.fq.gz"
-
+}
+```
 ```bash
 $ nextflow run wc-params.nf -params-file params.json 
 ```

diff --git a/episodes/03-channels.md b/episodes/03-channels.md
@@ -168,11 +168,11 @@ GRCh38
 
 Queue (consumable) channels can be created using the following channel factory methods.
 
-- Channel.of
-- Channel.fromList
-- Channel.fromPath
-- Channel.fromFilePairs
-- Channel.fromSRA
+- `Channel.of`
+- `Channel.fromList`
+- `Channel.fromPath`
+- `Channel.fromFilePairs`
+- `Channel.fromSRA`
 
 ### The **of** Channel factory
 

diff --git a/episodes/04-processes-part1.md b/episodes/04-processes-part1.md
@@ -797,6 +797,7 @@ When a process declares an input file, the corresponding channel elements must b
 :::::::::::::::::::::::::::::::::::::::  challenge
 ## Add input channel
 For the script `process_exercise_input.nf`:
+
 1. Define a Channel using `fromPath` for the transcriptome `params.transcriptome`.  
 2. Add an input channel that takes the transcriptome channel as a file input.
 3. Replace `params.transcriptome` in the `script:` block with the input variable you defined in the `input:` definition.

diff --git a/episodes/06-workflow.md b/episodes/06-workflow.md
@@ -307,7 +307,8 @@ If you only have two lines it might mean that you did not use `collect()` operat
 
 - A Nextflow workflow is defined by invoking `processes` inside the `workflow` scope.
 - A process is invoked like a function inside the `workflow` scope passing any required input parameters as arguments. e.g. `FASTQC(reads_ch)`.
-- Process outputs can be accessed using the `out` attribute for the respective `process` object or assigning the output to a Nextflow variable. - Multiple outputs from a single process can be accessed using the list syntax `[]` and it's index or by referencing the a named process output .
+- Process outputs can be accessed using the `out` attribute for the respective `process` object or assigning the output to a Nextflow variable. 
+- Multiple outputs from a single process can be accessed using the list syntax `[]` and it's index or by referencing the a named process output .
 
 ::::::::::::::::::::::::::::::::::::::::::::::::::
 

diff --git a/episodes/07-operators.md b/episodes/07-operators.md
@@ -29,8 +29,8 @@ In the Channels episode we learnt how to create Nextflow channels to enable us t
 - **Filtering** operators: reduce the number of elements in a channel.
 - **Transforming** operators: transform the value/data in a channel.
 - **Splitting** operators: split items in a channel into smaller chunks.
-- **Combining** operators: join channel together.
-- **Maths** operators: apply simple math function on channels.
+- **Combining** operators: join channels together.
+- **Maths** operators: apply simple math functions on channels.
 - **Other**: such as the view operator.
 
 In this episode you will see examples, and get to use different types of operators.
@@ -226,7 +226,7 @@ channel
 
 ### Closures
 
-In the above example we could remove the brackets around the filter condition e.g. `filter{ it<5}`, since it specifies a closure as the operator's argument. This is language short for `filter({ it<5})`
+In the above example we have removed the brackets around the filter condition e.g. `filter{ it<5}`, since it specifies a closure as the operator's argument. This is language short for `filter({ it<5})`
 
 
 ::::::::::::::::::::::::::::::::::::::::::::::::::

diff --git a/episodes/08-reporting.md b/episodes/08-reporting.md
@@ -207,8 +207,6 @@ name, hash, process and status
 
 ::::::::::::::  solution
 
-## Solution
-
 Example solution using run name `elegant_descartes`.
 
 ```bash 
@@ -219,11 +217,9 @@ $ nextflow log elegant_descartes -f name,hash,process,status
 
 ## Filter pipeline run log
 
-::::::::::::::  solution
-
 Use the `-F` option and a regular expression to filter the for a specific process e.g. multiqc.
 
-## Solution
+::::::::::::::  solution
 
 ```bash 
 $ nextflow log elegant_descartes -f name,hash,process,status -F 'process =~ /multiqc/'

diff --git a/episodes/09-configuration.md b/episodes/09-configuration.md
@@ -201,7 +201,7 @@ What is the outcome of the following commands?
 1. `nextflow run print_message.nf`
 2. `nextflow run print_message.nf --message '¿Que tal?'`
 3. `nextflow run print_message.nf -c print_message.config`
-4. `nextflow run print_message.nf -c pring_message.config --message '¿Que tal?'`
+4. `nextflow run print_message.nf -c print_message.config --message '¿Que tal?'`
 
 :::::::::::::::  solution
 

diff --git a/episodes/10-workflow_checkpoint_caching.md b/episodes/10-workflow_checkpoint_caching.md
@@ -53,7 +53,7 @@ and the parameter `--input 'data/yeast/reads/temp33*'`:
 
 ## Solution
 
-```
+```bash
 $ nextflow run wc.nf --input 'data/yeast/reads/temp33*' -resume
 ```
 
@@ -122,12 +122,11 @@ $ touch data/yeast/reads/temp33_3_2.fq.gz
 
 Run command below.
 
-```
+```bash
 $ nextflow run wc.nf --input 'data/yeast/reads/temp33*' -resume
 ```
 
 How many processes will be cached and how many will run ?
-{: .language-bash}
 
 :::::::::::::::  solution
 
@@ -340,7 +339,8 @@ $ nextflow clean nauseous_leavitt -f
 
 - Nextflow automatically keeps track of all the processes executed in your pipeline via checkpointing.
 - Nextflow caches intermediate data in task directories within the work directory.
-- Nextflow caching and checkpointing allows re-entrancy into a workflow after a pipeline error or using new data, skipping steps that have been successfully executed. - Re-entrancy is enabled using the `-resume` option.
+- Nextflow caching and checkpointing allows re-entrancy into a workflow after a pipeline error or using new data, skipping steps that have been successfully executed. 
+- Re-entrancy is enabled using the `-resume` option.
 
 ::::::::::::::::::::::::::::::::::::::::::::::::::
 

diff --git a/episodes/11-Simple_Rna-Seq_pipeline.md b/episodes/11-Simple_Rna-Seq_pipeline.md
@@ -10,15 +10,15 @@ exercises: 40
 - Use the `log.info` function to print all the pipeline parameters.
 - Print a confirmation message when the pipeline completes.
 - Use a conda `environment.yml` file to install the pipeline's software requirement.
-- Produce an execution report and generates run metrics from a pipeline run.
+- Produce an execution report and generate run metrics from a pipeline run.
 
 ::::::::::::::::::::::::::::::::::::::::::::::::::
 
 :::::::::::::::::::::::::::::::::::::::: questions
 
 - How can I create a Nextflow pipeline from a series of unix commands and input data?
 - How do I log my pipelines parameters?
-- How can I manage my pipeline software requirement?
+- How can I manage my pipeline software requirements?
 - How do I know when my pipeline has finished?
 - How do I see how much resources my pipeline has used?
 
@@ -84,12 +84,10 @@ println "reads: $params.reads"
 
 Run it by using the following command:
 
-```
+```bash
 $ nextflow run script1.nf
 ```
 
-{: language-bash}
-
 We can specify a different input parameter using the `--<params>` option, for example :
 
 ```groovy 

diff --git a/episodes/12-nfcore.md b/episodes/12-nfcore.md
@@ -582,10 +582,6 @@ The pipeline does next-generation sequencing-based Human Leukozyte Antigen (HLA)
 
 ### Solution
 
-```
-$ nextflow run nf-core/hlatyping -r 1.2.0 -profile test,conda  --max_memory 3G
-```
-
 ```output
  N E X T F L O W  ~  version 21.04.0
 Launching `nf-core/hlatyping` [pedantic_engelbart] - revision: 6998794795 [1.2.0]

diff --git a/index.md b/index.md
@@ -32,7 +32,7 @@ for building and sharing reproducible data science workflows.
 ## Prerequisites
 
 This is an intermediate lesson and assumes familiarity with the core materials covered in the
-[Software Carpentry Lessons] [swc-lessons]. In particular learners need to be familiar with
+[Software Carpentry Lessons](https://software-carpentry.org/lessons/). In particular learners need to be familiar with
 material covered in [The Unix Shell](https://swcarpentry.github.io/shell-novice).
 It is helpful to be familiar with using another programming language, to the level of
 [Plotting and Programming in Python](https://swcarpentry.github.io/python-novice-gapminder) or

diff --git a/learners/setup.md b/learners/setup.md
@@ -44,6 +44,9 @@ A list of software with version required for this training is listed below:
 
 The simplest way to install the software for this course is using conda.
 
+
+To install conda see [here](https://carpentries-incubator.github.io/introduction-to-conda-for-data-scientists/setup/). 
+
 An environment file is provided here [environment.yml](https://raw.githubusercontent.com/carpentries-incubator/workflows-nextflow/main/episodes/data/environment.yml)
 
 ```bash
@@ -54,8 +57,6 @@ wget https://raw.githubusercontent.com/carpentries-incubator/workflows-nextflow/
 curl -L -o environment.yml https://raw.githubusercontent.com/carpentries-incubator/workflows-nextflow/main/episodes/data/environment.yml
 ```
 
-To install conda see [here](https://carpentries-incubator.github.io/introduction-to-conda-for-data-scientists/setup/).
-
 To create the training environment run:
 
 ```bash