Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes several small spelling mistakes #104

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions _episodes/02-data_management.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ objectives:
- "Identify problems with data management practices"
- "Understand what raw data is"
- "Understand what backing up data means and why it is important to back up in more than one location"
- "Be able to decide on appropiate file names and identifiers"
- "Be able to decide on appropriate file names and identifiers"
- "Be able to create analysis ready datasets"
- "Understand the importance of documenting your process"
- "Understand what a DOI is and its usefulness"
Expand Down Expand Up @@ -70,8 +70,8 @@ Our recommendations have two main themes. One is to work towards ready-to-analyz
> > * In-house cloud service: this is a good way to back up your data (usually). You have local support. It is probably compliant with funders and data security guidelines.
> > * USB pen drive: definitely not! Pen-drives are prone to dying (and your data with it). It also raises data security issues and they can be easily lost.
> > * External hard-drive: see above.
> > * My laptop: it is good as a temporal storage solution for your active data. However, you should back it up appropiately.
> > * My workstation's hard-disk: it is good as a temporal storage solution for your active data. However, you should back it up appropiately.
> > * My laptop: it is good as a temporal storage solution for your active data. However, you should back it up appropriately.
> > * My workstation's hard-disk: it is good as a temporal storage solution for your active data. However, you should back it up appropriately.
> > * Network drive: this is a good way to back up your data (usually). You have local support. It is probably compliant with funders and data security guidelines.
> {: .solution}
{: .challenge}
Expand Down Expand Up @@ -264,7 +264,7 @@ and write a good README file for the humans

## Data management plans

Many UK universities and funders require researchers to complete a data management plan (DMP). A DMP is a document which outlines information about your research data and how it will be processed. Many funders provide basic templates for writing a DMP, along with guidelines on what information should be included but the main compoments of a DMP are:
Many UK universities and funders require researchers to complete a data management plan (DMP). A DMP is a document which outlines information about your research data and how it will be processed. Many funders provide basic templates for writing a DMP, along with guidelines on what information should be included but the main components of a DMP are:
* Information about your data
* Information about your metadata and data formats
* Information on how data can be accessed, shared and re-used
Expand All @@ -285,7 +285,7 @@ Many UK universities and funders require researchers to complete a data manageme

Writing your first data management plan can be a daunting task but your future self will thank you in the end.
It's best to speak to other members of your lab about any existing lab group or grant data management plans.
If you lab group doesn't have a data management plan, it may be helpful to work on it together to identify any major considerations.
If your lab group doesn't have a data management plan, it may be helpful to work on it together to identify any major considerations.

More resources on data management plans are available at [DMP online](https://dmponline.dcc.ac.uk).

Expand Down
2 changes: 1 addition & 1 deletion _episodes/03-software.md
Original file line number Diff line number Diff line change
Expand Up @@ -245,7 +245,7 @@ Also look for well-maintained libraries that already do what you're
trying to do. All programming languages have libraries that you can
import and use in your code. This is code that people have already
written and made available for distribution that have a particular
function. For instances there are libraries for statistics,
function. For instance, there are libraries for statistics,
modeling, mapping and many more. Many languages catalog the
libraries in a centralized source, for instance R has
CRAN, Python has PyPI, and so on. So
Expand Down
4 changes: 2 additions & 2 deletions _episodes/04-collaboration.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@ newcomers.

Make explicit
decisions about (and publicize where appropriate) how members of the
project will communicate with each other and with externals users /
project will communicate with each other and with external users /
collaborators. This includes the location and technology for email
lists, chat channels, voice / video conferencing, documentation, and
meeting notes, as well as which of these channels will be public or
Expand All @@ -157,7 +157,7 @@ private.
## Working with sensitive data

It is important to identify whether your project will work with sensitive data - by which we might mean:
* Research data including personal data or identifiers (this might include names and addresses, or potentially identifyable genetic data or health information, or confidential information)
* Research data including personal data or identifiers (this might include names and addresses, or potentially identifiable genetic data or health information, or confidential information)
* Commercially sensitive data or information (this might include intellectual property, or data generated or used within a restrictive commercial research funding agreement)
* Data which may cause harm or adverse affects if released or made public (for example data relating to rare or endangered species which could cause poaching or fuel illegal trading)

Expand Down
6 changes: 3 additions & 3 deletions _episodes/05-project_organization.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ The below recommendations on how you can structure data,
code, analysis outputs and other files, are drawn primarily
from [[noble2009](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000424), [gentzkow2014](https://web.stanford.edu/~gentzkow/research/CodeAndData.pdf)].

The important concepts are that is useful to organize the project in
The important concepts are that it is useful to organize the project in
modules by the types of files and that consistent planning and good
names help you effectively find and use things later.

Expand Down Expand Up @@ -105,7 +105,7 @@ cleaning or statistical analyses. These files can be thought of as
the "scientific guts" of the project.

The second type of file in `src` is controller or driver scripts
that that contains all the analysis steps for the entire project
that contains all the analysis steps for the entire project
from start to finish, with particular parameters and data
input/output commands. A controller script for a simple project, for
example, may read a raw data table, import and apply several cleanup
Expand Down Expand Up @@ -166,7 +166,7 @@ Projects that do not have any will not require `bin`.

For example, use names
such as `bird_count_table.csv`, `manuscript.md`, or
`sightings_analysis.py`. Do *not* using sequential numbers (e.g.,
`sightings_analysis.py`. Do *not* use sequential numbers (e.g.,
`result1.csv`, `result2.csv`) or a location in a final manuscript
(e.g., `fig_3_a.png`), since those numbers will almost certainly
change as the project evolves.
Expand Down
12 changes: 6 additions & 6 deletions _episodes/06-track_changes.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ regular basis. Do not allow individual investigator's versions of
the project repository to drift apart, as the effort required to
merge differences goes up faster than the size of the difference.
This is particularly important for the manual versioning procedure
describe below, which does not provide any assistance for merging
described below, which does not provide any assistance for merging
simultaneous, possibly conflicting, changes.


Expand Down Expand Up @@ -132,12 +132,12 @@ moment a laptop is stolen or its hard drive fails.
>
> * Reverted to the previous version of the abstract text as the manuscript reached word limits
>
> * Cleaned the strain inventory: Recent freezer cleaning and ordering indicated a lot of problem with the strains data. The missing physical samples were removed from the table, the duplicated ids are marked for checking with PCR. The antibiotic resistence were moved from phenotype description to its own column.
> * Cleaned the strain inventory: Recent freezer cleaning and ordering indicated a lot of problem with the strains data. The missing physical samples were removed from the table, the duplicated ids are marked for checking with PCR. The antibiotic resistance were moved from phenotype description to its own column.
>
> * New regulation heatmap: As suggested by Will I used the normalization and variance stabilization procedure from Hafemeister et al prior to clustering and heatmap generation
>
> The largest the project (measured either in: collaborators, file numbers, or workflow complexity) the more detailed the change description should be.
> While your personal project can get away with one liner descrptions, the largest projects should always contain inforamtion about motivation behind the change and
> The larger the project (measured either in: collaborators, file numbers, or workflow complexity) the more detailed the change description should be.
> While your personal project can get away with one liner descriptions, the largest projects should always contain information about motivation behind the change and
> what are the consequences.
>
{: .callout}
Expand Down Expand Up @@ -264,12 +264,12 @@ and thereby require less self-discipline for more reliable results.

> ## Changelog in action
>
> Have a look at one of the example github repositories and how they track changes*:
> Have a look at one of the example github repositories and how they track changes:
> * [data from E.R. Ballou et al. 2020](https://github.com/ewallace/pseudonuclease_evolution_2020/commits/master)
> * [data from I. Boehm et al. 2020](https://github.com/BioRDM/nmj-pig/commits/main)
>
> Give examples of:
> * what makes them good changelog
> * what makes their changelogs good
> * what could be improved
>
> Think what would be the most difficult feature to replicate with manual version control?
Expand Down
10 changes: 5 additions & 5 deletions _episodes/07-manuscripts.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ is essential, just like other collaborations.
> ## Discussion (3 mins)
>
> Whether or not you have written a scientific manuscript before,
> you probably have experience of group work or writing
> you probably have experience of group work or writing.
> Discuss on the collaborative document:
>
> * What tools have you used before for group writing?
Expand All @@ -46,7 +46,7 @@ is essential, just like other collaborations.

We suggest having a meeting (or online thread) of all authors at the
beginning of the writing process. Ask everyone how they would prefer to
write a manuscript. The agree a decision and process, and put the outcome
write a manuscript. Then agree on a decision and process, and put the outcome
in writing. If co-authors are learning new tools, ask someone
familiar with those tools to support them!

Expand Down Expand Up @@ -111,7 +111,7 @@ Our first alternative will already be familiar to many researchers:

1. ***Write manuscripts using online tools with rich
formatting, change tracking, and reference
management (6a)***, such as Google Docs or MS OneDrive.
management***, such as Google Docs or MS OneDrive.
With the document online, everyone's changes are in one place, and
hence don't need to be merged manually.

Expand Down Expand Up @@ -165,15 +165,15 @@ e.g. through [Rmarkdown](https://rmarkdown.rstudio.com/).
|----------------------------------------------|----------------------|----------------------|----------------------------------|
| Previous user experience/comfort | High | Medium | Low |
| Visible tracking of changes | Low | Variable | High |
| Institutional support | Low | High* | Low |
| Institutional support | Low | High | Low |
| Ease of merging changes and suggestions | Low | Medium | High |
| Distributed control | Low | High | High |
| Ease of formatting changes for re-submission | Low | Low | High |

While we feel that text-based version control is a superior method,
the barriers to entry may be too high for many users.
The single master online approach is a good compromise.
If your instution has invested in an environment (Google Docs / MS Office),
If your institution has invested in an environment (Google Docs / MS Office),
users can stay within their familiar desktop GUI applications while still
taking advantage of automatic file versioning and shared editing.

Expand Down
4 changes: 2 additions & 2 deletions _episodes/08-what_next.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ Thinking through your work from a collaborator's point of view is helpful, and u

Learning good practices is a long-term process that never stops.
We [left out many good practices](/_extras/what-we-left-out.md) that, although useful,
have more niche application.
have more niche applications.
We recommend the paper [Best Practices in Scientific Computing](https://doi.org/10.1371/journal.pbio.1001745),
especially for those gaining more experience with coding.
There are [many other useful papers and resources that we have selected](/_extras/resources).
Expand All @@ -77,7 +77,7 @@ Progress in computational good practices comes from different places in the scie
- PIs and lab heads can require that lab members share code and data, and make it easy for them to do so.
- Lab members can organise "data curation days" and training sessions to share good practices.
- Self-organised groups led by students and postdocs can share ideas and train each other.
- Global organisations like [The Carpentries](https://carpentries.org) can co-ordinate training and suport training materials.
- Global organisations like [The Carpentries](https://carpentries.org) can co-ordinate training and support training materials.
- Professional societies can help to organise training.

It takes time to learn good practices, and time to train others.
Expand Down