In this practical, you will perform steps needed on working environment setup for reproducible data analysis using the code versioning system git
, systems environment management renv
and RMarkdown/Quarto
to create reproducible documents.
- R and RStudio or Posit Cloud account
- Start RStudio or login to the Posit Cloud
- Select "New Project" -> "New RStudio Project"
After a while, you should see Rstudio IDE environment with your project.
- To set up git versioning for your project click on "Tools > Version Control > Project setup".
In the following window under Git/SVN for version control system select Git and save the change.
If you are using RStudio and you do not have git as an option for code versioning you need to install it on your machine following next steps. Posit Cloud users skip the installation part and please go to the git configuration steps.
For Windows
Please download Git Bash from Git download.
For macOS
Please install it (recommended) following instructions here: http://git-scm.com/downloads.
For GNU/Linux
Please run in the terminal:
sudo apt-get install git
To configure git in all machines and for both RStudio and Posit Cloud users, fill in the mandatory info.
Type in the terminal/Bash (update name/email):
git config --global user.name "Firstname Lastname"
git config --global user.email "[email protected]"
Check the configuration was successful by running:
git config --list
You should be able to see your user.name and user.email set accordingly.
Your data analysis will require multiple packages. To use renv
first run install.packages("renv")
in the console. To start collecting the list of used packages in the project library initialize renv
by running renv::init()
. Inspect the renv.lock file.
The project environment and all used packages will be installed in any other system by running the renv::restore()
in the console tab.
-
In your project directory make 2 directories called data and R.
-
Data directory is where your ChiP-seq data from previous practicals should be placed.
-
Download the ChiP-seq data into the data directory and name it TC1-ST2-D0.12_peaks.narrowPeak. The location and the name of your data file are important for the analysis-code.R to work!
-
R directory is where you should create a new R Script and copy the code from the analysis-code.R from the given repository.
-
Now that you have the analysis-code.R that uses the tidyverse package, you will need to install it. Install the tidyverse package using the following
install.packages("tidyverse")
. To put it on the list of used packages in your project runrenv::snapshot()
to update the renv.lock file. Take a look at the renv.lock file again and notice the difference.
- Inspect the Git tab and see the list of changes.
- Create the
first-paper.qmd
file. Copy the content of the first-paper.qmd into this file and save it. The file should appear in Git tab list.
- Stage the changes for this file by checking the checkbox in
Stage
column. The green icon means it was added.
- Commit the addition of this file by clicking on
Commit
. You are prompted to review your changes. Add commit message and hitcommit
.
The file is no more listed in Git tab - naturally! It was commited.
-
Update the list of authors in the first section - add your name ;) - and save it.
-
Repeat the previous two steps. Stage the change of the file. The blue icon means it was modified.
Now commit the change. You are prompted to review your changes. Add a commit message and hit commit.
- Inspect history of the repository. Your commits should be the first. Inspect the metadata available for your commit - commit message, your name, date and commit hash (unique identifier).
At this moment, you may want to push your changes to a remote repository (GitHub or Gitlab) to share the code for others for additional development. This is not covered in this practical.
- Open the
first-paper.qmd
file saved in your project and hitRender
.
-
Review the resulting html file. Update the content of the Quarto document.
-
Switch the output format to Word
-
Describe the statistics of the length of the peaks in a table
-
Discuss the distribution of signal values and p values with one sentence that should contain the actual numbers.
-
Include a citation.
You can add new code chunks, update the text or add new pieces of code available in
R/analysis-code.R
file.
-
-
Render the document again.
Download the generated .docx
file and sent it to the trainer.