In this Elements of Style for Workflow Creation and Maintenance, users learn the know-how to ask scientific questions with these data using cloud platforms and workflows. Users will learn how to build and share processes that assure reproducibility, repurposablility regardless of the computational environment. While many things are possible, the user will be oriented to approaching their work in a modular, testable fashion.
The Gabriella Miller Kids First Pediatric Research Program (Kids First) is a trans-NIH Common Fund program whose goal is to help researchers uncover new insights into the biology of childhood cancer and structural birth defects, including the discovery of shared genetic pathways between these disorders. To achieve this goal, the program has developed the Gabriella Miller Kids First Data Resource, a cloud-based platform which publicly shares genetic and clinical data from childhood cancer and structural birth defect cohorts, and includes the Gabriella Miller Kids First portal and other tools to foster analysis and collaboration.
Pre-Training Workshop Chat 2022 August 19
Time (UTC) | Programme |
---|---|
11.00 - 11.10 | π Welcome Address, Motivation, Platform as a service Presentation Tutorial Agenda |
11.10 - 11.20 | GitHub Account Creation |
11.20 - 11.30 | Zenodo Account Creation |
11.30 - 11.50 | Kids First and CAVATICA Registrations |
11.50 - 12:00 | β Wrap up and overview of next days topics |
Day 1 Workshop Chat 2022 August 22
Day 1 Workshop Recording 2022 August 22 Passcode required - mailed to participants. If not registered, register and request passcodes
Time (UTC) | Programme |
---|---|
11.00 - 11:30 | π Welcome Address, Motivation, Cloud Credit Program, Platform as a service Presentation |
11.30 - 11.45 | A few simple rules for easier workflow maintenance and reuse |
11:45 - 12:00 | Lets Dive In, Create an Account, Start a JupyterLab Notebook |
12.00 - 12.10 | β Short Break |
12.10 - 12.25 | Introduction to the Command-line |
12.25 - 12.50 | Example Volcano Plot on CAVATICA |
12.50 - 13:00 | β Wrap up and overview of next days topics |
- NIH Kids First Cloud Credit Program Overview
- How to apply for NIH Kids First Cloud Credits
- Monthly User Support Office Hours
The Jupyter Text - do you like working in R Studio with R-markdown? You can convert easily JupyterLab notebooks with R kernels to R Script or R markdown. Explore the concept here:
Day 2 Workshop Chat 2022 August 23
Day 2 Workshop Recording 2022 August 23 Passcode required - mailed to participants. If not registered, register and request passcodes
Time (UTC) | Programme |
---|---|
11.00 - 11.10 | π Workspace set up and agenda for the day |
11.10 - 11.30 | 1. Why Git and GitHub? Motivation and set up in the JupyterLab workspace |
11.30 - 12.00 | 2. Git Routine 1: Reusing an available repository with fork and how to keep in sync with parent project |
12.00 - 12.10 | β Short break |
12.10 - 12:30: | 3. Git Routine 2: Extend your current code and use Git, GitHub to keep track of changes and contribute |
12.30 - 12:45: | 4. Git Routine 3: Generate GitHub Personal Access Tokens |
12.45 - 12:50 | 5. GitHub Auth Login |
12.50 - 13:00 | β Wrap up and overview of next days topics |
Day 3 Recording 2022 August 24
Passcode required - mailed to participants. If not registered, register and request passcodes
Time (UTC) | Programme |
---|---|
11.00 - 11.10 | π Workspace set up and agenda for the day |
11:10 - 12:00 | Creating a conda environment |
12.00 - 12.10 | β Short break |
12:10 - 12:50 | Building Dockerfiles |
12.50 - 13:00 | β Wrap up and overview of next days topics |
- Anaconda Packages Search
- Anaconda Gallery
- Anaconda Open Source
- Anaconda Open Data Science
- Who is Anaconda
- Conda
- Docker File Documentation
Day 4 Recording 2022 August 25
Passcode required - mailed to participants. If not registered, register and request passcodes
Time (UTC) | Programme |
---|---|
11.00 - 11.30 | π Welcome and Preamble to Building our Workflow |
11:10 - 12.00 | Building A Nextflow Script |
12.00 - 12.10 | β Short break |
12:10 - 12:40 | Building A CWL Script |
12.40 - 11.50 | Shared elements across workflow languages |
12.50 - 13:00 | β Wrap up and overview of next days topics |
- Nextflow
- Nextflow Basic pipeline with detail description of the file
- Nextflow Mixing Programming Languages In The Workflow
- Nextflow Blast Example
- Nextflow Community Based Pipelines
- Common Workflow Language
- Common Workflow Langage How-to
Day 5 Workshop Chat 2022 August 26
Day 5 Recording 2022 August 26
Passcode required - mailed to participants. If not registered, register and request passcodes
Time (UTC) | Programme |
---|---|
11.00 - 11.20 | π Recap of the week so far |
11.20 - 12.00 | Working with Apps on the CAVATICA |
12.00 - 12.10 | β Short break |
12:30 - 12.45 | GitHub Actions to build, test and deposit container images |
12.45 - 13:00 | π Cloud Credits Review, Course Survey and Wrap up |
Over a period of 5 days, only two hours a day, the learner learned elements of style in the construction and containerization of small single-function processes that facilitate repurposable workflow creation and execution. This hands-on-tutorial was given through a webinar using the Kids First Data Resource Center. This repository was used in the course and contains self-learnings to facilitate work. In this repository, contains how these processes may be kept up-to-date and alert the creator to the functional state of these processes (working or failing) by using a feature found within GitHub called GitHub Actions. This hands-on-course will use a small example to provide the structure, philosophy and approach to achieving this desirable outcome. This course seeks to help to demystify and make accessible powerful methods one can use to achieve platform independence and platform interoperability. Using a simple example to demonstrate these techniques, we will break down and walk the learner through each of the construction steps. The learners will be introduced to Conda, Docker, GitHub and the standard workflow language, Nextflow. If time permits, we will also show how these containerized processes can also be represented in a second standard workflow language implementation (e.g. Common Workflow Language or WDL). By the end of the course, the learner will understand these Elements of Style and will know how Conda, Docker, GitHub, Zenodo, and Nextflow enable repurposable research. Moreover, these steps will be on GitHub for the Learner to return to and reproduce themselves after the end of the course. In taking this course, the Learner will also be shown the power of JupyterLab notebooks to facilitate literate programming. Through their participation in the class, learners will learn and understand FAIR (findability, accessibility, interoperability and reusability) best practices. We ask all participants to get a GitHub, Zenodo and ORCID accounts prior to the course. We ask for minimal background knowledge of the command line, simple commands in the shell environment, we enable a bit of self-learning from the repository to facilitate the acquisition of this knowledge. This work was powered on CAVATICA and Kids First Data Resource Center
CAVATICA is a joint development between Seven Bridges and the Children's Hospital of Philadelphia
Seven Bridges supports multiple workflow languages in its application development, including CWL, Nextflow and soon WDL
Nextflow workflow information and guidance was gratefully received from Phil Palmer from his classes at the Jackson Laboratory while he was at Lifebit
and @cgpu Christina Chatzipantsiou has been my infallible guide. @cgpu Christina Chatzipantsiou taught with me the Dry Bench Skills for the Researchers and Elements of Style class at the ISCB Academy.
Common Workflow script guidance was gratefully received from Miguel Brown at Children's Hospital of Philadelphia.