Skip to content

pbshetty12/Kids-First-Elements-of-Style-Workflow-Creation-Maintenance

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation





Kids First Elements of Style Workflow Creation Maintenance

In this Elements of Style for Workflow Creation and Maintenance, users learn the know-how to ask scientific questions with these data using cloud platforms and workflows. Users will learn how to build and share processes that assure reproducibility, repurposablility regardless of the computational environment. While many things are possible, the user will be oriented to approaching their work in a modular, testable fashion.

The Gabriella Miller Kids First Pediatric Research Program (Kids First) is a trans-NIH Common Fund program whose goal is to help researchers uncover new insights into the biology of childhood cancer and structural birth defects, including the discovery of shared genetic pathways between these disorders. To achieve this goal, the program has developed the Gabriella Miller Kids First Data Resource, a cloud-based platform which publicly shares genetic and clinical data from childhood cancer and structural birth defect cohorts, and includes the Gabriella Miller Kids First portal and other tools to foster analysis and collaboration.

Course Overview:



Agenda for Pre-Training Workshop



Time (UTC) Programme
11.00 - 11.10 πŸ‘‹ Welcome Address, Motivation, Platform as a service Presentation Tutorial Agenda
11.10 - 11.20 GitHub Account Creation
11.20 - 11.30 Zenodo Account Creation
11.30 - 11.50 Kids First and CAVATICA Registrations
11.50 - 12:00 βœ‹ Wrap up and overview of next days topics

Additional resources:



Agenda for the Day 1: Reasoning



Time (UTC) Programme
11.00 - 11:30 πŸ‘‹ Welcome Address, Motivation, Cloud Credit Program, Platform as a service Presentation
11.30 - 11.45 A few simple rules for easier workflow maintenance and reuse
11:45 - 12:00 Lets Dive In, Create an Account, Start a JupyterLab Notebook
12.00 - 12.10 β˜• Short Break
12.10 - 12.25 Introduction to the Command-line
12.25 - 12.50 Example Volcano Plot on CAVATICA
12.50 - 13:00 βœ‹ Wrap up and overview of next days topics

Additional Resources

JupyterLab Notebook Conversions

The Jupyter Text - do you like working in R Studio with R-markdown? You can convert easily JupyterLab notebooks with R kernels to R Script or R markdown. Explore the concept here:



Agenda for the Day 2: Code Versioning



Time (UTC) Programme
11.00 - 11.10 πŸ‘‹ Workspace set up and agenda for the day
11.10 - 11.30 1. Why Git and GitHub? Motivation and set up in the JupyterLab workspace
11.30 - 12.00 2. Git Routine 1: Reusing an available repository with fork and how to keep in sync with parent project
12.00 - 12.10 β˜• Short break
12.10 - 12:30: 3. Git Routine 2: Extend your current code and use Git, GitHub to keep track of changes and contribute
12.30 - 12:45: 4. Git Routine 3: Generate GitHub Personal Access Tokens
12.45 - 12:50 5. GitHub Auth Login
12.50 - 13:00 βœ‹ Wrap up and overview of next days topics

Additional resources:



Agenda for the Day 3: Containerization with Environment Control



Time (UTC) Programme
11.00 - 11.10 πŸ‘‹ Workspace set up and agenda for the day
11:10 - 12:00 Creating a conda environment
12.00 - 12.10 β˜• Short break
12:10 - 12:50 Building Dockerfiles
12.50 - 13:00 βœ‹ Wrap up and overview of next days topics



Agenda for the Day 4: Workflow Development



Time (UTC) Programme
11.00 - 11.10 πŸ‘‹ Workspace set up and agenda for the day
11:10 - 12.00 Building A Nextflow Script
12.00 - 12.10 β˜• Short break
12:10 - 12:40 Building A CWL Script
12.40 - 11.50 Shared elements across workflow languages
12.50 - 13:00 βœ‹ Wrap up and overview of next days topics



Agenda for the Day 5: Workflow Execution



Time (UTC) Programme
11.00 - 11.10 πŸ‘‹ Workspace set up and agenda for the day
11.10 - 12.00 Working with Apps on the CAVATICA
12.00 - 12.10 β˜• Short break
12:30 - 12.45 GitHub Actions to build, test and deposit container images
12.45 - 13:00 πŸ‘‹ End of Course Survey and Wrap up


Additional resources:

Background Information and other Topics of Interest

Anaconda Package Jupytext CAVATICA Create Developer Token CAVATICA Add samtools to Docker Repository Conda Create env and install GitHub CLI
CAVATICA DataCruncher JupyterLab Startup Generate GitHub Personal Access Tokens GitHub Auth Login GitHub Clone FHIR Exercises
INCLUDE DataHub Login with ORCID CAVATICA Login GitHub Actions with STAR Anaconda Search GitHub CLI
Shell Google Cloud

About

Over a period of 5 days, only two hours a day, the learner learned elements of style in the construction and containerization of small single-function processes that facilitate repurposable workflow creation and execution. This hands-on-tutorial was given through a webinar using the Kids First Data Resource Center. This repository was used in the course and contains self-learnings to facilitate work. In this repository, contains how these processes may be kept up-to-date and alert the creator to the functional state of these processes (working or failing) by using a feature found within GitHub called GitHub Actions. This hands-on-course will use a small example to provide the structure, philosophy and approach to achieving this desirable outcome. This course seeks to help to demystify and make accessible powerful methods one can use to achieve platform independence and platform interoperability. Using a simple example to demonstrate these techniques, we will break down and walk the learner through each of the construction steps. The learners will be introduced to Conda, Docker, GitHub and the standard workflow language, Nextflow. If time permits, we will also show how these containerized processes can also be represented in a second standard workflow language implementation (e.g. Common Workflow Language or WDL). By the end of the course, the learner will understand these Elements of Style and will know how Conda, Docker, GitHub, Zenodo, and Nextflow enable repurposable research. Moreover, these steps will be on GitHub for the Learner to return to and reproduce themselves after the end of the course. In taking this course, the Learner will also be shown the power of JupyterLab notebooks to facilitate literate programming. Through their participation in the class, learners will learn and understand FAIR (findability, accessibility, interoperability and reusability) best practices. We ask all participants to get a GitHub, Zenodo and ORCID accounts prior to the course. We ask for minimal background knowledge of the command line, simple commands in the shell environment, we enable a bit of self-learning from the repository to facilitate the acquisition of this knowledge. This work was powered on CAVATICA and Kids First Data Resource Center

Acknowledgements

CAVATICA is a joint development between Seven Bridges and the Children's Hospital of Philadelphia

Seven Bridges supports multiple workflow languages in its application development, including CWL, Nextflow and soon WDL

Nextflow workflow information and guidance was gratefully received from Phil Palmer from his classes at the Jackson Laboratory while he was at Lifebit

and @cgpu Christina Chatzipantsiou has been my infallible guide. @cgpu Christina Chatzipantsiou taught with me the first Elements of Style class at the ISCB Academy.

Common Workflow script guidance was gratefully received from Miguel Brown at Children's Hospital of Philadelphia.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 95.9%
  • Common Workflow Language 1.9%
  • Nextflow 1.6%
  • Other 0.6%