ElucidataInc · sahil21 · May 21, 2024
diff --git a/docs/Pipelines/BranchingStrategy.md b/docs/Pipelines/BranchingStrategy.md
@@ -0,0 +1,29 @@
+
+Your feature branch must be forked off from master ideally. Suppose, you are creating a new pipeline named pipeline_007 pertaining to Jira ticket TKT-162, you should create a branch like this:
+
+```
+git checkout master
+git checkout -b TKT-162/ftr/pipeline_007_dev
+```
+<br>
+
+There are 3 main branches in the pipelines repository - `develop`, `staging` & `master`. Your branch should be first merged to `develop` followed by `staging` and then when it is production-ready, a last PR should be raised to `master`. Each branch merge or commit to 3 main branches or on `*_dev` branches triggers a deployment to specific environment(s). Check the following table to understand deployment triggers:
+
+| Branch    | Devpolly          | Testpolly         | Polly             | Description                                                                   |
+| --------- | ----------------- | ----------------- | ----------------- | ----------------------------------------------------------------------------- |  
+| `*_dev`   | :material-check:  | :material-close:  | :material-check:  | Commits to branches ending with `_dev` are deployed to devpolly and on polly  |
+| `develop` | :material-check:  | :material-close:  | :material-check:  | PR merges to `develop` branch is deployed to devpolly and polly               |
+| `staging` | :material-close:  | :material-check:  | :material-check:  | PR merges to `staging` branch is deployed to testpolly and polly              |
+| `master`  | :material-close:  | :material-close:  | :material-check:  | PR merges to `master` branch is only deployed to polly                        |
+
+
+!!! note "Important Note"
+    As you can see, all the 4 branch types (`*_dev`, `develop`, `staging`, `master`) are deployed to production along with other environments. This approach ensures that pipeline developers can test their pipelines without relying solely on the stability of devpolly and testpolly environments. In the production environment, each pipeline is assigned a 'stage' attribute that differentiates its maturity level. The *_dev and develop branches are designated as the 'dev' stage, the staging branch represents the 'test' stage, and the master branch is considered the 'prod' or production stage.
+
+<br>
+<br>
+<br>
+<br>
+<br>
+<br>
+
diff --git a/docs/Pipelines/GettingStarted.md b/docs/Pipelines/GettingStarted.md
@@ -0,0 +1,25 @@
+
+
+Welcome to Polly Pipelines, a powerful workflow orchestration framework designed to simplify the process of building, managing, and executing complex pipelines. With Polly Pipelines, users can focus on running their pipelines without worrying about the underlying infrastructure.
+
+
+## Why use Polly Pipelines?
+- **Multi-language support:** Supports two languages for writing pipelines: Nextflow and Polly Workflow Language (PWL). We are planning to support Snakemake soon!
+- **GUI and Programmatic Interface:** Provides user-friendly GUI and polly-python interfaces to monitor and execute pipelines
+- **No infrastructure management:** Abstracts away complexities of infrastructure management and deployment. As a user, you just need to focus on writing pipelines!
+- **Cloud and on-prem execution:** You can choose to execute your pipelines on cloud or on-prem. Helps you save costs! This feature is available only on Nextflow pipelines for now
+
+
+If you are confused on what language to choose from while writing pipelines, please [check these guidelines](NextflowVsPWL.md)
+
+
+<br>
+
+To learn how to write pipelines, please check the following quick start guides
+
+<div class="grid cards" markdown>
+
+-  :material-arrow-right: [__Nextflow__ Quick Start Guide](WritingPipelines/Nextflow/QuickStartNextflow.md)
+-  :material-arrow-right: [__PWL__ Quick Start Guide](WritingPipelines/PWL/QuickStartPWL.md)
+
+</div>
diff --git a/docs/Pipelines/NextflowVsPWL.md b/docs/Pipelines/NextflowVsPWL.md
@@ -0,0 +1,23 @@
+
+
+#### Choose Nextflow if:
+- You like to use [nf-core](https://nf-co.re/) community pipelines
+- You primarily work in bioinformatics and scientific workflows
+- You need data-parallelism capabilities
+- You're comfortable with learning a new syntax
+
+#### Choose PWL if:
+- You're a Python developer comfortable with functional programming
+- You need a highly scalable and flexible framework for diverse workflows, including data science
+- You prioritize ease of use, user-friendliness, and a rich feature set
+
+<br>
+
+<div class="grid cards" markdown>
+
+-  :material-arrow-right: [__Nextflow__ Quick Start Guide](WritingPipelines/Nextflow/QuickStartNextflow.md)
+-  :material-arrow-right: [__PWL__ Quick Start Guide](WritingPipelines/PWL/QuickStartPWL.md)
+
+</div>
+
+
diff --git a/docs/Pipelines/WritingPipelines/Nextflow/QuickStartNextflow.md b/docs/Pipelines/WritingPipelines/Nextflow/QuickStartNextflow.md
@@ -0,0 +1,140 @@
+
+Welcome to Nextflow quick start guide!
+
+## Setting up the environment
+
+Start by cloning the repository. Assuming you have your ElucidataInc GitHub SSH key setup:
+``` bash
+git clone [email protected]:ElucidataInc/pipelines.git
+```
+
+Create a virtual environment in Python [refer to this doc](https://www.freecodecamp.org/news/how-to-setup-virtual-environments-in-python/) and activate it
+
+``` bash
+cd pipelines
+# Activate your virtual env here
+```
+
+Install some basic requirements
+
+``` bash
+pip install -r requirements.txt
+```
+
+Install pre-commit hooks for basic formatting checks on code commit
+
+``` bash
+pre-commit install
+```
+
+<hr>
+
+
+## Understanding the structure of pipelines repo
+
+Let’s go over the structure of the repository in brief. The following schematic shows some important root level files and folders and their purposes:
+
+``` hl_lines="7 8 9 10 11 12 13 14"
+pipelines                       # the repository
+    │
+    ├── .circleci/              # config for CI/CD
+    ├── deployment/             # deployment scripts and utilities
+    ├── orchestration/          # utilities for enabling pipeline development
+    │
+    ├── pipelines/          
+    │   │
+    │   ├── nextflow/           # All Nextflow pipelines
+    │   │   ├── pipeline_1/     
+    │   │   └── pipeline_2/     
+    │   │
+    │   └── pwl/                # All PWL pipelines
+    │       └── pipeline_3/     
+    ├── ...
+    ├── requirements.txt        # dependencies
+    ├── ...
+    └── scripts/                # common scripts
+```
+
+!!! info
+    As a pipeline developer, you should only care about the pipelines directory (highlighted above). It will contain both Nextflow and PWL pipelines
+
+<hr>
+
+A Pipeline will follow a specific directory structure. To better grasp this concept, let's explore the directory structure of a demo pipeline.
+
+```
+toy/                            # nextflow pipeline named "toy"
+    │
+    ├──  __init__.py
+    │
+    ├── build
+    │   ├── Dockerfile          # For building docker image (must)
+    │   └── environment.yml     # dependencies for pipeline (must)
+    │
+    ├── config                  
+    │   ├── dev.json            # config for devpolly
+    │   ├── test.json           # config for testpolly
+    │   └── prod.json           # config for polly
+    │
+    ├── src                     # Source code
+    │   ├── main.nf
+    │   ├── Makefile
+    │   └── nextflow.config   
+    │
+    └── parameter_schema.json   # Defines pipeline's parameters (must)
+
+```
+
+
+## Let's create your first pipeline
+
+1. We will start by forking a branch from `#! master`
+
+    ``` bash
+    git checkout master
+    git checkout -b <add_your_branch_name>_dev
+    # Make sure your branch name ends with _dev.
+    ```
+
+    The pipelines repository employs a branching strategy. For more details please refer to [this page](../../BranchingStrategy.md).
+
+
+2. Secondly, instead of creating a pipeline from scratch, let's copy an example pipeline and try playing with it
+
+    ``` bash
+    mkdir pipelines/nextflow/<name_your_pipeline>
+    cp -r pipelines/nextflow/toy/ pipelines/nextflow/<name_your_pipeline>/
+    ```
+
+3. Go to `build/Dockerfile` and change the pipeline path in the highlighted `COPY` command
+
+    ``` hl_lines="4"
+    FROM nfcore/base:2.1
+
+    # Install the conda environment
+    COPY pipelines/nextflow/toy/build/environment.yml .
+    RUN pip3 --no-cache-dir install --upgrade awscli
+
+    CMD ["bash","echo 'ECS_IMAGE_PULL_BEHAVIOR=once' >> /etc/ecs/ecs.config"]
+    ```
+
+4. After all the above changes are done, let's push your pipeline
+
+    ``` bash
+    git add .
+    git commit -m 'First pipeline'
+    git push origin <name_of_your_branch>
+    ```
+
+6. Go to [circleCI](https://app.circleci.com/pipelines/github/ElucidataInc/pipelines) and approve the hold to deploy your pipeline 
+
+
+Congrats! You have deployed your first pipeline. Go to [Polly](https://polly.elucidata.io/manage/pipelines) (after circleCI jobs are completed). Click on your pipeline, pass in the parameters and initiate your first run. 
+
+
+<br>
+<br>
+<br>
+<br>
+<br>
+<br>
diff --git a/docs/Pipelines/WritingPipelines/PWL/QuickStartPWL.md b/docs/Pipelines/WritingPipelines/PWL/QuickStartPWL.md
@@ -0,0 +1,157 @@
+
+Welcome to PWL quick start guide!
+
+## Setting up the environment
+
+Start by cloning the repository. Assuming you have your ElucidataInc GitHub SSH key setup:
+``` bash
+git clone [email protected]:ElucidataInc/pipelines.git
+```
+
+Create a virtual environment in Python [refer to this doc](https://www.freecodecamp.org/news/how-to-setup-virtual-environments-in-python/) and activate it
+
+``` bash
+cd pipelines
+# Activate your virtual env here
+```
+
+Install some basic requirements
+
+``` bash
+pip install -r requirements.txt
+```
+
+Install pre-commit hooks for basic formatting checks on code commit
+
+``` bash
+pre-commit install
+```
+
+<hr>
+
+
+## Understanding the structure of pipelines repo
+
+Let’s go over the structure of the repository in brief. The following schematic shows some important root level files and folders and their purposes:
+
+``` hl_lines="7 8 9 10 11 12 13 14"
+pipelines                       # the repository
+    │
+    ├── .circleci/              # config for CI/CD
+    ├── deployment/             # deployment scripts and utilities
+    ├── orchestration/          # utilities for enabling pipeline development
+    │
+    ├── pipelines/          
+    │   │
+    │   ├── nextflow/           # All Nextflow pipelines
+    │   │   ├── pipeline_1/     
+    │   │   └── pipeline_2/     
+    │   │
+    │   └── pwl/                # All PWL pipelines
+    │       └── pipeline_3/     
+    ├── ...
+    ├── requirements.txt        # dependencies
+    ├── ...
+    └── scripts/                # common scripts
+```
+
+!!! info
+    As a pipeline developer, you should only care about the pipelines directory (highlighted above). It will contain both Nextflow and PWL pipelines
+
+<hr>
+
+A Pipeline will follow a specific directory structure. To better grasp this concept, let's explore the directory structure of a demo pipeline.
+
+```
+demo_protein_processing/        # pwl pipeline named "demo_protein_processing"
+    │
+    ├──  __init__.py
+    │
+    ├── build
+    │   ├── Dockerfile          # For building docker image (must be present)
+    │   └── requirements.txt    # dependencies for pipeline (must be present)
+    │
+    ├── config                  
+    │   ├── dev.json            # config for devpolly
+    │   ├── test.json           # config for testpolly
+    │   └── prod.json           # config for polly
+    │
+    ├── src                     # Source code
+    │   ├── __init__.py
+    │   └── main.py 
+    │
+    └── parameter_schema.json   # Defines pipeline's parameters (must be present)
+```
+
+
+## Let's create your first pipeline
+
+1. We will start by forking a branch from `#! master`
+
+    ``` bash
+    git checkout master
+    git checkout -b <add_your_branch_name>_dev
+    # Make sure your branch name ends with _dev.
+    ```
+
+    The pipelines repository employs a branching strategy. For more details please refer to [this page](../../BranchingStrategy.md).
+
+
+2. Secondly, instead of creating a pipeline from scratch, let's copy an example pipeline and try playing with it
+
+    ``` bash
+    mkdir pipelines/pwl/<name_your_pipeline>
+    cp -r pipelines/pwl/demo_protein_processing/ pipelines/pwl/<name_your_pipeline>/
+    ```
+
+3. Go to `build/Dockerfile` and change the pipeline path in the highlighted `COPY` command
+
+    ``` hl_lines="3"
+    FROM mithoopolly/workflows-base:python3.9
+
+    COPY pipelines/pwl/demo_protein_processing/build/requirements.txt .
+
+    RUN pip install -r requirements.txt
+
+    ```
+
+4. Change the entrypoint function name in `main.py` to match pipeline name. This is important!
+
+    ``` python hl_lines="2 12"
+    @workflow(result_serialization=Serialization.JSON)
+    def demo_protein_processing(exp_id: str = "exp1", pre_process: bool = False):
+        secret_key = "MY_SECRET_KEY"
+        secret_value = Secrets.get(secret_key)
+        Logger.info(f"My secret value: {secret_value}")
+
+    ##
+    ##
+    ##
+
+    if __name__ == "__main__":
+        demo_protein_processing("exp1.data", True)
+    ```
+
+
+5. After all the above changes are done, let's push your pipeline
+
+    ``` bash
+    git add .
+    git commit -m 'First pipeline'
+    git push origin <name_of_your_branch>
+    ```
+
+6. Go to [circleCI](https://app.circleci.com/pipelines/github/ElucidataInc/pipelines) and approve the hold to deploy your pipeline 
+
+
+Congrats! You have deployed your first pipeline. Go to [Polly](https://polly.elucidata.io/manage/pipelines) (after circleCI jobs are completed). Click on your pipeline, pass in the parameters and initiate your first run. 
+
+
+Now that you have deployed your first PWL pipeline, let's do in-depth dive on creating your pipelines from scratch. [Check this page](UnderstandingTheSyntax.md).
+
+<br>
+<br>
+<br>
+<br>
+<br>
+<br>