diff --git a/filtering.html b/filtering.html index 6440380..f54b569 100644 --- a/filtering.html +++ b/filtering.html @@ -2,7 +2,7 @@ - + @@ -40,10 +40,11 @@ - +
- +
@@ -77,10 +78,13 @@

Filtering Roadmap

-
A["Results Viewer"] --> B
-B["Build Filter in GUI"] --> C["Save JSON filter"]
-C --> D["Apply filter to New Set"]
-C --> E["Apply Filter on CLI"]
+
flowchart TD
+  A["Results Viewer"] --> B
+  B["Build Filter in GUI"] 
+  B --> F["Explore Filtered Data"]
+  F --> G["Save JSON filter"]
+  G --> D["Apply filter to New Set"]
+  G --> E["Apply Filter on CLI"]
 

@@ -101,28 +105,193 @@

Filtering in the GUI

Samples

You can remove variants associated with a set of sample IDs by unselecting them in the checkboxes here.

+
+
+
+

+
+
graph TD
+  A["All Variants\n(n=1,738)"] --"Sample:mother"--> B
+  B["Mother Variants\n(n=385)"]
+
+
+

+
+
+
+

For example, we can filter for

Genes

-

Gene-level filtering can be done here. You can dinput a list of genes, separated by line-breaks

+

Gene-level filtering can be done here. You can input a list of genes, separated by line-breaks

+
+
+
+

+
+
graph TD
+  A["All Variants\n(n=1,738)"] --"Gene:(BRCA1,BRCA2)"--> B
+  B["BRCA1/BRCA2 Genes\n(n=63)"]
+
+
+

+
+
+

Sample Properties

Variant Properties

+

In variant properties, you can filter by variant type based on your annotations. For example, let’s filter to missense variants.

+

Under Variant Filter:

+
    +
  1. Click on “Query Builder”.
  2. +
  3. Mouse into the left bottom corner of the query builder window, and click the “+” sign.
  4. +
+

+

Make your filter by selecting the following dropdown boxes:

+
    +
  1. Variant Annotation
  2. +
  3. Sequence Ontology
  4. +
  5. One of
  6. +
  7. Missense checkbox
  8. +
+

Finally, click the “Apply Filter” Button:

+

+

You will be left with 299 variants. Here’s a visual summary of what we did:

+
+
+
+

+
+
graph TD
+  A["All Variants\n(n=1,738)"] --"Sequence Ontology:\nMissense"--> B["Missense Variants\n(n=299)"]
+
+
+
+

+
+
+

Boolean Operations

You can build more sophisticated operations by combining each filter step using Boolean logic.

-

For example, we might want variants that …

+
+

AND logic

+

By default, the filters are combined using AND logic, which are more restrictive, because they require variants to meet both filters.

+

Here’s an example of using AND logic. Here we are combining two filters: Missense Variants (from Variant Annotation –> Sequence Ontology) and Pathogenic variants (from ClinVar –> Clinical Significance).

+

When we apply the filter, we get 9 variants that meet both criteria. Here’s a visual summary of the filtering:

+
+
+
+

+
+
graph TD
+  A["All Variants\n(n=1,738)"] --"Sequence Ontology:\nmissense"--> B["Missense Variants\(n=299)"]
+  A --"Clinical Significance:\nPathogenic"--> C["Pathogenic Variants\n(n=10)"]
+  B --"AND"--> E["Pathogenic AND Missense\n(n=9)"]
+  C --"AND"--> E
+
+
+

+
+
+
+
+
+

OR Logic

+

These filters can also be combined using OR logic, which is more permissive (that is, these filters will return a greater number than the AND logic) we might want variants that are either missense OR pathogenic.

+

We can do this by clicking the “and” that links our two filters, which will switch it to an “or”:

+

When we apply the filter, we get 300 variants. The breakdown is below.

+
+
+
+

+
+
graph TD
+  A["All Variants\n(n=1,738)"] --"Sequence Ontology:\nmissense"--> B["Missense Variants\n(n=299)"]
+  A --"Clinical Significance:\nPathogenic"--> C["Pathogenic Variants\n(n=10)"]
+  B --"OR"--> E["Pathogenic OR Missense\n(n=300)"]
+  C --"OR"--> E
+
+
+

+
+
+
+
+
+

Combining AND / OR logic

+

By default, when you click the “and” / “or” of one set of filters, all filters will be changed. If you want to combine AND / OR logic, you can group one of the logic operations using parentheses.

+

For example, say we want the above OR subset combined as an AND with those variants that have PS1 evidence. We can

+

This is what our final filter looks like:

+

Here’s a visual breakdown of this complex filter:

+
+
+
+

+
+
graph TD
+  A["All Variants\n(n=1,738)"]--"Sequence Ontology:\nmissense"--> B
+  A --"Clinical Significance:\nPathogenic"--> C
+  subgraph OR
+  B["Missense Variants\n(n=299)"]
+  C["Pathogenic Variants\n(n=10)"]
+  B --"OR"--> E["Pathogenic OR Missense\n(n=300)"]
+  C --"OR"--> E
+  end
+  subgraph PS1
+  F["PS1 Variants\n(n=27)"]
+  end
+  E --"AND"--> G["PS1 Variants AND\n(Missense OR Pathogenic)\n(n=27)"]
+  A --"ClinVar ACMG\nPS1 variants"--> F
+  F --"AND"--> G
+  
+
+
+

+
+
+
+
+
+

NOT Logic

+
+
+

Deleting a Filter

+
+
+

Case Study: Filtering Pathogenic Variants

+
+
+
+

+
+
graph TD
+  A["All Variants\n(n=1,738)"] --"PS1 ID\nhas data"--> B
+  B["PS1 variants\n(n=36)"]
+
+
+

+
+
+
+
+
+
+

Exporting Filters as JSON

+

Filters can be exported and saved as JSON files for further reuse. They can be applied to a new set of variants in the GUI, or can be applied to result SQLite files on the command line.

-
-

Exporting Filters

+
+

Applying JSON filters in the GUI

-
-

Applying filters on the command line

+
+

Applying JSON filters on the command line

+

JSON filters can also be applied on the command-line using the oc util filtersqlite command. More information is here.

diff --git a/filtering.qmd b/filtering.qmd index 904dc2f..9b9b807 100644 --- a/filtering.qmd +++ b/filtering.qmd @@ -1,6 +1,8 @@ --- title: "Filtering" -format: html +format: + rst: default + html: default --- # Why Filter? @@ -12,11 +14,15 @@ A secondary purpose of filtering is when you want to view your results in the re # Filtering Roadmap + ```{mermaid} -A["Results Viewer"] --> B -B["Build Filter in GUI"] --> C["Save JSON filter"] -C --> D["Apply filter to New Set"] -C --> E["Apply Filter on CLI"] +flowchart TD + A["Results Viewer"] --> B + B["Build Filter in GUI"] + B --> F["Explore Filtered Data"] + F --> G["Save JSON filter"] + G --> D["Apply filter to New Set"] + G --> E["Apply Filter on CLI"] ``` # Filtering in the GUI @@ -34,31 +40,147 @@ We will go through each of these filters and their functionality. You can remove variants associated with a set of sample IDs by unselecting them in the checkboxes here. +```{mermaid} +graph TD + A["All Variants\n(n=1,738)"] --"Sample:mother"--> B + B["Mother Variants\n(n=385)"] +``` + +For example, we can filter for + ## Genes -Gene-level filtering can be done here. You can dinput a list of genes, separated by line-breaks +Gene-level filtering can be done here. You can input a list of genes, separated by line-breaks + +```{mermaid} +graph TD + A["All Variants\n(n=1,738)"] --"Gene:(BRCA1,BRCA2)"--> B + B["BRCA1/BRCA2 Genes\n(n=63)"] +``` ## Sample Properties ## Variant Properties +In variant properties, you can filter by variant type based on your annotations. For example, let's filter to missense variants. + +Under Variant Filter: + +1. Click on "Query Builder". +2. Mouse into the left bottom corner of the query builder window, and click the "+" sign. + +![](images/variant-filter.png) + +Make your filter by selecting the following dropdown boxes: + +1. Variant Annotation +2. Sequence Ontology +3. One of +4. Missense checkbox + +![](images/variant-filter2.png) +Finally, click the "Apply Filter" Button: + +![](images/variant-apply-filter.png) + +You will be left with 299 variants. Here's a visual summary of what we did: + +```{mermaid} +graph TD + A["All Variants\n(n=1,738)"] --"Sequence Ontology:\nMissense"--> B["Missense Variants\n(n=299)"] + +``` # Boolean Operations You can build more sophisticated operations by combining each filter step using Boolean logic. -By default, the filters are combined using **AND** logic, which are more restrictive. For example ... +## AND logic + +By default, the filters are combined using **AND** logic, which are more restrictive, because they require variants to meet both filters. + +Here's an example of using **AND** logic. Here we are combining two filters: Missense Variants (from Variant Annotation --> Sequence Ontology) and Pathogenic variants (from ClinVar --> Clinical Significance). -These filters can also be combined using **OR** logic, which is more permissive (that is, these filters will return a greater number than the **AND** logic) we might want variants that ... +![](images/and-filter.png) +When we apply the filter, we get 9 variants that meet both criteria. Here's a visual summary of the filtering: + +```{mermaid} +graph TD + A["All Variants\n(n=1,738)"] --"Sequence Ontology:\nmissense"--> B["Missense Variants\(n=299)"] + A --"Clinical Significance:\nPathogenic"--> C["Pathogenic Variants\n(n=10)"] + B --"AND"--> E["Pathogenic AND Missense\n(n=9)"] + C --"AND"--> E +``` +## OR Logic + +These filters can also be combined using **OR** logic, which is more permissive (that is, these filters will return a greater number than the **AND** logic) we might want variants that are either missense **OR** pathogenic. + +We can do this by clicking the "and" that links our two filters, which will switch it to an "or": + +![](images/or-filter.png) +When we apply the filter, we get 300 variants. The breakdown is below. + +```{mermaid} +graph TD + A["All Variants\n(n=1,738)"] --"Sequence Ontology:\nmissense"--> B["Missense Variants\n(n=299)"] + A --"Clinical Significance:\nPathogenic"--> C["Pathogenic Variants\n(n=10)"] + B --"OR"--> E["Pathogenic OR Missense\n(n=300)"] + C --"OR"--> E +``` + +## Combining AND / OR logic + +By default, when you click the "and" / "or" of one set of filters, all filters will be changed. If you want to combine **AND** / **OR** logic, you can group one of the logic operations using parentheses. + +For example, say we want the above **OR** subset combined as an **AND** with those variants that have PS1 evidence. We can + +This is what our final filter looks like: + +![](images/and-or-filter.png) +Here's a visual breakdown of this complex filter: + +```{mermaid} +graph TD + A["All Variants\n(n=1,738)"]--"Sequence Ontology:\nmissense"--> B + A --"Clinical Significance:\nPathogenic"--> C + subgraph OR + B["Missense Variants\n(n=299)"] + C["Pathogenic Variants\n(n=10)"] + B --"OR"--> E["Pathogenic OR Missense\n(n=300)"] + C --"OR"--> E + end + subgraph PS1 + F["PS1 Variants\n(n=27)"] + end + E --"AND"--> G["PS1 Variants AND\n(Missense OR Pathogenic)\n(n=27)"] + A --"ClinVar ACMG\nPS1 variants"--> F + F --"AND"--> G +``` + +## NOT Logic + + + +## Deleting a Filter + + + +## Case Study: Filtering Pathogenic Variants + +```{mermaid} +graph TD + A["All Variants\n(n=1,738)"] --"PS1 ID\nhas data"--> B + B["PS1 variants\n(n=36)"] +``` -# Exporting Filters +# Exporting Filters as JSON Filters can be exported and saved as JSON files for further reuse. They can be applied to a new set of variants in the GUI, or can be applied to result SQLite files on the command line. -# Applying filters in the GUI +# Applying JSON filters in the GUI -# Applying filters on the command line +# Applying JSON filters on the command line -JSON filters can also be applied on the command-line \ No newline at end of file +JSON filters can also be applied on the command-line using the `oc util filtersqlite` command. [More information is here](https://open-cravat.readthedocs.io/en/latest/Filter-And-Merge-SQLite.html). \ No newline at end of file diff --git a/filtering.rst b/filtering.rst new file mode 100644 index 0000000..11ced0b --- /dev/null +++ b/filtering.rst @@ -0,0 +1,89 @@ +========= +Filtering +========= + + +Why Filter? +=========== + +Filtering your annotated variants lets you query interesting subsets of +your variants. + +A secondary purpose of filtering is when you want to view your results +in the results viewer and you have more than 100K variants. + +Filtering Roadmap +================= + +.. container:: cell + + .. container:: cell-output-display + + .. container:: + + .. container:: + + |image1| + +Filtering in the GUI +==================== + +Once you have your annotated results, you can filter variants in the +results viewer. There are 4 kinds of filters: + +- Samples +- Genes +- Sample Properties +- Variant Properties + +We will go through each of these filters and their functionality. + +Samples +------- + +You can remove variants associated with a set of sample IDs by +unselecting them in the checkboxes here. + +Genes +----- + +Gene-level filtering can be done here. You can dinput a list of genes, +separated by line-breaks + +Sample Properties +----------------- + +Variant Properties +------------------ + +Boolean Operations +================== + +You can build more sophisticated operations by combining each filter +step using Boolean logic. + +By default, the filters are combined using **AND** logic, which are more +restrictive. For example … + +These filters can also be combined using **OR** logic, which is more +permissive (that is, these filters will return a greater number than the +**AND** logic) we might want variants that … + +Exporting Filters +================= + +Filters can be exported and saved as JSON files for further reuse. They +can be applied to a new set of variants in the GUI, or can be applied to +result SQLite files on the command line. + +Applying filters in the GUI +=========================== + +Applying filters on the command line +==================================== + +JSON filters can also be applied on the command-line + +.. |image1| image:: filtering_files/figure-rst/mermaid-figure-1.png + :width: 9.5in + :height: 5.33in diff --git a/images/and-filter.png b/images/and-filter.png new file mode 100644 index 0000000..7a6abbe Binary files /dev/null and b/images/and-filter.png differ diff --git a/images/and-or-filter.png b/images/and-or-filter.png new file mode 100644 index 0000000..4f5cc6f Binary files /dev/null and b/images/and-or-filter.png differ diff --git a/images/or-filter.png b/images/or-filter.png new file mode 100644 index 0000000..d5b9fbf Binary files /dev/null and b/images/or-filter.png differ diff --git a/images/variant-apply-filter.png b/images/variant-apply-filter.png new file mode 100644 index 0000000..7e72f8f Binary files /dev/null and b/images/variant-apply-filter.png differ diff --git a/images/variant-filter.png b/images/variant-filter.png new file mode 100644 index 0000000..e1b1841 Binary files /dev/null and b/images/variant-filter.png differ diff --git a/images/variant-filter2.png b/images/variant-filter2.png new file mode 100644 index 0000000..b47bbf2 Binary files /dev/null and b/images/variant-filter2.png differ diff --git a/making_annotator_modules.html b/making_annotator_modules.html index 8393557..98b13aa 100644 --- a/making_annotator_modules.html +++ b/making_annotator_modules.html @@ -2,7 +2,7 @@ - + @@ -33,7 +33,7 @@ } @media print { pre > code.sourceCode { white-space: pre-wrap; } -pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; } +pre > code.sourceCode > span { display: inline-block; text-indent: -5em; padding-left: 5em; } } pre.numberSource code { counter-reset: source-line 0; } @@ -74,10 +74,11 @@ - +
- +
@@ -99,17 +100,37 @@

Making an Annotator Module

-
-

Annotator Basic Structure

+
+

Annotator Overview

+
+
+
+

+
+
flowchart TD
+  A[Initialize Module] --> B
+  click A "#initializing-an-annotator-module"
+  B[Load Annotation\ninto Database] --> C
+  click B "#loading-annotations-as-a-sqlite-file"
+  C[Map Annotations\nin Python] -->  D
+  click C "#mapping-our-annotator-file"
+  D[Customize Output/\nDisplay] 
+
+
+

+
+
+

Creating an annotator module requires the following:

    -
  1. Creating an new annotator skeleton using oc new annotator <modulename>
  2. +
  3. Initializing an new annotator skeleton using oc new annotator <modulename>
  4. Loading an annotator file into a SQLite database (<modulename>.sqlitess) using sqlite3
  5. Mapping the annotator sqlite file in the <modulename>.py file
  6. Customizing the output using the <modulename>.yml file
+

This is a quick review of the basic structure of an annotator module.

@@ -126,7 +147,6 @@

Annotator Basic

-

This is a quick review of the basic structure of an annotator module.

/Users/Shared/open-cravat/modules/annotators/sift
 ├── data
 │   └── sift.sqlite               ## contains annotations in sqlite format
@@ -292,8 +312,8 @@ 

Creating our Table

-
-

Fill out sift.py

+
+

Mapping our annotator file

Now that our data is loaded into our .sqlite file, we need to set up our mapping. If we look in sift.py, we’ll see there are stubs for three methods: setup(), annotate(), and cleanup():

@@ -579,18 +599,7 @@

+```{mermaid} +flowchart TD + A[Initialize Module] --> B + click A "#initializing-an-annotator-module" + B[Load Annotation\ninto Database] --> C + click B "#loading-annotations-as-a-sqlite-file" + C[Map Annotations\nin Python] --> D + click C "#mapping-our-annotator-file" + D[Customize Output/\nDisplay] +``` + + Creating an annotator module requires the following: -1. Creating an new annotator skeleton using `oc new annotator ` +1. Initializing an new annotator skeleton using `oc new annotator ` 2. Loading an annotator file into a SQLite database (`.sqlite`ss) using `sqlite3` 3. Mapping the annotator sqlite file in the `.py` file 4. Customizing the output using the `.yml` file + +This is a quick review of the basic structure of an annotator module. + ```{mermaid} flowchart LR A[oc new annotator sift] --> B @@ -22,8 +39,6 @@ flowchart LR A --> D["data/sift.sqlite\n(contains annotation table)"] ``` -This is a quick review of the basic structure of an annotator module. - ``` /Users/Shared/open-cravat/modules/annotators/sift ├── data @@ -298,7 +313,7 @@ Finally, now that we're satisfied, we can `.exit`: -## Fill out `sift.py` +## Mapping our annotator file