Skip to content

Commit

Permalink
Merge branch 'release/1.2.7'
Browse files Browse the repository at this point in the history
  • Loading branch information
dputhier committed Oct 15, 2020
2 parents 654a5a5 + bc0ab5c commit 1fe4f15
Show file tree
Hide file tree
Showing 93 changed files with 279 additions and 421 deletions.
Binary file modified docs/_images/example_01.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_images/example_02.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_images/example_05.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_images/example_06.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_images/example_06b.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_images/example_07.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_images/example_08.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_images/example_13.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
23 changes: 14 additions & 9 deletions docs/_sources/ologram.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,7 @@ For statistical reasons, we recommend shuffling across a relevant subsection of

**Exact combinations:** By default, OLOGRAM will compute "inexact" combinations, meaning that when encountering an overlap of [Query + A + B + C] it will count towards [A + B + ...]. For exact intersections (ie. [Query + A + B + nothing else]), set the --multiple-overlap-target-combi-size flag to the number of --more-bed plus one. You will know if the combinations are computed as inexact by the '...' in their name in the result file. Intersections not including the query file are discarded.

With inexact combinations, if A+B is very enriched and C is depleted, A+B+C will be enriched. It is more interesting to look at C's contribution to the enrichment. Relatedly, longer combinations are usually more enriched since they involve more theoretically independant sets. Combinations of similar orders should be compared.


**Simple example:**
Expand Down Expand Up @@ -226,8 +227,8 @@ As the computation of multiple overlaps can be RAM-intensive, if you have a very



Details
-----------------
Itemset mining details
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


In broad strokes, the custom itemset algorithm MODL (Multiple Overlap Dictionary Learning) will perform many matrix factorizations on the matrix of true overlaps to identify relevant correlation groups of genomic regions. Then a greedy algorithm based on how much these words improve the reconstruction will select the utmost best words. MODL is only used to filter the output of OLOGRAM : once it returns a list of interesting combination, OLOGRAM will compute their enrichment as usual, but for them only. Each combination is of the form [Query + A + B + C] where A, B and C are BED files given as --more-bed. You can also manually specify the combinations to be studied with the format defined in OLOGRAM notes (below).
Expand All @@ -244,7 +245,7 @@ This itemset mining algorithm is a work-in-progress. Whether you use MODL will n

This can work on any type of data, biological or not, that respects the conventional formatting for lists of transactions: the data needs to be a matrix with one line per transaction and one column per element. For example, if you have three possible elements A, B and C, a line of [1,0,1] means a transaction containing A and C.

For a factor allowance of k and n final queried words, the matrix will be rebuilt with k*n words in step 1. MODL will discard combinations rarer than 1/10000 occurences to reduce computing times. It will also reduce the abundance of all unique lines in the matrix to their square roots to reduce the emphasis on the most frequent elements. However, the latter can magnify the impact of the noise as well and can be disabled when using the manual API. To de-emphasize longer words, which can help in this case, we can also normalize words by their summed square in step 2.
For a factor allowance of k and n final queried words, the matrix will be rebuilt with k*n words in step 1. MODL will discard combinations rarer than 1/10000 occurences to reduce computing times. It will also reduce the abundance of all unique lines in the matrix to their square roots to reduce the emphasis on the most frequent elements. However, the latter can magnify the impact of the noise as well and can be disabled when using the manual API. To de-emphasize longer words, which can help in this case, we normalize words by their summed square in step 2.

If you are passing a custom error function, it must have the signature error_function(X_true, X_rebuilt, code). X_true is the real data, X_rebuilt is the reconstruction to evaluate, and code is the encoded version which in our case is used to assess sparsity. All are NumPY matrices.

Expand All @@ -267,7 +268,8 @@ Here is an example:
step_1_factor_allowance = 2, # How many words to ask for in each step 1 rebuilding, as a multiplier of multiple_overlap_max_number_of_combinations
error_function = None, # Custom error function in step 2
smother = True, # Should the smothering (quadratic reduction of abundance) be applied ?
normalize_words = False) # Normalize words by their summed squared in step 2 ?
normalize_words = True, # Normalize words by their summed squared in step 2 ?
step_2_alpha = None) # Override the alpha (sparsity control) used in step 2
interesting_combis = combi_miner.find_interesting_combinations()
Expand Down Expand Up @@ -300,6 +302,7 @@ The resulting flags_matrix is a NumPy array that can be edited, and on which MOD

Since the results of MODL only depend on the true intersections and not on the shuffles, you can run MODL with 1 shuffle or on a manually computed matrix as above to pre-select interesting combinations, and then run the full analysis on many shuffles. We then recommend selecting the combinations that interest you in the resulting tsv file, using MODL's selection as a starting point and adding or removing some combinations based on your own needs (eg. adding all the highest fold changes, or all particular combinations containing the Transcription Factor X that you are studying).

It is also possible to run any itemset miner you wish on this matrix. An implementation of apriori is provided in the `pygtftk.stats.intersect.modl.apriori.Apriori` class.


ologram_merge_stats
Expand Down Expand Up @@ -329,24 +332,28 @@ ologram_merge_stats

This also works with OLOGRAM-MODL results, since they follow the same basic format of one element/combination per line.

Cases without a p-value diamond mean it was NaN. It usually means was too rare to be encountered in the shuffles.

An example of use case for this tool would be to compare between different cell lines, or to slop (extend) your query regions by different lengths and compare the enrichment to find at which distance of each other several sets are on average.

**Arguments:**

.. command-output:: gtftk ologram_merge_stats -h
:shell:




ologram_modl_treeify
~~~~~~~~~~~~~~~~~~~~~~

**Description:** Visualize n-wise enrichment results (OLOGRAM-MODL) as a tree of combinations. Works on the result (tsv file) of an OLOGRAM analysis called with --more-bed-multiple-overlap. On the graph, S designated the total number of basepairs in which this combinations is encountered in the real data. Fold change gives the ratio with the number of basepairs in the shuffles, with the associated Negative Binomial p-value.

This recommended representation is useful to find master regulators, by showing which additions to a combinations increase its enrichment, and allowing to see whether overlaps that contain the element X also contain the element Y (looking at how a child combination accounts for the S of its parent in an inexact counting).

The tsv result file can be edited before passing it to the command, for example by keeping only the combinations you are interested in, such as all combinations containing the Transcription Factor you are studying. We recommend running MODL to make a pre-selection.
P-values of NaN (-1 in the original tsv) are due to poor fitting. They are mostly present in high order combinations, that were so rare that they are not encountered in the shuffles even once. We also recommend discarding the rarest combinations found on such a very small number of basepairs that they are unlikely to be biologically significant. This is mostly relevant when you have many sets (k >= 5) since longer combinations will often be enriched through sheer unlikelihood. To that effect, there is a parameter to display only the combinations with the highest S.

We also recommend discarding the rarest combinations found on such a very small number of basepairs that they are unlikely tobe biologically significant. This is mostly relevant when you have many sets (k >= 5) since longer combinations will often be enriched through sheer unlikelihood.
The tsv result file can be edited before passing it to the command, for example by keeping only the combinations you are interested in.
You can either (1) run OLOGRAM-MODl with no filtering and get a tree of all combinations, (2) use MODL to get a pre-selection that can be tailored, or (3) take the run with all combinations from the possibility 1 and use the -t argument to take the most frequent combinations.

.. command-output:: gtftk ologram_modl_treeify -i multiple_overlap_trivial_ologram_stats.tsv -o treeified.pdf -l ThisWasTheNameOfTheQuery
:shell:
Expand All @@ -369,8 +376,6 @@ We also recommend discarding the rarest combinations found on such a very small
:shell:




ologram_merge_runs
~~~~~~~~~~~~~~~~~~~~~~

Expand Down
2 changes: 1 addition & 1 deletion docs/_static/documentation_options.js
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
var DOCUMENTATION_OPTIONS = {
URL_ROOT: document.getElementById("documentation_options").getAttribute('data-url_root'),
VERSION: '1.2.6',
VERSION: '1.2.7',
LANGUAGE: 'en',
COLLAPSE_INDEX: false,
BUILDER: 'html',
Expand Down
Binary file modified docs/_static/example_01.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/example_02.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/example_05.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/example_06.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/example_06b.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/example_07.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/example_08.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/example_13.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/example_pa_01.pdf
Binary file not shown.
Binary file modified docs/_static/example_pa_02.pdf
Binary file not shown.
Binary file modified docs/_static/example_pa_03.pdf
Binary file not shown.
Binary file modified docs/_static/example_pa_04.pdf
Binary file not shown.
Binary file modified docs/_static/merge_ologram_stats_01.pdf
Binary file not shown.
Binary file modified docs/_static/treeified.pdf
Binary file not shown.
8 changes: 4 additions & 4 deletions docs/about.html
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();
</script>
<title>Warning about supported GTF file formats &#8212; gtftk 1.2.6 documentation</title>
<title>Warning about supported GTF file formats &#8212; gtftk 1.2.7 documentation</title>
<link rel="stylesheet" href="_static/nature.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
Expand Down Expand Up @@ -45,7 +45,7 @@ <h3>Navigation</h3>
<li class="right" >
<a href="index.html" title="Welcome to pygtftk documentation page"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">gtftk 1.2.6 documentation</a> &#187;</li>
<li class="nav-item nav-item-0"><a href="index.html">gtftk 1.2.7 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Warning about supported GTF file formats</a></li>
</ul>
</div>
Expand Down Expand Up @@ -202,13 +202,13 @@ <h3>Navigation</h3>
<li class="right" >
<a href="index.html" title="Welcome to pygtftk documentation page"
>previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">gtftk 1.2.6 documentation</a> &#187;</li>
<li class="nav-item nav-item-0"><a href="index.html">gtftk 1.2.7 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Warning about supported GTF file formats</a></li>
</ul>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2018, F. Lopez and D. Puthier.
Last updated on Oct 09, 2020.
Last updated on Oct 15, 2020.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 3.2.1.
</div>
</body>
Expand Down
8 changes: 4 additions & 4 deletions docs/annotation.html
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();
</script>
<title>Commands from section ‘annotation’ &#8212; gtftk 1.2.6 documentation</title>
<title>Commands from section ‘annotation’ &#8212; gtftk 1.2.7 documentation</title>
<link rel="stylesheet" href="_static/nature.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
Expand Down Expand Up @@ -45,7 +45,7 @@ <h3>Navigation</h3>
<li class="right" >
<a href="conversion.html" title="Commands from section ‘conversion’"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">gtftk 1.2.6 documentation</a> &#187;</li>
<li class="nav-item nav-item-0"><a href="index.html">gtftk 1.2.7 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Commands from section ‘annotation’</a></li>
</ul>
</div>
Expand Down Expand Up @@ -477,13 +477,13 @@ <h3>Navigation</h3>
<li class="right" >
<a href="conversion.html" title="Commands from section ‘conversion’"
>previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">gtftk 1.2.6 documentation</a> &#187;</li>
<li class="nav-item nav-item-0"><a href="index.html">gtftk 1.2.7 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Commands from section ‘annotation’</a></li>
</ul>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2018, F. Lopez and D. Puthier.
Last updated on Oct 09, 2020.
Last updated on Oct 15, 2020.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 3.2.1.
</div>
</body>
Expand Down
Loading

0 comments on commit 1fe4f15

Please sign in to comment.