-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Logoplots #534
base: main
Are you sure you want to change the base?
Logoplots #534
Changes from 3 commits
d7a78d7
2e9cde0
8c9da93
eb6d480
e21db85
a14d42b
acdd892
6bfd819
f1c6390
ffcc707
2b7a0ea
d30416c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,177 @@ | ||
from collections.abc import Sequence | ||
from typing import Literal, Union | ||
|
||
import palmotif as palm | ||
import pandas as pd | ||
from IPython.display import SVG | ||
|
||
from scirpy.get import airr as get_airr | ||
from scirpy.util import DataHandler | ||
|
||
|
||
@DataHandler.inject_param_docs() | ||
def logoplot_cdr3_motif( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. please add the function to the API documentation: https://github.com/scverse/scirpy/blob/main/docs/api.rst |
||
adata: DataHandler.TYPE, | ||
chains: Union[ | ||
Literal["VJ_1", "VDJ_1", "VJ_2", "VDJ_2"], | ||
Sequence[Literal["VJ_1", "VDJ_1", "VJ_2", "VDJ_2"]], | ||
] = "VDJ_1", | ||
airr_mod="airr", | ||
airr_key="airr", | ||
chain_idx_key="chain_indices", | ||
cdr3_col: str = "junction_aa", | ||
*, | ||
by: Sequence[Literal["gene_segment", "clonotype", "length"]] = "length", | ||
target_col: Union[None, str] = None, | ||
gene_annotation: Union[None, list] = None, | ||
clonotype_id: Union[None, list] = None, | ||
clonotype_key: Union[None, str] = None, | ||
cdr_len: int, | ||
plot: bool = True, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we don't need that option, better to always use matplotlib. If the user wants to save a logo, they can do that through matplotlib. |
||
color_scheme: Sequence[ | ||
Literal["nucleotide", "base_pairing", "hydrophobicity", "chemistry", "charge", "taylor", "logojs", "shapely"] | ||
] = "taylor", | ||
): | ||
""" | ||
A user friendly wrapper function for the palmotif python package. | ||
Enables the analysis of potential amino acid motifs by displaying logo plots. | ||
|
||
Parameters | ||
---------- | ||
{adata} | ||
chains | ||
One or multiple chains from which to use CDR3 sequences | ||
MKanetscheider marked this conversation as resolved.
Show resolved
Hide resolved
|
||
{airr_mod} | ||
{airr_key} | ||
{chain_idx_key} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We typicalle allow the user to specify an fig, ax =plt.subplots()
ir.pl.something (..., ax=ax) Do you think this is possible with logomaker? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see. |
||
cdr3_col | ||
key inside awkward array to retrieve junction information (aa) | ||
by | ||
Three options for convenient customisation: | ||
length -- compares all sequences that match the selected length | ||
clonotype -- compares all sequences that match the selected clonotype cluster(s) | ||
-> need to define `clonotype_id` and `clonotype_key` | ||
gene_segment -- compares all sequences that match the selected gene segment(s) | ||
-> need to define `gene_annotation` and `target_col` | ||
target_col | ||
key inside awkward array to retrieve gene annotation information e.g. v_call, j_call | ||
gene_annotation | ||
a list of predefined genes deemed interesting to include in a logoplot | ||
clonotype_id | ||
predefined clonotype cluster id to investigate as a logoplot | ||
clonotype_key | ||
key inside .obs column under which clonotype cluster ids are stored | ||
cdr_len | ||
Specify one exact sequence length t compute sequence motifs | ||
plot | ||
defaults to true to return a SVG logoplot for direct investigation | ||
set to false to retrieve the raw sequence motif for customised use | ||
color_scheme | ||
different color schemes used by palmotif. see https://github.com/agartland/palmotif/blob/master/palmotif/aacolors.py for more details | ||
|
||
Returns | ||
------- | ||
Depending on `plot` either returns a SVG object or the calculated sequence motif as a pd.DataFrame | ||
""" | ||
params = DataHandler(adata, airr_mod, airr_key, chain_idx_key) | ||
|
||
if by == "length": | ||
airr_df = get_airr(params, [cdr3_col], chains) | ||
if type(chains) is list: | ||
if len(chains) > 2: | ||
raise Exception("Only two different chains are allowed e.g. VDJ_1 and VDJ_2") | ||
|
||
else: | ||
cdr3_list = airr_df[airr_df[chains[0] + "_" + cdr3_col].str.len() == cdr_len][ | ||
chains[0] + "_" + cdr3_col | ||
].to_list() | ||
cdr3_list += airr_df[airr_df[chains[1] + "_" + cdr3_col].str.len() == cdr_len][ | ||
chains[1] + "_" + cdr3_col | ||
].to_list() | ||
motif = palm.compute_motif(cdr3_list) | ||
else: | ||
motif = palm.compute_motif( | ||
airr_df[airr_df[chains + "_" + cdr3_col].str.len() == cdr_len][chains + "_" + cdr3_col].to_list() | ||
) | ||
|
||
if plot: | ||
return SVG(palm.svg_logo(motif, return_str=False, color_scheme=color_scheme)) | ||
else: | ||
return motif | ||
|
||
if by == "gene_segment": | ||
if target_col is None or gene_annotation is None: | ||
raise Exception( | ||
"Please specify where the gene information is stored (`target_col`) and which genes to include (`gene_annotation`) as a list" | ||
) | ||
if type(gene_annotation) is not list: | ||
gene_annotation = list(gene_annotation.split(" ")) | ||
|
||
airr_df = get_airr(params, [cdr3_col, target_col], chains) | ||
if type(chains) is list: | ||
if len(chains) > 2: | ||
raise Exception("Only two different chains are allowed e.g. VDJ_1 and VDJ_2") | ||
|
||
cdr3_list = airr_df[ | ||
(airr_df[chains[0] + "_" + target_col].isin(gene_annotation)) | ||
& (airr_df[chains[0] + "_" + cdr3_col].str.len() == cdr_len) | ||
][chains[0] + "_" + cdr3_col].to_list() | ||
cdr3_list += airr_df[ | ||
(airr_df[chains[1] + "_" + target_col].isin(gene_annotation)) | ||
& (airr_df[chains[1] + "_" + cdr3_col].str.len() == cdr_len) | ||
][chains[1] + "_" + cdr3_col].to_list() | ||
motif = palm.compute_motif(cdr3_list) | ||
|
||
else: | ||
motif = palm.compute_motif( | ||
airr_df[ | ||
(airr_df[chains + "_" + target_col].isin(gene_annotation)) | ||
& (airr_df[chains + "_" + cdr3_col].str.len() == cdr_len) | ||
][chains + "_" + cdr3_col].to_list() | ||
) | ||
|
||
if plot: | ||
return SVG(palm.svg_logo(motif, return_str=False, color_scheme=color_scheme)) | ||
else: | ||
return motif | ||
|
||
if by == "clonotype": | ||
if clonotype_id is None or clonotype_key is None: | ||
raise Exception( | ||
"Please select desired clonotype cluster and the name of the column where this information is stored!" | ||
) | ||
|
||
if type(clonotype_id) is not list: | ||
clonotype_id = list(clonotype_id.split(" ")) | ||
|
||
if type(chains) is list: | ||
airr_df = get_airr(params, [cdr3_col], chains) | ||
else: | ||
airr_df = get_airr(params, [cdr3_col], [chains]) | ||
airr_df = pd.concat([airr_df, params.get_obs(clonotype_key)]) | ||
airr_df = airr_df.loc[params.get_obs(clonotype_key).isin(clonotype_id)] | ||
|
||
if type(chains) is list: | ||
if len(chains) > 2: | ||
raise Exception("Only two different chains are allowed e.g. VDJ_1 and VDJ_2") | ||
|
||
else: | ||
cdr3_list = airr_df[airr_df[chains[0] + "_" + cdr3_col].str.len() == cdr_len][ | ||
chains[0] + "_" + cdr3_col | ||
].to_list() | ||
cdr3_list += airr_df[airr_df[chains[1] + "_" + cdr3_col].str.len() == cdr_len][ | ||
chains[1] + "_" + cdr3_col | ||
].to_list() | ||
motif = palm.compute_motif(cdr3_list) | ||
else: | ||
motif = palm.compute_motif( | ||
airr_df[airr_df[chains + "_" + cdr3_col].str.len() == cdr_len][chains + "_" + cdr3_col].to_list() | ||
) | ||
|
||
if plot: | ||
return SVG(palm.svg_logo(motif, return_str=False, color_scheme=color_scheme)) | ||
else: | ||
return motif | ||
|
||
else: | ||
raise Exception("Invalid input for parameter `by`!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the IPython dependency necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Palmotif returns a SVG object, which is by default saved as such accordingly. What I did here was that I used the IPython.display's function SVG to directly display the SVG in my notebook and not always have to save it as a file. This was a workaround that I have almost forgotten, and I am not sure that this is even necessary...although I like to be able to directly investigate plots inside the notebook. However, I noticed that this is probably not the smartest way to do as I do not offer the possibility to save the SVG to a file any more and I might adapt this accordingly. Either way, I am not sure if SVG is a handy format for users to deal with... do you know if there is a possibility to save it as a png file or any other format that might be more convenient to deal with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like palmotif can use matplotlib as a backend:
https://github.com/agartland/palmotif/blob/e228c2a9772acf1e4a2a0f3e15782b8096704cec/palmotif/mpl_plot.py#L30
This should be supported by jupyter notebooks natively, and the user can save it to any format they like as with any other matplotlib plot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will have a look into this, but sounds promising :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am just having a hard time to manipulate the plot as plt.show() is called inside mpl_plot and it returns three objects, which are somewhat confusing to me...
Do you know if a object such as
[[(Text(0, 0, 'C'), 178.8135375281251)],
[(Text(0, 0, 'A'), 178.8135375281251)],
[(Text(0, 0, 'S'), 178.8135375281251)],
[(Text(0, 0, 'S'), 178.8135375281251)],
[(Text(0, 0, 'E'), 11.495650387247952),
(Text(0, 0, 'H'), 11.495650387247952),
(Text(0, 0, 'L'), 11.495650387247952),
(Text(0, 0, 'T'), 11.495650387247952),
(Text(0, 0, 'Y'), 11.495650387247958),
(Text(0, 0, 'P'), 22.991300774495915)],
[(Text(0, 0, 'G'), 13.08748712194183),
(Text(0, 0, 'S'), 13.08748712194183),
(Text(0, 0, 'Y'), 13.08748712194183),
(Text(0, 0, 'P'), 26.17497424388366),
(Text(0, 0, 'V'), 26.17497424388366)],
[(Text(0, 0, 'G'), 11.495650387247952),
(Text(0, 0, 'I'), 11.495650387247952),
(Text(0, 0, 'L'), 11.495650387247952),
(Text(0, 0, 'S'), 11.495650387247952),
(Text(0, 0, 'V'), 11.495650387247958),
(Text(0, 0, 'W'), 22.991300774495915)],
[(Text(0, 0, 'F'), 17.862997326023464),
(Text(0, 0, 'A'), 35.72599465204693),
(Text(0, 0, 'G'), 71.45198930409386)],
[(Text(0, 0, 'A'), 13.688315950194387),
(Text(0, 0, 'E'), 13.688315950194387),
(Text(0, 0, 'L'), 13.688315950194387),
(Text(0, 0, 'V'), 13.688315950194387),
(Text(0, 0, 'G'), 41.06494785058316)],
[(Text(0, 0, 'A'), 11.49565038724795),
(Text(0, 0, 'E'), 11.49565038724795),
(Text(0, 0, 'L'), 11.49565038724795),
(Text(0, 0, 'P'), 11.49565038724795),
(Text(0, 0, 'V'), 11.495650387247956),
(Text(0, 0, 'G'), 22.9913007744959)],
[(Text(0, 0, 'T'), 14.679323856635706),
(Text(0, 0, 'G'), 29.358647713271413),
(Text(0, 0, 'L'), 29.358647713271413),
(Text(0, 0, 'P'), 29.358647713271402)],
[(Text(0, 0, 'D'), 13.08748712194183),
(Text(0, 0, 'I'), 13.08748712194183),
(Text(0, 0, 'L'), 13.08748712194183),
(Text(0, 0, 'G'), 26.17497424388366),
(Text(0, 0, 'S'), 26.17497424388366)],
[(Text(0, 0, 'N'), 15.280152684888265),
(Text(0, 0, 'I'), 15.280152684888265),
(Text(0, 0, 'L'), 30.56030536977653),
(Text(0, 0, 'S'), 45.840458054664815)],
[(Text(0, 0, 'L'), 17.472818247834695),
(Text(0, 0, 'G'), 52.41845474350409),
(Text(0, 0, 'S'), 52.41845474350409)],
[(Text(0, 0, 'N'), 13.688315950194387),
(Text(0, 0, 'Q'), 13.688315950194387),
(Text(0, 0, 'S'), 13.688315950194387),
(Text(0, 0, 'T'), 13.688315950194387),
(Text(0, 0, 'A'), 41.06494785058316)],
[(Text(0, 0, 'D'), 13.688315950194387),
(Text(0, 0, 'Q'), 13.688315950194387),
(Text(0, 0, 'E'), 13.688315950194387),
(Text(0, 0, 'Y'), 13.688315950194387),
(Text(0, 0, 'N'), 41.06494785058316)],
[(Text(0, 0, 'E'), 15.280152684888265),
(Text(0, 0, 'P'), 15.280152684888265),
(Text(0, 0, 'T'), 30.56030536977653),
(Text(0, 0, 'V'), 45.840458054664815)],
[(Text(0, 0, 'L'), 60.16698866690969), (Text(0, 0, 'Q'), 80.2226515558796)],
[(Text(0, 0, 'H'), 17.472818247834695),
(Text(0, 0, 'T'), 52.41845474350409),
(Text(0, 0, 'Y'), 52.41845474350409)],
[(Text(0, 0, 'F'), 178.8135375281251)]]
can be easily transformed into a matplotlib figure or if it can be used to customize the returned matplotlib plot?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's annoying.
When choosing palmotif, did you also take a look at logomaker?
https://logomaker.readthedocs.io/en/latest/
Neither palmotif nor logomaker seem very actively maintained, but logomaker seems much more popular (according to github stars). From a first glance at the docs, logomaker seems more customizable and I also like that it doesn't have parasail as a hard dependency which I'd like to get rid of soon (see #450). But I'm not sure if it has other limitations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, I am not sure why I have chosen palmotif...probably because it was used in the single-cell best practice book as well. I will have a look on logomaker, but if it's even preferable to use logomaker over palmotif (to get rid of parasail) I would be happy to adapt my code accordingly unless it is for whatever reason impossible to use logomaker in our case...