-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Logoplots #534
base: main
Are you sure you want to change the base?
Logoplots #534
Conversation
The conda tests are failing because the dependency for the conda tests needs to be declared separately here: Line 23 in 6160f37
If you add palmotif in that list it shoud be ok. |
pyproject.toml
Outdated
@@ -42,6 +42,8 @@ dependencies = [ | |||
'pooch>=1.7.0', | |||
'pycairo>=1.20; sys_platform == "win32"', | |||
'joblib>=1.3.1', | |||
'palmotif', | |||
'IPython', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the IPython dependency necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Palmotif returns a SVG object, which is by default saved as such accordingly. What I did here was that I used the IPython.display's function SVG to directly display the SVG in my notebook and not always have to save it as a file. This was a workaround that I have almost forgotten, and I am not sure that this is even necessary...although I like to be able to directly investigate plots inside the notebook. However, I noticed that this is probably not the smartest way to do as I do not offer the possibility to save the SVG to a file any more and I might adapt this accordingly. Either way, I am not sure if SVG is a handy format for users to deal with... do you know if there is a possibility to save it as a png file or any other format that might be more convenient to deal with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like palmotif can use matplotlib as a backend:
https://github.com/agartland/palmotif/blob/e228c2a9772acf1e4a2a0f3e15782b8096704cec/palmotif/mpl_plot.py#L30
This should be supported by jupyter notebooks natively, and the user can save it to any format they like as with any other matplotlib plot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will have a look into this, but sounds promising :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am just having a hard time to manipulate the plot as plt.show() is called inside mpl_plot and it returns three objects, which are somewhat confusing to me...
Do you know if a object such as
[[(Text(0, 0, 'C'), 178.8135375281251)],
[(Text(0, 0, 'A'), 178.8135375281251)],
[(Text(0, 0, 'S'), 178.8135375281251)],
[(Text(0, 0, 'S'), 178.8135375281251)],
[(Text(0, 0, 'E'), 11.495650387247952),
(Text(0, 0, 'H'), 11.495650387247952),
(Text(0, 0, 'L'), 11.495650387247952),
(Text(0, 0, 'T'), 11.495650387247952),
(Text(0, 0, 'Y'), 11.495650387247958),
(Text(0, 0, 'P'), 22.991300774495915)],
[(Text(0, 0, 'G'), 13.08748712194183),
(Text(0, 0, 'S'), 13.08748712194183),
(Text(0, 0, 'Y'), 13.08748712194183),
(Text(0, 0, 'P'), 26.17497424388366),
(Text(0, 0, 'V'), 26.17497424388366)],
[(Text(0, 0, 'G'), 11.495650387247952),
(Text(0, 0, 'I'), 11.495650387247952),
(Text(0, 0, 'L'), 11.495650387247952),
(Text(0, 0, 'S'), 11.495650387247952),
(Text(0, 0, 'V'), 11.495650387247958),
(Text(0, 0, 'W'), 22.991300774495915)],
[(Text(0, 0, 'F'), 17.862997326023464),
(Text(0, 0, 'A'), 35.72599465204693),
(Text(0, 0, 'G'), 71.45198930409386)],
[(Text(0, 0, 'A'), 13.688315950194387),
(Text(0, 0, 'E'), 13.688315950194387),
(Text(0, 0, 'L'), 13.688315950194387),
(Text(0, 0, 'V'), 13.688315950194387),
(Text(0, 0, 'G'), 41.06494785058316)],
[(Text(0, 0, 'A'), 11.49565038724795),
(Text(0, 0, 'E'), 11.49565038724795),
(Text(0, 0, 'L'), 11.49565038724795),
(Text(0, 0, 'P'), 11.49565038724795),
(Text(0, 0, 'V'), 11.495650387247956),
(Text(0, 0, 'G'), 22.9913007744959)],
[(Text(0, 0, 'T'), 14.679323856635706),
(Text(0, 0, 'G'), 29.358647713271413),
(Text(0, 0, 'L'), 29.358647713271413),
(Text(0, 0, 'P'), 29.358647713271402)],
[(Text(0, 0, 'D'), 13.08748712194183),
(Text(0, 0, 'I'), 13.08748712194183),
(Text(0, 0, 'L'), 13.08748712194183),
(Text(0, 0, 'G'), 26.17497424388366),
(Text(0, 0, 'S'), 26.17497424388366)],
[(Text(0, 0, 'N'), 15.280152684888265),
(Text(0, 0, 'I'), 15.280152684888265),
(Text(0, 0, 'L'), 30.56030536977653),
(Text(0, 0, 'S'), 45.840458054664815)],
[(Text(0, 0, 'L'), 17.472818247834695),
(Text(0, 0, 'G'), 52.41845474350409),
(Text(0, 0, 'S'), 52.41845474350409)],
[(Text(0, 0, 'N'), 13.688315950194387),
(Text(0, 0, 'Q'), 13.688315950194387),
(Text(0, 0, 'S'), 13.688315950194387),
(Text(0, 0, 'T'), 13.688315950194387),
(Text(0, 0, 'A'), 41.06494785058316)],
[(Text(0, 0, 'D'), 13.688315950194387),
(Text(0, 0, 'Q'), 13.688315950194387),
(Text(0, 0, 'E'), 13.688315950194387),
(Text(0, 0, 'Y'), 13.688315950194387),
(Text(0, 0, 'N'), 41.06494785058316)],
[(Text(0, 0, 'E'), 15.280152684888265),
(Text(0, 0, 'P'), 15.280152684888265),
(Text(0, 0, 'T'), 30.56030536977653),
(Text(0, 0, 'V'), 45.840458054664815)],
[(Text(0, 0, 'L'), 60.16698866690969), (Text(0, 0, 'Q'), 80.2226515558796)],
[(Text(0, 0, 'H'), 17.472818247834695),
(Text(0, 0, 'T'), 52.41845474350409),
(Text(0, 0, 'Y'), 52.41845474350409)],
[(Text(0, 0, 'F'), 178.8135375281251)]]
can be easily transformed into a matplotlib figure or if it can be used to customize the returned matplotlib plot?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as plt.show() is called inside mpl_plot
that's annoying.
When choosing palmotif, did you also take a look at logomaker?
https://logomaker.readthedocs.io/en/latest/
Neither palmotif nor logomaker seem very actively maintained, but logomaker seems much more popular (according to github stars). From a first glance at the docs, logomaker seems more customizable and I also like that it doesn't have parasail as a hard dependency which I'd like to get rid of soon (see #450). But I'm not sure if it has other limitations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, I am not sure why I have chosen palmotif...probably because it was used in the single-cell best practice book as well. I will have a look on logomaker, but if it's even preferable to use logomaker over palmotif (to get rid of parasail) I would be happy to adapt my code accordingly unless it is for whatever reason impossible to use logomaker in our case...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the function signature, I'm not a big fan of one function that tries to do different things by accepting different, mutually exclusive parameters.
I'd prefer to split it up and have
logoplot_cdr3_motif_length
logoplot_cdr3_motif_gene_segment
logoplot_cdr3_motif_clonotype
where each has only the corresponding parameters. If you want to reuse code between those functions, you can do so by factoring it out into a helper function.
Alternatively, the function could just make a motif of all sequences in the anndata object and we could leave filtering to the user entirely, e.g.
# clonotype
logoplot_cdr3_motif(mdata[mdata.obs["clone_id"] == "42", :]
# length
logoplot_cdr3_motif(mdata[ir.get.airr(mdata, "VJ_1", "junction_aa").str.len() == 15, :])
...
We could also have a combintation of the two where there's one implementation that makes a plot with all sequences in the AnnData object, and the length
, clonotype
and gene_segment
versions are wrappers around the former that do the filtering for the user.
What do you think?
|
||
|
||
@DataHandler.inject_param_docs() | ||
def logoplot_cdr3_motif( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add the function to the API documentation: https://github.com/scverse/scirpy/blob/main/docs/api.rst
src/scirpy/pl/_logoplots.py
Outdated
clonotype_id: Union[None, list] = None, | ||
clonotype_key: Union[None, str] = None, | ||
cdr_len: int, | ||
plot: bool = True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need that option, better to always use matplotlib. If the user wants to save a logo, they can do that through matplotlib.
Actually, I was also not a big fan of my solution here, but I wasn't sure if including several similar functions is preferable. I will adapt it accordingly once I rewrote the code so that it properly returns matplotlib plots instead of SVG. I will experiment a little bit, but I am afraid that it's not possible to plot a logo of all sequences as they need to be aligned and this is just the case for junction sequences that have the same length. |
Unless you use the hamming distance, this wouldn't even be guaranteeded for a clonotype I think? I see two options
|
this might be something that palmotif is doing for us (while logomaker does not) and the reason why it requires parasail. |
…Removed AnnData filter functionality
for more information, see https://pre-commit.ci
This will become an issue with future pandas releases. There's even an issue here: Unfortunately, logomaker seems quite unmaintained (last commit 5 years ago), but I couldn't find any better alternatives. If the package really breaks eventually, we can still consider copying the code to scirpy, or forking the repo to scverse or something like that. |
for more information, see https://pre-commit.ci
pseudocount: float = 0, | ||
background=None, | ||
center_weights: bool = False, | ||
plot_default=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just set defaults and allow the user to override them via kwargs. This could be done via dict.update()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I understand it right, you want me to exclude them as part of the function call (like they are now) and define them in the function body with possibility to overwrite them as part of kwargs?
One or multiple chains from which to use CDR3 sequences | ||
{airr_mod} | ||
{airr_key} | ||
{chain_idx_key} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We typicalle allow the user to specify an ax
object into which the plot is added. This allows the user to easily compose multi-panel plots, e.g.
fig, ax =plt.subplots()
ir.pl.something (..., ax=ax)
Do you think this is possible with logomaker?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see.
I will have a look on your other graphical implementations, but as far as I'm concerned (by looking at the logomaker documentation with working examples) this should be possible. So the idea is that the user can access the ax object after using the function right? So it has to be returned at some point in the function call, right?
I've seen one example in a paper where they
See #12 (comment) What do you think of these ideas? |
I have also encountered this many times and I think it would be a nice functionality as the information usually complements each other quite well. However, I am not sure if and how it could work with our current logomaker implementation |
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #534 +/- ##
==========================================
- Coverage 81.43% 81.38% -0.06%
==========================================
Files 49 50 +1
Lines 4213 4367 +154
==========================================
+ Hits 3431 3554 +123
- Misses 782 813 +31 ☔ View full report in Codecov by Sentry. |
Closes #12
Added the file for sequence motif analysis via logoplots. It works as a wrapper function to the palmotif package. Also added palmotif now to the dependencies together with IPython. The latter was used to help with SVG visualization