Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Star or not star? #5

Open
ThomasDesvignes opened this issue Mar 29, 2016 · 4 comments
Open

Star or not star? #5

ThomasDesvignes opened this issue Mar 29, 2016 · 4 comments

Comments

@ThomasDesvignes
Copy link
Member

Here I'd like to open the discussion on the use of the "star" symbol.

Originally, when we thought that only one arm of the hairpin was functional, the star symbol () was used to convey that this strand was a non-functional by-product of the functional miRNA biogenesis. But the use of the start strand denomination '' is now not approved by any nomenclature consortium, including miRBase since April 2011 (Release 18), because many miRNA genes were then showed to produce mature miRNAs from both sides of the hairpin and because of the risk of fluctuation of expression levels as this denomination relies a lot on sequencing depth and the nature of the studied tissue/stage/etc. But in some cases of extreme differences in levels of expression, this additional symbol can convey potentially useful functional information.

So, should we simply follow the gene nomenclature consortia and not support this symbol? Or try to find an agreement and define rules for using this symbol to make this symbol consistent and trustworthy?

For example, at what level of arm selection can we say that a strand is likely only a by-product? Fromm et al (2015) propose a one-fold change. To me this appears not strong enough of a difference to call the second strand star strand, given the non-representation of the complete expressed miRNAome of an organism and the sequence bias known in miRNA-Seq library preparation. For instance I would personally be more confident in a 10-fold change and a good representation of tissue types in the organism considered to call one strand the star strand.

If you have any comments or propositions to try to clarify this situation, please participate!

@lpantano
Copy link
Contributor

Nice idea!

I think we can add the annotation, maybe not in the name it self, but another label together with the variant information. In terms of file format, maybe we can start with something miRBase compatible, and then add other naming with this kind of information.

@BastianFromm
Copy link
Collaborator

I think this is a fair point to discuss and my opinion is that star-sequences and in general all passenger-strand reads have to be characterized, too.

Mind you that there are many such sequences that are not annotated in miRBase.

Whether or not the difference between a 2 fold change or 10 fold change is considered for calling a star is always going to be subjective but before setting such a measure it is more important to think about how to name the passenger-strands if the ratio is not given for a star. We proposed "co-mature" as designation, where per definition 5' would be mature and 3' co.

@ThomasDesvignes
Copy link
Member Author

ThomasDesvignes commented Apr 12, 2017

Great to have some input!
I think we have two questions here:
1- Characterize and annotate all "passenger" strands, which comes down to annotate all 5p strands and 3p strands
2- Decide whether the "star" additional symbol is meaningful and robust and should be promoted.
The purpose of this Issue is mostly to discuss the second question.

For 1, indeed, many strands are missing in miRBase and other database. I have personally updated my species of interest (some actinopterygians species) but haven't done it for any species for which I haven't sequencing data. There should be for sure an encouragement to annotate all strands when possible (makes me think of this paper about plants: do it right or not at all! http://onlinelibrary.wiley.com/doi/10.1002/bies.201600113/abstract). And maybe we should eventually think of a way to share up-to-date annotation files?

For 2, that's where I think there's some thinking to be made:

  • How can we be sure that the strand is really the only mature? Or can they be co-matures?
  • What (arbitrary) threshold to select and call a strand star or co-mature?
  • Is that always stable among all cells or can it be reversed in different situations/different organs/different cells (== is it robust to more sequencing or is it possible that the "star" annotation will change at some point when we'll have more sequencing data)? (Which was the reason why miRBase removed that annotation symbol in the first place)
  • A question that may sound stupid, but, do we really care to know that one mature strand is the star strand or not, or a co-mature?
  • Do we need to have this information in the name of a mature miRNA to be able to study it and refer to it without ambiguity?
  • Could annotating a mature miRNA as "passenger" bias our approach and lead us not to pay as much attention to its expression data as we should?

My vision in short: I actually personally don't really care about the "star" symbol and don't pay attention to it. What I look for in my data (with all 5p and 3p strands annotated) is what is deferentially expressed between my samples/conditions, no matter the strand they come from or whether they are the dominant strands. Maybe the non-dominant strand is actually the one that in my physiological situation is making the difference, who knows? Then I look at the absolute abundance, abundance of one strand compared to the complementary one, their isomiRs, their predicted targets, etc. Knowing that a given mature sequence is 'considered' the star sequence, or "co-mature", or not, won't change the way I analyze my sequencing data. I think it tries to bring non-robust functional information into a naming system that needs to be robust to anything.

@gurgese
Copy link
Collaborator

gurgese commented May 11, 2017

I completely agree with the Thomas vision, it is better to have a robust naming system based on concrete and unchangeable miRNA characteristics. Other information, like folding rates of expression between the two strands or if a miR is canonical or not (using the Fromm notation) can be included and maintained in other complementary files or DBs (like MirGeneDB, miRGator, and others). If available, this info can be valuable during the analysis, provided that contextualised with the tissue, age, and the conditions of the sample type.
In miRBase, the introduction of a unique identifier such as MIMAT is a reliable way to point to a determined sequence. However, this index system does not maintain any biological info that can suggest to the user the miRNA family and the known gene targets.
In my opinion, the miRBase annotation system can be pruned from recursive info like the species and the miR name, while the MIMAT identifier can be kept. A supplementary string can be included that give info about seed, miR family, the number of family components for species, the number of genes producing this miR sequence, and other stable miRNA characteristics.
In the mature.fa file the hsa-let-7a-5p entry can be changed from the following

hsa-let-7a-5p MIMAT0000062 Homo sapiens let-7a-5p
UGAGGUAGUAGGUUGUAUAGUU

to the next

hsa-let-7a-5p MIMAT0000062 GAGGUA_LET-7_12_3_...
UGAGGUAGUAGGUUGUAUAGUU

Some of the available info can be used to access other databases like MirGeneDB by query composition (check the other component of LET-7 family -> http://mirgenedb.org/browse?org=hsa&query=LET-7 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants