Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GFF3::source | GFF3::type #13

Open
lpantano opened this issue Jun 13, 2017 · 22 comments
Open

GFF3::source | GFF3::type #13

lpantano opened this issue Jun 13, 2017 · 22 comments

Comments

@lpantano
Copy link
Contributor

Hi all again!

cc: @lpantano @gurgese @ThomasDesvignes @mhalushka @mlhack @keilbeck @BastianFromm @ivlachos @TJU-CMC

I propose to use the database used by the tool to put in the second column: source

I propose to use these labels for the type column (3rd):

  • hairpin : this could be the parent
  • annotated: this could be the annotated in the database, this is child from hairpin. I am trying to avoid canonical as we have discussed before. Maybe reference is another idea since it would be similar to the problem we have for SNPs, where reference it was just designated by the first genomes sequenced but doesn't mean is the most abundant. miRNA is another we can use I guess.
  • isomiR/variant: this could be the detected sequence, this is child from previous one
  • other types of miRNAs?

Contribute with more options or any thoughts you have about it! thanks!

@keilbeck
Copy link

Column 3 needs to be a term form the Sequence Ontology.
If the right terms are not there to describe your feature - we need to add it to the ontology
There are 25 miRNA terms in the SO currently
http://www.sequenceontology.org/browser/obob.cgi

@lpantano
Copy link
Contributor Author

lpantano commented Jun 13, 2017 via email

@ivlachos
Copy link
Collaborator

I really like the idea of "reference" instead of canonical, since it's very close to reality.
Kudos!

@ThomasDesvignes
Copy link
Member

ThomasDesvignes commented Jun 13, 2017

I am all for a "reference" miRNA instead of a "canonical" miRNA. In our TiG paper we were actually proposing the creation of a "RefSeq miRNA sequence" as an unchangeable standard, while the most expressed isomiR could change among sample/tissue/etc...

For column , the parent could then be "pre_miRNA" to match the Sequence Ontology. I think the SO database has most if not all covered as of now, except the isomiRs which may be considered as child of the RefSeq miRNA.

@ThomasDesvignes
Copy link
Member

For column two ("source") do you mean putting: "miRBase_v.XX" or "MirGeneDB_v.X" or "personal annotation"? All that's fine with me. By experience (on fish) I usually do my own annotation and make it public with the publication. And for example both what I've done on Zebrafish and Spotted gar has never been incorporated in any database. How would we deal with that? I am thinking of putting my annotation files on a gitHub/Zenodo page (because I'll continue annotating more species and I know people won't dig into the supplemental files of my publication to retrieve an annotation...), so maybe in column 2 we could have something like "Zenobo_doi..."? Basically something traceable...

@BastianFromm
Copy link
Collaborator

BastianFromm commented Jun 13, 2017 via email

@ThomasDesvignes
Copy link
Member

That's awesome Bastian! How many more fish out of the 20? (I'm a fish person ;) )
However, the problem I have with the MirGeneDB is that the criteria for being in the DB are way too strict in regard to the way I study miRNAs and there are many non-canonical miRNAs that I want to continue studying because they are functional and that are not in MirGeneDB (cf previous discussions on canonical miRNAs..), so I guess that at least for my studies I'll continue using my own annotation files which will remain larger than what is in MirGeneDB and I need a way to make them publicly available, so that's why I ask for an alternative "source" of annotation. But we're moving away from the original question here...

@lpantano
Copy link
Contributor Author

lpantano commented Jun 13, 2017

Thanks all for the discussion! and awesome we'll have zebrafish there.

Thomas, I think is ok, you can name it as you want, as far as it doesn't overlap with an official name.

I think we can ask for a line like this in the header of the file:

##source-ontology LINK TO DATABASE

or something like that to make sure is traceable.

PS:The idea to upload it to github it seems super good

@lpantano
Copy link
Contributor Author

Hi @keilbeck and all,
I looked at the SO. I think we need something like ref_miRNA and edit_miRNA or isomiR directly? Do you think is possible to add that to the database?

Let me know your thoughts.

@keilbeck
Copy link

Send me the definitions.

@ThomasDesvignes
Copy link
Member

Hi Karen, I'm not sure we've reached a consensus yet on the "ref_miRNA" and "isomiR" definitions (I think isomiR is better than edit_miRNA btw), but in our paper together we proposed these definitions, which people can maybe comment and embellish:

  • Ref_miRNA: A Ref_miRNA sequence is assigned at the creation of a new mature miRNA entry in a database. The Ref_miRNA sequence designation remains unchanged even if a different isomiR is later shown to be expressed at a higher level. A ref_miRNA can be produced by one or multiple pre-miRNA.
  • IsomiRs: IsomiRs are all the bona fide variants of a mature product. IsomiRs should be connected to the Ref_miRNA it is most likely to be the variant of. Some isomiRs can be variations of one or multiple Ref_miRNA.
    (Directly taken from Fig.1 in the Trends in Genetics miRNA Nomenclature paper)

@lpantano
Copy link
Contributor Author

lpantano commented Jun 22, 2017 via email

@keilbeck
Copy link

OK, just trying to get my head arounf this

Is a ref_miRNA a genomic feature or a transcript feature?
I think isomiR is a transcript feature right?

@ThomasDesvignes
Copy link
Member

From my point of view:

  • ref_miRNA is a transcript feature: it's the mature reference product of a miRNA gene expression. By analogy, it's like the RefSeq transcript of a protein coding gene
  • isomiR is also a transcript feature. With the same analogy, it's a splicing variant of a protein coding gene.

@keilbeck
Copy link

Brilliant.
We will add these

@nicoleruiz
Copy link

SO:0002166 ref_miRNA and SO:0002167 isomiR have been added as children of miRNA.

@mhalushka
Copy link
Collaborator

Just to probe this further, what if a more abundant isomiR with a change at the 5' end of a ref_miRNA is encountered? This would change the seed sequence and could change the genes to which the miRNA could bind. Would you still keep the original ref_miRNA? Would you consider updating it with a "version change" or similar method?

@ThomasDesvignes
Copy link
Member

That's a good thought! From my end I still consider it as an isomiR of the ref-miRNA and I usually call it a "seed-shifted isomiR". It will theoretically have a different function/targets due to having a different seed but it still is an alternative product of the same gene/pre-miRNA.
Then if the ref_miRNA has actually been annotated with the "wrong" seed, that would probably need to be fixed I guess..., so all rely on the quality of the sequencing and analysis of the first dataset that leads to the annotation...

@lpantano
Copy link
Contributor Author

lpantano commented Jun 22, 2017 via email

@ivlachos
Copy link
Collaborator

ivlachos commented Jun 22, 2017

I like the approach of the ref miRNA and isomiRs. It's a convention, it's clear and extensible.
Many isomiRs can have different functionalities despite having the same seed (e.g. different localization) but certainly targeting with a 5' shift could be drastically affected. I agree that it's an isomiR compared to the reference and it's our job to find out what changes and what remains the same, as we are doing for genes.
I also support to avoid "edited", since it brings ADAR to mind.

lpantano added a commit that referenced this issue Aug 9, 2017
@phillipeloher
Copy link
Collaborator

phillipeloher commented Jul 16, 2018

From my perspective, a Ref_miRNA is an abstraction of a series of surrounding isomiRs. The problems with a ref_miRNA include (a) different ref endpoints between databases (e.g. mirbase vs mircarta) for the same locus (b) as folks already mentioned, the isomiR seeds (e.g. isomiRs with different 5p starting points) won't necessarily match the reference (c) the most abundant isomiR (which many ref miRNA annotations were populated) can differ between tissue state and cell type.

In many cases, the isomiR sequence corresponding exactly the ref_miRNA sequence (a 0|5p, 0|3p isomiR) is expressed.

Instead of making an isomiR a child of a ref_miRNA, if we made the Ref_miRNA an abstract_property of an isomiR sequence, it would place (I think rightfully) less emphasis on a somewhat arbitrary Ref_miRNA and more emphasis on the transcriptional products.

@lpantano
Copy link
Contributor Author

Hi @phillipeloher,

thanks for the comment.

We don't consider isomiR to be a child of miRNA_Ref but a child of precursor.

It is true that the Variant attribute is relative to the miRNA_Ref, but I think this is the same problem than any other database where you get a reference somehow. I think having the universal ID can get the data mapped to any other database, and if we allow cross-mapping tool in the API: mirBase to mirGeneDB etc, then we solve this problem somehow, what do you think? I'll open an issue with this request.

It is true we can remove miRNA_ref from there and use the variants to be NA meaning that using that database there is no variants. I'll open a discussion for this specific issue. Thanks! great idea!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants