Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GFF3::attributes #14

Open
lpantano opened this issue Jun 15, 2017 · 6 comments
Open

GFF3::attributes #14

lpantano opened this issue Jun 15, 2017 · 6 comments

Comments

@lpantano
Copy link
Contributor

cc: @lpantano @gurgese @ThomasDesvignes @mhalushka @mlhack @keilbeck @BastianFromm @ivlachos @TJU-CMC

I'd like to discuss the last columns since probably would need more time, and before everybody go in holidays, trips, conferences ... etc :) I'd like to have a chance to get your thoughts.

  • ID: unique ID based on sequence like mintmap has for tRNA: prefix-22-BZBZOS4Y1 (https://github.com/TJU-CMC-Org/MINTmap/tree/master/MINTplates). good way to use it as cross-mapper ID between different naming or future changes.
  • Name: miRNA name used in the database
  • Parent: hairpin precursor name
  • Alias or Dbxref: get names from other databases miRBase or miRgeneDB
  • Expression: raw counts separated by ,
  • Normalized_expression: normalization by the tool if any. Same format than before
  • Filter: PASS or REJECT (this allow to keep all the data and select the one you really want to consider as valid features)
  • Variant: string character similar to CIGAR to show the difference with the ref_miRNA
  • Target: to add other genomic positions where the sequences map as well?
  • Seed: just to have the 2-8 nt of the sequence

Any other attribute you normally use or would like to have?

@ivlachos
Copy link
Collaborator

I might have opted for the database version or the actual reference sequence, since these might change in time.
I would've avoided the "target" keyword for this since for we're talking about miRNAs and it might be a confusing term.
What do you think?

@lpantano
Copy link
Contributor Author

lpantano commented Jun 15, 2017 via email

@Bastami
Copy link
Collaborator

Bastami commented Jun 23, 2017

Hi!
I think Dbxref is important, but I noticed that miRBase or miRgeneDB are not listed in the authoritative list of databases (ftp://ftp.geneontology.org/pub/go/doc/GO.xrf_abbs) which contains DBTAGs and the URL transformation rules that can be used to fetch the objects given their IDs.
Do you think we should add them?

@lpantano
Copy link
Contributor Author

Thats a good point, @keilbeck do you know who can do this or the requirements for this? Thanks!

@gurgese
Copy link
Collaborator

gurgese commented Jun 28, 2017

@lpantano
In my opinion a supplementary attribute should be included to collect high level labels useful to classify how the read has been mapped on the mature sequence.
This field can be useful to filter particular classes of reads of interest.
As an example all reads mapped with an insertion on the 5p end of the mature form can be easily identified if a label is assigned to them.

If some other like this idea we can discuss deeper how it can be supported in the new format.

@ThomasDesvignes
Copy link
Member

All those comments are great!
One additional information that I think needs to be associated is the aligner used. For isomiRs that have no edited sites or untemplated additions, the aligner doesn't really matter, but as soon as we start looking into untemplated nucleotides and edited miRNAs, then different aligners can potentially provide different answers and return different CIGAR strings.

lpantano added a commit that referenced this issue Aug 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants