Marker files conversion #164

shawntanzk · 2021-08-16T20:07:08Z

Have gotten the NS-Forest marker file from Brian, however, they are not in the right format.

https://github.com/obophenotype/brain_data_standards_ontologies/tree/dosdp_based_pipeline/src/markers/raw

We need to convert the gene names into ensembl ID - @hkir-dev could you help write something that can do this?

Currently it is all in cluster name too and need to be matched to taxonomy ID - happy to do this manually if it is not easily scriptable, but I think they the cluster names match perfectly pref label in the csv files, so should be able to write a R code that does this too

hkir-dev · 2021-08-17T07:59:32Z

Some markers (such as LOC100392984, FAM19A1) don't exist in our marker file (src/patterns/data/bds/ensmusg_data.tsv). Can we use one of the rest services from https://rest.ensembl.org/ to automatically search these markers?

shawntanzk · 2021-08-17T08:04:35Z

yeah I think thats what Brian was asking about release version and all, see issue #163
but yeah, as long as we get an ensembl thing that can resolve, should be ok

shawntanzk · 2021-08-17T10:17:56Z

@BAevermann - think we briefly talked about this yesterday, was wondering what version of ensembl should we look at that would match your dataset?
I've tried a few versions and lots still come up with heaps of NA.

Thanks

shawntanzk · 2021-08-17T11:05:08Z

So I just learnt that a gene symbol can have multiple ensembl IDs (tried using EnsDb.Hsapiens.v86 & biomaRt to convert and kept getting extra lines) so I'm not sure how to handle the conversion from symbols to ensembl accurately.

We currently are basing it off our marker file (src/patterns/data/bds/ensmusg_data.tsv) but we aren't sure where this came from. Also, I'm not sure how the mouse marker file was generated? Was it you guys @jeremymiller? If so would it be possible to help us with the humans and marmoset too?

@hkir-dev already got accession Id from cluster name so its just converting gene symbols to ensembl ID now.

shawntanzk · 2021-08-17T13:36:25Z

have downloaded the dataset from biomart, hope it is correct:
1fb5bc0
Currently i dont think some of the LOC terms are in ensembl, which is a bit of an issue

shawntanzk · 2021-08-17T14:06:22Z

Items missing in the db:

For marmoset: {'LOC103788553', 'LOC108588895', 'GABRG1', 'CALN1', 'CCDC129', 'FAM19A1', 'LOC100401328', 'PCDH11X', 'LOC103789268', 'LOC103793569', 'LOC100392984', 'LOC100384959', 'SEPP1', 'LOC103791740', 'LOC103789461', 'LOC108588539', 'LOC108587679', 'LOC103788313', 'FAM179A', 'LOC108593203', 'MOBP', 'LOC103787232', 'FYB', 'LOC108588071', 'LOC103793609', 'LOC100403193', 'LOC108589948', 'LOC103795407', 'LOC100405319', 'LOC103795617', 'LOC108588801', 'LOC100406856', 'LOC100408486', 'MYO5B', 'LOC108588466', 'LOC103788721'}

For human: {'LOC105376081', 'LOC105369818', 'LOC105369890', 'LOC105378334', 'LOC100506497', 'LOC284825', 'LOC105376372', 'LOC105373642', 'LOC100128108', 'LOC101928964', 'LOC105374392', 'LOC101927281', 'ZFPM2_AS1', 'FAM19A1', 'LOC105370315', 'LOC100996671', 'LOC101927874', 'FER1L6_AS2', 'LOC101928114', 'LOC100132891', 'LOC105370019', 'LOC105373893', 'NPSR1_AS1', 'LOC105374971', 'FAM150B', 'LOC105374524', 'LOC105377183', 'LOC101929028', 'LOC105376917', 'LOC101927389', 'LOC105378486', 'LOC105373454', 'LOC105371310', 'LOC100128497', 'LOC105379168', 'LOC105370456', 'LOC101927459', 'LOC101927843', 'LOC105379064', 'LOC101927745', 'LOC101927668', 'LOC105377209', 'LOC101929680', 'LOC105370610', 'LOC105378031', 'LOC101928278', 'LOC105371832', 'SOX2_OT', 'LOC105376457', 'LOC101927286', 'RNF219_AS1', 'LOC101927078', 'LOC105379003', 'LOC105377703', 'LOC101927439', 'LOC105377862', 'LOC105376987', 'LOC105374973', 'LOC105373592', 'LOC101928842', 'LOC401134', 'LOC101926942', 'ADD3_AS1', 'LOC101927199', 'LOC105371663', 'LOC100507562'}

shawntanzk · 2021-08-17T14:30:11Z

Oligo L3-6 OPALIN LRP4-AS1 exists in the NS-forest marker file, but not anywhere else @BAevermann - need some clarification here, thanks

jeremymiller · 2021-08-17T16:05:06Z

Hi Shawn. I think Brian is the best person to answer all of the above questions. Biomart is an appropriate place to do gene conversions, but is not the ONLY place, so Brian may need to weigh in. For all of the Allen institute generated data sets (including human M1), you can download the reference transcriptome file here: https://portal.brain-map.org/atlases-and-data/rnaseq/reference-genome-files, which includes the NCBI Entrez Gene gene ids, but not ensembl.

BAevermann · 2021-08-17T17:10:40Z

Hey Jeremy!

So 2 of 3 links on the GTF page appear dead. The mouse smartseq worked though. The file contains ENSEMBL and Havanna (so gencode I presume?) annotation; unfortunately, it doesnt seem to track the overall version of the annotation. Any chance that its still referred to somewhere in the data processing scripts?

b.

jeremymiller · 2021-08-17T17:15:52Z

All three links work for me. Try hard-reloading your browser? The mouse SMART-seq probably does not match what was done in the M1 mouse paper. Unfortunately, I don't know the answer of your other questions. Zizhen might know.

BAevermann · 2021-08-17T17:25:16Z

Ok. Ill try.

Anyway, we will need to set up infrastructure... perhaps just a metadata field ... that captures the annotation version used. We are moving onto the Human/Marmoset datasets, would Zizhen also know about them?

jeremymiller · 2021-08-17T17:27:13Z

You should ask Nik Jorstad about human and marmoset

…

________________________________ From: Brian Aevermann ***@***.***> Sent: Tuesday, August 17, 2021 10:25:28 AM To: obophenotype/brain_data_standards_ontologies ***@***.***> Cc: Jeremy Miller ***@***.***>; Mention ***@***.***> Subject: Re: [obophenotype/brain_data_standards_ontologies] Marker files conversion (#164) Ok. Ill try. Anyway, we will need to set up infrastructure... perhaps just a metadata field ... that captures the annotation version used. We are moving onto the Human/Marmoset datasets, would Zizhen also know about them? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fobophenotype%2Fbrain_data_standards_ontologies%2Fissues%2F164%23issuecomment-900490869&data=04%7C01%7C%7C3625c3fdf42f4673306908d961a401f4%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637648179331893799%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=8rObuWnbeOPOGfnfKSkgz4LEGT51PTHWcYow6rzwHj0%3D&reserved=0>, or unsubscribe<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAGCOKV5OEIW6DC7D76U5OSLT5KLQRANCNFSM5CINE2DQ&data=04%7C01%7C%7C3625c3fdf42f4673306908d961a401f4%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637648179331903764%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=jmLJqkmeZYfu7KYiqDRM2nnIJ7ordKmPIwzzJxoyG8M%3D&reserved=0>. Triage notifications on the go with GitHub Mobile for iOS<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapps.apple.com%2Fapp%2Fapple-store%2Fid1477376905%3Fct%3Dnotification-email%26mt%3D8%26pt%3D524675&data=04%7C01%7C%7C3625c3fdf42f4673306908d961a401f4%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637648179331903764%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Zy7%2B2muGMSWFIq5GMvBmQys%2BQLDxmBlmNvqctUlTb%2FU%3D&reserved=0> or Android<https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fplay.google.com%2Fstore%2Fapps%2Fdetails%3Fid%3Dcom.github.android%26utm_campaign%3Dnotification-email&data=04%7C01%7C%7C3625c3fdf42f4673306908d961a401f4%7C32669cd6737f4b398bddd6951120d3fc%7C0%7C0%7C637648179331913717%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=4ot2WF1gmxCy0ZTK76UUsyS8mn0lAVDlscYOy7le8Io%3D&reserved=0>.

shawntanzk · 2021-08-19T07:28:30Z

noticed that antisense gene (eg FER1L6_AS2) uses a - instead of _ in the database
Will change it in the forest file gotten by Brian accordingly

dosumis · 2021-08-25T14:37:37Z

@jeremymiller - IIRC we got the actual Ensembl reference file used for the mouse analysis. Can we get the same for human and marmoset? Is Nik the right person to ask for these? This should be shared with @BAevermann too as ref gene names and IDs should come from the same genome build/Ensembl release.

jeremymiller · 2021-08-25T16:05:28Z

@dosumis: there is an email thread going around about this, which I'll add you to. I'm not sure the status at the moment, and I most of our references use NCBI gene IDs rather than Ensembl so the conversion might still be an issue. Let's resolve in the email thread.

shawntanzk · 2021-10-12T10:29:28Z

Think we have a plan for standardisation, am going to close this ticket cause the name is confusing - will look through tickets to see if there is an open one about using same reference gene files, if not will write one up with what stage we are up to now

shawntanzk assigned hkir-dev and shawntanzk Aug 16, 2021

shawntanzk assigned BAevermann Aug 17, 2021

shawntanzk assigned dosumis Aug 17, 2021

shawntanzk mentioned this issue Aug 18, 2021

Marker files for Marmoset and Human Taxonomies #162

Closed

2 tasks

shawntanzk closed this as completed Oct 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Marker files conversion #164

Marker files conversion #164

shawntanzk commented Aug 16, 2021

hkir-dev commented Aug 17, 2021

shawntanzk commented Aug 17, 2021

shawntanzk commented Aug 17, 2021

shawntanzk commented Aug 17, 2021 •

edited

Loading

shawntanzk commented Aug 17, 2021 •

edited

Loading

shawntanzk commented Aug 17, 2021

shawntanzk commented Aug 17, 2021

jeremymiller commented Aug 17, 2021

BAevermann commented Aug 17, 2021

jeremymiller commented Aug 17, 2021

BAevermann commented Aug 17, 2021

jeremymiller commented Aug 17, 2021 via email

shawntanzk commented Aug 19, 2021

dosumis commented Aug 25, 2021 •

edited

Loading

jeremymiller commented Aug 25, 2021

shawntanzk commented Oct 12, 2021

Marker files conversion #164

Marker files conversion #164

Comments

shawntanzk commented Aug 16, 2021

hkir-dev commented Aug 17, 2021

shawntanzk commented Aug 17, 2021

shawntanzk commented Aug 17, 2021

shawntanzk commented Aug 17, 2021 • edited Loading

shawntanzk commented Aug 17, 2021 • edited Loading

shawntanzk commented Aug 17, 2021

shawntanzk commented Aug 17, 2021

jeremymiller commented Aug 17, 2021

BAevermann commented Aug 17, 2021

jeremymiller commented Aug 17, 2021

BAevermann commented Aug 17, 2021

jeremymiller commented Aug 17, 2021 via email

shawntanzk commented Aug 19, 2021

dosumis commented Aug 25, 2021 • edited Loading

jeremymiller commented Aug 25, 2021

shawntanzk commented Oct 12, 2021

shawntanzk commented Aug 17, 2021 •

edited

Loading

shawntanzk commented Aug 17, 2021 •

edited

Loading

dosumis commented Aug 25, 2021 •

edited

Loading