You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Someone please explain to me how to get the annotation from GRCh38 2020A and convert to a GRanges object
GRCh38_2020-A<-ensDbFromGtf(gtf = "http://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_32/gencode.v32.primary_assembly.annotation.gtf.gz",
path = 'C:/Users/danie/Desktop/Seurat Objects/snATAC seq preliminary analysis/ref.genome/',
organism = "Homo_sapiens",
genomeVersion = 'GRCh38',
version = 98)
Importing GTF file ... trying URL 'http://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_32/gencode.v32.primary_assembly.annotation.gtf.gz'
Content type 'application/octet-stream' length 43107903 bytes (41.1 MB)
downloaded 41.1 MB
OK
Processing metadata ... OK
Processing genes ...
Attribute availability:
o gene_id ... OK
o gene_name ... OK
o entrezid ... Nope
o gene_biotype ... Nope
OK
Processing transcripts ...
Attribute availability:
o transcript_id ... OK
o gene_id ... OK
o source ... OK
OK
Processing exons ... OK
Processing chromosomes ... Fetch seqlengths from ensembl ... OK
Generating index ... Error: UNIQUE constraint failed: exon.exon_id
In addition: Warning messages:
1: In readLines(gtf, n = 10) : line 1 appears to contain an embedded nul
2: In readLines(gtf, n = 10) : line 2 appears to contain an embedded nul
3: In readLines(gtf, n = 10) : line 3 appears to contain an embedded nul
4: In readLines(gtf, n = 10) : line 6 appears to contain an embedded nul
5: In ensDbFromGRanges(GTF, outfile = outfile, path = path, organism = organism, :
I'm missing column(s): 'entrezid','gene_biotype'. The corresponding database column(s) will be empty!
6: In .getSeqlengthsFromMysqlFolder(organism = organism, ensembl = ensemblVersion, :
Could not determine length for all seqnames.
Why?
The text was updated successfully, but these errors were encountered:
According to the error message it seems that the exon identifiers in the GTF file are not unique - not much we can do about. Generally, creating EnsDb objects/databases from GTF is tricky as the GTF file format is not too standardized. Creating databases from GTF files from Ensembl should work - for the ones from Gencode I don't know.
Note that there are pre-build annotation resources for all Ensembl releases:
> library(AnnotationHub)
>ah<- AnnotationHub()
snapshotDate():2020-11-02> query(ah, "EnsDb.Hsapiens.v98")
AnnotationHubwith1record# snapshotDate(): 2020-11-02# names(): AH75011# $dataprovider: Ensembl# $species: Homo sapiens# $rdataclass: EnsDb# $rdatadateadded: 2019-05-02# $title: Ensembl 98 EnsDb for Homo sapiens# $description: Gene and protein annotations for Homo sapiens based on Ensem...# $taxonomyid: 9606# $genome: GRCh38# $sourcetype: ensembl# $sourceurl: http://www.ensembl.org# $sourcesize: NA# $tags: c("98", "AHEnsDbs", "Annotation", "EnsDb", "Ensembl", "Gene",# "Protein", "Transcript") # retrieve record with 'object[["AH75011"]]'
Since the Gencode 32 is based on Ensembl 98 - would this work for you?
Someone please explain to me how to get the annotation from GRCh38 2020A and convert to a GRanges object
Why?
The text was updated successfully, but these errors were encountered: