Skip to content

Extract 3'UTR, 5'UTR, CDS, Promoter, Genes, Introns, Exons from GTF files

License

Notifications You must be signed in to change notification settings

ach-in/gencode_regions

 
 

Repository files navigation

gencode_regions

Extract 3'UTR, 5'UTR, CDS, Promoter, Genes from GTF files.

Data

If you only care about the final output, they are hosted build and GTF version wise on riboraptor.

Using Python

Dependencies

Notebooks

The corresponding output gzipped beds are in the data directory.

Using R

Dependencies

Run

./create_regions_from_gencode.R <path_to_GFF/GTF> <path_to_output_dir>

Will create exons.bed, 3UTR.bed, 5UTR.bed, genes.bed, cds.bed in <output_dir>

Example

  • Download GFF/GTF(GRCh37, v25, comprehensive, CHR) from gencodegenes.org:
   wget ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_25/gencode.v25.annotation.gff3.gz \
   && gunzip gencode.v25.annotation.gff3.gz
  • Create regions:
./create_regions_from_gencode.R gencode.v25.annotation.gff3 /path/to/GRCh37/annotation

First exons, Last exons

We use GenePred format to make the process a bit simple.

  • Download gtfToGenePred

  • Convert gtf to GenePred:

    gtfToGenePred gencode.v25.annotation.gtf gencode.v25.annotation.genepred
    
  • Extract first exons:

    python genepred_to_bed.py --first_exon gencode.v25.annotation.genepred
    
  • Extract last exons:

    python genepred_to_bed.py --last_exon gencode.v25.annotation.genepred
    

Confused about exons and UTRs?

This should be helpful: img

Source: Wikipedia

or probably this:

img

Source: Biostar

About

Extract 3'UTR, 5'UTR, CDS, Promoter, Genes, Introns, Exons from GTF files

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 98.7%
  • Other 1.3%