Skip to content
Yousuf A. Khan edited this page Jan 16, 2020 · 18 revisions

Evidence for a novel overlapping coding sequence in POLG initiated at a CUG start codon

Introduction

The code presented here was used to discover and characterize a novel overlapping ORF in the mRNA of POLG (ORF-Y) and a second, further upstream ORF (ORF-Z) that is likely regulatory in modulating translation between ORF-Y and the main ORF. The work is of interest for two main reasons

  1. Overlapping ORFs in the human genome of such length are extremely rare. ORF-Y in POLG may represent the longest bonafide overlapping ORF in the human genome.

  2. POLG is of significant clinical relevance. Many mutations have been mapped across its mRNA that link to mitochondrial disease. Given the extent of overlap with ORF-Y, it is possible that the mechanism of action of some of these mutations is a combined effect exerted on both the protein products of main ORF and ORF-Y.

Breakdown of Figures

The following will cogently run the readers through how the analysis of each of the figures was conducted with the accompanying code. I have included trimmed code that helps recreate each of the main figures and I have also included the original Python 3 code that was used to generate most figures in the manuscript.

Figure 1

Panel a - Schematic of POLG mRNA: Generated in Powerpoint/Adobe Illustrator using https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?g=ENSG00000140521;r=15:89316305-89334795;t=ENST00000268124 as a guide

Panel b - PhyloCSF analysis of POLG: PhyloCSF is well documented and more information can be found at https://github.com/mlin/PhyloCSF/wiki

Figure 2

The video below gives a brief walkthrough how to recreate the Synplots in Figure 2 (click on the image).The accompanying code is available on the GitHub Page

Figure 2

Figure 3

Panel a - Global ribosome profiling for POLG was mined from GWIPSviz. A link to an in-browswer view of a similar region can be found here (https://gwips.ucc.ie/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr15%3A89333747%2D89333820&hgsid=101372_aEB2npsxOKHmIw45L71UMoAE5pnZ)

Panel b - Global ribosome profiling for POLG ORF-Y stop codon was mined from TRIPSviz.k A link to an in-browswer view of a similar region can be found here (https://trips.ucc.ie/homo_sapiens/Gencode_v25/interactive_plot/?files=&ribo_studies=18,20,21,23,24,27,28,29,31,32,33,34,35,38,39,42,43,44,45,56,58,60,62,63,64,67,89,90,99,101,102,103,107,113,122,124,130,134,138,141,144,150,152,153,165,171,172,176,177,178,179,181,183,190,191,192,201,204,212,216,217,222,225,229,234,&tran=ENST00000268124&minread=25&maxread=150&user_dir=fiveprime&ambig=F&cov=F&lg=T&nuc=T&rs=0&crd=F&short=qw0)

Panel c - See the video below

Figure 3

Figure 4

See the video below

Figure 4

Figure 5

The following PRIDE datasets (PXD000561, PXD002967) were downloaded and converted to the mzML spectra using multiple search engines in a high confidence OpenMS workflow that has already been described in Wright et. al (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4895710/) and Weisser et. al (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5703597/).

Figure 6

Panel a: A schematic of ORF-Z in the POLG mRNA 9https://uswest.ensembl.org/Homo_sapiens/Transcript/Summary?g=ENSG00000140521;r=15:89316305-89334795;t=ENST00000268124)

Panel b: Ribosome profiling mined from GWIPS-viz. A similar view of the mRNA presented in the figure is shown here (https://gwips.ucc.ie/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr15%3A89333739%2D89333903&hgsid=101372_aEB2npsxOKHmIw45L71UMoAE5pnZ)

Panel c: Similar to figure 4a, a sequence motif was created for the ATG context of ORF-Z. Code for this figure is uploaded

Figure 7

This figure is simply a model. This was generated with a combination of Powerpoint and Adobe Illustrator.

Extended figures and tables

All the original code (POLG Representative Organisms Bioinformatic Analysis.ipynb), annotated when necessary, used to generate both the main figures and extended figures can be found in the files portion as well

If there are any additional questions or comments, please do not hesitate to email me at [email protected], [email protected], or [email protected]