Analysis scripts and processed data for Ganesan et al.
DOI: 10.1261/rna.079416.122
- Raw sequencing data generated in this study have been deposited and are available at NCBI's Gene Expression Omnibus (GEO) under accession number GSE186795.
- The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PXD029577 and 10.6019/PXD029577.
- Numerical data underlying the plots and the R scripts used to generate them are in the folder Figures or within the analysis scripts (below)
- Processed data (exported from Scaffold) include normalized iBAQs (abundance quantification), Protein Identification Probability, and Total Spectrum Count for each sample, available in Processed data/Mass spec
- Criteria for valid identification of protein in a sample:
- Protein Identification Probability >= 99%
- Total Spectrum Count >= 2
- normalized iBAQ > 0 for at least 2 biological replicates (if applicable)
- Analysis scripts/Mass spec analysis/Mass_spec_analysis_limma.Rmd consolidates data, identifies differentially recovered proteins, and plots Figure 1 and Figure S5.
- Transcriptome used for sequence alignment is available at https://github.com/Jacobson-Lab/yeast_transcriptome_v5
- Transcript abundance results are combined and available in the folder Processed data/RSEM output
- "expected_count" column from isoforms.results files
- "FPKM" column from isoforms.results files
- "TPM" column from isoforms.results files
- Analysis scripts are in Analysis scripts/Sequencing data analyses
- Differential expression (RNA-Seq) between yeast strains
- RNAseq_cormatrix_PCA.Rmd shows reprodicibility between replicates and produces Figure S12.
- RNAseq_analysis_DESeq2.Rmd performs differential expression analysis and produces Figure S4.
- IP vs Total ribosomes (Ribo-Seq)
- Riboseq_cormatrix_PCA.Rmd shows reprodicibility between replicates and produces Figure S11.
- Riboseq_analysis_DESeq2.Rmd performs differential expression analysis, comparative analysis, and produces Figure 5.
- Ribosome occupancy (Total Ribo-Seq / RNA-Seq)
- RNA-vs-RiboTotal.Rmd produces ribosome occupancy data used for Figure 5B.
- Initial processing of bam files was done using riboWaltz package (https://github.com/LabTranslationalArchitectomics/riboWaltz). Either reads_list or reads_psite_list data tables were used in further analysis steps.
- Analysis scripts are in Analysis scripts/Metagene analyses
- Footprint length distribution
- rl_dist.R
- Figure 2
- Mapping of the 5’ and 3’ ends of footprints
- Calculated by riboWaltz's
rends_heat
function. - Figure 3
- Calculated by riboWaltz's
- Distribution of footprint abundance across the coding region
- Metagene:
- binning.R
- Figure 4
- Individual gene:
- binning2.R
- Figure S8
- Metagene:
- Ribo-Seq diagnostic
- 3-nucleotide periodicity:
- Calculated by riboWaltz's
metaprofile_psite
function. - Figure S7
- Calculated by riboWaltz's
- Footprint's P-sites
- psite_region_frame_fraction.R
- Fraction in mRNA regions: Figure S10 A-B, Figure 7A
- Fraction of footprint's reading frames in an mRNA region: Figure S10 C-D, Figure 7B
- 3-nucleotide periodicity:
- Unless otherwise indicated/shown, the output table of replicate libraries were averaged.
- The results and scripts for plotting them are available in Figures/data and Figures/scripts, respectively.
- Codon optimality value for each coding sequence is calculated based on https://github.com/mariodosreis/tai
- Analysis scripts/Codon optimality/codon_optimality_tAI_CDS.R calculates codon optimality scores used for Figure 5D
- Analysis scripts and required files for processing are in Analysis scripts/A-site codon occupancy/
- Calculation of mean relative occupancy:
- write_rpl_by_size.R prepares riboWaltz's reads_psite_list data tables for the analysis and export them as txt files.
- codon_window_count_by_codonpos_from_rpl.py counts number of footprints whose A-site (or P- or E-site, as specified) are within a specified window (in codons) of a specified codon of interest.
- calc_REV_window_codonpos_v2.R calculates mean relative occupancy.
- automate_codon_window_count_by_codonpos_from_rpl.sh automates steps 2 and 3 for multiple codons in a given codon list (such as codons_list.txt, which contains all 64 codons)
- automate_automate_codon_window_count_by_codonpos_from_rpl.sh submits the above script as multiple jobs to process multiple samples in parallel.
- Results of all samples for A-site occupancy in 30 codon window are combined and provided as A-site_codon_occupancy_window30_bycodon_allsamples.txt.
- codon_occupancy_analysis.Rmd further analyzes the results, plots Figure 6 and Figure S9.