Skip to content

Latest commit

 

History

History
77 lines (69 loc) · 4.95 KB

README.md

File metadata and controls

77 lines (69 loc) · 4.95 KB

Upf1-ribosomes

Analysis scripts and processed data for Ganesan et al.
DOI: 10.1261/rna.079416.122

Data availability

  • Raw sequencing data generated in this study have been deposited and are available at NCBI's Gene Expression Omnibus (GEO) under accession number GSE186795.
  • The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PXD029577 and 10.6019/PXD029577.
  • Numerical data underlying the plots and the R scripts used to generate them are in the folder Figures or within the analysis scripts (below)

Mass spectrometry

  • Processed data (exported from Scaffold) include normalized iBAQs (abundance quantification), Protein Identification Probability, and Total Spectrum Count for each sample, available in Processed data/Mass spec
  • Criteria for valid identification of protein in a sample:
    • Protein Identification Probability >= 99%
    • Total Spectrum Count >= 2
    • normalized iBAQ > 0 for at least 2 biological replicates (if applicable)
  • Analysis scripts/Mass spec analysis/Mass_spec_analysis_limma.Rmd consolidates data, identifies differentially recovered proteins, and plots Figure 1 and Figure S5.

RNA-Seq and Ribo-Seq sequence alignment and transcript abundance quantification

  • Transcriptome used for sequence alignment is available at https://github.com/Jacobson-Lab/yeast_transcriptome_v5
  • Transcript abundance results are combined and available in the folder Processed data/RSEM output
    • "expected_count" column from isoforms.results files
    • "FPKM" column from isoforms.results files
    • "TPM" column from isoforms.results files

Analyses of changes in transcript abundance using DESeq2

  • Analysis scripts are in Analysis scripts/Sequencing data analyses
  • Differential expression (RNA-Seq) between yeast strains
    • RNAseq_cormatrix_PCA.Rmd shows reprodicibility between replicates and produces Figure S12.
    • RNAseq_analysis_DESeq2.Rmd performs differential expression analysis and produces Figure S4.
  • IP vs Total ribosomes (Ribo-Seq)
    • Riboseq_cormatrix_PCA.Rmd shows reprodicibility between replicates and produces Figure S11.
    • Riboseq_analysis_DESeq2.Rmd performs differential expression analysis, comparative analysis, and produces Figure 5.
  • Ribosome occupancy (Total Ribo-Seq / RNA-Seq)
    • RNA-vs-RiboTotal.Rmd produces ribosome occupancy data used for Figure 5B.

Metagene analyses

  • Initial processing of bam files was done using riboWaltz package (https://github.com/LabTranslationalArchitectomics/riboWaltz). Either reads_list or reads_psite_list data tables were used in further analysis steps.
  • Analysis scripts are in Analysis scripts/Metagene analyses
  • Footprint length distribution
    • rl_dist.R
    • Figure 2
  • Mapping of the 5’ and 3’ ends of footprints
    • Calculated by riboWaltz's rends_heat function.
    • Figure 3
  • Distribution of footprint abundance across the coding region
    • Metagene:
      • binning.R
      • Figure 4
    • Individual gene:
      • binning2.R
      • Figure S8
  • Ribo-Seq diagnostic
    • 3-nucleotide periodicity:
      • Calculated by riboWaltz's metaprofile_psite function.
      • Figure S7
    • Footprint's P-sites
      • psite_region_frame_fraction.R
      • Fraction in mRNA regions: Figure S10 A-B, Figure 7A
      • Fraction of footprint's reading frames in an mRNA region: Figure S10 C-D, Figure 7B
  • Unless otherwise indicated/shown, the output table of replicate libraries were averaged.
  • The results and scripts for plotting them are available in Figures/data and Figures/scripts, respectively.

Codon optimality

  • Codon optimality value for each coding sequence is calculated based on https://github.com/mariodosreis/tai
  • Analysis scripts/Codon optimality/codon_optimality_tAI_CDS.R calculates codon optimality scores used for Figure 5D

A-site codon occupancy

  • Analysis scripts and required files for processing are in Analysis scripts/A-site codon occupancy/
  • Calculation of mean relative occupancy:
    1. write_rpl_by_size.R prepares riboWaltz's reads_psite_list data tables for the analysis and export them as txt files.
    2. codon_window_count_by_codonpos_from_rpl.py counts number of footprints whose A-site (or P- or E-site, as specified) are within a specified window (in codons) of a specified codon of interest.
    3. calc_REV_window_codonpos_v2.R calculates mean relative occupancy.
    • automate_codon_window_count_by_codonpos_from_rpl.sh automates steps 2 and 3 for multiple codons in a given codon list (such as codons_list.txt, which contains all 64 codons)
    • automate_automate_codon_window_count_by_codonpos_from_rpl.sh submits the above script as multiple jobs to process multiple samples in parallel.
    • Results of all samples for A-site occupancy in 30 codon window are combined and provided as A-site_codon_occupancy_window30_bycodon_allsamples.txt.
  • codon_occupancy_analysis.Rmd further analyzes the results, plots Figure 6 and Figure S9.