Upf1-ribosomes

Analysis scripts and processed data for Ganesan et al.
DOI: 10.1261/rna.079416.122

Raw sequencing data generated in this study have been deposited and are available at NCBI's Gene Expression Omnibus (GEO) under accession number GSE186795.
The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PXD029577 and 10.6019/PXD029577.
Numerical data underlying the plots and the R scripts used to generate them are in the folder Figures or within the analysis scripts (below)

Processed data (exported from Scaffold) include normalized iBAQs (abundance quantification), Protein Identification Probability, and Total Spectrum Count for each sample, available in Processed data/Mass spec
Criteria for valid identification of protein in a sample:
- Protein Identification Probability >= 99%
- Total Spectrum Count >= 2
- normalized iBAQ > 0 for at least 2 biological replicates (if applicable)
Analysis scripts/Mass spec analysis/Mass_spec_analysis_limma.Rmd consolidates data, identifies differentially recovered proteins, and plots Figure 1 and Figure S5.

Transcriptome used for sequence alignment is available at https://github.com/Jacobson-Lab/yeast_transcriptome_v5
Transcript abundance results are combined and available in the folder Processed data/RSEM output
- "expected_count" column from isoforms.results files
- "FPKM" column from isoforms.results files
- "TPM" column from isoforms.results files

Analysis scripts are in Analysis scripts/Sequencing data analyses
Differential expression (RNA-Seq) between yeast strains
- RNAseq_cormatrix_PCA.Rmd shows reprodicibility between replicates and produces Figure S12.
- RNAseq_analysis_DESeq2.Rmd performs differential expression analysis and produces Figure S4.
IP vs Total ribosomes (Ribo-Seq)
- Riboseq_cormatrix_PCA.Rmd shows reprodicibility between replicates and produces Figure S11.
- Riboseq_analysis_DESeq2.Rmd performs differential expression analysis, comparative analysis, and produces Figure 5.
Ribosome occupancy (Total Ribo-Seq / RNA-Seq)
- RNA-vs-RiboTotal.Rmd produces ribosome occupancy data used for Figure 5B.

Initial processing of bam files was done using riboWaltz package (https://github.com/LabTranslationalArchitectomics/riboWaltz). Either reads_list or reads_psite_list data tables were used in further analysis steps.
Analysis scripts are in Analysis scripts/Metagene analyses
Footprint length distribution
- rl_dist.R
- Figure 2
Mapping of the 5’ and 3’ ends of footprints
- Calculated by riboWaltz's rends_heat function.
- Figure 3
Distribution of footprint abundance across the coding region
- Metagene:
  - binning.R
  - Figure 4
- Individual gene:
  - binning2.R
  - Figure S8
Ribo-Seq diagnostic
- 3-nucleotide periodicity:
  - Calculated by riboWaltz's metaprofile_psite function.
  - Figure S7
- Footprint's P-sites
  - psite_region_frame_fraction.R
  - Fraction in mRNA regions: Figure S10 A-B, Figure 7A
  - Fraction of footprint's reading frames in an mRNA region: Figure S10 C-D, Figure 7B
Unless otherwise indicated/shown, the output table of replicate libraries were averaged.
The results and scripts for plotting them are available in Figures/data and Figures/scripts, respectively.

Codon optimality value for each coding sequence is calculated based on https://github.com/mariodosreis/tai
Analysis scripts/Codon optimality/codon_optimality_tAI_CDS.R calculates codon optimality scores used for Figure 5D

Analysis scripts and required files for processing are in Analysis scripts/A-site codon occupancy/
Calculation of mean relative occupancy:
1. write_rpl_by_size.R prepares riboWaltz's reads_psite_list data tables for the analysis and export them as txt files.
2. codon_window_count_by_codonpos_from_rpl.py counts number of footprints whose A-site (or P- or E-site, as specified) are within a specified window (in codons) of a specified codon of interest.
3. calc_REV_window_codonpos_v2.R calculates mean relative occupancy.
- automate_codon_window_count_by_codonpos_from_rpl.sh automates steps 2 and 3 for multiple codons in a given codon list (such as codons_list.txt, which contains all 64 codons)
- automate_automate_codon_window_count_by_codonpos_from_rpl.sh submits the above script as multiple jobs to process multiple samples in parallel.
- Results of all samples for A-site occupancy in 30 codon window are combined and provided as A-site_codon_occupancy_window30_bycodon_allsamples.txt.
codon_occupancy_analysis.Rmd further analyzes the results, plots Figure 6 and Figure S9.

Provide feedback