Prediction of Saccharomyces cerevisiae fitness in different environments and cross-environment prediction of fitness using transfer learning.
Data
- Genetic interactions (Costanzo): Data Dryad
- Whole genome RNA-seq data for 1,000 isolates: SRA
- Genetic marker data (.gvcf): here
- Phenotype data: 35 conditions (YPD standard is control media used to normalize fitness values), 4 replicates, fitness = colony size normalized
Literature
- Peter et al. 2018: https://doi.org/10.1038/s41586-018-0030-5 (genetic markers, phenotype, whole genome RNA-seq)
- Costanzo et al. 2016: https://doi.org/10.1126/science.aaf1420 (genetic interactions)
File/Directory | Description |
---|---|
Data | Datasets from the literature |
Costanzo_S1/ | Data File S1. Raw genetic interaction datasets: Pair-wise interaction format |
Costanzo_S2/ | Data File S2. Raw genetic interaction datasets: Matrix format |
Peter_2018/ | Yeast diploid isolates' bi-allelic SNP and fitness data for 35 growth environments |
S288C_reference_genome_R64-2-1_20150113/ | Reference yeast genome S288C files |
All_genes_and_pathways_in_S._cerevisiae_S288c.txt | Yeast (S288C) genes and which pathways they belong to |
All_pathways_S._cerevisiae_S288c.txt | Pathways and which yeast (S288C) genes are in them |
Scripts | Code for various statistical and machine learning algorithms |
06_classify_SNPs_switchgrass.py | Peipei Wang's original code for classifying Switchgrass SNPs |
06_classify_SNPs_yeast.ipynb | Jupyter notebook for development purposes |
06_classify_SNPs_yeast.py | Adapted from Peipei's code to classify Yeast SNPs |
External_software | See the following section |
Job_Submission_Scripts | Contains SLURM job submission scripts for each prediction model |
yeast_rrBLUP_results | Input and output files and figures for rrBLUP modelling |
yeast_RF_results | Output files and figures for RF modelling |
Software | Description |
---|---|
fastPHASE | Executable for imputation of missing genotypes from population data |
Genomic_prediction_in_Switchgrass/ | Peipei Wang's code for rrBLUP |
GWAS_NN | Code for "Gene-Gene Interaction Detection with Deep Learning" |
ML-Pipeline/ | Shiu Lab Machine Learning Pipeline (RF code) |
phase.2.1.1.linux | PHASE source code https://stephenslab.uchicago.edu/software.html |
tasseladmin-tassel-5-standalone-8b0f83692ccb | TASSEL5 for kinship and linkage disequilibrium analysis |
Google Docs with information about all scripts and their development:
The google drive path to the file is Segura Abá_ShiuLab/Projects/Yeast GI Network/
.