Skip to content

Latest commit

 

History

History
54 lines (45 loc) · 5.04 KB

README.md

File metadata and controls

54 lines (45 loc) · 5.04 KB

Yeast-Project

Prediction of Saccharomyces cerevisiae fitness in different environments and cross-environment prediction of fitness using transfer learning.

Project Resources

Data

  • Genetic interactions (Costanzo): Data Dryad
  • Whole genome RNA-seq data for 1,000 isolates: SRA
  • Genetic marker data (.gvcf): here
  • Phenotype data: 35 conditions (YPD standard is control media used to normalize fitness values), 4 replicates, fitness = colony size normalized

Literature

Description of files and directories

File/Directory Description
Data Datasets from the literature
Costanzo_S1/ Data File S1. Raw genetic interaction datasets: Pair-wise interaction format
Costanzo_S2/ Data File S2. Raw genetic interaction datasets: Matrix format
Peter_2018/ Yeast diploid isolates' bi-allelic SNP and fitness data for 35 growth environments
S288C_reference_genome_R64-2-1_20150113/ Reference yeast genome S288C files
All_genes_and_pathways_in_S._cerevisiae_S288c.txt Yeast (S288C) genes and which pathways they belong to
All_pathways_S._cerevisiae_S288c.txt Pathways and which yeast (S288C) genes are in them
Scripts Code for various statistical and machine learning algorithms
06_classify_SNPs_switchgrass.py Peipei Wang's original code for classifying Switchgrass SNPs
06_classify_SNPs_yeast.ipynb Jupyter notebook for development purposes
06_classify_SNPs_yeast.py Adapted from Peipei's code to classify Yeast SNPs
External_software See the following section
Job_Submission_Scripts Contains SLURM job submission scripts for each prediction model
yeast_rrBLUP_results Input and output files and figures for rrBLUP modelling
yeast_RF_results Output files and figures for RF modelling

Description of external software that I'm using or exploring

Software Description
fastPHASE Executable for imputation of missing genotypes from population data
Genomic_prediction_in_Switchgrass/ Peipei Wang's code for rrBLUP
GWAS_NN Code for "Gene-Gene Interaction Detection with Deep Learning"
ML-Pipeline/ Shiu Lab Machine Learning Pipeline (RF code)
phase.2.1.1.linux PHASE source code https://stephenslab.uchicago.edu/software.html
tasseladmin-tassel-5-standalone-8b0f83692ccb TASSEL5 for kinship and linkage disequilibrium analysis

Google Docs with information about all scripts and their development: The google drive path to the file is Segura Abá_ShiuLab/Projects/Yeast GI Network/.

Git Tutorial