A brief video summary of this project can be found here on YouTube.
Contains relevant csv files
- GDSC2_fitted_dose_response_25Feb20.csv - contains the raw IC50 data for 135,242 drug / cell line combinations; data can be downloaded here
- cell_list.csv - contains GDSC-generated information about the included cell lines, including tissue and TCGA classification; data can be downloaded here
- drug_data.csv - contains GDSC-generated information about the included drugs, including drug pathways and targets; data can be found here
- cell.csv - results of the analysis from
GDSC_Project.ipynb
for the cell data, including contains two dimensions from PCA and t-SNE for plotting, the cluster identities from k-nearest neighbors using full, PCA-transformed, and low rank approximations of the data, and the mean lnIC50 per cell line - cell_lrm.csv - low-rank approximation of the cell matrix
data:image/s3,"s3://crabby-images/cc41a/cc41a84d79a6a5fe47ac5a8c5ea857fe03b6a7a5" alt="formula"
- drug.csv - results of the analysis from
GDSC_Project.ipynb
for the drug data, including contains two dimensions from PCA and t-SNE for plotting, the cluster identities from k-nearest neighbors using full, PCA-transformed, and low rank approximations of the data, and the mean lnIC50 per compound - drug_lrm.csv - low-rank approximation of the drug matrix
data:image/s3,"s3://crabby-images/932c5/932c56fa3a8d1f0ddf9a5375f21d6fb5e8a8841e" alt="formula"
Contains relevant scripts and notebooks
- GDSC_Project.R - contains
- GDSC_Project.ipynb - contains
- kmeans.py
find_kmeans
: find an optimal number clusters via the elbow method and fit k-means with this many clustersplot_kmeans
: plot the SSE vs. clusters and elbow point for cell and drug data
- lowrank.py
fit_svd
: fit SVD model iteratively for a given rank r
- pca.py
find_pc
: returns the eigenvalues and eigenvectors of the covariance matrixproject_pca
: transforms the original matrix via projection using a specified number of principal componentsplot_pca
: plot the variance by principal component and cumulativev variance by principal component
- utils.py
import_data
: loads GDSC data and pre-process into a wide matrixprocess_data
: produces mean-centered data and masks
data:image/s3,"s3://crabby-images/5b66e/5b66e6b8a27fa9b065b12e94f13a789121f1614e" alt="formula" data:image/s3,"s3://crabby-images/cc41a/cc41a84d79a6a5fe47ac5a8c5ea857fe03b6a7a5" alt="formula" data:image/s3,"s3://crabby-images/932c5/932c56fa3a8d1f0ddf9a5375f21d6fb5e8a8841e" alt="formula"