Name		Name	Last commit message	Last commit date
parent directory ..
inputs		inputs
outputs		outputs
README.md		README.md
calculate-distances.py		calculate-distances.py
compare-models.py		compare-models.py
config.default.json		config.default.json
evaluate-models.py		evaluate-models.py
features.py		features.py
metrics.py		metrics.py
models.py		models.py
plots.py		plots.py

README.md

Matchmaking

Matchmaking identifies genomic similarity between provided molecular profiles within a cohort and evaluates genomic similarity based on shared labels. This repository contains an implementation of matchmaking as demonstrated in the present study, which perform a hold-one-out approach to compare cancer cell lines based on genomic similarity with the goal of finding nearest neighbors that share therapeutic sensitivity.

To adapt your own data for matchmaking, follow instructions under inputs/ to format and annotate molecular data, sample information, and labels for your cohort. Datasources as used in the present study are found in the inputs/datasources/ folder.

Documentation detailing outputs can be found in the outputs/ folder. All outputs committed to GitHub were generated using a subset of cell lines used in the present study (n=50) to ensure that output files are less than the GitHub file size limit.

Calculate genomic distances from models

The script calculate-distances.py is used to calculate the genomic similarity of a provided cohort using all models that were evaluated for the present study. Annotated data is further processed for individual models, as implemented in models.py and called in the main function of calculate-distances.py. A description for each model is provided in the models.py file and additional information can be found in the protocol.

To run this script, edit or copy and edit the handle fields of all keys within config.default.json to suit your data.

Usage

Required arguments:

    --config, -c    <string>  File handle to annotated somatic variants

Example:

python calculate-distances.py --config config.default.json

Evaluate genomic similarity models

The script evaluate-models.py will compare the results of genomic distance and/or similarity models applied to a provided cohort, generated from calculate-distances.py. Genomic comparisons between two samples are evaluated as relevant or not relevant based on if they share a provided label. For more details, see the slides on evaluation metrics or the protocol.

To run this script,

Usage

Required arguments:

    --labels, -l    <string>  A tab delimited file containing pairwise comparison of sample labels of interest
    --samples, -s   <string>  A tab delimited file containing which samples to use, listed in a column with a column name sample_name

Optional arguments:

    --distance, -d            <string>  A tab delimited file containing a column for the case sample name, comparison sample name, and at least one column containing distance values (0 = more similar) for sample comparisons
    --affinity, -a            <string>  A tab delimited file containing a column for the case sample name, comparison sample name, and at least one column containing similarity values (1 = more similiar) for sample comparisons
    --features, -f            <string>  A tab delimited file containing pairwise comparison of features
    --output_directory, -o    <string>  Name of output directory to create, if it does not already exist, and place output files

Example:

python evaluate_models.py \
  -l inputs/pairwise-comparisons/samples.pairwise-labels.txt \
  -s inputs/formatted/samples.summary.txt \
  -d outputs/distances/jaccard-almanac-genes.stacked.txt \
  -d outputs/distances/jaccard-cgc-genes.stacked.txt \
  -f inputs/pairwise-comparisons/samples.pairwise-features.txt \
  -o outputs

Here, we compare two distance models generated: jaccard-almanac-genes and jaccard-cgc-genes. Currently, these two models are defined in models.py within the AlmanacGenes and CGC classes. Rather than pass two distance files, we could horizontally concatenate the two files together and pass a single file with this argument. Models passed with the --affinity argument will have their similarity values changed to distances by subtracting the observed value from 1.

Compare models

The script compare-models.py is used to perform two tasks from the .pkl produced by evaluate-models.py,

Create a summary table of model performance, outputs/models.summary.txt
Perform pairwise comparison of models to see if models significantly differ from one another, outputs/models.pairwise-comparison.txt

The second task of this script can take up to tens of minutes to run.

Usage

Required arguments:

   --input, -i              <string> File handle to .pkl output from `evaluate-models.py`
   --output_directory, -o   <string> Output path to write produced files to

Example:

python compare-models.py --input outputs/models.evaluated.pkl --output_directory outputs/

Modifying code in this repository

Models can be revised or added by editing the models.py file and evaluation metrics can be revised by modifying the metrics.py file. Figures generated can be modified by revising the plots.py file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

matchmaking

matchmaking

README.md

Matchmaking

Calculate genomic distances from models

Usage

Evaluate genomic similarity models

Usage

Compare models

Usage

Modifying code in this repository

References

Files

matchmaking

Directory actions

More options

Directory actions

More options

Latest commit

History

matchmaking

Folders and files

parent directory

README.md

Matchmaking

Calculate genomic distances from models

Usage

Evaluate genomic similarity models

Usage

Compare models

Usage

Modifying code in this repository

References