Finding Haplotypic Signatures in Proteins

Code related to the "Finding Haplotypic Signatures in Proteins" publication: https://doi.org/10.1101/2022.11.21.517096

Requirements and Usage

Required software is Snakemake and Conda, remaining libraries are included in the provided Conda environment, created automatically by Snakemake.

Steps for reproducing results:

Download supplementary data from https://doi.org/10.6084/m9.figshare.21408117.v1
Clone this repository
Provide path to the downloaded files following instructions config.yaml
For generating mirrored plots, a rerun of MS2PIP is necessary. This step can be skipped to re-generate the remaining results.
Run the pipeline using snakemake -c<# cores> -p --use-conda

Additional files provided

Two additional files were derived from data made public by Spooner et al. [1]

protein_haplotypes_1.csv: List of protein haplotypes created using alleles with minor allele frequency (MAF) > 0.01
substitutions_list.tsv: List of substitutions found in the provided haplotypes, along with other pre-computed values

Supplementary data format

Description of supplementary data accessible at https://doi.org/10.6084/m9.figshare.21408117.v1:

SD1 and SD2: protein databases in the FASTA format:

>tag|accession|description
PROTEINSEQUENCE

SD3: CSV, list of all resulting PSMs with confidence metrics. The included columns are:

PSMId: artificial identifier of the peptide-spectrum match
SpectrumTitle: identifier of the matching spectrum, as given in the data set
Proteins: accession identifiers of the matching protein sequences
Position: positions of the peptide within the respective protein
Sequence: sequence of the peptide without post-translational modifications (PTMs)
Peptide: sequence of the peptide with possible PTMs
sample_ID: identifier of the sample as given in the data set
fragment_ID: identifier of the sample fragment as given in the data set
measured_rt: measured retention time (RT)
rt_Abs_error: absolute distance between measured and predicted RT
spectra_cos_similarity: cosine similarity between matching peaks in the observed and predicted spectrum
spectra_angular_similarity: angular similarity between matching peaks in the observed and predicted spectrum
score: PSM score obtained from Percolator
q-value: q-value obtained from Percolator
posterior_error_prob: posterior error probability obtained from Percolator

SD4: list of identified amino acid substitutions, tab-separated

ProteinVariant: amino acid substitution description
Peptide: list of peptide sequences where the substitution has been identified, separated by semicolon
PeptideType: type of the respected peptide
PSMId: IDs of PSMs where this substitution has been identified
Samples: IDs of respective samples

SD5: list of all variant PSMs and PepQuery validation, comma-separated

PSMId: artificial identifier of the peptide-spectrum match (MAtching with SD3)
PepQuery_class: classification by PepQuery: confident, better_ref_pep_with_mod, better_ref_pep or low_score
posterior_error_prob: PEP value assigned to the variant match by Percolator
xtandem_sequence: Matched variant sequence obtained from the X!Tandem search, including modifications (Unimod ID in the sequence)
PepQuery_sequence: Canonical peptide candidate given by PepQuery
PepQuery_mods: Modifications in the peptide candidate given by PepQuery
PepQueryPTM_sequence: Canonical peptide candidate including rare modifications given by PepQuery
PepQueryPTM_mods: List of modifications in the peptide candidate including rare modifications given by PepQuery
PepQuery_variant_score: PepQuery score for the variant match
PepQuery_ref_score: PepQuery score for the canonical match
PepQuery_PTM_score: PepQuery score for the matching peptide including rare modifications

References

[1] Spooner, W., McLaren, W., Slidel, T., Finch, D.K., Butler, R., Campbell, J., Eghobamien, L., Rider, D., Kiefer, C.M., Robinson, M.J., et al. (2018) Haplosaurus computes protein haplotypes for use in precision drug design. Nat. Commun., 9, 4128.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Finding Haplotypic Signatures in Proteins

Requirements and Usage

Additional files provided

Supplementary data format

References

Files

README.md

Latest commit

History

README.md

File metadata and controls

Finding Haplotypic Signatures in Proteins

Requirements and Usage

Additional files provided

Supplementary data format

References