From 752f7795e740bd43e42dc9561e38e5e08a285afd Mon Sep 17 00:00:00 2001 From: Sam Crowl <51138150+srcrowl@users.noreply.github.com> Date: Tue, 11 Jun 2024 10:35:51 -0400 Subject: [PATCH] Update README.md --- README.md | 61 ++++++++++++++++++++++++++++++++++++++++--------------- 1 file changed, 45 insertions(+), 16 deletions(-) diff --git a/README.md b/README.md index ab4cf91..2d6bdb5 100644 --- a/README.md +++ b/README.md @@ -5,11 +5,16 @@ PTM-POSE is an easily implementable tool to project PTM sites onto splice event ## Running PTM-POSE To run PTM-POSE, you first need to process your data such that each row corresponds to a unique splice event with the genomic location of that splice event (chromosome, strand, and the bounds of the spliced region). Strand can be indicated using either '+'/'-' or 1/-1. If desired, you can also provide a delta PSI and significance value which will be included in the final PTM dataframe. Any additional columns will be kept. At a minimum, the dataframe should look something like this (optional but recommended parameters indicated): -| event_id (optional) | chromosome | strand | region_start | region_end | dPSI (optional) | significance (optional) | -|---------------------|------------|--------|--------------|------------|-----------------|-------------------------| -| first_event | 1 | - | 9797555 | 9797612 | 0.362 | 0.032 | +| event_id (optional) | Gene name (recommend) | chromosome | strand | region_start | region_end | dPSI (optional) | significance (optional) | +|---------------------|-----------------------|------------|--------|--------------|------------|-----------------|-------------------------| +| first_event | CSTN1 |1 | - | 9797555 | 9797612 | 0.362 | 0.032 | -You currently need to also download the `ptm_coordinates` dataframe generated by [ExonPTMapper](https://github.com/NaegleLab/ExonPTMapper), which can be downloaded ---------. Once the data is in the correct format, simply run the project_ptms_onto_splice_events() function. By default, PTM-POSE assumes the provided coordinates are in hg38 coordinates, but you can use older coordinate systems with the `coordinate_type` parameter. +You currently need to also download the `ptm_coordinates` dataframe generated by [ExonPTMapper](https://github.com/NaegleLab/ExonPTMapper), which can be downloaded be downloaded using `pose_config.download_ptm_coordinates()`. If you would like to avoid needing to download this in the future (usually takes a little under a minute), you can set `pose_config.download_ptm_coordinates(save = True)` which will save the file within the local package, taking about 60MB of space. + +PTM-POSE allows you to assess two potential impacts of splicing on PTMs: __differential inclusion__ (lost or gained as a result of a splice event) or __altered flanking sequences__ around a PTM, which can potentially alter protein interactions. + +## Differentially Included PTMs +Once the data is in the correct format, simply run the `project_ptms_onto_splice_events()` function. By default, PTM-POSE assumes the provided coordinates are in hg38 coordinates, but you can use older coordinate systems with the `coordinate_type` parameter. If you have saved ptm_coordinates locally, you can set this parameter to None. ```python from ptm-pose import project @@ -19,20 +24,44 @@ my_splice_data_annotated, spliced_ptms = project.project_ptms_onto_splice_events region_start_col = 'region_start', region_end_col = 'region_end', event_id_col = 'event_id', + gene_col = 'Gene name', dPSI_col='dPSI', coordinate_type = 'hg19') ``` +## Altered Flanking Sequences + +In addition to the previously mentioned columns, we will need to know the location of the flanking exonic regions next to the spliced region. Make sure your dataframe contains the following information prior to running flanking sequence analysis: +| event_id (optional) | Gene name (recommended) | chromosome | strand | region_start | region_end | first_flank_start | first_flank_end | second_flank_start | second_flank_end |dPSI (optional) | significance (optional) | +|---------------------|-------------------------|------------|--------|--------------|------------|-------------------|-----------------|--------------------|------------------|----------------|-------------------------| +| first_event | CSTN1 |1 | - | 9797555 | 9797612 | 9687655 | 9688446 | 9811223 | 9811745 |0.362 | 0.032 | + + +Then, as with differentially included PTMs, you only need to run `get_flanking_changes_from_splice_data()` function: + +```python +from ptm-pose import project + +altered_flanks = project.get_flanking_changes_from_splice_data(my_splice_data, ptm_coordinates, + chromosome_col = 'chromosome', + strand_col = 'strand', + region_start_col = 'region_start', + region_end_col = 'region_end', + first_flank_start_col = 'first_flank_start', + first_flank_end_col = 'first_flank_end', + second_flank_start_col = 'second_flank_start', + second_flank_end_col = 'second_flank_start', + event_id_col = 'event_id', + gene_col = 'Gene name', + dPSI_col='dPSI', + coordinate_type = 'hg19') +``` + +## Downstream Analysis + +PTM-POSE also provides functions in the `annotate` module for annotating the above outputs with functional information from various databases: PhosphoSitePlus, RegPhos, PTMcode, PTMInt, ELM, DEPOD. You can then identify PTMs with specific functions, interaction, etc. with the `analyze` module. See an example on a real dataset [here](Examples/ESRP1_knockdown). + + +## Have questions? -This will produce two dataframes: -1. Original splice data with additional columns indicating the number and which PTMs were found associated with that splice event. 'PTMs column denotes the UniProtKB accession, residue, site number, and modification type for PTM identified. - -| event_id (if provided) | chromosome | strand | region_start | region_end | dPSI (if provided) | significance (if provided) | PTMs | Number of PTMs Affected | -|---------------------|------------|--------|--------------|------------|-----------------|-------------------------|-------------------------------|-------------------------| -| first_event | 1 | - | 9797555 | 9797612 | 0.362 | 0.032 | O94985_N515 (N-Glycosylation) | 1 | - -2. New dataframe where each row is a unique event-PTM pair. This is useful for downstream analyses of the important PTM changes that are occuring in your dataset, and many functions provided for further annotation and analyses of these PTMs (see rest of documentation for examples) - -| event_id (if provided) | UniProtKB Accession | Residue | Modifications | PTM Info | dPSI (if provided) | significance (if provided) | -|----------|---------------------|---------|---------------|----------|--------------------|----------------------------| -| first_event | O94985 | N515 | N-Glycosylation | O94985_N515 (N-Glycosylation) | 0.362 | 0.032 | +Please reach out to Sam Crowl (sc8wf@virginia.edu) if you have questions or suggestions about new analysis functions that you would like to see implemented. We hope to continue to expand the analysis that can be easily performed with this package as time goes on, and welcome any feedback.