Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
srcrowl committed Jun 11, 2024
1 parent 6a97ae2 commit 752f779
Showing 1 changed file with 45 additions and 16 deletions.
61 changes: 45 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,16 @@ PTM-POSE is an easily implementable tool to project PTM sites onto splice event
## Running PTM-POSE

To run PTM-POSE, you first need to process your data such that each row corresponds to a unique splice event with the genomic location of that splice event (chromosome, strand, and the bounds of the spliced region). Strand can be indicated using either '+'/'-' or 1/-1. If desired, you can also provide a delta PSI and significance value which will be included in the final PTM dataframe. Any additional columns will be kept. At a minimum, the dataframe should look something like this (optional but recommended parameters indicated):
| event_id (optional) | chromosome | strand | region_start | region_end | dPSI (optional) | significance (optional) |
|---------------------|------------|--------|--------------|------------|-----------------|-------------------------|
| first_event | 1 | - | 9797555 | 9797612 | 0.362 | 0.032 |
| event_id (optional) | Gene name (recommend) | chromosome | strand | region_start | region_end | dPSI (optional) | significance (optional) |
|---------------------|-----------------------|------------|--------|--------------|------------|-----------------|-------------------------|
| first_event | CSTN1 |1 | - | 9797555 | 9797612 | 0.362 | 0.032 |

You currently need to also download the `ptm_coordinates` dataframe generated by [ExonPTMapper](https://github.com/NaegleLab/ExonPTMapper), which can be downloaded ---------. Once the data is in the correct format, simply run the project_ptms_onto_splice_events() function. By default, PTM-POSE assumes the provided coordinates are in hg38 coordinates, but you can use older coordinate systems with the `coordinate_type` parameter.
You currently need to also download the `ptm_coordinates` dataframe generated by [ExonPTMapper](https://github.com/NaegleLab/ExonPTMapper), which can be downloaded be downloaded using `pose_config.download_ptm_coordinates()`. If you would like to avoid needing to download this in the future (usually takes a little under a minute), you can set `pose_config.download_ptm_coordinates(save = True)` which will save the file within the local package, taking about 60MB of space.

PTM-POSE allows you to assess two potential impacts of splicing on PTMs: __differential inclusion__ (lost or gained as a result of a splice event) or __altered flanking sequences__ around a PTM, which can potentially alter protein interactions.

## Differentially Included PTMs
Once the data is in the correct format, simply run the `project_ptms_onto_splice_events()` function. By default, PTM-POSE assumes the provided coordinates are in hg38 coordinates, but you can use older coordinate systems with the `coordinate_type` parameter. If you have saved ptm_coordinates locally, you can set this parameter to None.
```python
from ptm-pose import project

Expand All @@ -19,20 +24,44 @@ my_splice_data_annotated, spliced_ptms = project.project_ptms_onto_splice_events
region_start_col = 'region_start',
region_end_col = 'region_end',
event_id_col = 'event_id',
gene_col = 'Gene name',
dPSI_col='dPSI',
coordinate_type = 'hg19')
```
## Altered Flanking Sequences

In addition to the previously mentioned columns, we will need to know the location of the flanking exonic regions next to the spliced region. Make sure your dataframe contains the following information prior to running flanking sequence analysis:
| event_id (optional) | Gene name (recommended) | chromosome | strand | region_start | region_end | first_flank_start | first_flank_end | second_flank_start | second_flank_end |dPSI (optional) | significance (optional) |
|---------------------|-------------------------|------------|--------|--------------|------------|-------------------|-----------------|--------------------|------------------|----------------|-------------------------|
| first_event | CSTN1 |1 | - | 9797555 | 9797612 | 9687655 | 9688446 | 9811223 | 9811745 |0.362 | 0.032 |


Then, as with differentially included PTMs, you only need to run `get_flanking_changes_from_splice_data()` function:

```python
from ptm-pose import project

altered_flanks = project.get_flanking_changes_from_splice_data(my_splice_data, ptm_coordinates,
chromosome_col = 'chromosome',
strand_col = 'strand',
region_start_col = 'region_start',
region_end_col = 'region_end',
first_flank_start_col = 'first_flank_start',
first_flank_end_col = 'first_flank_end',
second_flank_start_col = 'second_flank_start',
second_flank_end_col = 'second_flank_start',
event_id_col = 'event_id',
gene_col = 'Gene name',
dPSI_col='dPSI',
coordinate_type = 'hg19')
```

## Downstream Analysis

PTM-POSE also provides functions in the `annotate` module for annotating the above outputs with functional information from various databases: PhosphoSitePlus, RegPhos, PTMcode, PTMInt, ELM, DEPOD. You can then identify PTMs with specific functions, interaction, etc. with the `analyze` module. See an example on a real dataset [here](Examples/ESRP1_knockdown).


## Have questions?

This will produce two dataframes:
1. Original splice data with additional columns indicating the number and which PTMs were found associated with that splice event. 'PTMs column denotes the UniProtKB accession, residue, site number, and modification type for PTM identified.

| event_id (if provided) | chromosome | strand | region_start | region_end | dPSI (if provided) | significance (if provided) | PTMs | Number of PTMs Affected |
|---------------------|------------|--------|--------------|------------|-----------------|-------------------------|-------------------------------|-------------------------|
| first_event | 1 | - | 9797555 | 9797612 | 0.362 | 0.032 | O94985_N515 (N-Glycosylation) | 1 |

2. New dataframe where each row is a unique event-PTM pair. This is useful for downstream analyses of the important PTM changes that are occuring in your dataset, and many functions provided for further annotation and analyses of these PTMs (see rest of documentation for examples)

| event_id (if provided) | UniProtKB Accession | Residue | Modifications | PTM Info | dPSI (if provided) | significance (if provided) |
|----------|---------------------|---------|---------------|----------|--------------------|----------------------------|
| first_event | O94985 | N515 | N-Glycosylation | O94985_N515 (N-Glycosylation) | 0.362 | 0.032 |
Please reach out to Sam Crowl ([email protected]) if you have questions or suggestions about new analysis functions that you would like to see implemented. We hope to continue to expand the analysis that can be easily performed with this package as time goes on, and welcome any feedback.

0 comments on commit 752f779

Please sign in to comment.