-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
45 additions
and
16 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,11 +5,16 @@ PTM-POSE is an easily implementable tool to project PTM sites onto splice event | |
## Running PTM-POSE | ||
|
||
To run PTM-POSE, you first need to process your data such that each row corresponds to a unique splice event with the genomic location of that splice event (chromosome, strand, and the bounds of the spliced region). Strand can be indicated using either '+'/'-' or 1/-1. If desired, you can also provide a delta PSI and significance value which will be included in the final PTM dataframe. Any additional columns will be kept. At a minimum, the dataframe should look something like this (optional but recommended parameters indicated): | ||
| event_id (optional) | chromosome | strand | region_start | region_end | dPSI (optional) | significance (optional) | | ||
|---------------------|------------|--------|--------------|------------|-----------------|-------------------------| | ||
| first_event | 1 | - | 9797555 | 9797612 | 0.362 | 0.032 | | ||
| event_id (optional) | Gene name (recommend) | chromosome | strand | region_start | region_end | dPSI (optional) | significance (optional) | | ||
|---------------------|-----------------------|------------|--------|--------------|------------|-----------------|-------------------------| | ||
| first_event | CSTN1 |1 | - | 9797555 | 9797612 | 0.362 | 0.032 | | ||
|
||
You currently need to also download the `ptm_coordinates` dataframe generated by [ExonPTMapper](https://github.com/NaegleLab/ExonPTMapper), which can be downloaded ---------. Once the data is in the correct format, simply run the project_ptms_onto_splice_events() function. By default, PTM-POSE assumes the provided coordinates are in hg38 coordinates, but you can use older coordinate systems with the `coordinate_type` parameter. | ||
You currently need to also download the `ptm_coordinates` dataframe generated by [ExonPTMapper](https://github.com/NaegleLab/ExonPTMapper), which can be downloaded be downloaded using `pose_config.download_ptm_coordinates()`. If you would like to avoid needing to download this in the future (usually takes a little under a minute), you can set `pose_config.download_ptm_coordinates(save = True)` which will save the file within the local package, taking about 60MB of space. | ||
|
||
PTM-POSE allows you to assess two potential impacts of splicing on PTMs: __differential inclusion__ (lost or gained as a result of a splice event) or __altered flanking sequences__ around a PTM, which can potentially alter protein interactions. | ||
|
||
## Differentially Included PTMs | ||
Once the data is in the correct format, simply run the `project_ptms_onto_splice_events()` function. By default, PTM-POSE assumes the provided coordinates are in hg38 coordinates, but you can use older coordinate systems with the `coordinate_type` parameter. If you have saved ptm_coordinates locally, you can set this parameter to None. | ||
```python | ||
from ptm-pose import project | ||
|
||
|
@@ -19,20 +24,44 @@ my_splice_data_annotated, spliced_ptms = project.project_ptms_onto_splice_events | |
region_start_col = 'region_start', | ||
region_end_col = 'region_end', | ||
event_id_col = 'event_id', | ||
gene_col = 'Gene name', | ||
dPSI_col='dPSI', | ||
coordinate_type = 'hg19') | ||
``` | ||
## Altered Flanking Sequences | ||
|
||
In addition to the previously mentioned columns, we will need to know the location of the flanking exonic regions next to the spliced region. Make sure your dataframe contains the following information prior to running flanking sequence analysis: | ||
| event_id (optional) | Gene name (recommended) | chromosome | strand | region_start | region_end | first_flank_start | first_flank_end | second_flank_start | second_flank_end |dPSI (optional) | significance (optional) | | ||
|---------------------|-------------------------|------------|--------|--------------|------------|-------------------|-----------------|--------------------|------------------|----------------|-------------------------| | ||
| first_event | CSTN1 |1 | - | 9797555 | 9797612 | 9687655 | 9688446 | 9811223 | 9811745 |0.362 | 0.032 | | ||
|
||
|
||
Then, as with differentially included PTMs, you only need to run `get_flanking_changes_from_splice_data()` function: | ||
|
||
```python | ||
from ptm-pose import project | ||
|
||
altered_flanks = project.get_flanking_changes_from_splice_data(my_splice_data, ptm_coordinates, | ||
chromosome_col = 'chromosome', | ||
strand_col = 'strand', | ||
region_start_col = 'region_start', | ||
region_end_col = 'region_end', | ||
first_flank_start_col = 'first_flank_start', | ||
first_flank_end_col = 'first_flank_end', | ||
second_flank_start_col = 'second_flank_start', | ||
second_flank_end_col = 'second_flank_start', | ||
event_id_col = 'event_id', | ||
gene_col = 'Gene name', | ||
dPSI_col='dPSI', | ||
coordinate_type = 'hg19') | ||
``` | ||
|
||
## Downstream Analysis | ||
|
||
PTM-POSE also provides functions in the `annotate` module for annotating the above outputs with functional information from various databases: PhosphoSitePlus, RegPhos, PTMcode, PTMInt, ELM, DEPOD. You can then identify PTMs with specific functions, interaction, etc. with the `analyze` module. See an example on a real dataset [here](Examples/ESRP1_knockdown). | ||
|
||
|
||
## Have questions? | ||
|
||
This will produce two dataframes: | ||
1. Original splice data with additional columns indicating the number and which PTMs were found associated with that splice event. 'PTMs column denotes the UniProtKB accession, residue, site number, and modification type for PTM identified. | ||
|
||
| event_id (if provided) | chromosome | strand | region_start | region_end | dPSI (if provided) | significance (if provided) | PTMs | Number of PTMs Affected | | ||
|---------------------|------------|--------|--------------|------------|-----------------|-------------------------|-------------------------------|-------------------------| | ||
| first_event | 1 | - | 9797555 | 9797612 | 0.362 | 0.032 | O94985_N515 (N-Glycosylation) | 1 | | ||
|
||
2. New dataframe where each row is a unique event-PTM pair. This is useful for downstream analyses of the important PTM changes that are occuring in your dataset, and many functions provided for further annotation and analyses of these PTMs (see rest of documentation for examples) | ||
|
||
| event_id (if provided) | UniProtKB Accession | Residue | Modifications | PTM Info | dPSI (if provided) | significance (if provided) | | ||
|----------|---------------------|---------|---------------|----------|--------------------|----------------------------| | ||
| first_event | O94985 | N515 | N-Glycosylation | O94985_N515 (N-Glycosylation) | 0.362 | 0.032 | | ||
Please reach out to Sam Crowl ([email protected]) if you have questions or suggestions about new analysis functions that you would like to see implemented. We hope to continue to expand the analysis that can be easily performed with this package as time goes on, and welcome any feedback. | ||
|