Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About output #13

Open
lancy-liang opened this issue Jul 20, 2023 · 7 comments
Open

About output #13

lancy-liang opened this issue Jul 20, 2023 · 7 comments

Comments

@lancy-liang
Copy link

Hello, I have some confusion about the output of SCAPE, and I am not quite sure what the "count" in the pasite.csv.gz file refers to:

1:40004:36:+,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,2,0,0,1,0,1,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,

May I know what do these values represent?

@zhou-ran
Copy link
Contributor

Hi,

The first column represents the name of the predicted pA site, while the other columns correspond to individual cells, with each value indicating the abundance of the respective pA site in the respective cell. Additionally, the name "1:40004:36:+" of the pA site indicates that it is located on the forward strand of chromosome 1 at position 40004 with a disperse value of 36.

I hope this helps.

Ran

@lancy-liang
Copy link
Author

Thank you very much for your response!

However, I still have some confusion:

"each value indicating the abundance of the respective pA site in the respective cell"

0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,2,0,0,1,0,1,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0

During the analysis, I constructed a Seurat object from this matrix. What do nCount_RNA and nFeature_RNA represent in this case?

thanks

@zhou-ran
Copy link
Contributor

Hi,

Please refer to the Seurat manual for information on how to calculate this value, as I am not familiar with how Seurat performs the calculation.

Ran

@lancy-liang
Copy link
Author

Hello, I used the [loadData.R] ([https://github.com/LuChenLab/SCAPE/blob/main/SCAPE.R/R/loadData.R) ↗](https://github.com/LuChenLab/SCAPE/blob/main/SCAPE.R/R/loadData.R)) script to directly construct a Seurat object, but I am not sure about the meaning of the count 0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,2,0,0,1,0,1,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0. I have read your paper where you mention the description of count:

"SCAPE is used to assign the reads to different APA isoforms. As a result, we get the number of reads for each pA site, which we term as pA counts."

"We use SCAPE to quantify the weights of pA sites of a gene in each cell."

"In summary, we demonstrate that SCAPE is able to accurately identify and quantify APA isoforms from the theoretical perspective."

"In terms of isoform weight quantification, SCAPE showed the highest correlation with the ground truth (R2 = 0.97), while the best performance of other methods was R2 = 0.53 (Supplementary Figure S1G)."

However, I am still not entirely clear on what this count specifically represents.

@zhou-ran
Copy link
Contributor

zhou-ran commented Jul 20, 2023

Hi,

The count represents how many times a pA site was detected in an individual cell after removing PCR duplicates. Besides, the variables "nCount_RNA" and "nFeature_RNA" are established when a Seurat object is initiated and not created by SCAPE.
Ran

@lancy-liang
Copy link
Author

thank you very much!!

@lancy-liang
Copy link
Author

Hi, Ran.
I used psiCategory.R to calculate the psi values for pA sites, but I thought it would calculate the psi values for genes. In your article, "Category of pA usage" mentions, "To better understand the heterogeneity of APA patterns among cell populations, we first calculated the usage of pA sites for each gene at single cell level." Is this categorization for genes or for pA sites?
Regarding the section on "Expected pA length," is the calculated value referring to the usage of a single pA site or the usage of pA sites within a gene?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants