Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Establishing initial data analysis strategy #3

Closed
swiri021 opened this issue Sep 4, 2021 · 5 comments · Fixed by #20
Closed

Establishing initial data analysis strategy #3

swiri021 opened this issue Sep 4, 2021 · 5 comments · Fixed by #20
Assignees
Milestone

Comments

@swiri021
Copy link
Contributor

swiri021 commented Sep 4, 2021

dataset summary (what we have, what we can do), data analysis strategy

@swiri021 swiri021 added this to the Phase 0 milestone Sep 4, 2021
@swiri021 swiri021 self-assigned this Sep 12, 2021
@swiri021
Copy link
Contributor Author

Early diagnosis biomarker:

  1. Getting DEG (Early vs Late vs Normal)
  2. Getting Differential pathways (Early vs Late vs Normal)
  3. Feature Extraction(RFE) (Early vs Normal)
  4. PCA (Early vs Late vs Normal) with clustering methods
  5. Intersected features among all methods and curation with scientific insight(Kicheol)
  6. Performance test (AUC curve) with validation dataset
  7. Performance test (AUC curve) with external dataset

@swiri021
Copy link
Contributor Author

@kicheolkim We need a normalization dataset, I am able to do it by myself but I think we have to make consistency for the normalization method. A possible option could be TMM or quantile norm, let me know your thoughts.

@kicheolkim
Copy link
Contributor

kicheolkim commented Sep 14, 2021

  • We don't have actual late-stage patients. The majority of patients are early stage.

  • For DEG, it should compare within cell types. I used sex and age as covariates (model: ~ Last_Known_Treat_Stat + Sex + AgeAtExamGrp). The age was binned as 10-30, 30-40, 40-50, 50-60, and 60-90.

  • Github R code for previous DEG (used in the publication)

@swiri021 swiri021 linked a pull request Sep 18, 2021 that will close this issue
@swiri021
Copy link
Contributor Author

swiri021 commented Sep 18, 2021

@kicheolkim

Initial feature extraction

  • Experiment group: early stage (DiseaseDuration <= median)
  • Control group: Late stage (DiseaseDuration > median)

Step1- Ranksum and get pvalues<0.05 from Activation Score
Step2- Recursive Feature Elimination(w/ Cross Validation) and get ranking=1 from Step1 result
Step3- Ranksum and get pvalues<0.05 from Step2 result
Step4- Recursive Feature Elimination(w/ Cross Validation) and get ranking=1 from Step3 result

@swiri021
Copy link
Contributor Author

New comments for actual analysis work #24

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants