Skip to content

K optimization: r2 vs score #59

@nbedelman

Description

@nbedelman

Hello,

I am running STITCH with a relatively small sample (40 individuals), with ~0.5X per-sample haplotagging (linked-read) sequencing. I recognize, that this sample size is pretty small for imputation, but I'm hoping STITCH will still be at least somewhat effective. I've run the program, varying K and number of generations as suggested, and am now evaluating the output. It seems that the mean score, and number of sites with scores > 0.4 increase as K increases from 2-35, where the values seem to asymptote. However, the r2 values reach their peak around K=14 (r2=0.875), and drop off on either side (K=35, r2=0.73). Number of generations has minimal effect on r2, but runs with fewer generations (10-100) consistently yield more sites with high scores than those with more generations (300-1000). Does this make sense, and would you recommend maximizing score values or r2 values when selecting K?

Thanks!
Nate

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions