Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function to compare genotypes stored in xarray.dataset using sgkit #94

Open
szhan opened this issue Jun 14, 2023 · 7 comments
Open

Function to compare genotypes stored in xarray.dataset using sgkit #94

szhan opened this issue Jun 14, 2023 · 7 comments
Labels
enhancement New feature or request

Comments

@szhan
Copy link
Owner

szhan commented Jun 14, 2023

This function should help compare genotypes stored in two VCF files, e.g., one from BEAGLE containing imputed genotypes and the other containing ground-truth genotypes. The plan is to develop a version of this function to work for cases where only biallelic or monoallelic sites are compared before generalising it to be incorporated into sgkit.

@hyanwong
Copy link
Collaborator

Does this tie in to tskit-dev/tsinfer#739

On that note, we should probably try to push tskit-dev/tskit#2617 through. I think all the comments are addressed on that PR now, and it can probably be merged, right?

@szhan
Copy link
Owner Author

szhan commented Jun 14, 2023

Ah, yes! Using SampleData to compare genotypes is an absolute nightmare.

@hyanwong
Copy link
Collaborator

hyanwong commented Jun 14, 2023

Should probably be incorporated into SGkit, not SampleData.

@szhan
Copy link
Owner Author

szhan commented Jun 14, 2023

Yes, I mean using SampleData to compare genotypes is inconvenient. Should definitely move onto sgkit.

@szhan
Copy link
Owner Author

szhan commented Jun 15, 2023

If one or both of ds1 and ds2 is empty, then a ValueError exception should be raised, right? Since there are genotypes to be remapped. h/t @benjeffery for suggesting these tests.

@szhan
Copy link
Owner Author

szhan commented Jun 16, 2023

Should also add tests involving cases where

  1. Some site positions overlap.
  2. No site positions are shared.

@szhan
Copy link
Owner Author

szhan commented Jun 20, 2023

Leaving this open, although it is good enough for now.

@szhan szhan changed the title Function to compare genotypes stored in two xarray.dataset objects Function to compare genotypes stored in xarray.dataset using sgkit Jun 23, 2023
@szhan szhan added the enhancement New feature or request label Jun 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants