-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix Duplicate Analysis Entries for Sequencing Groups of Different 'Ty…
…pes' (#675) * Fixed issue with sequencing groups of different 'types' from same sample getting duplicate analyses entries. In this commit, we addressed a bug that was causing duplicate analysis entries for samples with sequencing groups of different 'types'. Specifically, if a sample had both 'exome' and 'genome' sequencing groups, the analysis entries for the 'genome' sequencing group were being incorrectly copied to the 'exome' sequencing groups. This resulted in the 'exome' sequencing groups having the same analysis IDs and metadata as the 'genome' sequencing group. To fix this, we modified the code to create a more detailed mapping of old sample id to new sample id in order to track newly created sequencing groups so that the correct analyses were being copied to the newly created sequencing group according to the following rule: each sample only has one sequencing group for each type, platform, technology. * refactoring getting of new sg IDs to append analyses too. Needs some tidying up * Changed name of variable sequencing_group_ids_from_sample to sample_to_sg_attribute_map so that it more accurately reflects the contents of the mapping. This variable is here to map the unique sequencing group attributes that for each samples sequencing groups so that we can track which newly created sequencing group gets the analyses from the current old sequencing group we are using to update the analyses. * Linting fixes * iSort linting change * minor changes to improve readability and error checking * Add detailed docstring with example data to get_new_sg_id function * Add early failure check for get_new_sg_id function to check it returns a list of sequencing group id's of lenght 1 so that we know it's only adding data to one sequencing group during the Analysis creation inside trasnfer_analyses() * Ensuring sample and sequencing group IDs are completely fake (they already were fake) to please the linter. * Changing to get the external ID of the participant instead of the sample's external ID as the participant external ID is conistent, compared to a sample external ID that could change if a participant has multiple samples. While none of the PA datasets currently multiple samples per participant (that I'm aware of), the RD team does and in the interest of future proofing this change was made
- Loading branch information
1 parent
c7f53b2
commit 68351eb
Showing
1 changed file
with
126 additions
and
13 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters