You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.
For context, I was reviewing #912 and there was a question about an increase in the number of unique Kids_First_Participant_ID, harmonized_diagnosis pairs.
I looked into how often Kid_First_Participant_ID are associated with multiple harmonized_diagnosis values with the following:
library(tidyverse)
# Read in v18 pbta-histologies.tsv explicitlyv18_df<- read_tsv("data/release-v18-20201123/pbta-histologies.tsv")
# Find participant IDs associated with multiple harmonized dx valuespt_multiple_dx<-v18_df %>%
# For tumor samplesdplyr::filter(sample_type=="Tumor",
composition=="Solid Tissue") %>%
# Filter to unique participant-harmonized dx pairs
select(Kids_First_Participant_ID,
harmonized_diagnosis) %>%
distinct() %>%
# Find participant IDs when more than one harmonized dx exists
count(Kids_First_Participant_ID) %>%
filter(n>1) %>%
pull(Kids_First_Participant_ID)
# Create a TSV file with all ID and disease type info in these casesv18_df %>%
filter(Kids_First_Participant_ID%in%pt_multiple_dx,
sample_type=="Tumor",
composition=="Solid Tissue") %>%
select(Kids_First_Participant_ID,
Kids_First_Biospecimen_ID,
sample_id,
pathology_diagnosis,
integrated_diagnosis,
harmonized_diagnosis,
molecular_subtype,
tumor_descriptor) %>%
arrange(Kids_First_Participant_ID) %>%
write_tsv("participants_with_multiple_harmonized_dx.txt")
There are 24 instances of an individual Kid_First_Participant_ID being associated with multiple harmonized_diagnosis values. Here's the file produced using the above: participants_with_multiple_harmonized_dx.txt
This number seems high to me. I do wonder if some of this stems from #735 (which is now closed), or because of changes to the HGAT and LGAT subtyping being somewhat in flux as they overlap or not. Either way, I wanted to make a note of it!
The text was updated successfully, but these errors were encountered:
jaclyn-taroni
changed the title
Question: Do we expect many participant IDs to be associated with more than one harmonized_dx?
Question: Do we expect many participant IDs to be associated with more than one harmonized_diagnosis?
Jan 16, 2021
Hi @jaclyn-taroni - this list looks to be expected to me. Some of these discrepancies are through subtyping: BS_EE73VE7V missing the K28 mutation, BS_1J2WQ08M ependymoma perhaps missing the YAP1 fusion, BS_HZNKSQ17 not having a matched RNA-Seq for medulloblastoma subtyping, BS_8T7DZV2F medulloblastoma may be discrepant with consensus subtypes, and the rest seem to be typical - many LGG initial tumors that progressed to HGG, PT_00G007DM ETMR that progressed but did not have the C19MC alteration (interesting), PT_25Z2NX27 (7316-355, 7316-1462) this may be a data input inconsistency, as both diagnoses are fibromas, but it is free text, so it came out differently in the harmonized diagnosis. We can go back and check those pathology reports. BS_N8WXTFN4 should be checked (or an eye kept out during #819) for the BRAF V600E mutation in case that was missed. I did also notice a few other "benign" specimens that were separate of the tumors (cortical tubers) for some patients.
I think some of these could be interesting cases to discuss in the paper!
What data file(s) does this issue pertain to?
pbta-histologies.tsv
What release are you using?
release-v18-20201123
Put your question or report your issue here.
For context, I was reviewing #912 and there was a question about an increase in the number of unique
Kids_First_Participant_ID
,harmonized_diagnosis
pairs.I looked into how often
Kid_First_Participant_ID
are associated with multipleharmonized_diagnosis
values with the following:There are 24 instances of an individual
Kid_First_Participant_ID
being associated with multipleharmonized_diagnosis
values. Here's the file produced using the above: participants_with_multiple_harmonized_dx.txtThis number seems high to me. I do wonder if some of this stems from #735 (which is now closed), or because of changes to the HGAT and LGAT subtyping being somewhat in flux as they overlap or not. Either way, I wanted to make a note of it!
The text was updated successfully, but these errors were encountered: