You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Proposal to broaden BQ.1.28 to parental lineage without ins22303TCAAGATGGATG, C25096T and then designate its sublineage with these two mutations - Nextclade already assigned sequences without these two mutations as BQ.1.28
#1558
BQ.1.28 was proposed in #1415 and was designated as BQ.1 + C44T, A1777G, G6734A, C20429T, C25096T, G25595T, T29492C, ins22303TCAAGATGGATG. The defining mutations are C25096T, ins22303TCAAGATGGATG.
Since Nextclade masks insertion, in total 6 mutations (C44T, A1777G, G6734A, C20429T, C25096T, G25595T, T29492C) are used to distinguish BQ.1.28 and BQ.1, and thus 4 mutations from that are sufficient for Nextclade to assign a sequence of BQ.1 as BQ.1.28.
This leads to the problem that a large number of sequences without defining mutations C25096T, ins22303TCAAGATGGATG are assigned as BQ.1.28 by Nextclade. In fact, only 317 of 3839 mutations assigned as BQ.1.28 by Nextclade carry the defining mutations C25096T.
BQ.1.28 assigned by Nextclade: https://cov-spectrum.org/explore/World/AllSamples/AllTimes/variants?nextcladePangoLineage=BQ.1.28*&
Real BQ.1.28 with T29492C: https://cov-spectrum.org/explore/World/AllSamples/AllTimes/variants?nucMutations=C25096T&nextcladePangoLineage=BQ.1.28*&
Regarding the extremely low positive predictive value of BQ.1.28 assigned by Nextclade, I think it would be better to broaden BQ.1.28 to parental lineage without ins22303TCAAGATGGATG, C25096T, i.e. BQ.1 + C44T, A1777G, G6734A, C20429T, G25595T, T29492C. It might be reasonable to designate BQ.1 + C44T, A1777G, G6734A, C20429T, G25595T, T29492C as new BQ.1.28 with more than 3000 sequences to solve the problem despite low epidemiological significance. Nevertheless, this lineage reached 10% prevalence in California, a state in the US with around 40 million population.
Then BQ.1 + C44T, A1777G, G6734A, C20429T, G25595T, T29492C + ins22303TCAAGATGGATG, C25096T can be designated as a sublineage of BQ.1.28, i.e. BQ.1.28.1. Alternative, since BQ.1 + C44T, A1777G, G6734A, C20429T, G25595T, T29492C + ins22303TCAAGATGGATG, C25096T does not show any growth advantage any more, this sublineage may also not be designated for now.
(Another choice is, BQ.1.28 may also be withdrawn directly for lack of epidemiological events. A problematic designation would be worse than no designation.)
Similar problems were reported in nextstrain/nextclade#966 and nextstrain/nextclade#1045. Actually, it would be better if Nextclade could force a certain defining mutation as a must to assign a lineage (so that no adjustment from the side of designation is necessary to solve those problems), but in short term, broadening BQ.1.28 would make sense.
The text was updated successfully, but these errors were encountered:
Also Pangolin misassigns a high number of BQ.1 sequences to BQ.1.28. Only 752 from 5326 sequences assigned by Pangolin as BQ.1.28 contain mutation C25096T, as of 2023/2/20 via CoV-Spectrum.
The trouble is that BQ.1.28 would become the parent of BQ.1.29 - awkward. This sounds more like a nextclade issue than a pango-designation issue to me. @corneliusroemer is there a way to make nextclade a bit more selective about what it will call BQ.1.28? (Close-but-not-quite situations like this are why I started adding extra non-Pango-lineage labels in the big tree and then converting them to vanilla Pango lineages in the minimized tree for pangolin. Perhaps you could add a node to represent the branch up to ORF1a:V2157I (G6734A) before BQ.1.28 and BQ.1.29, but let it be assigned BQ.1?)
The trouble is that BQ.1.28 would become the parent of BQ.1.29 - awkward.
I think with the designation of BQ.1.29, Nextclade and Pangolin/pangoLEARN would NOT assign those sequences that belong to the parental lineage of both BQ.1.28 and BQ.1.29 to BQ.1.28 any more, and instead assign to BQ.1. This kind of solutions was successful in the past, e.g. the problem from the wrong assignment of BF.15 was solved after BF.20 had been designated, see nextstrain/nextclade#966.
So I may close this issue after the correct assignment is confirmed.
Nevertheless, it would be great if Nextclade can add a node for the assignment of certain branches in the future.
BQ.1.28 was proposed in #1415 and was designated as BQ.1 + C44T, A1777G, G6734A, C20429T, C25096T, G25595T, T29492C, ins22303TCAAGATGGATG. The defining mutations are C25096T, ins22303TCAAGATGGATG.
Since Nextclade masks insertion, in total 6 mutations (C44T, A1777G, G6734A, C20429T, C25096T, G25595T, T29492C) are used to distinguish BQ.1.28 and BQ.1, and thus 4 mutations from that are sufficient for Nextclade to assign a sequence of BQ.1 as BQ.1.28.
This leads to the problem that a large number of sequences without defining mutations C25096T, ins22303TCAAGATGGATG are assigned as BQ.1.28 by Nextclade. In fact, only 317 of 3839 mutations assigned as BQ.1.28 by Nextclade carry the defining mutations C25096T.
BQ.1.28 assigned by Nextclade: https://cov-spectrum.org/explore/World/AllSamples/AllTimes/variants?nextcladePangoLineage=BQ.1.28*&
Real BQ.1.28 with T29492C: https://cov-spectrum.org/explore/World/AllSamples/AllTimes/variants?nucMutations=C25096T&nextcladePangoLineage=BQ.1.28*&
Regarding the extremely low positive predictive value of BQ.1.28 assigned by Nextclade, I think it would be better to broaden BQ.1.28 to parental lineage without ins22303TCAAGATGGATG, C25096T, i.e. BQ.1 + C44T, A1777G, G6734A, C20429T, G25595T, T29492C. It might be reasonable to designate BQ.1 + C44T, A1777G, G6734A, C20429T, G25595T, T29492C as new BQ.1.28 with more than 3000 sequences to solve the problem despite low epidemiological significance. Nevertheless, this lineage reached 10% prevalence in California, a state in the US with around 40 million population.
![image](https://user-images.githubusercontent.com/113214374/212775671-f3e17b97-6f0e-44b1-9d31-c0c33b7b9a09.png)
Then BQ.1 + C44T, A1777G, G6734A, C20429T, G25595T, T29492C + ins22303TCAAGATGGATG, C25096T can be designated as a sublineage of BQ.1.28, i.e. BQ.1.28.1. Alternative, since BQ.1 + C44T, A1777G, G6734A, C20429T, G25595T, T29492C + ins22303TCAAGATGGATG, C25096T does not show any growth advantage any more, this sublineage may also not be designated for now.
(Another choice is, BQ.1.28 may also be withdrawn directly for lack of epidemiological events. A problematic designation would be worse than no designation.)
Similar problems were reported in nextstrain/nextclade#966 and nextstrain/nextclade#1045. Actually, it would be better if Nextclade could force a certain defining mutation as a must to assign a lineage (so that no adjustment from the side of designation is necessary to solve those problems), but in short term, broadening BQ.1.28 would make sense.
The text was updated successfully, but these errors were encountered: