Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal to broaden BQ.1.28 to parental lineage without ins22303TCAAGATGGATG, C25096T and then designate its sublineage with these two mutations - Nextclade already assigned sequences without these two mutations as BQ.1.28 #1558

Open
AnonymousUserUse opened this issue Jan 16, 2023 · 6 comments
Labels
BQ.1 correction Highlight an error in the description or definition

Comments

@AnonymousUserUse
Copy link

BQ.1.28 was proposed in #1415 and was designated as BQ.1 + C44T, A1777G, G6734A, C20429T, C25096T, G25595T, T29492C, ins22303TCAAGATGGATG. The defining mutations are C25096T, ins22303TCAAGATGGATG.
Since Nextclade masks insertion, in total 6 mutations (C44T, A1777G, G6734A, C20429T, C25096T, G25595T, T29492C) are used to distinguish BQ.1.28 and BQ.1, and thus 4 mutations from that are sufficient for Nextclade to assign a sequence of BQ.1 as BQ.1.28.
This leads to the problem that a large number of sequences without defining mutations C25096T, ins22303TCAAGATGGATG are assigned as BQ.1.28 by Nextclade. In fact, only 317 of 3839 mutations assigned as BQ.1.28 by Nextclade carry the defining mutations C25096T.
BQ.1.28 assigned by Nextclade: https://cov-spectrum.org/explore/World/AllSamples/AllTimes/variants?nextcladePangoLineage=BQ.1.28*&
Real BQ.1.28 with T29492C: https://cov-spectrum.org/explore/World/AllSamples/AllTimes/variants?nucMutations=C25096T&nextcladePangoLineage=BQ.1.28*&

Regarding the extremely low positive predictive value of BQ.1.28 assigned by Nextclade, I think it would be better to broaden BQ.1.28 to parental lineage without ins22303TCAAGATGGATG, C25096T, i.e. BQ.1 + C44T, A1777G, G6734A, C20429T, G25595T, T29492C. It might be reasonable to designate BQ.1 + C44T, A1777G, G6734A, C20429T, G25595T, T29492C as new BQ.1.28 with more than 3000 sequences to solve the problem despite low epidemiological significance. Nevertheless, this lineage reached 10% prevalence in California, a state in the US with around 40 million population.
image
Then BQ.1 + C44T, A1777G, G6734A, C20429T, G25595T, T29492C + ins22303TCAAGATGGATG, C25096T can be designated as a sublineage of BQ.1.28, i.e. BQ.1.28.1. Alternative, since BQ.1 + C44T, A1777G, G6734A, C20429T, G25595T, T29492C + ins22303TCAAGATGGATG, C25096T does not show any growth advantage any more, this sublineage may also not be designated for now.
(Another choice is, BQ.1.28 may also be withdrawn directly for lack of epidemiological events. A problematic designation would be worse than no designation.)

Similar problems were reported in nextstrain/nextclade#966 and nextstrain/nextclade#1045. Actually, it would be better if Nextclade could force a certain defining mutation as a must to assign a lineage (so that no adjustment from the side of designation is necessary to solve those problems), but in short term, broadening BQ.1.28 would make sense.

@thomasppeacock thomasppeacock added correction Highlight an error in the description or definition BQ.1 labels Jan 18, 2023
@FedeGueli
Copy link
Contributor

Btw over 320 seqs now of BQ.1.28 (the real one)

@AnonymousUserUse
Copy link
Author

Also Pangolin misassigns a high number of BQ.1 sequences to BQ.1.28. Only 752 from 5326 sequences assigned by Pangolin as BQ.1.28 contain mutation C25096T, as of 2023/2/20 via CoV-Spectrum.

@AngieHinrichs
Copy link
Member

The trouble is that BQ.1.28 would become the parent of BQ.1.29 - awkward. This sounds more like a nextclade issue than a pango-designation issue to me. @corneliusroemer is there a way to make nextclade a bit more selective about what it will call BQ.1.28? (Close-but-not-quite situations like this are why I started adding extra non-Pango-lineage labels in the big tree and then converting them to vanilla Pango lineages in the minimized tree for pangolin. Perhaps you could add a node to represent the branch up to ORF1a:V2157I (G6734A) before BQ.1.28 and BQ.1.29, but let it be assigned BQ.1?)
image

@AnonymousUserUse
Copy link
Author

The trouble is that BQ.1.28 would become the parent of BQ.1.29 - awkward.

I think with the designation of BQ.1.29, Nextclade and Pangolin/pangoLEARN would NOT assign those sequences that belong to the parental lineage of both BQ.1.28 and BQ.1.29 to BQ.1.28 any more, and instead assign to BQ.1. This kind of solutions was successful in the past, e.g. the problem from the wrong assignment of BF.15 was solved after BF.20 had been designated, see nextstrain/nextclade#966.
So I may close this issue after the correct assignment is confirmed.

Nevertheless, it would be great if Nextclade can add a node for the assignment of certain branches in the future.

@DailyCovidCases
Copy link

Please close this issue

1 similar comment
@DailyCovidCases
Copy link

Please close this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BQ.1 correction Highlight an error in the description or definition
Projects
None yet
Development

No branches or pull requests

5 participants