Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to turn off population frequency plot #144

Open
savitakartik opened this issue Apr 2, 2024 · 6 comments
Open

Switch to turn off population frequency plot #144

savitakartik opened this issue Apr 2, 2024 · 6 comments
Milestone

Comments

@savitakartik
Copy link
Collaborator

savitakartik commented Apr 2, 2024

We can now view population frequencies of a mutation by clicking on a data point in the main mutations view. For this, we're computing a numpy array of shape num_nodes x num_populations. With large number of populations (e.g. >200 in the Unified Genealogies dataset) this array grows extremely big and tsqc runs out of memory, getting killed. We should make frequency plots optional, which users can choose to switch on only when sufficient memory is available.

@benjeffery benjeffery added this to the MVP milestone Oct 14, 2024
@benjeffery
Copy link
Member

Can we calculate these on-the-fly instead?

@jeromekelleher
Copy link
Member

Impractical at chromosome scale I'm afraid. Possible for a small window, though?

@benjeffery
Copy link
Member

Don't you just need to do it for the single mutation that's been clicked on?

@savitakartik
Copy link
Collaborator Author

savitakartik commented Oct 15, 2024

Yes, it's only for the single mutation that's clicked on. We considered on-the-fly calculations, but I think one of the reasons we thought it would be easier to pass the whole set of arrays to the browser, was so interactivity is smoother for larger datasets.

The reason for creating this issue was that a big machine was needed everytime we served the dataset, even when we didn't care about the population frequencies. Now that we have the preprocess logic, perhaps this issue is not as relevant? I'll check with a couple of datasets and confirm.

@jeromekelleher
Copy link
Member

If we're doing it for a single mutation then maybe it is feasible. You'd have to do this:

tree = ts.at(mutation_position)
nodes =  tree.preorder(mutation_node) # numpy array
subtree_population = ts.nodes_population[nodes]
# return a data frame of the counts

@benjeffery
Copy link
Member

Moving this to MVP+1 as it isn't a showstopper and we have a good solution.

@benjeffery benjeffery modified the milestones: MVP, MVP+1 Nov 13, 2024
This was referenced Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants