Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conflict of Optimized Relaxed Clock and Sampled Ancestors #984

Open
Anaphory opened this issue Jul 19, 2021 · 12 comments
Open

Conflict of Optimized Relaxed Clock and Sampled Ancestors #984

Anaphory opened this issue Jul 19, 2021 · 12 comments

Comments

@Anaphory
Copy link
Contributor

A sudden IndexError in an analysis had me stumped for some time. I have now figured out why it happens, but not how to fix it.

Symptoms:

For the analysis that permits sampled ancestors, the Relaxed Clock dies with an index error somewhere in the branch-rates-aware narrow exchange (MetaNER) operator.
What happens is the following:

  1. The relaxed clock mechanism sets the dimension of the rate parameter to “number of nodes in the tree – 1”, and the number of nodes at that point is correctly “number of tips × 2 – 1”. For Sinotibetan, this is 98 rates and 99 nodes for 50 tips, with the nodes numbered 0 to 98.
  2. The model runs fine for several 100'000s of steps.
  3. The MetaNER tries to propose an exchange, but trying to look up the node D with number 98 it dies with an IndexError.

For a moment I thought I might have started with a sampled ancestor and then a Jump added an internal node which would put the total number of nodes, and therefore branches, and therefore rates needed, up by one without adjusting the dimension of the rates parameter. But that's not the case, sampled ancestors are not degree-2 nodes, but tips with a branch length of 0.

Investigating closer, I was greatly confused, because I know that node D in the MetaNER is a child, and that node 98 is always the root. Except:

/**
* Sets root without recalculating nodeCount or ensuring that root is the last node in the internal array.
* Currently only used by sampled ancestor tree operators. Use carefully!
*
* @param root the new root node
*/
public void setRootOnly(final Node root) {
//TODO should we flag this with startEditing since it is an operator call?
this.root = root;
}

This method messes with that invariant, and makes Sampled Ancestors and Optimized Relaxed Clocks incompatible.
Why is it there? Why can Sampled Ancestors not just use the same setRoot() as everyone else?
Is there a way to get SA to work with an ORC (I'm willing to modify and test code and supply patches, as you know), or am I stuck with the generic Relaxed Clock with its convergence issues?
Should I cross-link this issue to the SA and ORC repositories?

@Anaphory
Copy link
Contributor Author

In my local BEAST sources, I have changed setRootOnly to be an alias of setRoot. Now my analysis seems to run with no issues, the trace looks appropriate, too. The difference in run-time is negligible, because the SA operators are not called that often and when they do, often they won't have changed the root, I guess. So I'm stumped why the SA operators got this special handling anyway. Does it have to do with the fact that in a sampled ancestor tree, the first bifurcation can be a descendant of an older sampled ancestor?

Anaphory pushed a commit to Anaphory/beast2 that referenced this issue Jul 19, 2021
@Anaphory
Copy link
Contributor Author

I have made a commit in Anaphory@9f10bc5 – I can make a pull request, but as it is, I have committed on top of #953 and some other small changes and somehow that PR is still in limbo.

@Anaphory
Copy link
Contributor Author

Running that patched version of BEAST on my data a while longer, I get IndexErrors in

if( ! nodesTraversed[n.getNr()] ) {

because

nodesTraversed = new boolean[tree.getNodeCount()];

ends up with a wrong cached NodeCount somehow.

@Anaphory
Copy link
Contributor Author

This in turn seems to happen because the SA-Operators (I found it in SAWilsonBalding, at least) first assign a new node as the root and only then add the remaining nodes as children to that new root:

        if (jP != null) {
            jP.removeChild(j);  // remove <jP, j>
            jP.addChild(iP);   // add <jP, iP>
            jP.makeDirty(Tree.IS_FILTHY);
        } else {
            iP.setParent(null); // completely remove <PiP, iP>
            tree.setRootOnly(iP);
        }
        iP.addChild(j);

I suggest the true solution is to modify the SampledAncestor operators, and I will link this issue from there; but to make it work for all analyses that do not have sampled ancestors and variable number of nodes in the tree by adding nodes during the analysis, it should be enough to change

nodeCount = this.root.getNodeCount();

to

nodeCount = -1; 

and thus defer the re-calculation fo nodeCount to the next time it is needed. I checked the Tree class, and the only way to sidestep this and run into issues seems to be if that changed line 310 and

public void addNode(final Node newNode) {
final Node[] tmp = new Node[nodeCount + 1];
System.arraycopy(m_nodes, 0, tmp, 0, nodeCount);
tmp[nodeCount] = newNode;
newNode.setNr(nodeCount);
m_nodes = tmp;
nodeCount++;

and

public int getNodeCount() {
if (nodeCount < 0) {

get called in that order, so it's already much safer than permitting SA operators to mess with a very general assumption (root is always the last node).

@rbouckaert
Copy link
Member

@Anaphory Thanks for digging into this. Both the update of line 310 (i.e. set nodeCount = -1; ) and 9f10bc5 are required to fix this, right?

@Anaphory
Copy link
Contributor Author

They are both required, but they aren't enough yet: I also ran into

Fatal exception: null
java.lang.NegativeArraySizeException
	at beast.evolution.tree.Tree.store(Unknown Source)
	at beast.core.StateNode.startEditing(Unknown Source)
	at beast.evolution.tree.Tree.startEditing(Unknown Source)
	at beast.core.StateNode.getCurrentEditable(Unknown Source)
	at beast.core.Input.get(Unknown Source)
	at beast.evolution.operators.Exchange.proposal(Unknown Source)
	at beast.core.Operator.proposal(Unknown Source)
	at beast.core.MCMC.propagateState(Unknown Source)
	at beast.core.MCMC.doLoop(Unknown Source)
	at beast.core.MCMC.run(Unknown Source)
	at beast.app.BeastMCMC.run(Unknown Source)
	at beast.app.beastapp.BeastMain.<init>(Unknown Source)
	at beast.app.beastapp.BeastMain.main(Unknown Source)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at beast.app.beastapp.BeastLauncher.run(Unknown Source)
	at beast.app.beastapp.BeastLauncher.main(Unknown Source)
Fatal exception: null
java.lang.RuntimeException: An error was encounted. Terminating BEAST
	at beast.app.util.ErrorLogHandler.publish(Unknown Source)
	at java.util.logging.Logger.log(Logger.java:738)
	at java.util.logging.Logger.doLog(Logger.java:765)
	at java.util.logging.Logger.log(Logger.java:788)
	at java.util.logging.Logger.severe(Logger.java:1464)
	at beast.app.beastapp.BeastMain.<init>(Unknown Source)
	at beast.app.beastapp.BeastMain.main(Unknown Source)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at beast.app.beastapp.BeastLauncher.run(Unknown Source)
	at beast.app.beastapp.BeastLauncher.main(Unknown Source)

which is somewhat understandable: If for some reason the SAWilsonBalding sets the nodeCount to -1 and the tree needs to be stored immediately after the operator is applied, without any random call to getNodeCount() before that. Because that methot is memoized, replacing

if (m_storedNodes.length != nodeCount) {
final Node[] tmp = new Node[nodeCount];

with

     if (m_storedNodes.length != getNodeCount()) {
         final Node[] tmp = new Node[nodeCount]; 

should fix that problem. However, that whole if looks like it comes from the times when sampled ancestors were non-bifurcating nodes, so maybe it would be best to replace it as a whole and just run (committed as Anaphory@8526c99)

/**
 * StateNode implementation *
 */
@Override
protected void store() {
    storeNodes(0, getNodeCount());
    storedRoot = m_storedNodes[root.getNr()];
}

Anaphory added a commit to Anaphory/sampled-ancestors that referenced this issue Jul 21, 2021
This should fix one of the symptoms of CompEvol/beast2#984 and also CompEvol#16
@Anaphory
Copy link
Contributor Author

I have put a pull request through to the SampledAncestor repository, as well.

@rbouckaert
Copy link
Member

What if getNodeCount is called at the end of setRoot?

@Anaphory
Copy link
Contributor Author

The issue with the SAWilsonBalding operator is that it constructs a partial tree, set that as the root, and only then glues in the other part of the tree, so getNodeCount will generate the wrong result. The -1 is for “Don't forget to re-compute this number once the SAWilsonBalding is completely finished”.

@rbouckaert
Copy link
Member

So, if the SAWilsonBalding operator would call setRoot with the correct root at the end of the proposal, that would do the trick, right? Perhaps adding some comments to setRootOnly to that effect might be useful.
I see how using the -1 flag works, but it does not feel particularly robust.

@Anaphory
Copy link
Contributor Author

Anaphory commented Jul 22, 2021 via email

@Anaphory
Copy link
Contributor Author

Anaphory commented Jul 23, 2021

I see how using the -1 flag works, but it does not feel particularly robust.

Oh, it really is not. I already ran into it being a problem somewhere else: In rare cases, a Tree.store() was called immediately after setRoot (I don't know why there was no likelihood calculation in beetween which would have re-computed the nodeCount, but it happened) and that creates an array with negative length in the old line 800 of that method. And I'm sure there are other issues.

Like, as I just noticed, resuming. 😞 Which is really important. So I'll undo that change and end with nodeCount = getNodeCount(); while using CompEvol/sampled-ancestors#19 (which fixes it on the SA end) for my analysis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants