Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different p-value on same tree with slightly different branch lengths #18

Open
crsl4 opened this issue Apr 2, 2022 · 3 comments
Open

Comments

@crsl4
Copy link
Member

crsl4 commented Apr 2, 2022

I am adding commands to run TICR in julia in the PhyloNetworks wiki.
I get two very different p-values for the ASTRAL tree and the SNaQ h=0 tree, but the tree topologies are the same and the difference in branch lengths (at least visually on the plot) look very similar.

I followed this code inside the data_results/n15.gamma0.30.20.2_n300 folder from the PhyloNetworks wiki:

using QuartetNetworkGoodnessFit, CSV, PhyloPlots, DataFrames, PhyloNetworks
df = DataFrame(CSV.File("bucky-output/1_seqgen.CFs.csv"))
dat = df[:,[1,2,3,4,5,8,11]]

astraltree = readMultiTopology("astral/astral.tre")[102] # main tree with BS as node labels
rootatnode!(astraltree,"15")
plot(astraltree,:R, showEdgeLength=true)
out = ticr!(astraltree,dat,false)

P-value for ASTRAL tree:

julia> out[1]
2.5095827563325576e-6

Then, same code for SNaQ h=0 tree:

snaqtree = readTopology("snaq/net0_bucky.out")
rootatnode!(snaqtree,"15")
plot(snaqtree,:R, showEdgeLength=true)
out2 = ticr!(snaqtree,dat,false)

P-value for SNaQ h=0 tree:

julia> out2[1]
0.8984817899839543

Even if I optimize branch lengths on the SNaQ h=0 tree:

julia> out2 = ticr!(snaqtree,dat,true)
(0.8984817899839543, -1.2729482105384087, Dict("[0.05, 0.1)" => 50, "[0.0, 0.01)" => 30, "[0.01, 0.05)" => 28, "[0.1, 1.0]" => 1257), [74.34019547885967, 7601.3467641088855], [0.30195601360220753, 0.4748614258436003, 0.7787108654400592, 0.5303203132024022, 0.7787108654400592, 0.7321699351036286, 0.7787108654400592, 0.9253543201911679, 0.9983388360397745, 0.4748614258436003  …  0.946903443206478, 0.9757944592517083, 0.3743756117951704, 0.12274795560100286, 0.9888728409315868, 0.9692954871424657, 0.9250612479969377, 0.4410937049467882, 0.9255010963721408, 0.10201118577400667], HybridNetwork, Un-rooted Network
27 edges
28 nodes: 15 tips, 0 hybrid nodes, 13 internal tree nodes.
tip labels: 15, 1, 3, 2, ...
(3,2,((6,(4,5):1.102):0.495,(((((12,13):2.766,11):0.716,(7,(10,(8,9):2.047):0.318):0.645):4.504,14):0.897,(1,15):1.256):2.034):2.134);
)

julia> out2[1]
0.8984817899839543
@cecileane
Copy link
Member

what happens if you re-build the data frame dat from scratch, before running the second test (with the SNaQ tree)? ticr! modified the data frame, so it would be best to use the same starting point for dat in both cases. I doubt that it will change anything, but who knows.

Did you make sure that both trees don't have any polytomy (with branch lengths of 0 if need be)?

To help the diagnostic, it would help to compare the column created by the test: it contains the outlier p-values corresponding to each four-taxon set. The largest differences between the astral tree and snaq tree could help find the tree differences that might contributed to the outlier p-value differences.

@crsl4
Copy link
Member Author

crsl4 commented Apr 4, 2022

I get the same results if I re-read dat before running the test for the SNaQ tree. I also checked that there are no polytomies on the trees. I will check next the columns created by the test.

@cecileane
Copy link
Member

Also: did you check the results using the R implementation in phylolm? Do the result vary a lot from one tree to the other using the R implementation? The polytomies are treated a little differently between the 2 implementations, and you would need to choose the test statistic in julia that is implemented in R.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants