-
Notifications
You must be signed in to change notification settings - Fork 7
/
Copy pathBIN-PHYLO-Tree_building.R
134 lines (110 loc) · 5.3 KB
/
BIN-PHYLO-Tree_building.R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
# tocID <- "BIN-PHYLO-Tree_building.R"
#
# Purpose: A Bioinformatics Course:
# R code accompanying the BIN-PHYLO-Tree_building unit.
#
# Version: 2.0
#
# Date: 2017 - 10 - 2022 - 11
# Author: Boris Steipe ([email protected])
#
# Versions:
# 2.0 The PHYLIP era is over
# 1.2 deprecate save()/load() for saveRDS()/readRDS(); Mac:
# instructions to authorize proml.app
# 1.1 Change from require() to requireNamespace(),
# use <package>::<function>() idiom throughout,
# 1.0 First 2017 version
# 0.1 First code copied from 2016 material.
#
# Note:
# This unit was originally developed as a workflow with the
# groundbreaking maximum-likelihood methods that were developed
# by Joe Felsenstein in Washington as part of the Phylip suite.
# Unfortunately, Phylip is no longer actively maintained, and
# although the code works as designed, its arkane user interface
# has made it unsuitable for coursework by novices. For a while,
# the Rphylip:: package provided a useable wrapper for
# R-scripted analysis, but since that too has gone out of maintenance,
# and the package was removed from CRAN in June 2022, I am, with
# gratitude and respect, removing Phylip from the course. (2022-11)
#
# cf. https://evolution.genetics.washington.edu/phylip.html
#
# PhyML is described here:
#
# Guindon, Stéphane et al.. (2010). “New algorithms and methods
# to estimate maximum-likelihood phylogenies: assessing the
# performance of PhyML 3.0”. Systematic Biology 59(3):307–21 .
# [PMID: 20525638] [DOI: 10.1093/sysbio/syq010]
#
# TODO:
# Add MrBayes
# add: https://cran.r-project.org/web/packages/phangorn/
# vignettes/IntertwiningTreesAndNetworks.html
#
#
# == DO NOT SIMPLY source() THIS FILE! =======================================
#
# If there are portions you don't understand, use R's help system, Google for an
# answer, or ask your instructor. Don't continue if you don't understand what's
# going on. That's not how it works ...
#
# ==============================================================================
#TOC> ==========================================================================
#TOC>
#TOC> Section Title Line
#TOC> -----------------------------------------------------
#TOC> 1 Packages 69
#TOC> 2 PhyML online server workflow 87
#TOC> 2.1 .mfa to .phy 92
#TOC> 2.2 Computing the tree 109
#TOC> 2.3 Reading the tree back into R 123
#TOC>
#TOC> ==========================================================================
# = 1 Packages ============================================================
#
#
if (! requireNamespace("phangorn", quietly = TRUE)) {
install.packages("phangorn")
}
# Package information:
# library(help = phangorn) # basic information
# browseVignettes("phangorn") # available vignettes
# data(package = "phangorn") # available datasets
# This will install phangorn::, as well as its dependency, the package ape::.
# Here, we only use phangorn:: to read a multi-FASTA file and write a
# Phylip-formatted dataset that is suitable as input for tree-inference
# programs. But phangorn:: can do a lot more than that and even has its own
# maximum-likelihood tree-inference code: phangorn::pml().
# = 2 PhyML online server workflow ========================================
# Workflow to create input that is suitable for an online version of PhyML.
# == 2.1 .mfa to .phy ======================================================
# In this workflow we reformat a multi-FASTA file into the .phy format that
# is used as input for tree inference by many different programs. The first
# line needs to specify the number of organisms and the number of "states"
# (sequence characters), subsequent lines contain the organism name and the
# data.
# Read the multi-FASTA alignment that we produced previously
tmp <- phangorn::read.phyDat("data/APSESphyloSet.mfa",
format = "fasta",
type = "AA")
# Write the alignment to disk in Phylip format
# phangorn::write.phyDat(tmp, file = "data/APSESphyloSet.phy")
# == 2.2 Computing the tree ================================================
# Submit the file to the Montpellier PhyML server
# 1. Navigate to http://www.atgc-montpellier.fr/phyml/
# 2. Upload "data/APSESphyloSet.phy"
# 3. Use default parameters
# 4. Make sure to enter your eMail address to be notified of the results
#
# The computation may complete in a minute or so. It may also take longer.
# 5. Download the .zip attachment to your results email and expand
# 6. From the folder with results find "apsesphyloset_phy_phyml_tree.txt"
# and move it to your data/ folder
# == 2.3 Reading the tree back into R ======================================
# Confirm that you can read the tree and that it makes sense
apsTree <- ape::read.tree("data/apsesphyloset_phy_phyml_tree.txt")
plot(apsTree)
# If this did not work, ask for advice.
# [END]