phylo_data {opm} | R Documentation |
Create entire character matrix (include header and footer) in a file format suitable for exporting phylogenetic data. Return it or write it to a file. This function can also produce HTML tables and text paragraphs suitable for displaying PM data in taxonomic journals such as IJSEM.
## S4 method for signature 'OPMD_Listing' phylo_data(object, html.args = html_args(), run.tidy = FALSE) ## S4 method for signature 'OPMS_Listing' phylo_data(object, html.args = html_args(), run.tidy = FALSE) ## S4 method for signature 'XOPMX' phylo_data(object, as.labels, subset = param_names("disc.name"), sep = " ", extract.args = list(), join = TRUE, discrete.args = list(range = TRUE, gap = TRUE), ...) ## S4 method for signature 'data.frame' phylo_data(object, as.labels = NULL, subset = "numeric", sep = " ", ...) ## S4 method for signature 'matrix' phylo_data(object, format = opm_opt("phylo.fmt"), outfile = "", enclose = TRUE, indent = 3L, paup.block = FALSE, delete = c("none", "uninf", "constant", "ambig"), join = FALSE, cutoff = 0, digits = opm_opt("digits"), comments = comment(object), html.args = html_args(), prefer.char = format == "html", run.tidy = FALSE, ...)
object |
Data frame, numeric matrix or
|
format |
Character scalar determining the output format, either epf (Extended PHYLIP Format), nexus, phylip, hennig or html. If NEXUS or ‘Hennig’ format is chosen, a
non-empty EPF or ‘extended PHYLIP’ is
sometimes called ‘relaxed PHYLIP’. The
main difference between EPF and
PHYLIP is that the former can use labels with
more than ten characters, but its labels must not contain
whitespace. (These adaptations are done automatically
with |
outfile |
Character scalar. If a non-empty character scalar, resulting lines are directly written to this file. Otherwise, they are returned. |
enclose |
Logical scalar. Shall labels be enclosed
in single quotes? Ignored unless |
indent |
Integer scalar. Indentation of commands in
NEXUS format. Ignored unless |
paup.block |
Logical scalar. Append a PAUP* block with selected (recommended) default values? Has no effect unless ‘nexus’ is selected as ‘format’. |
delete |
Character scalar with one of the following values:
|
join |
Logical scalar, vector or factor. Unless
|
cutoff |
Numeric scalar. If joining results in multiple-state characters, they can be filtered by removing all entries with a relative frequency less than ‘cutoff’. Makes not much sense for non-integer numeric data. |
digits |
Numeric scalar. Used for rounding, and thus
ignored unless |
comments |
Character vector. Comments to be added to the output (as title if HTML is chosen). Ignored if the output format does not allow for comments. If empty, a default comment is chosen. |
html.args |
List of arguments used to modify the
generated HTML. See |
prefer.char |
Logical scalar indicating whether or
not to use |
run.tidy |
Logical scalar. Filter the resulting
HTML through the Tidy program? Ignored unless
|
as.labels |
Vector of data-frame indexes or
|
sep |
Character scalar. See |
subset |
Character scalar. For the
|
extract.args |
Optional list of arguments passed to that method. |
discrete.args |
Optional list of arguments passed
from the |
... |
Optional arguments passed between the methods
(i.e., from the other methods to the matrix method) or to
|
Exporting PM data in such formats allows one to either infer trees from the data under the maximum-likelihood and/or the maximum-parsimony criterion, or to reconstruct the evolution of PM characters on given phylogenetic trees, or to nicely display the data in HTML format.
For exporting NEXUS format, the matrix should normally be
converted beforehand by applying discrete
.
Exporting HTML is optimised for data
discretised with gap
set to TRUE
. For other
data, the character.states
argument should be
modified, see html_args
. The hennig
(Hennig86) format is the one used by TNT; it
allows continuous characters to be analysed as such.
Regarding the meaning of ‘character’ as used here,
see the ‘Details’ section of
discrete
.
The generated HTML is guaranteed to produce
neither errors nor warnings if checked using the Tidy
program. It deliberately contains no formatting
instructions but a rich annotation with ‘class’
attributes which allows for CSS-based
formatting. This annotation includes the naming of all
sections and all kinds of textual content. Whether the
characters show differences between at least one organism
and the others is also indicated. For the CSS
files that come with the package, see the examples below
and opm_files
.
Character vector, each element representing a line in a
potential output file, returned invisibly if
outfile
is given.
Berger, S. A., Stamatakis, A. 2010 Accuracy of morphology-based phylogenetic fossil placement under maximum likelihood. 8th ACS/IEEE International Conference on Computer Systems and Applications (AICCSA-10). Hammamet, Tunisia [analysis of phenotypic data with RAxML].
Felsenstein, J. 2005 PHYLIP (Phylogeny Inference Package) version 3.6. Distributed by the author. Seattle: University of Washington, Department of Genome Sciences [the PHYLIP program].
Goloboff, P.A., Farris, J.S., Nixon, K.C. 2008 TNT, a free program for phylogenetic analysis. Cladistics 24, 774–786 [the TNT program].
Goloboff, P.A., Mattoni, C., Quinteros, S. 2005 Continuous characters analysed as such. Cladistics 22, 589–601.
Maddison, D. R., Swofford, D. L., Maddison, W. P. 1997 Nexus: An extensible file format for systematic information. Syst Biol 46, 590–621 [the NEXUS format].
Stamatakis, A. 2006 RAxML-VI-HPC: Maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models Bioinformatics 22, 2688–2690. [the RAxML program].
Swofford, D. L. 2002 PAUP*: Phylogenetic Analysis Using Parsimony (*and Other Methods), Version 4.0 b10. Sunderland, Mass.: Sinauer Associates [the PAUP* program].
http://ijs.microbiologyresearch.org/ [IJSEM journal]
http://tidy.sourceforge.net/ [HTML Tidy]
base::comment base::write hwriter::hwrite
Other phylogeny-functions: html_args
,
safe_labels
# simple helper functions echo <- function(x) write(substr(x, 1, 250), file = "") is_html <- function(x) is.character(x) && c("<html>", "<head>", "<body>", "</html>", "</head>", "</body>") %in% x longer <- function(x, y) any(nchar(x) > nchar(y)) && !any(nchar(x) < nchar(y)) ## examples with a dummy data set x <- matrix(c(0:9, letters[1:22]), nrow = 2) colnames(x) <- LETTERS[1:16] rownames(x) <- c("Ahoernchen", "Behoernchen") # Chip and Dale in German # EPF is a comparatively restricted format echo(y.epf <- phylo_data(x, format = "epf")) stopifnot(is.character(y.epf), length(y.epf) == 3) stopifnot(identical(y.epf, phylo_data(as.data.frame(x), subset = "factor", format = "epf"))) # PHYLIP is even more restricted (shorter labels!) echo(y.phylip <- phylo_data(x, format = "phylip")) stopifnot((y.epf == y.phylip) == c(TRUE, FALSE, FALSE)) # NEXUS allows for more content; note the comment and the character labels echo(y.nexus <- phylo_data(x, format = "nexus")) nexus.len.1 <- length(y.nexus) stopifnot(is.character(y.nexus), nexus.len.1 > 10) # adding a PAUP* block with (hopefully useful) default settings echo(y.nexus <- phylo_data(x, format = "nexus", paup.block = TRUE)) stopifnot(is.character(y.nexus), length(y.nexus) > nexus.len.1) # adding our own comment comment(x) <- c("This is", "a test") # yields two lines echo(y.nexus <- phylo_data(x, format = "nexus")) stopifnot(identical(length(y.nexus), nexus.len.1 + 1L)) # Hennig86/TNT also includes the comment echo(y.hennig <- phylo_data(x, format = "hennig")) hennig.len.1 <- length(y.hennig) stopifnot(is.character(y.hennig), hennig.len.1 > 10) # without an explicit comment, the default one will be used comment(x) <- NULL echo(y.hennig <- phylo_data(x, format = "hennig")) stopifnot(identical(length(y.hennig), hennig.len.1 - 1L)) ## examples with real data and HTML # setting the CSS file that comes with opm as default opm_opt(css.file = opm_files("css")[[1]]) # see discrete() for the conversion and note the OPMS example below: one # could also get the results directly from OPMS objects x <- extract(vaas_4[, , 1:10], as.labels = list("Species", "Strain"), in.parens = FALSE) x <- discrete(x, range = TRUE, gap = TRUE) echo(y <- phylo_data(x, format = "html", html.args = html_args(organisms.start = "Strains: "))) # this yields HTML with the usual tags, a table legend, and the table itself # in a single line; the default 'organisms.start' could also be used stopifnot(is_html(y)) # now with joining of the results per species (and changing the organism # description accordingly) x <- extract(vaas_4[, , 1:10], as.labels = list("Species"), in.parens = FALSE) x <- discrete(x, range = TRUE, gap = TRUE) echo(y <- phylo_data(x, format = "html", join = TRUE, html.args = html_args(organisms.start = "Species: "))) stopifnot(is_html(y)) # Here and in the following examples note the highlighting of the variable # (uninformative or informative) characters. The uninformative ones are those # that are not constant but show overlap regarding the sets of character # states between all organisms. The informative ones are those that are fully # distinct between all organisms. # 'OPMS' method, yielding the same results than above but directly echo(yy <- phylo_data(vaas_4[, , 1:10], as.labels = "Species", format = "html", join = TRUE, extract.args = list(in.parens = FALSE), html.args = html_args(organisms.start = "Species: "))) # the timestamps might differ, but otherwise the result is as above stopifnot(length(y) == length(yy) && length(which(y != yy)) < 2) # appending user-defined sections echo(yy <- phylo_data(vaas_4[, , 1:10], as.labels = "Species", format = "html", join = TRUE, extract.args = list(in.parens = FALSE), html.args = html_args(organisms.start = "Species: ", append = list(section.1 = "additional text", section.2 = "more text")))) stopifnot(length(y) < length(yy), length(which(!y %in% yy)) < 2) # note the position -- there are also 'prepend' and 'insert' arguments # effect of deletion echo(y <- phylo_data(x, "html", delete = "none", join = FALSE)) echo(y.noambig <- phylo_data(x, "html", delete = "ambig", join = FALSE)) stopifnot(length(which(y != y.noambig)) < 2) # timestamps might differ # ambiguities are created only by joining echo(y <- phylo_data(x, "html", delete = "none", join = TRUE)) echo(y.noambig <- phylo_data(x, "html", delete = "ambig", join = TRUE)) stopifnot(longer(y, y.noambig)) echo(y.nouninf <- phylo_data(x, "html", delete = "uninf", join = TRUE)) stopifnot(longer(y, y.nouninf)) echo(y.noconst <- phylo_data(x, "html", delete = "const", join = TRUE)) stopifnot(longer(y.noconst, y.nouninf)) # getting real numbers, not discretised ones echo(yy <- phylo_data(vaas_4[, , 1:10], as.labels = "Species", format = "html", join = TRUE, extract.args = list(in.parens = FALSE), subset = "A", discrete.args = NULL, html.args = html_args(organisms.start = "Species: "))) stopifnot(is_html(yy), length(yy) == length(y) - 1) # no symbols list # the highlighting is also used here, based on the following heuristic: # if mean+/-2*sd does not overlap, the character is informative; else # if mean+/-sd does not overlap, the character is uninformative; otherwise # it is constant # this can also be used for formats other than HTML (but not all make sense) echo(yy <- phylo_data(vaas_4[, , 1:10], as.labels = "Species", format = "hennig", join = TRUE, extract.args = list(in.parens = FALSE), subset = "A", discrete.args = NULL)) stopifnot(is.character(yy), length(yy) > 10) ## 'OPMD_Listing' method echo(x <- phylo_data(listing(vaas_1, NULL))) stopifnot(is.character(x), length(x) == 1) echo(x <- phylo_data(listing(vaas_1, NULL, html = TRUE))) stopifnot(is.character(x), length(x) > 1) ## 'OPMS_Listing' method echo(x <- phylo_data(listing(vaas_4, as.groups = "Species"))) stopifnot(is.character(x), length(x) == 2, !is.null(names(x))) echo(x <- phylo_data(listing(vaas_4, as.groups = "Species", html = TRUE))) stopifnot(is.character(x), length(x) > 2, is.null(names(x)))