In order to really get information out of building phylogenetic trees (especially large ones) some thought has to be given to how to annotate the tips (OTUs).The two programs that seem to do this in a powerful way are ARB and Treedyn. I also want to explore Tree-Q vista, which looks promising, but haven’t really had chance yet. (Has anybody got experience with Tree-Q vista?).
Treedyn is a very good program for editing and annotating phylogenetic trees. Its action can be driven by scripts and it can carry out many sophisticated graphical transformations.
“Many powerful tree editors are now available, but existing tree visualisation tools make little use of meta-information related to the entities under study such as taxonomic descriptions, geographic distribution or gene functions. This meta-information is useful for the analyses of trees and their publications, but can hardly be encoded within the tree itself (the so-called newick format). Consequently, a tedious manual analysis and post-processing of the tree’s images is required. Particularly with large trees, multiple trees and multiple meta-information variables. TreeDyn links unique leaf labels to lists of variables/values pairs of annotations (meta-information), independently of the tree topologies, remaining fully compatible with the basic newick format.” [www.treedyn.org]
What information can it be labeled with? The best thing would be to parse the information out of the original GenBank files of the sequences that created the tree. Treedyn allows conditional annotation of OTUs by adding to or replacing the existing names. This can be done from an annotation file where the information is held as “key{value}” pairs, such as accession_number{AY123456}, on a line following the unique name from the newick file.
I wrote a little perl script to do this. This could be done much better using BioPerl. My perl skills are very basic, but it works.
#! usr/bin/perl
# Creates an annotation file for treedyn from a file containing multiple
# Genbank files. Annotations are of the form key{value}. Keys must not
# contain spaces.
# usage: genbank2treedyn.pl infile.gb > outfile.tlf
$/ = “//”; # break up records on genbank // delimiter
while () {
/ACCESSION[ ]*(S+)/; # matches ACCESSION line
$accession = $1;
/AUTHORS[ ]*(w+),/; # matches first author surname
$author = $1;
/organism=”[ ]*(S+)[ ]*(w+).|”/; # matches genus, species
$genus = $1;
$species = $2;
/isolate[ ]*(S+)/; # matches isolate line
$isolate = $1;
print “$accession tgenus {$genus} tspecies {$species} taccession {$accession} tisolate {$isolate} tauthor {$author}n”;
}
exit;
In addition to tip names Treedyn is able to annotate OTUs with graphical character data, some nice examples on the website.
Of course I also have some grumbles about Treedyn. It doesn’t work properly on Macs, never has. The PC version though seems very stable. The interface is an absolute nightmare. One of the most illogical and confusing I have ever seen. But you can learn to survive it with a little patience. Despite all this the actual functions are well thought out and powerful, even if applying them is difficult sometimes.
The best thing about Treedyn in my opinion is that it is open source.