I just scanned through a very interesting article in BMC Bioinformatics discussing the results of a data mining approach to describing phylogenetic methodology in published articles.
Eales et al. Methodology capture: discriminating between the “best” and the rest of community practice. BMC Bioinformatics 2008, 9:359 doi:10.1186/1471-2105-9-359
They searched for “phylogen*” in titles and abstracts at PubMed, downloaded PDFs and converted to text. This was successful for 21,484 articles. They analysed those published after 1996 (about 90% of full dataset) employing data mining techniques to extract the protocol employed for phylogenetic reconstruction. 723 journals were represented by the 17,732 protocols successfully extracted.
“We found that 17% (3,712 articles) of articles were published in evolutionary biology journals, 22% (4,625 articles) were published in microbiology or bacteriology journals and 11% (2,274 articles) were from journals related to virology. The remaining 50% (10,873 articles) were published in a wide variety of fields.”
This is staggering, at least to me. In 12 years over 21 thousand articles have prominently talked about phylogenetics, and this is very broadly distributed, being found in over 700 different scientific journals. Remember these figures for next time a colleague is dismissive of “tree building”! It is clear that phylogenetic approaches are truly pervasive in modern biology.
The authors found differences in methods used between the three groups mentioned above- evolutionary biology, microbiology and virology. One difference is in use of Bayesian analysis
“Over 60% of evolutionary biology articles published in 2005 included one or more references to a term describing Bayesian phylogenetic analysis of some kind, this compares to 5% of microbiology and 11% of virology articles.”
This is not specifically about Bayesian analysis per se. It is highlighted by the authors as an example that evolutionary biology tends to take new analysis techniques faster than microbiology or virology. Something that interested me was the comment
“Almost all of the 10 protocols used most commonly by the phylogenetics community represent a valid choice (except those using UPGMA [see 31, 32]) for a researcher new to the field.”
Obviously “valid choice” could be seen as a little subjective, although it is almost certainly correct, the methods may sometimes be old-fashioned but they have been peer reviewed and are OK. What attracted my attention though was UPGMA, are there really people still using UPGMA? Then I started to think about all the genetics and bioinformatics textbooks I’ve seen that have a section on phylogenetic analysis. UPGMA is almost ubiquitous and little analysis is discussed beyond the venerable PHYLIP.
An interesting paper.
References from the quote above concerning UPGMA
31. Leitner T, Escanilla D, Franzen C, Uhlen M, Albert J: Accurate reconstruction of a known HIV-1 transmission history by phylogenetic tree analysis. Proc Natl Acad Sci USA 1996.
32. Huelsenbeck JP: Performance of Phylogenetic Methods in Simulation. Systematic Biology 1995, 44(1):17-48.