ReproPhylo: a reproducible phylogenomics environment

Our new phylogenomics environment is called ReproPhylo. It makes experimental reproducibility frictionless, occurring quietly in the background while you work on the science. The environment has a lot of tools to allow exploration of phylogenomics data and to create phylogenomic analysis pipelines. It is distributed in a Docker container simplifying installation and allowing the reproducibility…

5 star open phylogenetic data

I’ve recently come across the idea of stars for open data quality thanks to Steve Moss. The table below is from 5stardata: ★ make your stuff available on the Web (whatever format) under an open license ★★ make it available as structured data (e.g., Excel instead of image scan of a table) ★★★ use non-proprietary…

Reproducible phylogenetics part 3; how

tl;dr Phylogenetic experiments need explicitly designed reproducibility, rather than accidental or partial reproducibility. There are many working reproducibility solutions out there differing in their approach, interface and functions. There is no perfect solution for all cases, and you can learn a lot by investigating. Here I discuss a few software approaches to reproducible phylogenetics. I’m…

Reproducible phylogenetics part 2b; what

Previously I wrote about (1) why we need reproducibility in phylogenetics, (2) what we need to achieve it. This is part 2b, still writing about what we need to achieve reproducibility. My conclusion before was: “that most of the issues surrounding reproducible phylogenetics are solved problems in other disciplines. The things that are still challenging…

Reproducible phylogenetics part 2a; what

In part 1 of this series I wrote about why we need reproducible phylogenetics, here I write about what we actually need to do. tl;dr We need only a few classes of things (open reusable archiving of all data, information provenance, recording of data treatments & software environments), to make our work reproducible. Many of the…

Reproducible phylogenetics part 1; why

We are still largely missing the benefits of reproducibility in phylogenetics. I think that this makes our lives unnecessarily difficult and makes us particularly poorly prepared to confront modern data-rich phylogenetics. In this first post “Why” I want to talk about why we need reproducible phylogenetics. Then, in part two, “What“, I’m going to talk…

Reproducible research in phylogenetics

I’ve been reading a lot recently about reproducible research (RR) in bioinformatics on several blogs, and Google+ and Twitter. The idea is that it is important that someone is easily able to reproduce* your results (and even figures) from your publication using your provided code and data. I’ve been thinking that this is a movement…

The monophyly of plants and insects

I just received promotional information about a new book from Garland Science publishers. “Genome Duplication; concepts, mechanisms, evolution and disease” By Melvin L DePamphilis and Stephen D Bell. Garland Science Oct 2010 ISBN: 978-0-415-44206. It sounds like a great title, especially for someone like me who thinks genome and gene duplication are among the most…

FastTree 3: Timing some runs

I downloaded some datasets from the SILVA96 database. These are structurally aligned SSU rDNA sequences. I browsed through the taxonomic groups and chose annelids (N=1050) and nematodes (N=5048) as smallish tests. I downloaded these as fasta files. I started with the annelids file. The file contain a LOT of gaps, because it comes from an…

FastTree 2.5: Update

The prediction I made before about a long silence once this year’s students turned up was sadly accurate. Anyway, students dealt with, grant proposal submitted, lectures (mostly) given, bureaucracy reduced (a bit), time to get on with some phylogenetics. I was playing before with FastTree. Although it looks to have been quite well tested by…