Frictionless reproducibility in phylogenomic experiments

This is a guest post by Amir Szitenberg, postdoctoral researcher in my group at the University of Hull, and main author of @ReproPhylo I find the ReproPhylo approach to experimental phylogenomics very exciting, and can see how it would lead to better, in depth understanding of phylogenomic datasets, regardless of their size. An example for this…

ReproPhylo: a reproducible phylogenomics environment

Our new phylogenomics environment is called ReproPhylo. It makes experimental reproducibility frictionless, occurring quietly in the background while you work on the science. The environment has a lot of tools to allow exploration of phylogenomics data and to create phylogenomic analysis pipelines. It is distributed in a Docker container simplifying installation and allowing the reproducibility…

5 star open phylogenetic data

I’ve recently come across the idea of stars for open data quality thanks to Steve Moss. The table below is from 5stardata: ★ make your stuff available on the Web (whatever format) under an open license ★★ make it available as structured data (e.g., Excel instead of image scan of a table) ★★★ use non-proprietary…

Reproducible phylogenetics part 3; how

tl;dr Phylogenetic experiments need explicitly designed reproducibility, rather than accidental or partial reproducibility. There are many working reproducibility solutions out there differing in their approach, interface and functions. There is no perfect solution for all cases, and you can learn a lot by investigating. Here I discuss a few software approaches to reproducible phylogenetics. I’m…

Reproducible phylogenetics part 2b; what

Previously I wrote about (1) why we need reproducibility in phylogenetics, (2) what we need to achieve it. This is part 2b, still writing about what we need to achieve reproducibility. My conclusion before was: “that most of the issues surrounding reproducible phylogenetics are solved problems in other disciplines. The things that are still challenging…

Reproducible phylogenetics part 2a; what

In part 1 of this series I wrote about why we need reproducible phylogenetics, here I write about what we actually need to do. tl;dr We need only a few classes of things (open reusable archiving of all data, information provenance, recording of data treatments & software environments), to make our work reproducible. Many of the…

Reproducible phylogenetics part 1; why

We are still largely missing the benefits of reproducibility in phylogenetics. I think that this makes our lives unnecessarily difficult and makes us particularly poorly prepared to confront modern data-rich phylogenetics. In this first post “Why” I want to talk about why we need reproducible phylogenetics. Then, in part two, “What“, I’m going to talk…

Pride before a (data) fall

I’m pretty proud of some parts of my workflow: electronic lab notebook, reproducibility, open data files, (semi-obsessive) automated data backups etc etc. But pride often comes before a fall. I had a bad experience this week where I thought I had lost some important phylogenetic data files (I found them eventually), and I’m writing this…