Monolithic pipelines are common in bioinformatics and particularly for metabarcoding. My view is that the word pipeline, and the type of software it refers to, may be holding us back and should be rethought. What is a pipeline? Pipelines are connected sets of programs, where information flows through the linked analysis algorithms as water flows…
Category: Research
metabarcoding numts
My PhD was hard. The grasshopper I was investigating (Chorthippus paralellus) had a huge number of mtDNA insertions into the nuclear chromosomal DNA (numt, pronounced ‘new-might’). PCR tended to amplify multiple templates. It was hard to get genuine mtDNA sequences. numts are known from almost all animals, though in most species they don’t get in…
Modest reproducibility success: a reanalysis of two early branching Metazoa datasets using ReproPhylo
This is a guest post by Amir Szitenberg, a postdoc in my lab @EvoHull, describing a phylogenomic investigation using ReproPhylo. Amir used to be a sponge researcher if you can’t tell from the tone below. Despite already knowing ReproPhylo could do all this rapidly and of course reproducibly, I was still both surprised and impressed by the scale and speed…
Frictionless reproducibility in phylogenomic experiments
This is a guest post by Amir Szitenberg, postdoctoral researcher in my group at the University of Hull, and main author of @ReproPhylo I find the ReproPhylo approach to experimental phylogenomics very exciting, and can see how it would lead to better, in depth understanding of phylogenomic datasets, regardless of their size. An example for this…
ReproPhylo: a reproducible phylogenomics environment
Our new phylogenomics environment is called ReproPhylo. It makes experimental reproducibility frictionless, occurring quietly in the background while you work on the science. The environment has a lot of tools to allow exploration of phylogenomics data and to create phylogenomic analysis pipelines. It is distributed in a Docker container simplifying installation and allowing the reproducibility…
5 star open phylogenetic data
I’ve recently come across the idea of stars for open data quality thanks to Steve Moss. The table below is from 5stardata: ★ make your stuff available on the Web (whatever format) under an open license ★★ make it available as structured data (e.g., Excel instead of image scan of a table) ★★★ use non-proprietary…
Reproducible phylogenetics part 3; how
tl;dr Phylogenetic experiments need explicitly designed reproducibility, rather than accidental or partial reproducibility. There are many working reproducibility solutions out there differing in their approach, interface and functions. There is no perfect solution for all cases, and you can learn a lot by investigating. Here I discuss a few software approaches to reproducible phylogenetics….
Reproducible phylogenetics part 2b; what
Previously I wrote about (1) why we need reproducibility in phylogenetics, (2) what we need to achieve it. This is part 2b, still writing about what we need to achieve reproducibility. My conclusion before was: “that most of the issues surrounding reproducible phylogenetics are solved problems in other disciplines. The things that are still challenging…
Reproducible phylogenetics part 2a; what
In part 1 of this series I wrote about why we need reproducible phylogenetics, here I write about what we actually need to do. tl;dr We need only a few classes of things (open reusable archiving of all data, information provenance, recording of data treatments & software environments), to make our work reproducible. Many of the…
Reproducible phylogenetics part 1; why
We are still largely missing the benefits of reproducibility in phylogenetics. I think that this makes our lives unnecessarily difficult and makes us particularly poorly prepared to confront modern data-rich phylogenetics. In this first post “Why” I want to talk about why we need reproducible phylogenetics. Then, in part two, “What“, I’m going to talk…