I’ve recently come across the idea of stars for open data quality thanks to Steve Moss. The table below is from 5stardata:
★ | make your stuff available on the Web (whatever format) under an open license |
★★ | make it available as structured data (e.g., Excel instead of image scan of a table) |
★★★ | use non-proprietary formats (e.g., CSV instead of Excel) |
★★★★ | use URIs to denote things, so that people can point at your stuff |
★★★★★ | link your data to other data to provide context |
How does this relate to phylogenetic data? Here is my suggestion for a star system for phylogenetic data:
Anyone want to suggest changes to this star system?
★ | publish a picture of your tree in a journal article |
★★ | make seq alignment, tree & metadata available in suppl data with the paper |
★★★ | as 2star but save as XML e.g. NeXML, PhyloXML in supplementary data with the paper |
★★★★ | as 3star but place open access NeXML file on FigShare or Dryad with URIs |
★★★★★ | as 4star and link your data to other data to provide context |
Thoughts:
1 star: Surely we are past the point where people do not archive their newick tree file? Or am I being too optimistic?
2 star: This seems to be the current standard. Metadata often means a Word doc table or Excel spreadsheet. Unfortunately these complex and fragile proprietary file formats create a barrier to machine reading the data. A simple csv file would be much better and you could easily open it in Excel if you really insist. Surely open access publication is a prerequisite for 2 stars?
3 star: Want to increase your star rating? This would be an easy step to take for most people. Many good programs are supporting new rich standard formats like NeXML and PhyloXML and we should hassle the authors of software not doing so.
4 star: Again this is an easy win. Make sure your data is open access, machine findable and machine readable. Figshare is ridiculously powerful and easy to work with. Your files (all of them) can be bulk uploaded. You will get a repository doi link to quote in your manuscript and share with people. Individual files have doi links too.
5 star: This is more difficult to do well. Some of this may have been achieved by use of XML files, but how much? I have a lot to learn here about linked data. Having files that use the NCBI taxonIDs and official gene names allows automatic link-outs to be created. XML files can do exactly this. But how well does this work? The potential of linked data is also bigger than this, I have more reading to do. I like Tim Berners-Lee’s bag of chips (crisps!) analogy.
The idea of data stars originated with Tim Berners-Lee, and there are nice descriptions of the system on 5stardata.info and a YouTube “bag of chips” video of TBL explaining many of the ideas.
Edit: I liked the idea of starting the list at zero, no stars, because (A) thats how computer languages count (B) you don’t deserve any stars at all for just putting a picture of your data in a publication. But it seemed too petty.
Reading:
- Cranston K, Harmon LJ, O’Leary MA, Lisle C: Best practices for data sharing in phylogenetic research. PLoS Curr 2014, 6.
- Cranston K, Blackburn D, Brown J, Dececchi A, Gardner N, Greshake B, Harmon L, Holder M, Holroyd P, Irmis R, Jansma R, Lloyd G, Mabee P, Miller M, Mounce R, Mungall C, O’Leary M, Pardo J, Parr C, Piel WH, Stoltzfus A, Turner W, Vision T, Wright A, Watanabe A, Wolfe J: Simple rules for sharing phylogenetic data. figshare 2014.
- Sharing data with Open Tree of Life [http://blog.opentreeoflife.org/data-sharing/]
- Stoltzfus A, O’Meara B, Whitacre J, Mounce R, Gillespie EL, Kumar S, Rosauer DF, Vos RA: Sharing and re-use of phylogenetic trees (and associated data) to facilitate synthesis. BMC Res Notes 2012, 5:574.
- Han MV, Zmasek CM: phyloXML: XML for evolutionary biology and comparative genomics. BMC Bioinformatics 2009, 10:356.
- Vos RA, Balhoff JP, Caravas JA, Holder MT, Lapp H, Maddison WP, Midford PE, Priyam A, Sukumaran J, Xia X, Stoltzfus A: NeXML: rich, extensible, and verifiable representation of comparative data and metadata. Syst Biol 2012, 61:675–689.
Enjoying this series of posts – thanks! I note that we are most definitely *not* past the 1 star phase. We (Open Tree hat here) had about at 16% success rate getting tree files, and this included TreeBASE + Dryad + suppl material + asking authors directly. Similar stats in the Stoltzfus et al paper.
LikeLike
Thanks Karen, glad you’re enjoying. Maybe I was too optimistic. Is it too much friction in the process or insurmountable human laziness do you think? I’m going to be an optimist again and say the former.
LikeLike