FastTree 3: Timing some runs

I downloaded some datasets from the SILVA96 database. These are structurally aligned SSU rDNA sequences. I browsed through the taxonomic groups and chose annelids (N=1050) and nematodes (N=5048) as smallish tests. I downloaded these as fasta files.

I started with the annelids file. The file contain a LOT of gaps, because it comes from an alignment of hundreds of thousands of sequences of all three domains of life.
I haven’t yet found a good way to process large files to remove columns that are all gaps. It can be done in Clustal and Mesquite but these are bad choices with very large alignments. There are some online resources but my fasta files are >50-250MB, so online is not the place even if I could persuade a server to upload my files. I should really have used BioPerl SimpleAlign to remove gap columns, its probably the most flexible and able to deal with big files, but I was temporarily having trouble installing BioPerl on my desktop (a future post) and ran out of time and patience.

I ran it through Gblocks instead which does more than just remove blank columns, also trimming areas of poor alignment judge by various criteria. This reduced the file considerably.

I had previously installed FastTree, so I ran it with the command

fasttree -nt annelids.fasta >annelids.tree

It ran quite nicely and produced a viable tree.
Something strange with the timings though.

Topology done after 1242.20 sec -- computing support values
Unique: 3137/5048 Bad splits: 37/3134 Hill-climb: 259 Update-best: 11335 NNI: 4149
Top hits: close neighbors 2510/3137 refreshes 176
Time 1577.05 Distances per N*N: by-profile 0.220 (out 0.065) by-leaf 0.291
END:    2008-10-28 18:23:32
----------------------------------------------
Runtime:     5886         seconds
Runtime:     01:38:06     h:m:s
----------------------------------------------

The text starting with “END:” is the output of my perl script before that from fasttree. So fasttree claims to have taken 1577 seconds (26 minutes) but my script times it at 1 hour 38 minutes. I actually noted the time it started and it did take 1 hour 38 mins. I repeated with identical results. Strange discrepancy.

One Comment Add yours

  1. suicyte says:

    For removing alignment columns that are all gaps, I typically use BELVU from Erik Sonnhammer’s group. I am not sure if it can handle your giant files, though.BELVU is mainly an alignment viewer, but it can also be run in command line mode and is reasonably fast.Good luck,Kay

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s