BSA Past-Presidentís Symposium: Computational Issues and Solutions for the Study of Plant Phylogeny
Cranston, Karen A. , Sanderson, Michael , Wheeler, Travis J. .
Automating phylogenetic tree inference from public sequence databases.
The development of increasingly accurate and efficient computational algorithms for assembling phylogenies leads us to ask when and how manual intervention with expert knowledge should be incorporated into our procedures. The PhyLoTA browser, a searchable database of sequence clusters for molecular phylogenetics, provides an example of a fully automated phyloinformatics pipeline. Starting from all non-genomic GenBank nucleotide sequences, it produces unaligned clusters, multiple sequence alignments and inferred phylogenies at various depths across the Tree of Life. Many of the greatest design challenges lie upstream of alignment and phylogenetic inference, in the assembly of sequence clusters as input for alignment software. The structure of the input data - the huge diversity of sequence lengths and types - requires careful optimization of the clustering protocol for synthesizing phylogenetically useful sets of sequences. Clusters should be as inclusive as possible while maintaining ease of alignment and phylogenetic inference. The size heterogeneity between clusters requires that the overall pipeline be efficient both for a huge number of small clusters and a small number of very large clusters. We discuss effects of these choices on the structure of the sequence clusters and then results of alignment and phylogenetic inference. Advantages of such a fully automated pipeline lie in efficiency and reproducibility, but such a strategy may require sacrifice in terms of accuracy. In light of future availability of computational resources, algorithmic improvements and molecular sequence data, we explore the idea that full automation should be an end goal in reconstructing the Tree of Life.
Log in to add this item to your schedule
1 - University of Arizona / Field Museum of Natural History, Ecology and Evolutionary Biology, 1041 E. Lowell St., Tucson, AZ , 85721, USA
2 - University of Arizona, Ecology and Evolutionary Biology, Tucson, AZ, 85721, USA
3 - University of Arizona / Field Museum of Natural History, Computer Science, 1041 E. Lowell St., Tucson, AZ , 85721, USA
Presentation Type: Symposium or Colloquium Presentation
Location: Ballroom 2/Cliff Lodge - Level B
Date: Wednesday, July 29th, 2009
Time: 10:15 AM