|Stochastic Systems Group|
Phylogenetic Tree Reconstruction with Insertions and Deletions
Phylogenetic trees represent evolutionary history, by showing the course of evolution from some extinct common ancestor to today's species. These trees can reveal connections between different species, teach us about extinct species, and help us understand the development of viruses, and other organisms with high mutation rate.
The most common way of reconstructing phylogenetic trees is based on sequencing DNA from different species, and using similarities between sequences to infer the tree. This requires some model of how DNA evolves between an ancestor species and its descendants. The simplest model assumes that there are i.i.d mutations, and that each mutation is a substitution, and under this model, there are provably good reconstruction algorithm. However, recent studies show that insertions and deletions mutations are a serious cause of reconstruction errors, and can not be ignored.
We present the first efficient algorithm for tree reconstruction when the mutations can be substitutions, insertions and deletions. The algorithm uses a clustering based approach, which builds small parts of the tree, and glues them together. In addition, the algorithm outputs a global alignment of the DNA sequences, which respects the evolutionary history.
Based on joint works with Alex Andoni, Mark Braverman, Costis Daskalakis and Sebasiten Roch
Problems with this site should be emailed to email@example.com