Last night I attended a talk in Lisbon given by Ward Wheeler at the AMNH in New York City and moderated by Frederick Matsen from his home institution in Berkeley, California. The talk was the second on a series of talks in phylogenetics held via videoconferencing.
The idea behind phyloseminar.org is to hold regular live online seminars in phylogenetic methodology open to anyone around the globe. This is a challenge given the time zone differences of the possible participants, but it does makes the whole event fun: I watched it after dinner at 9:00pm; the presenter gave it at his 4:00pm; while the moderator was there after lunch at his 1:00pm. I saw at least one person among the audience that watched it from the future after breakfast in New Zealand the next day at 10:00am.
We used the software EVO, a free tool specifically designed for scientific communication (unlike the Internets that was designed for… nevermind). Prior to a seminar, you need to install and create an user account so you can then join the phyloseminar channel. It works really well. You see a window with the slideshow and a window with video stream for each participant (to keep things simpler, only the presenter and the moderator had video enabled last night).
For Wheeler’s talk we were twelve people, and looking at their user accounts (where you can set your location), there were people listening in California, Kansas, New York, Lisbon and New Zealand at least. The talk was 45 minutes long and went on for another 15 minutes of discussion. We could type questions using the chat tool of the software, which were then read by the moderator (again, rather than each person talking to keep things simpler).
Wheeler’s talk, Dynamic homology and phylogenetic systematics, was about alignment, or rather methods to avoid having to perform an alignment for phylogenetic inference altogether, something he has been championing for many years now. The idea behind these methods, called direct optimization methods, is easy to understand: when you are comparing DNA sequences in order to reconstruct how species (or genes) are related to each other, you need to match them together to determine which positions along a sequence correspond to which positions in another one, a process called sequence alignment. Only then can you asses whether different species have the same or a different base composition in each position– the raw evidence for evolutionary relatedness. But it happens that, because those sequences are the result of a process of mostly branching evolution (where one species splits to gives rise to two descendant ones), the proper format for comparison between multiple sequences is not a matrix of rows and columns but a phylogenetic tree. The problem is that we don’t know the shape of this tree because that is what we seek to reconstruct in the first place.
The most common way to address this problem is to perform alignments using tree shapes that we know are a good approximations. Once we find a satisfactory match between our sequences, we proceed with the phylogenetic reconstruction proper, searching for the tree(s) that maximizes our optimality criterion (e.g., parsimony, likelihood). But one caveat of the procedure I just caricatured is that by running the analysis in two steps (alignment and tree search), you impose a restriction on the number of possible combinations you will evaluate. Direct optimization lifts this restriction by performing the sequence matching and tree evaluation in just one step, with the potential result that you may find more optimal solutions. In other words, direct optimization methods are able to perform more thorough exploration of the space of possible solutions.
Now, while the method is easy enough to describe in a post, its mathematical and computational implementation is not simple at all. The amount of operations needed to evaluate just a single tree shape increases exponentially in comparison with vanilla tree searches, and you will be better off performing these calculations in a computer cluster.
The aspect that caused more unease during the talk was Wheeler’s explanation of the difference between truth and optimality inherent in all these methods (direct optimization or not). Apparently, when you simulate sequence data in order to run it through different programs and evaluate how well each alignment methods does, they all invariably find solutions that are more optimal (more parsimonious, more likely or more probable) than the simulated one. That is, most of the time the optimal solution is different from the true one. The consequence of this is that, since in phylogenetic reconstruction we will never know for certain the true evolutionary history, we are forced to abandon the search for the true solution and will have to content with finding the optimal one.
If this talk was representative of the series, I said the seminars are not for the general audience: you needed a very good grasp of phylogenetic theory; alas, if you know Ward Wheeler you know that his brain runs as fast as his supercomputers. A good thing is that the seminars are being recorded and can be revisited anytime. You can watch the first one by Marc A. Suchard here and the one by Ward Wheeler there soon. The next seminar will address the problem of reconciling gene tress with species trees, and the next three seminars are decided by popular vote.
The whole experience was a first for me, and it was real fun.
4 Comments to Phylogenetics through videoconferencing
- Tom Waits