Phylogenetic Systematics among the Sciences

Any phylogeny can be recovered: The history of phylogenetic systematics is one of increasing scope of application. Willi Hennig first envisioned its use as a means of reconstructing phylogeny for creatures with no significant fossil record, such as the insects he studied. A nice thing about the parsimony method of phylogeny reconstruction, however, is that it is robust to missing data. This makes the inclusion of fossil taxa simple and, as we have seen, they can be instrumental in resolving long branch attraction problems.

More recently, the method has been applied to other situations requiring the reconstruction of branching historic patterns, including:

Cladograms are not ends in themselves: Cladists often behave as if phylogeny reconstruction were a self-evident end in itself. In fact, it's great value may be when it is applied to allied fields of science.

Reconstructing missing data using the Extant Phylogenetic Bracket: Phylogeny reconstruction algorithms like T.N.T. also reconstruct optimum character states at reconstructed tree nodes. From these reconstructions, hpothetical states for taxa with missing data can be reconstructed as well. In 1999, Larry Witmer coined useful vocabulary to describe how unknown character states for fossil taxa are reconstructed with respect to extant taxa called the extant phylogenetic bracket (EPB). This is the bracket formed on either side of a fossil taxon by living taxa that provide missing information on a character state. Using it, we can make three types of inference, listed in order of decreasing confidence. Consider the distribution of a soft-tissue character - the four-chambered heart - among three fossil reptiles:

Phylogenies meet the Rock Record

Phylogenies and Biostratigraphy: Traditional biostratigraphers - Geologists who use the fossil record to date sedimentary rock units tend to be literal-minded souls who think that either fossils of an organism or group are known to be present at a particular time, or they aren't. Hypotheses of phylogeny tell us when we ought to be cautious about such literal interpretations.

Telling the Truth with Statistics

Correlation studies: Biologists are fond of performing statistical studies, looking for meaningful correlations between measurable aspects of different parts of animals' anatomies. For instance, one might look at the ratio of the length and depth of birds' beaks vs. the length of their tarsals to see if larger birds have proportionately deeper beaks. Measurements of this sort can be subjected to the full range of statistical analyses. A fundamental assumption of such studies is that all observations be made from independent members of the same underlying population.

However, when samples are being taken from groups of populations nested in a heirarchical phylogeny, this assumption is violated. The example at right shows of how misleading this can be. Imagine a simple phylogeny with two sister clades, one black, the other red. When we make a bivariate plot of data taken from them, there appears to be a strong statistical relationship. But in fact, when we compare values only within each clade, we see very low correlation.

Fortunately, Joe Felsenstein of the University of Washington emerged from the Felsenstein zone in 1985 to develop a statistical technique for adjusting data to account for the phylogeny of the taxa sampled, called phylogenetically independent contrasts (PIC). This method is based on the concept that although the taxa may be non-independent, the differences between measured values in them are independent. Statistical correlation techniques are, therefore, applied to pairwise contrasts in measurements from sister taxa. By applying it, meaningful correlation studies can be performed. Without it, they would be meaningless or misleading. Applying this method absolutely requires a known phylogeny. If hypotheses of phylogeny change, the results of correlation studies based on them must be revised, too.

The dance of Molecules and Morphologies

Molecular phylogenies: We can often identify homologous positions on the DNA strands of different creatures. The identity of the nucleotide at a given position (A, G, C, or T) is as good a phylogenetic character as any, as long as it is taken from a part of the genome that does not code for a functional protein. Molecular analyses offer:

From Lost at Eminor
Indeed, molecular analyses are vulnerable to long-branch attraction in a less obvious way: Different portions of the genome evolve at different rates. Failure to match the rate of evolution of the portion of the genome to the length of the branches being examined can result in significant long branch attraction problems. During the early days of molecular systematics, this led to some interesting conclusions.

For example, Gaur, Hide, and Li, 1991 confidently reported the (morphologically) improbable result that guinea-pigs are more closely related to primates than to rats, based on an analysis of the mitochondrial genome. Two subsequent analyses vindicated this. The mitochondrial genome evolves very rapidly compared to the nuclear genome. Arguably, the rate of evolution was so fast that Gaur et al. were, in effect, analyzing noise rather than signal. In the early days, however, the mitochondrial genome was much easier to sequence.

From Practically Science
Molecular clocks: A basic tenet of molecular systematics is that nucleotide substitutions (mutations) in non-coding regions of DNA are random and stochastic. Thus, if we can establish the relative number of substitutions in homologous regions of the genome in different pairs of critters, we have a good approximations of the relative amount of time that has passed since their last common ancestors split.

If we assume that substitutions occur at roughly the same rate, then the last common ancestor of primates and birds should have lived twice as long ago as that of primates and mice. Thus, molecular distance might provide the basis for a molecular clock.

But how do we hang numbers on these ages? To know we must calibrate the molecular clock with reference to the fossil record. For example, knowing on the basis of fossils that the minimum divergence age of primates and birds is 300 million years helps pin down that ages in the example at right. The beauty is that if we have a known phylogeny, we don't need fossil specimens to pin down each minimum divergence age.

In practice this technique is fraught with difficulty. The largest of which has already been mentioned: We can't assume that non-coding sections of the genome evolve at equivalent rates. Rates may be different:

Over very long time periods, as we approach the point where all locations on the DNA strand have experienced substitutions, the genome is saturated and comparisons become meaningless.

Thus, a molecular clock is a first order approximation.