Phylogenetic Systematics among the Sciences
Any phylogeny can be recovered: The history of phylogenetic systematics is one of increasing scope of application. Willi Hennig first envisioned its use as a means of reconstructing phylogeny for creatures with no significant fossil record, such as the insects he studied. A nice thing about the parsimony method of phylogeny reconstruction, however, is that it is robust to missing data. This makes the inclusion of fossil taxa simple and, as we have seen, they can be instrumental in resolving long branch attraction problems.
More recently, the method has been applied to other situations requiring the reconstruction of branching historic patterns, including:
- Epidemiology: The transmission of rapidly evolving pathogens like HIV can be mapped using phylogenetic methods. A classic example: the identification of the "dental clade" of HIV by Keith Crandall.
- Historical linguistics: Generally speaking, language evolution can be described as a branching phylogeny. Thus, historical linguists have adapted phylogenetic methods for establishing language phylogenies and reconstructing ancient languages with no historic record.
Cladograms are not ends in themselves: Cladists often behave as if phylogeny reconstruction were a self-evident end in itself. In fact, it's great value may be when it is applied to allied fields of science.
Reconstructing missing data using the Extant Phylogenetic Bracket: Phylogeny reconstruction algorithms like T.N.T. also reconstruct optimum character states at reconstructed tree nodes. From these reconstructions, hpothetical states for taxa with missing data can be reconstructed as well. In 1999, Larry Witmer coined useful vocabulary to describe how unknown character states for fossil taxa are reconstructed with respect to extant taxa called the extant phylogenetic bracket (EPB). This is the bracket formed on either side of a fossil taxon by living taxa that provide missing information on a character state. Using it, we can make three types of inference, listed in order of decreasing confidence. Consider the distribution of a soft-tissue character - the four-chambered heart - among three fossil reptiles:
- Type I Inference: the bird-like theropod Anchiornis is bracketed by birds and crocodilians, both of which have the derived character. With no contrary positive evidence, the simplest assumption is that Anchiornis had it also.
Euparkeria capensis from Wikipedia
- Type II Inference: The archosauriform Euparkeria, not quite an archosaur, is bracketed by crocodilians and birds above, and squamates below. Crocs and birds have the derived character, squamates don't. Thus, we are less secure than above in inferring it in Euparkeria , and even less secure the farther from the Archosaurian node we get. Still, the presence of some sort of hard tissue correlate of that trait, or a strong biomechanical argument for it might increase our confidence.
Claudiosaurus germaini from Wikipedia
- Type III Inference: The not-quite-saurian Claudiosaurus is bracketed by squamates and turtles, neither of which have the derived character. Our confidence in
its presence in the extinct form is very low. We would need strong positive fossil evidence to argue for its presence.
Anchiornis huxleyi in true colors by Michael diGeiorgio from Science Codex
Phylogenies meet the Rock Record
Phylogenies and Biostratigraphy: Traditional biostratigraphers - Geologists who use the fossil record to date sedimentary rock units tend to be literal-minded souls who think that either fossils of an organism or group are known to be present at a particular time, or they aren't. Hypotheses of phylogeny tell us when we ought to be cautious about such literal interpretations.
- Ghost Lineages and minimum divergence ages: When we know that two taxa are sister taxa (descendants of the same recent common ancestor), we in essence know that they originated at the same point in geologic time - the time of their last common ancestor and the speciation event that gave rise to them. Say we know one of these taxa from 100 million year old rocks, and the other from 90 million year old rocks. Even without seeing a fossil, we know that the second group must have representatives dating back at least to 100 million years, simply from its sister-taxon relationship with the other. 100 million years is the minimum divergence age of the monophyletic group formed by the two total groups. A lineage whose existence can be inferred from the cladogram, but which is not known from actual fossils is called a ghost lineage. The examination of ghost lineages should allow biostratigraphers to refine their models of the stratigraphic ages of organisms.
- Stratigraphic congruence:. How do we identify ghost lineages and measure their prevalence in a cladogram. All other things being equal, we expect the terminal taxa that branch off of a cladogram first to appear first in the fossil record. When this is true, the cladogram is said to be stratigraphically congruent. Often, cladograms are not stratigraphically congruent. This happens when there are long ghost lineages.
- Measuring Stratigraphic congruence:. A simple measure of stratigraphic congruence is the Stratigraphic Consistency Index (SCI) of John Huelsenbeck. To calculate it, count the number of stratigraphically consistent nodes in a cladogram. A node is stratigraphically consistent if both of the lineages emerging from it occur later than the node's sister taxon. Divide the number of consistent nodes with the total number of nodes in the cladogram to get the SCI.
- Example: Choristoderes.
In 1984, when Jacques Gauthier performed the first major cladistic analysis of diapsid reptiles, he got a surprise. Choristodera, whose members were known from large Late Cretaceous and Paleogene creatures like Champsosaurus, appeared to have branched off of the reptilian tree by the late Permian. Thus, they sat at the end of a ghost lineage that persisted for over 150 million years. This seemed highly improbable. Was the cladogram wrong or our understanding of the fossil record? As paleontologists began reexamining museum collections, it became apparent that some neglected partially preserved creatures were, in fact, choristoderes (E.G.: Early Jurassic Cteniogenys.) Today, the choristodere record starts in the late Triassic. Some research suggests an alternate phylogenetic placement for choristoderes that further shortens their ghost lineage.
Telling the Truth with Statistics
Correlation studies: Biologists are fond of performing statistical studies, looking for meaningful correlations between measurable aspects of different parts of animals' anatomies. For instance, one might look at the ratio of the length and depth of birds' beaks vs. the length of their tarsals to see if larger birds have proportionately deeper beaks. Measurements of this sort can be subjected to the full range of statistical analyses. A fundamental assumption of such studies is that all observations be made from independent members of the same underlying population.
However, when samples are being taken from groups of populations nested in a heirarchical phylogeny, this assumption is violated. The example at right shows of how misleading this can be. Imagine a simple phylogeny with two sister clades, one black, the other red. When we make a bivariate plot of data taken from them, there appears to be a strong statistical relationship. But in fact, when we compare values only within each clade, we see very low correlation.
Fortunately, Joe Felsenstein of the University of Washington emerged from the Felsenstein zone in 1985 to develop a statistical technique for adjusting data to account for the phylogeny of the taxa sampled, called phylogenetically independent contrasts (PIC). This method is based on the concept that although the taxa may be non-independent, the differences between measured values in them are independent. Statistical correlation techniques are, therefore, applied to pairwise contrasts in measurements from sister taxa. By applying it, meaningful correlation studies can be performed. Without it, they would be meaningless or misleading. Applying this method absolutely requires a known phylogeny. If hypotheses of phylogeny change, the results of correlation studies based on them must be revised, too.
The dance of Molecules and Morphologies
Molecular phylogenies: We can often identify homologous positions on the DNA strands of different creatures. The identity of the nucleotide at a given position (A, G, C, or T) is as good a phylogenetic character as any, as long as it is taken from a part of the genome that does not code for a functional protein. Molecular analyses offer:
- offer very large quantities of data - an advantage that is not to be despised
- but are only feasible only for living taxa
- and do nothing to break up long branches.
From Lost at Eminor
For example, Gaur, Hide, and Li, 1991 confidently reported the (morphologically) improbable result that guinea-pigs are more closely related to primates than to rats, based on an analysis of the mitochondrial genome. Two subsequent analyses vindicated this. The mitochondrial genome evolves very rapidly compared to the nuclear genome. Arguably, the rate of evolution was so fast that Gaur et al. were, in effect, analyzing noise rather than signal. In the early days, however, the mitochondrial genome was much easier to sequence.
From Practically Science
If we assume that substitutions occur at roughly the same rate, then the last common ancestor of primates and birds should have lived twice as long ago as that of primates and mice. Thus, molecular distance might provide the basis for a molecular clock.
But how do we hang numbers on these ages? To know we must calibrate the molecular clock with reference to the fossil record. For example, knowing on the basis of fossils that the minimum divergence age of primates and birds is 300 million years helps pin down that ages in the example at right. The beauty is that if we have a known phylogeny, we don't need fossil specimens to pin down each minimum divergence age.
In practice this technique is fraught with difficulty. The largest of which has already been mentioned: We can't assume that non-coding sections of the genome evolve at equivalent rates. Rates may be different:
- In different parts of the genome:
- Nuclear vs. cytoplasmic genomes
- We don't know that the non-coding genome doesn't experience stabilizing selective pressure.
- In different lineages
- Creatures with longer generation times (E.G. turtles) evolve more slowly than those with short generation times (viruses).
- Population size effects rate of evolution through genetic drift
Over very long time periods, as we approach the point where all locations on the DNA strand have experienced substitutions, the genome is saturated and comparisons become meaningless.
Thus, a molecular clock is a first order approximation.