Am. J. Bot. Botany 2008 Ad
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (108)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Sanderson, M. J.
Right arrow Articles by Doyle, J. A.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Sanderson, M. J.
Right arrow Articles by Doyle, J. A.
Agricola
Right arrow Articles by Sanderson, M. J.
Right arrow Articles by Doyle, J. A.
(American Journal of Botany. 2001;88:1499-1516.)
© 2001 Botanical Society of America, Inc.


Systematics, Phytogeography, and Evolution

Sources of error and confidence intervals in estimating the age of angiosperms from rbcL and 18S rDNA data1

Michael J. Sanderson2 and James A. Doyle

Section of Evolution and Ecology, University of California, Davis, California 95616 USA

Received for publication August 3, 2000. Accepted for publication February 13, 2001.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 LITERATURE CITED
 
Molecular estimates of the age of angiosperms have varied widely, and many greatly predate the Early Cretaceous appearance of angiosperms in the fossil record, but there have been few attempts to assess confidence limits on ages. Experiments with rbcL and 18S data using maximum likelihood suggest that previous angiosperm age estimates were too old because they assumed equal rates across sites—use of a gamma distribution of rates to correct for site-to-site variation gives 10–30 my (million years) younger ages—and relied on herbaceous angiosperm taxa with high rates of molecular evolution. Ages based on first and second codon positions of rbcL are markedly older than those based on third positions, which conflict with the fossil record in being too young, but all examined data partitions of rbcL and 18S depart substantially from a molecular clock. Age estimates are surprisingly insensitive to different views on seed-plant relationships. Randomization schemes were used to quantify confidence intervals due to phylogenetic uncertainty, substitutional noise, and lineage effects (deviations from a molecular clock). Estimates of the age of crown-group angiosperms range from 68 to 281 mya (million years ago), depending on data, tree, and assumptions, with most ~140–190 mya (Early Jurassic–earliest Cretaceous). Approximate 95% confidence intervals on ages are wider for rbcL than 18S, ranging up to 160 my for phylogenetic uncertainty, 90 my for substitutional noise, and 70 my for lineage effects. These intervals overlap the oldest occurrences of angiosperms in the fossil record, as well as some estimates from previous molecular studies.

Key Words: angiosperms • confidence intervals • fossil record • molecular clock • rbcL • 18S rDNA


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 LITERATURE CITED
 
The age of the angiosperms has long been a topic of controversy in plant evolution. Traditionally, this problem was addressed from a paleobotanical point of view, but in recent years studies based on the hypothesis of a molecular clock have added a new perspective (Martin, Gierl, and Saedler, 1989 ; Wolfe et al., 1989 ; Brandl, Mann, and Sprinzl, 1992 ; Martin et al., 1993 ; Laroche, Li, and Bousquet, 1995 ; Goremykin, Hansmann, and Martin, 1997 ; Sanderson, 1997 ). Since these analyses conflict both with interpretations of the fossil record and with each other, they relate not only to paleobotanical assumptions but also to the general validity of the molecular clock, a major issue in molecular evolution (Fitch, 1976 ; Gillespie, 1991 ; Ayala, 1997 ).

In this paper, we address the possibility that some of the apparent conflict between molecular and fossil estimates may stem from insufficient attention to sources of error and assessment of confidence limits on age estimates based on molecular data. Because of the potential importance of deviations from true global rate constancy, we consider a much larger sample of taxa than previous age studies. First, we present experiments with data from two genes that have been widely studied for this and related problems, the chloroplast gene rbcL and 18S nuclear rDNA (ribosomal DNA), which suggest that errors in tree topology and variation in rates among lineages can lead to erroneous age estimates. Second, we attempt to obtain a more reliable assessment of the confidence interval on molecular age estimates based on rbcL and 18S data, which allows us to quantify several potential sources of error in these estimates.

Previous estimates
Until the 1960s, it was widely assumed that angiosperms originated long before their first unquestioned fossil record in the mid-Early Cretaceous, based on assignment of Cretaceous fossils (mostly leaves) to diverse and advanced extant taxa (Axelrod, 1952, 1970 ). However, more recent studies of fossil pollen, leaves, flowers, and fruits have indicated that Early Cretaceous angiosperms were far less advanced than previously believed and have painted a coherent picture of rapid morphological diversification, which in its specifics agrees with views on angiosperm evolution based on modern plants (Doyle, 1969, 1978 ; Muller, 1970, 1981 ; Doyle and Hickey, 1976 ; Friis and Crepet, 1987 ; Doyle and Donoghue, 1993 ; Friis, Pedersen, and Crane, 1994 ; Crane, Friis, and Pedersen, 1995 ). At present, the oldest definite angiosperm fossils are pollen grains of Valanginian or Hauterivian age, ~130 mya (million years ago) (Trevisan, 1988 ; Hughes, 1994 ; Brenner, 1996 ); a supposed Jurassic record (Sun et al., 1998 ) has been redated as Early Cretaceous (Swisher et al., 1999 ). These data suggest that angiosperms may have originated barely before their first fossil records, although they do not rule out the existence of older angiosperms that were rare and plesiomorphic.

The application of phylogenetic thinking to living and fossil seed plants has also affected this discussion. Any extant group has two ages: the age at which its stem lineage branched from the line leading to its extant sister group and the age of the most recent common ancestor of all its living members or the crown group (Hennig, 1965 ; Jefferies, 1979 ). Following Doyle and Donoghue (1993) , we restrict the term "angiosperms" to the crown group; this is the age addressed by molecular studies. Most phylogenetic analyses based on morphology have indicated that the sister group of angiosperms is Gnetales, Gnetales plus Bennettitales, or Caytonia (Crane, 1985 ; Doyle and Donoghue, 1986 ; Loconte and Stevenson, 1990 ; Rothwell and Serbet, 1994 ; Doyle, 1996 ). Since all these taxa are known back to the Late Triassic, these results imply that the angiosperm stem lineage is also this old. However, the crown group could be much younger, especially considering the many apomorphies that distinguish angiosperms from other seed plants and the plesiomorphic nature of Early Cretaceous fossils. Molecular analyses have generally refuted the relationship of angiosperms and Gnetales, and several indicate that angiosperms and extant gymnosperms are sister groups, pushing the angiosperm stem lineage back to the mid-Carboniferous (Goremykin et al., 1996 ; Chaw et al., 1997, 2000 ; Hansen et al., 1999 ; Qiu et al., 1999 ; Winter et al., 1999 ; Bowe, Coat, and dePamphilis, 2000 ; Donoghue and Doyle, 2000 ). However, this does not rule out a relationship of angiosperms with Mesozoic groups such as Bennettitales or Caytonia, and it does not relate directly to the age of the crown group.

The first molecular studies gave far older ages for the angiosperms than their oldest fossil records. Ramshaw et al. (1972) obtained an estimate of 350–420 mya (Late Silurian-Mississippian) based on amino acid sequences of cytochrome c, calibrated with the bird–mammal split. Using nonsynonymous substitutions in the nuclear gene gapC, calibrated with the animal fossil record and the presumed divergence of plants, animals, and fungi at 1000 mya, Martin, Gierl, and Saedler (1989) dated the split between monocots (two grasses) and dicots (Magnolia and six eudicots) as 319 mya (mid-Carboniferous). This is more than twice the age of the oldest fossils; at that time, the most advanced known seed plants were "seed ferns" more plesiomorphic than all living seed plants, to say nothing of angiosperms. Martin, Gierl, and Saedler (1989) dismissed the concept of a Cretaceous origin as based on negative evidence and suggested that their results favored the views of Axelrod (1952, 1970) . However, Crane et al. (1989) argued that the conflict with the fossil record is not so easy to explain away. In particular, Martin, Gierl, and Saedler dated the common ancestor of eudicots as 276 mya (Permian), but eudicots (a strongly supported monophyletic group: Chase et al., 1993 ; Soltis et al., 1998 ; Qiu et al., 1999 ; Soltis, Soltis, and Chase, 1999 ) are united by tricolpate pollen, which has a dense fossil record, beginning in the late Barremian (120 mya: Doyle, 1992 ; Hughes, 1994 ) and becoming ubiquitous in the Albian (110 mya). Furthermore, Albian eudicots represent lines near the base of this clade (Doyle, 1998b ; Magallón, Crane, and Herendeen, 1999 ).

Subsequent studies made the improvement of calibrating dates with other land plants. Some have given more recent ages, though still pre-Cretaceous. Wolfe et al. (1989) dated the angiosperms as 200 mya (Early Jurassic), using rRNA (ribosomal RNA) sequences, several chloroplast genes, and two calibrations: the divergence of three grasses at 60 mya and the split of liverworts from other land plants at 400 mya (Early Devonian), which is probably 50 my (million years) too recent (vascular plant megafossils extend back to the Middle Silurian and land plant spores to the Middle Ordovician: Kenrick and Crane, 1997 ). For rRNA, they also had a cycad sequence; this diverged from angiosperms at 340 mya (Mississippian), which is consistent with fossil data. Laroche, Li, and Bousquet (1995) also dated angiosperms at 200 mya, based on nonsynonymous substitutions in several mitochondrial genes, calibrated with grasses and legumes. However, other studies with improved calibrations have given older ages. Martin et al. (1993) added a liverwort and a conifer and used nonsynonymous substitutions in both gapC and rbcL; assuming that liverworts diverged at 450 mya (Late Ordovician) and conifers at 330 mya (Late Mississipian), they dated the monocot–dicot split as 300 mya (Late Pennsylvanian). In a study of chloroplast transfer RNAs, calibrated with divergence of a liverwort and two grasses, Brandl, Mann, and Sprinzl (1992) also obtained a 300 mya age for angiosperms.

The youngest estimate so far was obtained by Goremykin, Hansmann, and Martin (1997) , based on protein sequences of 58 genes from six completely sequenced chloroplast genomes (Porphyra, Marchantia, Pinus, Nicotiana, Oryza, Zea). Assuming that Marchantia diverged at 450 mya, these authors dated the angiosperms as 160 mya (Late Jurassic) and the split between Pinus and angiosperms as 348 mya (Early Carboniferous), which they noted is more congruent with fossil evidence than their earlier results (Martin, Gierl, and Saedler, 1989 ; Martin et al., 1993 ). However, they found strong lineage-specific rate variation in the two grass genomes and therefore calculated the angiosperm age from the root node to Nicotiana only. Thus, although their analysis used an unprecedented number of genes, their dates were based on a very small number of taxa.

Sanderson (1997) used an experimental method (NPRS) for reconstructing ages in the absence of a molecular clock, which smooths local variations in rates by an optimization algorithm. Based on 36 land plant rbcL sequences and a land plant calibration of 450 mya, he obtained an estimate of 165 mya (Middle Jurassic). Using the same rbcL data set, Thorne, Kishino, and Painter (1998, fig. 3) used a model-based Bayesian approach to calculate that the angiosperm root node is 51% as old as the most recent common ancestor of vascular plants (i.e., ~200 mya, Early Jurassic). Both methods assume an autocorrelation in rates of molecular evolution across the tree, the presence or magnitude of which has yet to be determined.

Sources of error in estimating divergence times
These dates are in considerable conflict with each other and with the fossil record. Some of this conflict can be attributed to biases in the data or the statistical estimation methods used, but much of it is probably due to stochastic and deterministic aspects of the molecular evolutionary process itself, especially rate variation across lineages, or "lineage effects" (Britten, 1986 ; Gillespie, 1991 ; Gaut, Muse, and Clegg, 1993; Avise, 1994 ; Clegg et al., 1994 ; Nickrent and Starr, 1994 ; Li, 1997 ; Yang and Nielsen, 1998 ). Even with a stochastically constant rate, substitutional noise imposes an absolute lower bound on errors in age estimates (Kumar, Tamura, and Nei, 1993 ; Hillis, Mable, and Moritz, 1996 ). Variation in rate across sites causes sequence divergences to be estimated incorrectly, most severely at high rates (Gillespie, 1986 ; Yang, 1996 ) and high rate variability (Kelly and Rice, 1996 ; Miyamoto and Fitch, 1996 ; Yang, 1996 ). Still other errors relate to the underlying phylogenetic context for molecular divergence, including incorrect phylogenies and calibrations that associate fossil ages with the wrong nodes of a tree.

Several of the angiosperm studies reported the error rate in estimation of branch lengths due to substitutional noise (e.g., Goremykin, Hansmann, and Martin, 1997 ), but only Martin, Gierl, and Saedler (1989) , Martin et al. (1993) , and Sanderson (1997) used it to assess the corresponding errors in age estimates. Several studies tested for lineage effects, but only Wolfe et al. (1989) assessed the error component due to these. Wolfe et al. (1989) , Brandl, Mann, and Sprinzl (1992) , Laroche, Li, and Bousquet (1995) , and Goremykin, Hansmann, and Martin (1997) considered calibration error (although the last authors, concluding that substitutional noise was relatively low, subsumed it in the calibration error). None of these studies considered between-site sequence rate heterogeneity or choice of the tree used in deriving age estimates. The ideal tree, of course, would be the true tree. Most studies have used trees derived from phylogenetic analysis of each gene under study, but many of these are clearly incorrect as species trees, since they differ from each other.

In order to evaluate these results, we undertook our own analyses of rbcL and 18S data, designed to probe the various sources of error, reasons why estimates have varied so much, and ways to obtain better estimates. Our taxon sampling (modified from Sanderson, 1997 ) was designed to span critical nodes, provide an adequate sample of extant outgroups, and allow comparisons with previous studies and fossil evidence on the ages of nodes. First, we present a series of analyses that illustrate the effect of various factors on point estimates of the age of angiosperms: variations in tree topology, models for nucleotide substitution (with and without rate variation), sampling of taxa with different rates of evolution (lineage effects), and use of first and second vs. third codon positions (an approximation of nonsynonymous vs. synonymous substitutions). Second, we present a series of resampling experiments designed to provide a statistical estimate of the relative magnitude of errors due to these factors.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 LITERATURE CITED
 
Sequence data
We used published sequences and alignments of the chloroplast rbcL gene (1428 bp; Chase et al., 1993 ) and the 18S rDNA gene (~1842 bp, excluding poorly aligned segments; Chaw et al., 1997 ; Soltis et al., 1997 ), supplemented by a few sequences from GenBank. Data sets and published references for species used, vouchers, and alignments are provided at http://loco.ucdavis.edu/sandlab/sl.htm. Both genes have been widely studied in seed-plant phylogenetics, sampled across a large number and diversity of taxa, and subjected to intense scrutiny with regard to methodological issues such as large data sets and measures of support (Rice, Donoghue, and Olmstead, 1997 ; Källersjö et al., 1998 ; Soltis et al., 1998 ).

Taxa sampled
The 37 taxa in our data set comprise 22 angiosperms, 9 other seed plants, 5 other land plants, and Chara, one of the most closely related green algae, to root land plants (Mishler et al., 1994 ).

To span the root node of extant angiosperms, we included a variety of "magnoliid" taxa, based on current understanding of angiosperm relationships. Analyses of atpB (Savolainen et al., 2000) , phytochrome genes (Mathews and Donoghue, 1999 ), a combined 18S, rbcL, and atpB data set (Soltis et al., 1998 ; Soltis, Soltis, and Chase, 1999 ), and five-gene data sets including mitochondrial genes (Parkinson, Adams, and Palmer, 1999 ; Qiu et al., 1999 ) indicate that Amborella is the sister group of all other angiosperms, followed by Nymphaeales and then a clade consisting of Austrobaileya, Trimeniaceae, and Illiciales, in agreement with earlier analyses that placed Nymphaeales at the base of angiosperms (Hamby and Zimmer, 1992 ; Doyle, Donoghue, and Zimmer, 1994 ; Goremykin et al., 1996 ). Other analyses link Amborella with Nymphaeales or reverse these two taxa (Barkman et al., 2000 ; Graham and Olmstead, 2000 ; Qiu et al., 2000) , but these lines are still basal to other angiosperms. We represented these basal lines with Amborella, Nymphaea, and Austrobaileya, and other magnoliid clades (APG, 1998 ; Qiu et al., 1999 ; Soltis, Soltis, and Chase, 1999 ) with Magnolia (Magnoliales), Calycanthus and either Persea or Sassafras (Laurales), Drimys (Winteraceae), Saururus (Piperales), and Chloranthus (Chloranthaceae). We did not include Ceratophyllum, which is sister to all other angiosperms in trees based on rbcL (Chase et al., 1993 ), because it is never basal in analyses of other genes. If we had included Ceratophyllum, it would be unclear to what extent our conclusions were a function of this anomalous rooting, without performing additional experiments with topological constraints.

For other seed plants, we included the three genera of Gnetales, Ginkgo, and Cycas and Zamia, the latter representing the basal split in Cycadales. Pinaceae (plus Gnetales in some studies) are the sister group of other conifers in molecular analyses (Chaw et al., 1997, 2000 ; Stefanovic et al., 1998 ; Qiu et al., 1999 ; Bowe, Coat, and dePamphilis, 2000 ); to span the basal conifer node, we used Picea (Pinaceae), Podocarpus (Podocarpaceae), and Taxus (Taxaceae). In ferns, Osmunda represents Osmundaceae, the probable sister group of other Filicales (Pryer, Smith, and Skog, 1995 ), exemplified by Asplenium. Marchantia represents liverworts, which morphological and some molecular analyses identify as the sister group of other land plants (Mishler et al., 1994 ; Qiu et al., 1998 ). Although other molecular analyses place anthocerotes in this position (Nickrent et al., 2000) , this should not be critical for our purposes, since Marchantia is the only bryophytic group in our data set, and at worst Marchantia represents a clade that diverged just one node above the base of land plants.

For 30 species sequences were available for both genes. For the seven other taxa, we used a different exemplar of the same family for the two genes (18S/rbcL): Nageia/Afrocarpus (Podocarpaceae); Sassafras/Persea (Lauraceae); Calla/Spathiphyllum (Araceae); Veitchia/Drymophloeus (Palmae); Buxus/Pachysandra (Buxaceae); Arctostaphylos/Enkianthus (Ericaceae); Brunfelsia/Nicotiana (Solanaceae). This procedure may introduce some error because of changes in rate of evolution within families, but presumably these tend to be smaller than changes between families.

Trees
Because one of our goals was to clarify the effect of tree topology on age estimates, we examined a series of eight "standard" trees. Three of these were found by normal parsimony analysis of rbcL and 18S; the other five, intended to represent a range of current hypotheses on seed-plant phylogeny, were obtained by imposing topological constraints during parsimony analysis of rbcL, 18S, or the two data sets combined. Some of these constraints are not directly relevant to seed-plant relationships but were needed to correct anomalies elsewhere in the tree (e.g., in rooting of vascular plants or of angiosperms). These constraints and the reasoning behind their selection are described at the point where each tree is first discussed in the Results section. For these analyses, we used PAUP 3.1 (Swofford, 1991 ) to find most parsimonious trees, with 100 replicates using stepwise random addition of taxa, MULPARS (multiple most parsimonious trees), TBR (tree bisection-reconnection) branch swapping, and holding one tree at each step. For several subsequent analyses we used one of these trees, designated the "gnetifer" tree, in which Gnetales are the sister group of conifers and angiosperms are the sister group of other seed plants, as indicated by 18S data (Chaw et al., 1997, 2000 ; Bowe, Coat, and dePamphilis, 2000 ). Recent multigene analyses (Qiu et al., 1999 ; Bowe, Coat, and dePamphilis, 2000 ; Chaw et al., 2000 ) have produced somewhat different "gnepine" trees in which Gnetales are nested within now-paraphyletic conifers, linked with Pinaceae, but the gnetifer tree is more consistent with loss of the inverted repeat in the chloroplast genome of conifers but not Gnetales (Raubeson and Jansen, 1992b ). For comparisons with trees of Martin, Gierl, and Saedler (1989) and Martin et al. (1993) , we also examined trees including only three angiosperms comparable to those in their study, plus three other subsets of angiosperm taxa, designed to address problems of variation in rates of evolution.

Preliminary hypothesis testing
Prior to estimating ages, we undertook a round of hypothesis testing to infer the tempo and mode of evolution of these genes. We used ML (maximum likelihood) methods (Swofford et al., 1996 ; Huelsenbeck and Rannala, 1997 ) for estimation of evolutionary parameters and hypothesis testing. Several models of nucleotide substitution were examined, differing in complexity and number of parameters. The F81 ("Felsenstein 1981"), HKY85 ("Hasegawa-Kishino-Yano 1985"), and GTR (general time-reversible) models estimate one, two, and six parameters in the rate matrix, respectively (Swofford et al., 1996 ). Site-to-site rate variation was implemented using a gamma distribution of rates (denoted by adding "+ {Gamma}" to the acronyms above, and referred to as "gamma" in the following discussion). The shape parameter of the gamma distribution is estimated from the data using a four-category discrete approximation. In the absence of rate constancy across lineages, there are also 2N – 2 branch length parameters to be estimated, where N is the number of taxa. Any of these models can have the additional assumption of rate constancy across lineages (molecular clock). This reduces the number of parameters associated with the tree to N – 2 internal node times (plus one overall rate). Clock models will be denoted by adding the suffix "+ cl" to the model's acronym. Unless otherwise noted, all ML analyses used PAUP* 4.0 (Swofford, 2000) . In general, estimation of model parameters (other than branch lengths) is fairly insensitive to topology (Yang, Goldman, and Friday, 1995 ). Therefore, preliminary analyses were run only on the gnetifer tree.

Likelihood ratio tests of one substitution model against a more complex alternative were used to test for goodness of fit of the model to the data (Huelsenbeck and Rannala, 1997 ), using the gnetifer tree. Degrees of freedom for the test are equal to the difference in the number of free parameters between the models. Models with and without rate variation across sites were tested against each other by assuming that both have gamma-distributed rates, but in one the shape parameter was left free, whereas in the other it was set to correspond to a constant rate across sites (by setting the shape parameter to infinity: Swofford et al., 1996 ). A complete battery of tests was run both with and without the assumption of a molecular clock.

Four "data partitions" were constructed a priori, consisting of (1) the entire 18S gene, (2) the entire rbcL gene, (3) the first and second codon positions of rbcL, and (4) the third positions of rbcL. Differences in the mode of molecular evolution were examined in pairs of these partitions using a likelihood ratio test. For each test, the null hypothesis was that the two partitions evolved together according to the same model with one set of rate parameters. The alternative hypothesis was that each evolved according to a separate model with two different sets of rate parameters. Likelihood ratio tests were performed on each of the standard trees. On a given tree the log likelihood of the null hypothesis can be calculated directly in PAUP*. For the alternative, it is necessary to exclude one partition and calculate the log likelihood of the other partition, then do the reverse, and sum the two log likelihoods to find the overall likelihood of the alternative model. This is not the same as a "partition homogeneity test" (or ILD, incongruence length difference: Farris et al., 1995 ), which tests whether the phylogenetic signal is homogeneous across positions. Joint tests of more than two partitions at a time are possible, but high heterogeneity in the pairwise tests immediately indicated it was unnecessary (see Results). The HKY85 + {Gamma} substitution model was assumed in all tests, based on results from tests on the substitution model described above. The degrees of freedom are calculated as follows. For the model associated with one partition, there are two rate parameters, µ and {kappa}, associated with the substitution matrix (Swofford et al., 1996 ), one shape parameter associated with the gamma-distributed rate variation, plus 2N – 2 = 35 branch length parameters, for a total of 38 parameters. If the genes were allowed to evolve according to separate models, the joint model would have 76 parameters. The null model, that two partitions combined are evolving according to a common model, has 38 parameters again, so the df are 76 – 38 = 38.

A likelihood ratio test was used to determine whether rates were constant across lineages (Felsenstein, 1988 ). The null model was HKY85 + {Gamma} + cl with the alternative being HKY85 + {Gamma}. The number of degrees of freedom in the likelihood ratio test is N – 2 if the tree is fully resolved, where N is the number of taxa (Felsenstein, 1993 ). The test was performed separately for the four data partitions, on all eight of the standard trees, for a total of 32 tests. Critical values for all likelihood ratio tests were obtained under the assumption that –2 log (LR) is distributed approximately as {chi}2.

Point estimates of angiosperm age
The crown-group age of angiosperms was estimated by ML with PAUP*, assuming substitution models that include a molecular clock. Such analyses yield a tree that we call a "chronogram," in which branch lengths are proportional to time. Absolute ages are then assigned to individual nodes by calibrating some node in the tree. We calculated ages relative to the most recent common ancestor of land plants, to which we assigned an age of 450 mya (Late Ordovician), soon after the first appearance of land plant meiospores in the fossil record (Middle Ordovician). This is the same calibration used by other authors (e.g., Goremykin, Hansmann, and Martin, 1997 ). Such a fixed calibration should be distinguished from minimum or maximum age constraints on nodes, as used by Sanderson (1997) ; experiments with such constraints (Doyle, Magallón, and Sanderson, 2000 ) will be described elsewhere. Absolute ages for the geological time scale are based on Palmer (1983) .

Sensitivity analysis I: effects of gene, codon partition, model, and tree
To explore the sensitivity of age estimates to various factors, we first obtained such estimates under a wide range of specific conditions: different substitution models, genes, and codon partitions, and the set of eight standard trees. The effect of phylogenetic uncertainty, construed more broadly, is considered in the second set of analyses.

Sensitivity analysis II: effects of phylogenetic uncertainty, substitutional noise, and lineage effects
The factors described above entail finite and small numbers of alternatives, but other variables affecting age estimates entail a very large number of alternatives. Such factors include the phylogeny itself, which in reality must include many more possible alternatives than the eight treated here. Phylogenetic uncertainty has several sources, including substitutional noise (sampling from a finite number of stochastically evolving characters), which is often studied by bootstrapping (Felsenstein, 1985 ), and long-branch attraction, which is more difficult to detect (Felsenstein, 1978 ; Sanderson et al., 2000 ). Even if the phylogeny is essentially certain, substitutional noise introduces errors into age estimates on the tree, because of fluctuations in the numbers of substitutions occurring in a given interval of time. Finally, differences in rate between lineages may cause variation in age estimates.

To estimate the magnitude of error in age estimates due to phylogenetic uncertainty, we examined confidence sets of phylogenies (Sanderson, 1989 ; Sanderson and Wojciechowski, 1996 ; Baldwin and Sanderson, 1998 ) derived from the two genes. For each gene, one tree from each of 100 bootstrap replicates using parsimony (simple taxon addition sequence, MULPARS, TBR branch swapping, holding one tree at each step) was saved to a treefile (some replicates produced more than one most parsimonious tree). Maximum likelihood age estimation was then implemented on all of these trees using the original (unbootstrapped) data, the HKY85 + {Gamma} + cl substitution model, and calibration procedures described above under point estimates. The resulting chronograms were written to a treefile, which was in turn parsed by the program "r8s," which was used to calibrate node ages using the land plant calibration and to summarize the results across all the trees. This program is available from MJS at http://loco.ucdavis.edu/r8s/r8s.html.

The procedure just described estimates the effect of character sampling on topology. To estimate the magnitude of error from substitutional noise independent of topology, we fixed the tree and bootstrapped the characters repeatedly, estimating the age of angiosperms for each bootstrap replicate. Bootstrap data matrices were generated using the SEQBOOT program in PHYLIP (Felsenstein, 1993 ), but instead of being used to generate trees, these matrices were used to estimate the age of the angiosperm node on the gnetifer tree. This was accomplished by placing all 100 randomized matrices in a batch file and translating them to NEXUS format, with each data block followed by PAUP* commands directing PAUP* to perform ML estimation on the gnetifer tree. To test whether the estimates obtained are sensitive to tree topology, we performed the same analysis on one of the trees most different from the gnetifer tree, the most parsimonious rbcL tree with Oryza basal in angiosperms (Fig. 2).



View larger version (25K):
[in this window]
[in a new window]
 
Fig. 2. Time-calibrated phylogeny ("chronogram") of land plants. Phylogeny is one of 12 most parsimonious trees based on the rbcL data, rooted among monocots (with Oryza sister to all other angiosperms), with Chara removed after calculation of branch lengths to simplify comparisons. Ages were estimated by maximum likelihood from all positions of the rbcL gene. In this and subsequent chronograms, the model used was HKY85 + {Gamma} + cl (see text), the age of the angiosperm crown group is indicated next to the arrow, and the geological time scale is in millions of years ago (mya). Full names of geological periods: Ordovician, Silurian, Devonian, Mississippian, Pennsylvanian, Permian, Triassic, Jurassic, Cretaceous, Tertiary

 
To examine the sensitivity of age estimates to deviations from a molecular clock (lineage effects), we studied trees derived by pruning taxa from the gnetifer tree (Fig. 1). From this tree, a set of 100 13-taxon trees was constructed by keeping the same seven taxa in all replicates and randomly sampling six other taxa from the original 37-taxon data set. The six randomly sampled taxa were chosen such that the total number of angiosperms was always five, and the total number of nonangiosperm seed plants was four. The seven "fixed" taxa provide a constant "backbone" in every tree. They were Chara, Marchantia, Lycopodium, and Osmunda, chosen to circumscribe the basal land plant node and calibration point, Pinus to represent nonangiosperm seed plants, and Amborella and Nymphaea, chosen to ensure that the angiosperm root node was found in every one of these pruned trees. This procedure was implemented by using "r8s" to generate a file of PAUP* commands prescribing the random taxon samples from the original tree, combined with the necessary commands to perform ML estimation of parameters and times on the original unbootstrapped data.



View larger version (19K):
[in this window]
[in a new window]
 
Fig. 1. Taxon randomization scheme used to assess impact of variation in rates of substitution among lineages. Thirteen taxa are sampled from the data set of 37 taxa, and the tree is constructed by pruning all unsampled taxa from the gnetifer tree (see text). The seven solid lineages are kept in every sample. The six dashed lineages represent random samples of three angiosperms and three other seed plants from the 30 remaining taxa in the data set. Positions of dotted lineages exemplify one possible sample. The inclusion of Amborella and Nymphaea in every sample ensures that the root node of angiosperms on this tree will always be sampled

 

    RESULTS AND DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS AND DISCUSSION
 LITERATURE CITED
 
Preliminary hypothesis testing
The behavior of different substitution models on the gnetifer tree is illustrated in Table 1. Every model that is more complex than another model is significantly better at the P < 0.05 level, based on likelihood ratio tests (individual tests not shown). Models that allow site-to-site rate variation were always significantly better than models that assume a single rate across sites. Based on these results, the HKY85 + {Gamma} model was selected as a reasonable compromise among competing issues of bias, error variance, and running time (Zharkikh, 1994 ; Burnham and Anderson, 1998 ).


View this table:
[in this window]
[in a new window]
 
Table 1. Effect of substitution model on likelihoods and estimates of angiosperm age, using the gnetifer tree (described in the text). Top three rows are age estimates in millions of years ago (mya) for six substitution models, with and without gamma-distributed rate variation ({{Gamma}}+, {{Gamma}}–). 12 and 3 refer to codon partitions in rbcL. Bottom three rows are the negative log-likelihood scores

 
Likelihood ratio tests for gene effect differences between rbcL and 18S under the HKY85 + {Gamma} model were extremely significant across all eight of the standard trees (Table 2), whether or not a molecular clock was assumed. Codon positions within rbcL were even more heterogeneous. Clearly, the tempo and mode of evolution differ among these data partitions, and for this reason we performed separate age estimations on the different partitions.


View this table:
[in this window]
[in a new window]
 
Table 2. Likelihood ratio (LR) tests of homogeneity of data partitions with respect to substitution model HKY85 + {{Gamma}} for eight standard trees (described in the text). The first two data columns test the homogeneity of the two genes with respect to each other. The second two columns test the homogeneity of the two codon position partitions in rbcL (12, 3) with respect to each other. Presence or absence of a clock in the model is indicated by cl+ or cl–, respectively. The degrees of freedom are 38 for all tests (see text for explanation). Cutoff value for the P = 0.05 value of the LR test statistic, {{Lambda}} = –2 log LR, is 53.4, based on the {{chi}}2 distribution. Numbers larger than this indicate significant heterogeneity

 
Likelihood ratio tests for lineage effects were also highly significant across the set of standard trees in all four data partitions (Table 3), indicating strong departures from a molecular clock. Clearly, lineage effects are pervasive, as inferred by Gaut et al. (1992) in their study of rbcL in monocots. The value, {Lambda}, of the likelihood ratio statistic indicates the amount of departure from rate constancy, but {Lambda} values can only be compared within partitions. Generally, the trees that are most clocklike of the eight correspond to most parsimonious trees. For rbcL, the most clocklike tree for either codon partition is one of the most parsimonious trees derived from the rbcL data, namely the tree (almost surely incorrect) with Oryza basal in angiosperms. For 18S, the most clocklike tree is the most parsimonious tree derived from the 18S data. Reasons for this effect are suggested in the discussion of individual trees.


View this table:
[in this window]
[in a new window]
 
Table 3. Negative log likelihoods and results of likelihood ratio tests for lineage effects (deviations from a molecular clock). Substitution model is HKY85 + {{Gamma}}, with or without a clock. Presence or absence of a clock is indicated by cl+ or cl–, respectively. Codon partitions in rbcL are indicated by 12, 3, and All. Values indicated in columns headed by {{Lambda}} are –2 log LR, which below is always –2 ([cl–] – [cl+]), and values for cl+ and cl– are found in the two columns immediately to the left of {{Lambda}}. For a LR test of lineage effects on these trees, the degrees of freedom are 35. Any value greater than {{Lambda}} = 49.8 is significantly different from a clock at the P < 0.01 level, according to a {{chi}}2 distribution. All values in the table deviate significantly from a clock. Boldface values indicate either the maximum likelihood value across trees in a column (cl+, cl–) or the tree in which the LR statistic is closest to clock-like ({{Lambda}})

 
Point estimates of angiosperm ages
Given the strong departures from a molecular clock, it is reasonable to expect significant variation in age estimates using different methods or data partitions. Ages of angiosperms estimated from rbcL and 18S on the eight standard trees with the HKY85 + {Gamma} model, with and without gamma, and for first and second vs. third positions in rbcL, are presented in Table 4. Ages of other nodes of interest (especially seed plants, Gnetales, and eudicots) are given in the text or can be obtained from the chronograms.


View this table:
[in this window]
[in a new window]
 
Table 4. Maximum likelihood estimates for age of the angiosperm crown-group node (mya) for eight standard trees (described in the text). Codon partitions in rbcL are indicated by 12, 3, and All. Trees labelled "+ONP" and "–ONP" are a gnetifer tree that includes Oryza, Nicotiana, and Pisum as the only angiosperms, and a gnetifer tree that excludes these three taxa but includes all other angiosperms, respectively

 
Sensitivity analysis I
Dates in Table 4 show significant effects of inclusion vs. noninclusion of site-to-site rate variation in the model of molecular evolution. Dates calculated without gamma, as in earlier studies, assume an equal probability of change at all sites, but likelihood ratio tests (Table 1) show that this is a poor assumption. Most ages for angiosperms based on rbcL with gamma are younger, by ~20–30 my, than those estimated without gamma. The same effect is also seen in ages for eudicots, but its magnitude is less for older groups, such as seed plants. Because use of gamma is theoretically preferable, this suggests that previous studies systematically overestimated the age of angiosperms.

To gain insight into these results, we examine estimates from the standard trees in more detail. First we present results for rbcL, then for 18S. Although there are significant effects due to codon position in rbcL, for purposes of discussing lineage effects, topology, and their interaction, we first discuss ages based on all codon positions.

Two of the 12 most parsimonious trees derived from the rbcL data set are shown as chronograms in Figs. 2 and 3. In both trees, the rooting of seed plants agrees with that found in other analyses of rbcL (Albert et al., 1994 ), although not with analyses of morphology and other genes, in that Gnetales are the sister group of other seed plants. However, they differ radically in the rooting of the angiosperms, and this shows the potentially major effect of erroneous tree topologies on age estimates.

In Fig. 2 ("rbcL.MP.Oryza" in Tables 2–4), angiosperms are rooted among monocots, with Oryza (representing grasses) the sister group of all other angiosperms. This tree implies that the age of angiosperms is 224 mya without gamma, 214 mya with gamma (both Late Triassic). This rooting conflicts sharply with trees based on larger rbcL data sets, to say nothing of other molecular analyses and conventional views of angiosperm evolution, which nest grasses within monocots and monocots within angiosperms (e.g., Chase et al., 1993 ; Soltis, Soltis, and Chase, 1999 ). The magnoliid groups, usually thought to form a basal paraphyletic grade, instead form a clade nested well within the angiosperms.

In Fig. 3 ("rbcL.MP.Ambo" in Tables 2–4), angiosperms are rooted among magnoliids, with Amborella branching first, followed by Nymphaea and then Austrobaileya. This rooting agrees with the multigene analyses of Mathews and Donoghue (1999) , Parkinson, Adams, and Palmer (1999) , Qiu et al. (1999) , and Soltis, Soltis, and Chase (1999) . In this case, the estimated age of angiosperms is much younger: 143 mya without gamma (earliest Cretaceous) and 124 mya with gamma, actually younger than the oldest undisputed fossil angiosperms (Valanginian-Hauterivian, ~130 mya: Trevisan, 1988 ; Hughes, 1994 ; Brenner, 1996 ). Considering the very short branch between Amborella and Nymphaea, trees in which these two lines form a clade (Barkman et al., 2000 ; Graham and Olmstead, 2000 ; Qiu et al., 2000) would presumably give similar dates.



View larger version (26K):
[in this window]
[in a new window]
 
Fig. 3. Chronogram of one of 12 most parsimonious trees based on the rbcL data, rooted among magnoliids (with Amborella sister to all other angiosperms). Ages estimated by maximum likelihood from all positions of the rbcL gene

 
Although Fig. 3 agrees with other data on rooting of the angiosperms, both it and Fig. 2 conflict with other analyses on outgroup relationships. In both trees, Lycopodium is linked with Equisetum and ferns, whereas a chloroplast DNA inversion and morphological analyses (including fossil taxa) indicate that lycopsids (and related Devonian zosterophylls) are the sister group of all other extant vascular plants (Raubeson and Jansen, 1992a ; Kenrick and Crane, 1997 ). In Fig. 2, conifers are paraphyletic, with Ginkgo and cycads nested within them. In Fig. 3, conifers, Ginkgo, and cycads do not form a clade, but rather a paraphyletic grade, with angiosperms nested within it.

To evaluate the impact of these topological variations (some of which must be incorrect), we will use the tree in Fig. 4 ("rbcL.mincon" in Tables 2–4), one of 12 trees found by analyzing the rbcL data set with two constraints designed to bring outgroup relationships more in line with other data, forcing Lycopodium to the base of vascular plants and conifers into a clade (although some analyses have nested Gnetales in conifers, they have not done so for Ginkgo, cycads, or angiosperms). These trees are only three steps longer than the shortest trees (2707 rather than 2704). In Fig. 4 Amborella is basal in angiosperms (though Oryza is basal in other trees); other relationships are generally consistent with analyses of more taxa. Since this tree is almost as parsimonious as the shortest trees, consistent with other rbcL analyses of seed-plant phylogeny, and consistent with other data on the rooting of angiosperms, we will use it as a basis for discussion of the effect of various factors on age estimates derived from this gene. Henceforth all ages cited are based on gamma (see Table 4 for ages without gamma).



View larger version (28K):
[in this window]
[in a new window]
 
Fig. 4. Chronogram of tree based on parsimony analysis of the rbcL data with two phylogenetic constraints on relationships outside of angiosperms: forcing Lycopodium to be the sister group of remaining vascular plants, and forcing conifers to be monophyletic (see text for further discussion). Ages estimated by maximum likelihood from all positions of the rbcL gene

 
In Fig. 4, the inferred age of angiosperms is 139 mya (Berriasian, Early Cretaceous), close to the first fossil records of the group. The age of eudicots is 104 mya (Albian), slightly younger than the oldest known tricolpate pollen (late Barremian, 120 mya: Doyle, 1992 ; Hughes, 1994 ). However, some other ages conflict more strongly with the fossil record. Gnetales are dated as 218 mya (Late Triassic), whereas the fossil record suggests that crown-group Gnetales radiated in the Early Cretaceous; apparently related Triassic and Jurassic fossils seem to represent the stem lineage rather than the crown group (Crane, 1996 ; Doyle, 1996 ). More significantly, other ages are much too young. The conifer-Ginkgo-cycad clade is dated as 152 mya (Late Jurassic), whereas fossil relatives of cycads are known from the Early Permian (280 mya) and relatives of conifers from the Middle Pennsylvanian (310 mya) (Taylor and Taylor, 1993 )—over twice the molecular date. This is a more direct conflict with fossil evidence: older molecular dates can be explained by incompleteness of the fossil record, but not younger dates. If, alternatively, a date of 320 mya for the conifer-Ginkgo-cycad node is used to calibrate the tree, angiosperms are dated as 293 mya (Late Pennsylvanian), seed plants as 651 mya (Precambrian) rather than 309 mya, and land plants as 947 mya, all dates implying that the fossil record is grossly misleading.

Other anomalously young ages are seen within angiosperms. The Nelumbo-Platanus clade (Proteales; APG, 1998 ) is dated as 48 mya (Eocene), but both lines are known from the Albian, 100–110 mya (platanoid leaves and inflorescences, Nelumbites leaves and flowers: Friis, Crane, and Pedersen, 1988 ; Crane et al., 1993 ; Upchurch, Crane, and Drinnan, 1994 ). The Fagus-Carya clade (Fagales) is dated as 39 mya, but the line leading to Carya, represented by Normapolles pollen and associated flowers (Friis, 1983 ; Sims et al., 1999 ), extends back to the Cenomanian (95 mya). However, not all dates within angiosperms are too young—palms and grasses (commelinoids) diverge at 89 mya, and the oldest palm fossils are ~85 mya (Herendeen and Crane, 1995 ). The Calycanthus-Lauraceae clade (Laurales) is dated as 89 mya; fossils related to both groups extend back to the Albian, 100–110 mya (Drinnan et al., 1990 ; Friis et al., 1994 ).

These results are clearly related to inequality of rates—the fact that the data are not clocklike, as already indicated by likelihood ratio tests (Table 3). This is illustrated by Fig. 5, the tree in Fig. 4 plotted as a phylogram, so that branch lengths are proportional to the amount of molecular evolution. Within the angiosperms, some branches are long, notably Oryza, Pisum, and Solanaceae (represented in this data set by Nicotiana), all herbaceous groups. As noted above, this effect was recognized with rbcL by Bousquet et al. (1992) , Gaut et al. (1992) , and Eyre-Walker and Gaut (1997) , who suggested that the rate variation was related to habit and/or generation time. In the absence of a model of rate evolution (such as Thorne, Kishino, and Painter, 1998 ), it cannot be said whether evolution sped up in grasses (for example) or slowed down four times, in Saururus and the three monocot lines attached below them, but a parsimony argument would favor the former scenario. On the other hand, branches such as Platanus, Nelumbo, Fagus, and Carya are relatively short, which may explain the anomalously young ages obtained for Proteales and Fagales (because the likelihood method tends to equalize absolute substitution rates by "pulling" short branches toward the present). If these short branches are the result of slowing of molecular evolution, Platanus and Nelumbo may be "living fossils" in molecular as well as morphological terms, as suggested for Winteraceae by Suh et al. (1993) .



View larger version (23K):
[in this window]
[in a new window]
 
Fig. 5. Phylogeny of land plants in Fig. 4 presented as a phylogram, with branch lengths estimated by maximum likelihood. Lengths were estimated from all positions of the rbcL gene using the model HKY85 + {Gamma}. Differences in relative rates of substitution can be inferred by sister-group comparisons in this figure

 
Rate inequalities may also explain the tree rooted on Oryza (Fig. 2) and the older age of angiosperms inferred from that tree. This appears to be a case in which the root is "attracted" to a long branch (S. Graham, University of Alberta, personal communication) because some of the changes on this branch are reversals to outgroup states. Because the Oryza branch is so long, it pulls the basal node of the angiosperms back to the Triassic. This may also explain why the Oryza tree is more clocklike (Table 3)—when Oryza is basal, its branch length implies a substitution rate that is more like others in the tree. Similar effects may explain the young age of the conifer-Ginkgo-cycad clade (cycads and Ginkgo are unusually short branches) and the rooting of seed plants on Gnetales (a long branch; cf. Chaw et al., 2000) . This example may illustrate a general principle—that topological errors due to long-branch effects can lead to errors in age estimates as well.

These observations suggest that previous estimates of the age of angiosperms may have been biased by preferential sampling of herbaceous angiosperm lineages with accelerated rates of molecular evolution, such as Oryza, Pisum, and Nicotiana. To evaluate this effect, we calculated ages on the tree in Fig. 4 after removing all angiosperms except these three genera. Using just these taxa nearly doubles the inferred age of angiosperms, from 139 to 253 mya (Late Permian). Conversely, removing these three taxa lowers the age of angiosperms to 122 mya (Barremian).

Since branch lengths in Fig. 5 are especially variable in monocots and eudicots, it might be suggested that better age estimates could be obtained by considering only more basal lines, on the assumption that these may provide better evidence on original evolutionary rates. Following this reasoning, we removed the clade consisting of Saururus, monocots, and eudicots from the tree in Fig. 4. The resulting age is 98 mya (late Albian), more than 30 my younger than the first fossil records of the angiosperm crown group. Removing all angiosperms except Amborella, Nymphaea, and Austrobaileya, representing the first three branches in this analysis and others (Mathews and Donoghue, 1999 ; Parkinson, Adams, and Palmer, 1999 ; Qiu et al., 1999 ; Soltis, Soltis, and Chase, 1999 ), gives an even younger age, 85 mya (Santonian). This implies that rates in these basal lines were actually slower than the average rate in the outgroups, as well as in other angiosperms, as noted for Winteraceae by Suh et al. (1993) . This could be due to (1) deceleration on the angiosperm stem lineage, (2) parallel deceleration in the basal lines from higher rates during their initial radiation, and/or (3) acceleration in other lines. Establishing which of these scenarios is correct will be crucial for more accurate estimates of the age of angiosperms.

Other experiments were designed to assess the effect of uncertainties in seed-plant relationships, prompted by the fact that the arrangement based on rbcL conflicts with other analyses. Since the true tree is unknown, we used three trees with relevant taxa forced into arrangements found in other recent analyses, generated by analyzing the combined rbcL and 18S data sets with topological constraints ("anthophyte," "gnetifer," and "gnepine" in Tables 2–4).

The anthophyte tree (Fig. 6) is consistent with the morphological hypothesis that Gnetales are the closest living relatives of angiosperms (Crane, 1985 ; Doyle and Donoghue, 1986 ; Loconte and Stevenson, 1990 ; Rothwell and Serbet, 1994 ; Doyle, 1996 ). This is one of two trees found after forcing Lycopodium to the base of vascular plants, Gnetales and angiosperms into a clade, and Amborella to the base of the angiosperms (otherwise Solanaceae are basal). In Fig. 6, the base of the seed plants is a trichotomy, because the length of the branch subtending the clade of Gnetales plus angiosperms is zero for rbcL. This same trichotomy was observed in constrained anthophyte trees for the plastid genes psaA and psbB (Sanderson et al., 2000) . Thus there is no support for the anthophyte hypothesis in these genes. This change in topology has surprisingly little effect on the angiosperm age—it actually increases slightly from that based on the constrained rbcL analysis (Fig. 4), from 139 to 143 mya, near the beginning of the Early Cretaceous. It has more effect on the age of Gnetales, which decreases from 218 to 198 mya—as might be expected, since Gnetales are nested within seed plants, rather than basal.



View larger version (26K):
[in this window]
[in a new window]
 
Fig. 6. Chronogram of "anthophyte" tree based on parsimony analysis of the combined rbcL and 18S data, with Gnetales and angiosperms constrained to form a clade (see text for further discussion). Ages estimated by maximum likelihood from all positions of the rbcL gene

 
The gnetifer tree (Fig. 7) exemplifies the hypothesis that extant angiosperms and gymnosperms are sister groups and Gnetales are the sister group of conifers, as inferred from 18S (Chaw et al., 1997, 2000 ; Bowe, Coat, and dePamphilis, 2000 ). It is one of two trees obtained by forcing Lycopodium to the base of vascular plants, gymnosperms into a clade, and Gnetales together with conifers. As noted above, such trees do not necessarily mean that angiosperms extend back to the Carboniferous—this is true for the angiosperm stem lineage, but not for the crown group. In fact, the change in seed-plant relationships has only a negligible effect on the estimated age of angiosperms—141 rather than 139 mya. However, the age of the conifer-Ginkgo-cycad clade (now including Gnetales) increases dramatically, from 152 to 242 mya (Late Permian). This is still much younger than the age of 320 mya inferred from the fossil record, but it is closer to this date than ages based on any other trees investigated. Conversely, the age of Gnetales decreases to 170 mya (Middle Jurassic).



View larger version (26K):
[in this window]
[in a new window]
 
Fig. 7. Chronogram of "gnetifer" tree based on parsimony analysis of the combined rbcL and 18S data, with gymnosperms constrained to form a clade and Gnetales forced together with conifers (see text for further discussion). Ages estimated by maximum likelihood from all positions of the rbcL gene

 
The gnepine tree (Fig. 8) is modeled on multigene analyses in which Gnetales are not only related to conifers, but actually nested within them, linked with Pinaceae (Qiu et al., 1999 ; Bowe, Coat, and dePamphilis, 2000 ; Chaw et al., 2000 ). It is one of two trees obtained by forcing Lycopodium to the base of vascular plants, gymnosperms into a clade, and Gnetales together with Pinus. Again, this has little effect on the age of angiosperms—143 mya. However, the age of Gnetales decreases further, to 159 mya (Late Jurassic), still closer to the apparent radiation of crown-group Gnetales in the Early Cretaceous (Crane, 1996 ).



View larger version (27K):
[in this window]
[in a new window]
 
Fig. 8. Chronogram of "gnepine" tree based on parsimony analysis of the combined rbcL and 18S data, with gymnosperms constrained to form a clade and Gnetales nested within conifers as the sister group of Pinaceae (see text for further discussion). Ages estimated by maximum likelihood from all positions of the rbcL gene

 
The general conclusion of these experiments is that angiosperm age estimates are surprisingly (and encouragingly) insensitive to different views on seed-plant relationships, although ages of other groups (notably Gnetales and other seed plants) are strongly affected. However, the age of angiosperms is sensitive to the choice of angiosperm exemplars (e.g., using grasses and other herbaceous taxa gives an older age) and to rooting of the angiosperms (e.g., the age is older when grasses are basal).

The dates in Table 4 based on different codon positions in rbcL give insight into earlier studies that analyzed protein sequences or nonsynonymous substitutions (Martin, Gierl, and Saedler, 1989 ; Martin et al., 1993 ; Laroche, Li, and Bousquet, 1995 ), which can be approximated by analyzing first and second codon positions when the gene is highly conserved at the amino acid level. Martin et al. (1993) justified their approach by arguing that rbcL is "saturated" with synonymous substitutions at the level of seed plants; their age for angiosperms (300 mya, mid-Pennsylvanian) was much older than our estimates based on all positions. We investigated this factor on the gnetifer tree (Fig. 7). When dates are calculated based on first and second positions, the age of the angiosperms increases dramatically, from 141 to 211 mya (Late Triassic). When Oryza, Pisum, and Nicotiana are used as the only angiosperms (Fig. 9), the age increases still more, to 281 mya (Early Permian). In contrast to the pattern noted above, use of gamma increases these ages rather than decreasing them, but only slightly (e.g., from 273 to 281 mya in the last case, still Early Permian). These observations help explain the 300 mya date found by Martin et al. (1993) , since their analysis was based largely on herbaceous taxa. On the other hand, when only third positions are analyzed, the age of the angiosperms decreases to 88 mya (early Late Cretaceous), much younger than the oldest records of the group. In this case, use of gamma decreases the inferred age (from 121 mya without gamma). Overall, age estimates based on third positions are more sensitive to model choice than estimates based on first and second positions (Tables 1 and 4). This is expected if saturation is a problem, because "corrections" for saturation are model dependent and most likely to give variable results at high levels of sequence divergence.



View larger version (19K):
[in this window]
[in a new window]
 
Fig. 9. Chronogram of the "gnetifer" tree (Fig. 7 ) with all angiosperms except Oryza, Pisum, and Solanaceae (Nicotiana) removed, and ages estimated by maximum likelihood from first and second condon positions of rbcL only

 
Although this exercise does rule out ages based on third codon positions, it does not say which of the other dates is most nearly correct. The argument that rbcL is saturated with synonymous substitutions does not necessarily mean that use of nonsynonymous substitutions is more accurate, since the likelihood ratio tests (