|
|
||||||||
2 Department of Botany, Box 355325, University of Washington, Seattle, Washington, 98195-5325 USA
Received for publication April 22, 1999. Accepted for publication February 29, 2000.
| ABSTRACT |
|---|
|
|
|---|
Key Words: Amborella atpB basal angiosperm phylogeny chloroplast introns long-branch attraction NADH dehydrogenase genes Photosystem II genes primer design rbcL ribosomal protein genes
| INTRODUCTION |
|---|
|
|
|---|
The fossil record of the flowering plants extends back at least 130 million years (see Crane, 1993
; Doyle and Donoghue, 1993
; Crane, Friis, and Pedersen, 1995
). Several lines of molecular evidence suggest an even earlier origin of the crown angiosperm group (reviewed in Sytsma and Baum, 1996
; Li, 1997
), and the available paleobotanical data do not rule out an extended period of unrecorded angiosperm evolution (Crane, 1993
). Regardless of this uncertainty, very long branches subtend the earliest diverging or "most basal" lineages of the angiosperms in phylogenetic analysis. In contrast, a preponderance of relatively short internodes around the base of the angiosperms (e.g., Chase, 1993
) suggests a relatively rapid diversification of many of the extant basal lineages. The combination of deep lineages with short internal branches is known to have the potential for yielding strongly misleading results, the phenomenon of statistical inconsistency, or "long-branch attraction" (Felsenstein, 1978
; Hendy and Penny, 1989
).
Long branches may be difficult or impossible to divide by additional taxon sampling for at least some basal lineages, such as Amborellaceae and Ceratophyllaceae, that are represented by only one or a few extant taxa. Swofford and Poe (1999)
caution that adding taxa to break up long branches will not always improve the consistency of phylogenetic estimation, the ability to converge on the correct answer with increasing amounts of data. Compounding the problem of long branches in basal angiosperm lineages, the extant seed-plant groups, the angiosperms, conifers, cycads, Ginkgo and Gnetales, represent phylogenetically disjunct remnants of a more ancient radiation. Correspondingly long branches separate the major living seed-plant groups. One solution to circumventing long-branch attraction, in the absence of living lineages that could span this divide, may be to employ conservatively evolving characters (Felsenstein, 1983
).
Sampling error on short internal branches is an additional, poorly recognized source of ambiguous and misleading phylogenetic inference (Rodrigo et al., 1993
; Page, 1996
; Graham et al., 1998
). The accuracy of phylogenetic inference may be more easily improved by adding characters to detect changes on short internal branches, than by adding additional taxa (Swofford and Poe, 1999
). However, if slowly evolving regions are to be used to address basal angiosperm relationships, more characters in total must be sampled to assure sufficient resolution of deep, but short, internal branches (see also Donoghue and Sanderson, 1992
). Our approach to the twin problems of long-branch attraction and sampling error is to obtain relatively massive amounts of slowly evolving characters per taxon. This approach complements recent studies of basal angiosperm relationships that examined a greater number of taxa for fewer genes (Mathews and Donoghue, 1999
; Qiu et al., 1999
; Soltis, Soltis, and Chase, 1999
; Parkinson, Adams, and Palmer, 1999
). Fortunately, automated DNA sequencing technology now permits the rapid collection of a large number of characters. Primers are available for amplifying and sequencing three chloroplast genes that have been used successfully in studies of a broad array of angiosperm groups: rbcL (Zurawski, Clegg, and Brown, 1984
); ndhF (Olmstead and Sweere, 1994
; Kim and Jansen, 1995
); and atpB (Hoot, Culham, and Crane, 1995
). We describe here primers for 14 additional genes that are useful for amplifying and sequencing across the seed plants. These genes include three highly conserved introns and span five additional chloroplast regions.
The five additional regions examined include complete or partial coding sequences for genes from the Large Single Copy (LSC) and Inverted Repeat (IR) regions of the chloroplast genome (see Table 2). Of the 14 additional genes considered, ten are Photosystem II (psb) genes located in three distinct LSC regions. The remaining four are located in two IR regions in Nicotiana, Oryza, and Zea. They comprise three ribosomal protein genes and a gene for another NADH dehydrogenase (ndh) subunit. These genes were chosen to be at least as slowly evolving as those currently used to address basal angiosperm relationships. Genes in the IR have a six- to tenfold lower synonymous substitution rate than those in the single-copy regions (Wolfe, Li, and Sharp, 1987
; Goremykin et al., 1996
), and Photosystem II genes have some of the lowest synonymous substitution rates of single-copy chloroplast genes (Olmstead and Palmer, 1994
). They have a correspondingly low level of multiple change per site. An important implication of using sequences with low synonymous substitution rates is that they will have low site-to-site heterogeneity in substitution rates, which enables a better fit to models of phylogeny reconstruction. We examined previously published GenBank sequences for these regions (results not shown) and found that they also have a very low observed frequency of multiple change, with the IR sequences in particular showing a low amount of repeated change at each variable site. We obtained new data from these regions for a broad range of exemplar taxa, using novel primers described here. We demonstrate the utility of these genes in sorting out basal angiosperm relationships, and also show that caution should be exercised over the finding (Parkinson, Adams, and Palmer, 1999
; Qiu et al., 1999
; Soltis, Soltis, and Chase, 1999
) that the root node of the angiosperms has been definitively resolved (see also Graham et al., in press).
|
| MATERIALS AND METHODS |
|---|
|
|
|---|
|
|
|
100200 bp in length. The genes within each region are coordinately transcribed, although psbB, psbT, and psbH are part of a larger transcription unit, as are psbD and psbC (see Gruissem and Tonkyn, 1993
|
|
|
450530 bases apart on each strand, a distance chosen because it allows a reasonable overlap using an ABI Prism 377 automated sequencer (PE Biosystems, Foster City, California, USA). Staggered spacing of primers on the forward and reverse strands minimizes the danger of sequence signal expiring at the same point on different strands. The IGS regions were rejected as sites for primer design, but several primers were placed inside the three introns or at intron/exon boundaries, because these sequences were found to be slowly evolving. These primers (Figs. 15; Appendix) should fail in taxa lacking introns, but that did not apply to any taxa examined in this study.
Choosing primer sites with as little variation as possible across the aligned taxa was made easier by the generally low substitution rates in the chloroplast genome, particularly for the IR sequences. Sequences of Marchantia polymorpha, Pinus thunbergii, and an assortment of angiosperm taxa available in GenBank were considered for primer design in the different regions. Alignments for each region were obtained using Clustal W (Thompson, Higgins, and Gibson, 1994
). For primers in protein-coding sequences, the 3'-most base (the one leading into the sequence) was generally chosen to be a second-codon position, but occasionally a first- or third-codon position was used, the latter only if it belonged to a conserved codon for a twofold or nondegenerate amino acid. Variable sites within the primers regions were accounted for using partial nucleotide degeneracy in the primer sequence. Primers were assessed for duplex formation in Amplify 1.2 (Engels, 1993
) and were discarded if they had a Tm lower than
60°C or obvious hairpin regions. All primers were at least 20 bases long to maximize the Tm and, hence, the specificity of binding. Whenever possible, each primer was positioned so that it ended in a one or two base CG-clamp (G and C are the strong-pairing bases). Replacement or alternate primers were necessary in a few cases (primers with B and C prefixes in Figs. 15) where amplifications and cycle-sequencing reactions failed to work for all species.
Amplification and sequencing protocols
The following thermocycler profile was used for Polymerase Chain Reaction (PCR) amplifications: (1) initial denaturing at 94°C for 5 min; (2) 30 cycles of the following: denaturation at 94°C for 1 min, annealing at 45°C for 1 min, extension at 72°C for 2 min; (3) final extension at 72°C for 15 min. The reactions were performed in 50-µL volumes, using 25 pm of each primer. QIAquick PCR purification columns (QIAgen Inc., Valencia, California, USA) were used to purify PCR products following manufacturer's instructions, except that 30 µL of water preheated to 60°C was used for elution.
The primer pairs typically used in amplification are noted in the Appendix (see also Figs. 15), together with the sequencing primers (and alternates) used for each fragment. The following alternative amplification primer-pairs were occasionally used when the PCR products were absent or weak: region 1: 9F/14R instead of 9F/13R, and 12F/15R instead of 11F/14R; region 2: 20F/24R plus 21F/25R instead of 20F/25F; region 3: 40F/B47R instead of 40F/47R, 44F/51R instead of 44F/B51R; region 4: B55F/58R or 55F/B58R instead of 55F/58R. The atpB gene was sequenced for a few taxa where these were not already available, using the primers designed by Hoot, Culham, and Crane (1995)
. We sequenced ndhF using primers designed by Olmstead and Sweere (1994)
and Kim and Jansen (1995)
. Only the 5' end of ndhF (bases 291317 in tobacco) was included, because the 3' end of the gene exhibits extensive length variation, in combination with an increased substitution rate (Olmstead and Sweere, 1994
; Kim and Jansen, 1995
; Olmstead and Reeves, 1995
).
An ABI Prism dRhodamine terminator cycle sequencing ready reaction kit (PE Applied Biosystems) was used to set up sequencing reactions following the manufacturer's instructions, except that 35 ng of PCR product were used per half-reaction. For cycle sequencing, 25 cycles of the following conditions were used: (1) denaturation at 96°C for 10 s; (2) annealing at 45°C for 5 s; (3) extension at 60°C for 4 min. Individual cycle sequencing products were cleaned on a Sephadex column and precipitated using an unheated vacuum centrifuge for half an hour. Resuspended products were run on an ABI Prism 377 automated sequencer. All regions were sequenced at least twice for each taxon, and with a few minor exceptions these represent both forward and reverse strands. Because a large number of PCR and sequencing products were handled, a control for sample provenance was included by obtaining partial sequence from at least one replicate PCR product (from amplification reactions performed on different days) for each major region and taxon.
Data compilation
Sequencher 3.0 (Gene Codes Corp., Ann Arbor, Michigan, USA) was used to compile contiguous sequences from electrophoregrams generated on the automated sequencer. PCR primer sequences incorporated into sequenced products were excluded from contigs. Completed contigs were exported from Sequencher as text files and aligned across taxa using Clustal W (Thompson, Higgins, and Gibson, 1994
). Alignments exported in PHYLIP format (Felsenstein, 1995
) were then imported into Se-Al 1.0 (Rambaut, 1998
) for minor manual adjustment. Final alignments were then exported in PHYLIP noninterleaved format and imported into PAUP* 4.0 beta (Swofford, 1999
). This process was repeated for each region. Finally, a master file containing alignments for each region was assembled (with FORMAT set to "interleaved"). We were careful to maintain taxon order among noninterleaved alignments for the different regions, as taxon labels in interleaved files were not verified by the version of PAUP* we used. Gaps were treated as missing data in the analysis but retained as distinct records. Alignments are available on request from SWG.
Coordinates for gene, exon, and intron borders were determined by comparison with tobacco sequences, and this information was used to derive CHARSETs (character sets) in PAUP* to define the nucleotides used in each analysis. Finally, a separate binary character matrix (Table 3) was assembled for indel (insertion/deletion) characters in the intron and coding regions. This matrix was appended to the other characters in the master matrix. We excluded indels in IGS sequences from consideration because it was harder to determine their homology than in intron and coding sequences. Indels were also ignored if their homology could not be inferred unambiguously (in general this was only a problem for the outgroup seed plant taxa) or if they represented unique events in the outgroups.
|
Phylogenetic analysis
Only coding and intron sequences and indel characters were considered. The IGS sequences were excluded from phylogenetic analysis because it was often difficult to determine nucleotide homology. Heuristic searches were performed in PAUP* with all characters and character-state changes equally weighted. MULPARS and "Steepest Descent" options were activated, and 100 random addition replicates were performed for each search. All the data were analyzed simultaneously and also in the following data partitions: the IR data (coding and introns) combined; the Photosystem II genes combined; and the three remaining single-copy genes combined (atpB plus ndhF plus rbcL). These roughly equally sized data partitions were considered because they group the underlying data into at least two fairly natural groups: the IR sequences collectively evolve at a slower rate than the other sequences, and the psb genes all code for proteins in the Photosystem II complex. The appropriate indels were included for each data partition. All analyses were repeated with and without outgroup taxa, because these were by far the longest branches on the trees. The angiosperm subtree found from the most parsimonious tree from the combined analysis (see Results) was used to estimate the gamma-distribution shape parameter, alpha, available under the "Tree Scores/Likelihood" option in PAUP*.
Bootstrap analysis (Felsenstein, 1985
) was performed with the same search criteria, except that one random-order entry starting tree was used for each of the 100 bootstrap replicates. Bootstrap analysis provides biased, but usually conservative, estimates of the accuracy of individual clades (Hillis and Bull, 1993
). Hillis and Bull showed that branches supported by
70% or more replicates tend to be representative of the true phylogeny so long as rates of change are not very high or very unequal among lineages (see Felsenstein and Kishino, 1993
, for a slightly different interpretation). We refer to branches with at least this much support as "well-supported" while recognizing that phenomena such as long-branch attraction can also lead to erroneously high support values. Bootstrap analyses were repeated on the three data partitions. The mean bootstrap support from each of these data partitions also was determined for the angiosperm subtree inferred using all the data (i.e., the average bootstrap support from each data partition for the 13 nonterminal branches in this tree). The incongruence length difference (ILD) test of Farris et al. (1994)
was used to assess the significance of incongruence among these subsets of the available chloroplast data. The same heuristic search criteria were used as above, except that ten random addition replicates were used for each of the 100 permutation replicates.
| RESULTS |
|---|
|
|
|---|
A few genes also showed minor truncation or expansion of their inferred reading frame. Sequence beyond the truncation/expansion point was excluded in phylogenetic analysis, unless nucleotide homology could be assigned unambiguously. For example, the start codon inferred in published angiosperm ndhB sequences is upstream from that in Marchantia (GenBank accession GBAN-X04465) and Ginkgo, such that the 5' exon of ndhB is 1718 codons longer in the flowering plants. Because sequence upstream of the start codon in Ginkgo can be aligned unambiguously with the corresponding coding sequence in the angiosperms, this noncoding sequence was included in analyses.
RNA edit sites are known or suspected in several chloroplast genes (e.g., Maier et al., 1995
; Freyer, Kiefer-Meyer, and Kössel, 1997
). Apart from Pinus, Gnetum, Ginkgo, and the four monocot taxa, a previously reported edit site within the initiation codon of psbL (Kudla et al., 1992
; Bock et al., 1993
) is inferred in all the taxa we examined here, including Amborella. An edit from C to U is necessary to produce a functional translation initiation codon for this gene in these taxa. Because all the sequences we considered were derived from DNA, the RNA edit sites in this and other genes should not have an intrinsically misleading effect on the phylogenetic analysis we performed (Bowe and dePamphilis, 1996
).
In most cases the lengths of noncoding regions (introns and IGSs) did not vary greatly across the taxa, either within the angiosperms or across the seed-plant taxa (Table 4). The greatest range of length variation was in the IGS region between the psbB and psbT loci (Table 4). In all cases the standard deviation of length variation across examined sequences was <30 bp. This amount of length variation was thus not large enough to interfere with the ability to generate overlapping sequencing fragments for any region or taxon examined here.
|
|
|
|
|
The IR data provided good support for most of the structure on the angiosperm subtree inferred from all the data, despite having fewer informative characters than either single-copy data partition. The mean support for the subtree from this data partition was 61% (Table 5). The IR data found more than one most parsimonious tree, but one of these was the closest in shape to the tree inferred from all of the data combined (two symmetric difference units; Table 5) of any tree found by the three data partitions considered. As might be expected for such slowly evolving characters, the IR data had by far the lowest amount of homoplasy as measured by two different homoplasy estimators, the consistency index, CI, and the retention index, RI (Table 5).
All of the genes were inferred to be very slowly evolving. No change was inferred at a majority of characters (78%) across the angiosperm tree derived from all the data (88, 68, and 77% of all characters were invariant for the combined IR data, the combined atpB + ndhF + rbcL data, and the combined psb data, respectively). The IR data had the lowest parsimony-based estimate of the gamma-distribution shape parameter alpha (Table 5). This is in part a function of its high number of apparently invariant characters, but also because three-quarters of all variable characters were inferred to change only once for this data class. A narrow majority of variable characters in the other two data partitions changed more than once (Fig. 8).
|
Two of the data sets also provided moderate to strong support for a taxon bipartition (branch) that rejects Gnetales as the sister group of the angiosperms. A (Gnetum, Pinus) taxon bipartition (Fig. 6) was supported by 58% of bootstrap replicates for the IR data, and 100% for the combined atpB + ndhF + rbcL data. This is in line with several recent molecular studies of the nuclear gene RPB2 (A. Denton and B. Hall; personal communication), 18S rRNA (Chaw et al., 1997
) and the five-gene, three-genome study of Qiu et al. (1999)
. In contrast, the Photosystem II data support a (Gnetum, angiosperms) bipartition in 99% of bootstrap replicates, a result in line with the recent three-gene study of Soltis, Soltis, and Chase (1999)
.
The strong disagreement concerning outgroup relationships and the more modest one involving the local position of Lactoris are the only cases where conflicting tree structure among data partitions was well supported by bootstrap analysis. When the ILD test of Farris et al. (1994)
was performed on the three data partitions, significant or nearly significant heterogeneity was found with the outgroup taxa included or excluded (P = 0.01 and 0.07, respectively). By excluding Lactoris and the three outgroup taxa, no significant heterogeneity was indicated (P = 0.26).
Analyses with Amborella included
When Amborella trichopoda was added to the core data set, it was strongly rejected as the sister-group of the rest of the living angiosperms in favor of Cabomba (Fig. 9a). However, when Amborella and Nymphaea odorata were added together, Amborella was moderately well supported as the sister-group of all other angiosperms, and the two major water lily lineages, represented here by Cabomba and Nymphaea, were together inferred to be the sister-group of the remaining angiosperms (Fig. 9b). In all analyses Illicium was depicted as the next most basal angiosperm lineage, after Amborella and the water lilies. The two analyses that included Amborella also disagreed over relationships inferred among the major seed plant groups (Fig. 9). However, in both cases analyses that excluded outgroup taxa yielded the same underlying ingroup topology as that shown in Figs. 6 and 9 (results not shown).
| DISCUSSION |
|---|
|
|
|---|
Quality of the data
Characters with a conservative evolutionary rate are expected be more resistant to long-branch attraction (Felsenstein, 1983
). The regions examined here are at least as slowly evolving as those previously being used to assess angiosperm phylogeny, and in several cases substantially slower (Table 5; Fig. 8). The retention index, RI, has been used as a criterion for assessing the relative informativeness of each character, or as a measure of phylogenetic signal (Farris, 1989
; Savolainen et al., in press). The Photosystem II data have very similar RI values to the combined atpB + ndhF + rbcL data (Table 5), and so by this measure the psb genes are comparable to the other single-copy genes examined. The IR data have by far the highest RI values (Table 5), and so individual IR characters would be expected to be on average more reliable than those from any of the other classes of data considered here.
A large gamma shape parameter indicates that all sites evolve at essentially the same rate (Swofford et al., 1996
). Very small alpha values, such as we find here (Table 5), can indicate highly asymmetrical distributions of rates, with most sites changing very little or not at all (see Swofford et al., 1996
) and a few sites changing more frequently. We suggest, therefore, that with slowly evolving sequences, very low alpha values can be thought of as roughly approximating an equal-rates model, at least within a maximum parsimony framework. This is in line with the findings of Felsenstein (1981, 1983)
that when most characters change sufficiently slowly they may be equally weighted, even though they do not all actually change with equal probability, provided that the overall rate of change is very low (M. Sanderson, personal communication).
Figure 8 indicates that the IR data set most closely approximates an equal-rates model under parsimony, since most of its variable characters fall in only one parsimony-change class, the class with only one change inferred per character. This also largely accounts for the very low amount of homoplasy inferred for this class of data. Indeed, <3% of variable IR characters are inferred to change three times or more across the entire tree. By comparison, 17 and 14% of variable characters change more than three times for the combined atpB + ndhF + rbcL data and the combined psb data, respectively (Fig. 8). The total amount of change per site is likely to be an underestimate, because of undetected homoplasy, but this effect is thought to be minor for sites that have experienced few changes (Wakely, 1993
), as is the case for most of the variable chloroplast characters we examined.
Long-branch attraction and basal angiosperm relationships
Attraction between exceptionally long branches neighboring short internodes can become stronger as more characters (e.g., Felsenstein, 1978
; Huelsenbeck, 1995
) or taxa (Swofford and Poe, 1999
) are examined. This phenomenon should thus be more apparent with molecular data sets than with morphological studies, by dint of their greater size. The observed strong conflicts among different subsets of our data, or at different levels of taxonomic sampling, may also be a consequence of rather severe long branches. Our suggestion that strongly divergent results can also be a hallmark of long-branch attraction is not necessarily in conflict with the standard view that it results in strong convergence to a single wrong answer: long-branch attraction is poorly understood for more than about four or five taxa (e.g., Kim, 1996
).
Several candidate long-branch effects are apparent in our study using this criterion. One concerns outgroup relationships. In our initial taxon sampling, the two single-copy data partitions converged strongly on different arrangements of the four major seed-plant groups considered, with the angiosperms grouping strongly with either Ginkgo or Gnetum. They also clashed over the position of Lactoris within Piperales. In each case the conflicting relationship was supported by > 70% of bootstrap replicates (see Results). The long branch associated with Lactoris is implied not only by the number of substitutions inferred on the maximum parsimony tree (Fig. 6; note that Saururus is almost as long), but also by the large number of indel events inferred along this terminal branch (Fig. 7). The significant result with the test of Farris et al. (1994)
(with Lactoris and the outgroups included; see Results) may also reflect substantial "saturation" or noise on these long branches (see Graham et al., 1998
). When the outgroups were ignored, all of the analyses involving the 17 combined genes found the same underlying relationships among the angiosperms (see Results). However, when outgroups were included in the analyses, an additional candidate long-branch problem involved the inferred root of the angiosperms.
The root of the angiosperms
For the initial set of analyses involving 16 core angiosperm taxa and three outgroups, we found strong support for the first two basal splits, represented by Cabomba and Illicium, respectively. Cabomba was used to represent the water lilies and Illicium represents a woody magnoliid group distinct from the core woody magnoliids (Magnoliales, Laurales, and Winteraceae). The root split at the water lilies was further supported by two indels here (Table 3; Fig. 7), and by a single indel in the very slowly evolving chloroplast ITS region (Goremykin et al., 1996
).
The determination of the root node of a large taxon, such as the angiosperms, is always conditional on increased taxon sampling (Sanderson, 1996
). Since our submission of this paper, a fast-paced series of developments in basal angiosperm phylogeny has taken place, in studies that employ a variety of genes and levels of taxon sampling (Mathews and Donoghue, 1999
; Qiu et al., 1999
; Soltis, Soltis, and Chase, 1999
; Parkinson, Adams, and Palmer, 1999
; Graham et al., in press; S. W. Graham and R. G. Olmstead, unpublished data). All of these studies, and the current study, support the idea that the water lilies, and next, Illicium and relatives, represent successively emerging basal lineages close to the base of the flowering plants. However, perhaps the most significant new discovery has been that the New Caledonian species Amborella trichopoda (Amborellaceae) may constitute the sister-group of the rest of the angiosperms. These exciting results have been widely commented upon in the popular media and have led many botanists to view the problem of rooting the angiosperms to be essentially solved.
We therefore decided to anticipate a future, more detailed study on the root of the angiosperms and add Amborella and one additional major water lily lineage (Nymphaea odorata) to our preliminary taxon sampling. Using only slightly different levels of taxon sampling (Nymphaea included or excluded), Cabomba or Amborella were each strongly supported as candidate sister-groups of the rest of the angiosperms (Fig. 9). The rooting at Cabomba (Fig. 9A), inferred when only Amborella was added, also conflicts strongly with bootstrap analyses reported in Mathews and Donoghue (1999)
, Qiu et al. (1999)
, Soltis, Soltis, and Chase (1999)
and Parkinson, Adams, and Palmer (1999)
. Additional conflict was seen in outgroup relationships in the analyses involving this additional taxon sampling (Fig. 9).
Our analyses thus suggest that it is premature to place confidence in the Amborella rooting of the angiosperms, in this or other published studies with fewer characters available for analysis. The extant seed-plant groups are separated by very long branches that cannot be broken apart by the inclusion of additional intermediate taxa, because these are now extinct. A number of basal angiosperm lineages have similarly long branches. This fact alone should serve to give pause to the idea that the rooting of the angiosperms has been solved (see also Niklas, Crepet, and Nixon, 1999
).
Studies in progress will attempt to address whether the result here is an expression of deeper problems with the widely reported Amborella rooting. Of the two lineages competing for position at the base of the angiosperms in our analyses, Amborellaceae is a monotypic family, and Nymphaea and Cabomba represent each of the two major lineages of water lilies (Les et al., 1999
). Therefore, it is unlikely that the branches leading to these basal angiosperm lineages will be broken up more than we have done in this preliminary study (Fig. 9), even with substantial additional taxon sampling. The conflicting results described here concerning the rooting of the angiosperms thus may not be settled by additional taxon sampling alone.
Conclusion
In our initial taxon sampling the placement of the root of the angiosperms between water lilies and the other exemplar angiosperms was found by all three major data partitions we examined, and the combined data and two of the three data partitions supported this strongly. Most of the remaining clades were also well supported. The genes we used were carefully chosen to survey a large number of slowly evolving characters, using new primers that worked well across a broad range of seed-plant taxa. The loci examined have low synonymous substitution rates, low homoplasy, and approximate an equal-rates model under parsimony. In combination with other chloroplast data they provide about an order of magnitude more high-quality characters than the landmark rbcL study of Chase et al. (1993)
. We were also able to demonstrate several candidate cases where long-branch attraction may contribute to erroneous phylogenetic inference, including inference of the root node of the angiosperms: with slightly different taxon samplings two different root nodes were found for the angiosperms, one in strong conflict with published rootings.
|
|
| FOOTNOTES |
|---|
3 Author for reprint requests, current address: Department of Biological Sciences, Biological Sciences Centre, University of Alberta, Edmonton, Alberta, Canada T6G 2E9. ![]()
| LITERATURE CITED |
|---|
|
|
|---|
Bock, R., R. Hagemann, H. Kössel, and J. Kudla. 1993 Tissue- and stage-specific modulation of RNA editing of the psbF and psbL transcript from spinach plastidsa new regulatory mechanism? Molecular and General Genetics 240: 238244.
Bowe, L. M., and C. W. Depamphilis. 1996 Effects of RNA editing and gene processing on phylogenetic reconstruction. Molecular Biology and Evolution 13: 11591166.[Abstract]
Chase, M. W., D. E. Soltis, R. G. Olmstead, et al. 1993 Phylogenetics of seed plants: an analysis of nucleotide sequences from the plastid gene rbcL. Annals of the Missouri Botanical Garden 80: 528580.[CrossRef][ISI]
Chaw, S.-M., A. Zharkikh, H.-M. Sung, T.-C. Lau, and W.-H. Li. 1997 Molecular phylogeny of extant gymnosperms and seed plant evolution: analysis of nuclear 18S rRNA sequences. Molecular Biology and Evolution 14: 5668.[Abstract]
Crane, P. R. 1993 Time for the angiosperms. Nature 366: 631632.[CrossRef]
, E. M. Friis, and K. R. Pedersen. 1995 The origin and early diversification of angiosperms. Nature 374: 2733.[CrossRef]
Crepet, W. L. 1998 The abominable mystery. Science 282: 16531654.
Donoghue, M. J., and J. A. Doyle. 1989 Phylogenetic studies of seed plants and angiosperms based on morphological characters. In B. Fernholm, K. Bremer, and H. Jörnvall [eds.], The hierarchy of life, 181193. Elsevier, Amsterdam, The Netherlands.
, and M. J. Sanderson. 1992 The suitability of molecular and morphological evidence in reconstructing plant phylogeny. In P. S. Soltis, D. E. Soltis, and J. J. Doyle [eds.], Molecular systematics of plants, 340368. Chapman and Hall, New York, New York, USA.
Downie, S. R., R. G. Olmstead, G. Zurawski, D. E. Soltis, P. E. Soltis, J. C. Watson, and J. D. Palmer. 1991 Six independent losses of the chloroplast rpl2 intron in dicotyledons: molecular and phylogenetic implications. Evolution 45: 12451259.[CrossRef][ISI]
Doyle, J. A. 1998 Phylogeny of the vascular plants. Annual Review of Ecology and Systematics 29: 567599.
, and M. J. Donoghue. 1993 Phylogenies and angiosperm diversification. Paleobiology 19: 141167.[Abstract]
Endress P. K., and A. Igersheim, 1997 Gynoecium diversity and systematics of the Laurales. Botanical Journal of the Linnean Society 125: 93168.[CrossRef]
Engels, W. 1993 Amplify (version 1.2). Computer program and documentation. Genetics Department, University of Wisconsin, Madison, Wisconsin, USA.
Farris, J. S. 1989 The retention index and the rescaled consistency index. Cladistics 5: 417419.[ISI]
, M. Källersjö, A. G. Kluge, and C. Bult. 1994 Testing significance of incongruence. Cladistics 10: 315319.[CrossRef][ISI]
Felsenstein, J. 1978 Cases in which parsimony or compatibility methods will be positively misleading. Systematic Zoology 27: 401410.[CrossRef][ISI]
. 1981 A likelihood approach to character weighting and what it tells us about parsimony and compatibility. Biological Journal of the Linnean Society 16: 183196.[CrossRef]
. 1983 Parsimony in systematics: biological and statistical issues. Annual Review of Ecology and Systematics 14: 313333.
. 1985 Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39: 783791.[CrossRef][ISI]
. 1995 PHYLIP (Phylogeny Inference Package) version 3.5c. Computer programs and documentation. Department of Genetics, University of Washington, Seattle, Washington, USA.
, and H. Kishino. 1993 Is there something wrong with the bootstrap on phylogenies? A reply to Hillis and Bull. Systematic Biology 42: 193200.[CrossRef]
Frohlich, M. W. 1999 MADS about Gnetales. Proceedings of the National Academy of Sciences, USA 96:88118813.
Freyer, R., M.-C. Kiefer-Meyer, and H. Kössel. 1997 Occurrence of plastid RNA editing in all major lineages of land plants. Proceedings of the National Academy of Sciences, USA 94: 62856290.
Goremykin, V., V. Bobrova, J. Pahnke, A. Troitsky, A. Antonov, and W. Martin. 1996 Noncoding sequences from the slowly evolving chloroplast inverted repeat in addition to rbcL data do not support gnetalean affinities of angiosperms. Molecular Biology and Evolution 13: 383396.[Abstract]
Graham, S. W., J. R. Kohn, B. R. Morton, J. E. Eckenwalder, and S. C. H. Barrett. 1998 Phylogenetic congruence and discordance among one morphological and three molecular data sets from Pontederiaceae. Systematic Biology 47: 545567.[CrossRef][ISI][Medline]
, and R. G. Olmstead. 2000 Evolutionary significance of an unusual chloroplast DNA inversion found in two basal angiosperm lineages. Current Genetics 37: 183188.[CrossRef][ISI][Medline]
, P. A. Reeves, A. C. E. Burns, and R. G. Olmstead. In press. Microstructural changes in noncoding chloroplast DNA: interpretation, evolution, and utility of indels and inversions in basal angiosperm phylogenetic inference. International Journal of Plant Sciences.
Gruissem, W., and J. C. Tonkyn. 1993 Control mechanisms of plastid gene expression. Critical Reviews in Plant Sciences. 12: 1955.
Haley, J., and L. Bogorad. 1990 Alternative promoters are used for genes within maize chloroplast polycistronic units. Plant Cell 2: 323333.
Hendy, M. D., and D. Penny. 1989 A framework for the quantitative study of evolutionary trees. Systematic Zoology 38: 297309.[CrossRef][ISI]
Hickey, L. J., and D. W. Taylor. 1996 Origin of the angiosperm flower. In D. W. Taylor and L. J. Hickey [eds.], Flowering plant origin, evolution, and phylogeny, 176231. Chapman and Hall, New York, New York, USA.
Hillis, D. M., and J. J. Bull. 1993 An empirical test of bootstrapping as a method for assessing confidence in phylogenetic analysis. Systematic Biology 42: 182192.[CrossRef][ISI]
Hoot, S. B., A. Culham, and P. R. Crane. 1995 The utility of atpB gene sequences in resolving phylogenetic relationships: comparison with rbcL and 18S ribosomal DNA sequences in the Lardizabalaceae. Annals of the Missouri Botanical Garden 82: 194207.