|
|
||||||||
Systematics |
Section of Evolution and Ecology, University of California, One Shields Avenue, Davis, California 95616 USA
Received for publication March 7, 2002. Accepted for publication June 20, 2002.
| ABSTRACT |
|---|
|
|
|---|
Key Words: angiosperms Gnetales gymnosperms long-branch attraction maximum likelihood psaA psbB third codon positions
| INTRODUCTION |
|---|
|
|
|---|
Morphology-based phylogenetic studies addressing relationships among major lineages of seed plants have been instrumental in providing a framework to test hypotheses of evolutionary relationships and character homology (e.g., Crane, 1985
; Doyle and Donoghue, 1986
; Loconte and Stevenson, 1990
; Nixon et al., 1994
; Rothwell and Serbet, 1994
). Although results of these studies conflict significantly with each other, they agree in identifying a clade that contains Gnetales and the angiosperms, together with the extinct Bennettitales (Nixon et al., 1994
) and Pentoxylales (Crane, 1985
; Doyle and Donoghue, 1986
; Rothwell and Serbet, 1994
). One major implication of this result is that among extant seed plants, Gnetales and angiosperms are most closely related (cf. e.g., Donoghue and Doyle, 2000
). Gnetales display a number of important similarities, including significant anatomical characters, with the conifers and other gymnosperms (for a detailed review see Carlquist, 1996
). However, Gnetales also exhibit several characters, some of considerable significance, similar to those found in angiosperms, which, in the absence of an explicit phylogenetic hypothesis, could be interpreted either as convergent or homologous between the two groups. In the context of morphology-based phylogenetic results, the similarities between Gnetales and angiosperms were naturally interpreted as attributes shared by the two groups.
Molecular phylogenetic analyses of seed plants
Beginning in the early 1990s, molecular-based analyses of major seed plant lineages brought a dramatic revision to the concept of the phylogenetic closeness between Gnetales and the angiosperms. Nevertheless, a single, consistently recurring alternative scheme of relationships among major extant seed plant lineages, particularly regarding the placement of Gnetales, has not yet emerged. Rather, available results suggest that particular types of molecular data (e.g., nuclear vs. chloroplast genes), treatment of the primary data (e.g., use of all data vs. exclusion or downweighting of some characters), and the combination of two or more genes from different genomic compartments usually result in one of three distinct phylogenetic hypotheses among major lineages of seed plants. Published studies also hint at the differential effect of the use of parsimony (MP) or maximum likelihood (ML) methods of analysis of the same data (see below; Table 1).
|
A second phylogenetic hypothesis places angiosperms as sister to a clade that includes all gymnosperms and places Gnetales as the sister to Pinaceae, rendering the conifers paraphyletic (Table 1). This result is obtained from the following combinations of data and method of analysis: (a) parsimony analysis of chloroplast gene sequences from which third codon positions are downweighted or excluded (Fig. 3A in Sanderson et al., 2000
). Note that including all positions or only third codon positions of the same genes using the same method of analysis produces a substantially different phylogenetic hypothesis (see above; Table 1); (b) parsimony analysis of mitochondrial gene sequences in which all positions are included (Fig. 2 in Soltis, Soltis, and Zanis, 2002
); (c) maximum likelihood or Bayesian analysis of chloroplast or mitochondrial gene sequences in which all positions are included (Figs. 1B and 2 in Bowe, Coat, and dePamphilis, 2000
; Figs. 1A and 2B in Chaw et al., 2000
; Figs. 4 and 5 in Soltis, Soltis, and Zanis, 2002
) or third positions are partially or completely excluded (Fig. 2C in Chaw et al., 2000
; Fig. 5A in Sanderson et al., 2000
); and (d) parsimony, maximum likelihood, or Bayesian analysis of combined genes from different genomic compartments in which all positions are included (Fig. 1 in Qiu et al., 1999
; Fig. 3B in Bowe, Coat, and dePamphilis, 2000
; Fig. 2 and p. 172 in Gugerli et al., 2001
; Figs. 6 and 7 in Soltis, Soltis, and Zanis, 2002
) or third positions are excluded (at least partially; Fig. 3 in Nickrent et al., 2000
). Several studies based on one of the four conditions described above do not resolve exactly the described phylogenetic pattern, but neither do they contradict it, either because of the presence of polytomies (Figs. 2A and 4AB in Sanderson et al., 2000
) or a limited taxonomic sample (Fig. 8AB in Goremykin et al., 1996
; Fig. 1B in Hansen et al., 1999
; Fig. 5 in Samigullin et al., 1999
). Exceptions in which a different phylogenetic hypothesis is obtained were found by Hasebe et al. (1992
, Fig. 1B; Gnetales are resolved as sister to cycads) and Nickrent et al. (2000
, Fig. 2; Gnetales are resolved as sister to Juniperus, but not to Pinus).
|
|
In this study we investigate phylogenetic relationships among major seed plant lineages by utilizing sequence data of two highly conserved chloroplast genes, psaA and psbB, for a comprehensive taxonomic sample across seed plants. In a previous study, Sanderson et al. (2000)
, using sequences of the same genes, established strongly supported, but significantly conflicting phylogenetic hypotheses resulting from different partitions of the data when analyzed using parsimony. In this study, we have tripled the taxonomic sample across seed plants. The goal of this study is to address the relationships among major clades of seed plants in the context of previously detected conflicting phylogenetic signals. The expanded taxonomic sample should provide increased resolution and support for phylogenetic relationships among seed plants, as well as permit us to evaluate the general congruence of phylogenetic results with those obtained in studies focused on particular seed plant clades. We further evaluate the conflict between phylogenetic signals provided by different partitions of sequence data when analyzed using parsimony. We explicitly document the effect of maximum likelihood analysis in recovering the phylogenetic signal from a given data partition: the maximum likelihood tree is profoundly different from the tree resulting from parsimony analysis of the same data.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Angiosperms are represented by 32 genera including four genera belonging to lineages diverging close to the root of the clade, nine genera belonging to seven "magnoliid" lineages, six genera of monocots, and 13 genera of eudicots, including eight core eudicots. The conifers are represented by 13 genera and 15 species belonging to major clades recognized in previous studies (e.g., Chaw et al., 1997
, 2000
; Stefanovic et al., 1998
; Soltis et al., 1999
). Pinaceae, a morphologically divergent lineage among extant conifers (e.g., Page, 1990
), is represented by six species belonging to four genera. Araucariaceae is represented by one genus, Podocarpaceae (including Phyllocladaceae) and Taxaceae are each represented by two genera, and Cupressaceae (including Taxodiaceae) is represented by four genera. The three living genera of Gnetales and the single living species of Ginkgophyta are also included. Cycads are represented by one genus of Cycadaceae and two genera of Zamiaceae. Other tracheophytes are represented by seven members of the Moniliformopses (sensu Pryer et al., 2001
), including Equisetum, Psilotum, Angiopteris, Ophioglossum, and three members of Polypodiidae (i.e., leptosporangiate ferns, Pryer et al., 2001
), and by a representative of Lycophytina (i.e., Huperzia). Nonvascular embryophytes are represented by the liverwort Marchantia (Marchantiaceae), which was designated as the single outgroup.
Genes and molecular methods
Nucleotide sequences of two highly conserved chloroplast genes, psaA and psbB, were used as primary data for phylogenetic analyses. Thylakoid membrane-bound structural proteins that function in the chloroplast photosystems I and II are encoded by psaA and psbB, respectively (Ort and Yocum, 1996
). The low rate of replacement at the amino acid level (Sanderson et al., 2000
) and the high conservation of nucleotide sequences across land plants observed in psaA and psbB (see below) render these genes appropriate for addressing relationships among clades of seed plants. As a comparison, psaA and psbB are longer (2253 and 1527 base pairs, respectively) and have a higher percentage of amino acid similarity than rbcL and atpB (Olmstead and Palmer, 1994
).
DNA of all sampled genera was extracted from leaves collected from plants cultivated in botanical gardens and arboreta, except for one genus (Phyllocladus), for which an aliquot of stock DNA was kindly provided by a colleague. DNA was extracted from the macerated material using a DNeasy Plant mini kit (Qiagen, Valencia, California, USA) following the manufacturer's protocol, except for the following minor modifications: 500 µL of buffer AP1, 5 µL of RNAse, and 160 µL of buffer AP2. Depending on the relative concentration of DNA in the stock solution, 1 : 10 or 1 : 100 dilutions were prepared for gene amplification by polymerase chain reaction (PCR).
Nondegenerate PCR primers that match conserved segments at the 5' and 3' ends of each gene were initially designed by comparing aligned sequences obtained from previously available complete chloroplast genome sequences of Nicotiana, Oryza, Zea, Pinus, and Marchantia (Sanderson et al., 2000
). Primer sequences were subsequently compared and verified against aligned sequences of the larger taxonomic sample resulting from the study by Sanderson et al. (2000)
. Positions and sequences of primers used for PCR and sequencing are shown in Table 2. Purified PCR products were sequenced directly by automated fluorescent dye methods on an ABI model 377 sequencer. For each taxon, sequences obtained from reactions using different sequencing primers were edited and assembled using Sequencher 4.0 (GeneCodes, Ann Arbor, Michigan, USA). The extremely high conservation of psaA and psbB sequences across land plants allowed straightforward visual alignment. Gaps were treated as missing characters.
|
Parsimony analyses were conducted for nucleotide sequences of psaA and psbB and the concatenated sequences of both genes. In each case, separate analyses for first and second, third, and all codon positions were performed. Although results of the partition homogeneity test indicate phylogenetic incompatibility between first and second vs. third codon positions for the two genes (see RESULTS), we nevertheless performed parsimony analyses using all positions, to compare the resulting tree(s) with those obtained from analysis of only first and second positions, and only third positions, to evaluate if one of the corresponding signals predominates in the trees obtained using all codon positions. Parsimony analyses consisted of heuristic searches with 100 replicate random additions of sequences, TBR branch swapping, saving multiple trees (MulTrees). A strict consensus tree of all equally most parsimonious trees was obtained for each data set. The robustness of internal branches was assessed through bootstrap analysis (Felsenstein, 1985
), implemented on the two codon position partitions and on all positions for the concatenated genes (i.e., psaA-psbB [12], psaA-psbB [3], psaA-psbB [all]). Each search consisted of 500 bootstrap replicates in which the number of resampled characters was equal to the number of characters used in the respective parsimony analysis (NCHAR = current). Each bootstrap replicate consisted of a full heuristic search performed with the same options as in the parsimony searches, but with only ten random addition replicates (instead of 100). Groups with a frequency >50% were retained.
Maximum likelihood analyses were performed separately for first and second and for third codon positions for the concatenated sequences of psaA and psbB, because of their incongruent signals, as indicated by a partition homogeneity test (see RESULTS). Maximum likelihood searches were broken into two steps: (1) estimating the transition/transversion ratio (ti/tv) and the shape of the gamma distribution (
) from trees obtained through parsimony and (2) using the estimated parameters to obtain maximum likelihood trees (Fig. 1). Both steps involved heuristic searches using an HKY85 substitution model with gamma-distributed site-to-site rate variation (HKY +
) and TBR branch swapping, without enforcing a molecular clock.
In step one, the likelihood score and ti/tv and
parameters for two different parsimony topologies were estimated (Fig. 1). For each data set, the estimated ti/tv and
corresponding to the topology with the highest likelihood score were selected for the next phase of the maximum likelihood searches (Fig. 1).
In the second step, each data set was subjected to four maximum likelihood searches that differed in their heuristic starting condition, in which the ti/tv and
parameters estimated in the previous step were specified. The same four different starting conditions were applied to each of the data sets (Fig. 1). Of the four trees obtained for each data set, the one with the overall highest likelihood score was selected as the maximum likelihood tree for the corresponding codon position partition.
The level of support in the maximum likelihood tree for third codon positions (ML psaA-psbB [3]) was obtained by bootstrap analysis. Bootstrap analysis was conducted by performing replicate maximum likelihood searches, with stepwise AS IS addition of sequences, using the same search conditions as described above (i.e., HKY85 +
) and specifying the values of ti/tv and
corresponding to psaA-psbB [3]. We succeeded in performing 50 of these computationally intensive bootstrap replicates over a period of several months using four processors. To run simultaneous bootstrap replicates in different processors, we prepared ten batch files that differed in their starting number random seed, each specified to run five bootstrap replicates and to save the resulting bootstrap trees into files. A 50% majority rule consensus bootstrap tree was estimated by aggregating the trees in the files (using the RETAIN TREES PREVIOUSLY IN MEMORY command) and weighting trees accordingly to the number of trees found in each bootstrap replicate, so that the 50 bootstrap replicates (but not the 70 resulting bootstrap trees) had equal weight.
| RESULTS |
|---|
|
|
|---|
The primers used in PCR reactions allowed amplification of nearly the complete sequence for each gene in almost all taxa. External primers did not yield psaA PCR products for Gnetum gnemon, and, after trial-and-error experimentation with different internal primers, we obtained a segment of the gene from approximately bp 1000 to the end of its sequence (about 1200 bp). External primers for psbB also failed to yield PCR products for Asplenium nidus and Marsilea mutica. The use of internal primers allowed extraction of a segment of psbB from the beginning of the gene to approximately bp 1000. Visual alignment of sequences was achieved easily, given the extreme conservation of the sequences of both genes across land plants. A 3-bp insertion was detected in the psaA sequences of Ephedra and Welwitschia. It is not known whether or not it is present in Gnetum, because the region where this insertion occurs (near the beginning of the sequence), could not be amplified (see above). An insertion at the same site was also detected in Zea, but not in the closely related Oryza. Indels were not found in psbB. The number of characters and of parsimony-informative characters in each gene and codon position partition are listed in Table 3.
|
Parsimony analyses
Nine parsimony analyses were performed. The number of MP trees found in each analysis and their lengths and scores are presented in Table 3. All except one of the parsimony searches ran to completion. The exception was the parsimony search for psbB using first and second positions, which aborted after filling the computer's memory with equally parsimonious trees during the first random addition sequence replicate.
Congruent phylogenetic results among all parsimony analyses are the monophyly of seed plants, cycads, Gnetales, and angiosperms, and the fact that angiosperms do not appear to be closely related to any particular group of living gymnosperms. Whereas the monophyly of conifers depends on the treatment of data and method of analysis (see below), nearly all MP trees congruently detected (1) a clade that includes all Pinaceae, (2) a clade that includes Araucariaceae plus the strongly supported Podocarpaceae (including Phyllocladus), and (3) a clade that includes Taxaceae and Cupressaceae (including Taxodiaceae), informally referred to here as the "taxoids" (Sciadopitys and Cephalotaxus, which were not sampled in our analysis, also belong to this clade; cf. Chaw et al., 1997
, 2000
; Stefanovic et al., 1998
). In seven (out of nine) MP trees, Araucariaceae plus Podocarpaceae are the sister group to the "taxoids." In spite of the consistency of these important results, the strict consensus trees derived from each of the nine parsimony analyses conform to one of two very different phylogenetic hypotheses, depending on the codon position partition, but regardless of the gene or gene combination used to generate them. The differences between these two phylogenetic hypotheses involve the relationships among major clades of seed plants, particularly regarding the placement of Gnetales, and consequently, the monophyly or paraphyly of conifers. A summary of the strict consensus trees from the nine parsimony analyses is presented in Fig. 2.
Trees resulting from parsimony analysis of first and second codon positions (psaA [12], psbB [12] and psaA-psbB [12]) place angiosperms and gymnosperms as sister clades. Within gymnosperms, Gnetales are sister to Pinaceae, and thus, the conifers are paraphyletic (Fig. 2). Differences in phylogenetic results from the two genes and gene combinations involve the sister taxon to seed plants. Additionally, there are differences in the relationships among the three major clades of conifers (Fig. 2) and within angiosperms (results not shown).
Trees resulting from parsimony analyses of third codon positions (psaA [3], psbB [3], and psaA-psbB [3]) and of the three codon positions (psaA [all], psbB [all], and psaA-psbB [all]) depict the same general phylogenetic hypothesis, but a very different one from that derived from first and second codon positions (Fig. 2). Thus, when evaluated in combination, the signal of third positions predominates over that of the first and second, as expected, given the substantially larger number of parsimony-informative characters provided by third codon positions (Table 3). Nearly all MP trees based on third and on all positions place Gnetales as the sister taxon to all other seed plants and angiosperms as the sister to a clade that includes the cycads, Ginkgo, and the conifers (Fig. 2). This pattern is not recovered, but neither is it contradicted, in the strict consensus tree obtained from psbB [all] (Fig. 2). Differences among trees are the sister taxon to seed plants and the placement of cycads and Ginkgo with respect to conifers (Fig. 2). There are additional differences in the relationships among the three major clades of conifers (Fig. 2) and within angiosperms (results not shown).
The difference between parsimony trees resulting from the two codon position partitions is even more striking when considering the high bootstrap support associated with many of the branches in each topology (Fig. 3). A list of bootstrap support percentages (% BS) according to different partitions for the concatenated genes is presented in Table 4. For example, a clade that includes Gnetales and Pinaceae is supported by 96% BS according to first and second positions, but this clade is not detected according to third positions (nor all positions). Nevertheless, major clades recognized by both codon partitions are strongly supported; for example, seed plants, cycads, Gnetales, and angiosperms are all supported by 100% BS (Fig. 3, Table 4).
|
|
Maximum likelihood analyses
To avoid averaging values for the transition/transversion ratio (ti/tv) and the shape of gamma distribution (
) over the extremely different substitution parameters that characterize each of the two codon position partitions, maximum likelihood analyses were performed separately for first and second and for third codon positions. Parameters for ti/tv and
were estimated in the initial step of the maximum likelihood analyses (HKY85 +
; TBR; estimating ti/tv and
; Fig. 1). For first and second positions, the search starting with 1 of 510 equally most parsimonious trees from psaA-psbB [12] produced the tree with the greater likelihood score, ln L = 13 568 (ti/tv = 2.247;
= 0.186). The estimates of ti/tv and
in the search with the alternative starting tree are very close to the selected parameters (ti/tv = 2.137;
= 0.184; Fig. 1). For third codon positions, the search starting from 1 of 30 equally most parsimonious trees obtained from psaA-psbB [all] produced the more optimal tree (ln L = 30 723). The associated parameters (ti/tv = 4.217,
= 1.362) were used to estimate maximum likelihood trees for third position data in the next step of the maximum likelihood analysis. Parameters resulting from the alternative search for third positions were also very close to the selected values (ti/tv = 4.224;
= 1.335).
In the second step of the maximum likelihood analyses, four heuristic searches (HKY85 +
; TBR; ti/tv and
specified; Fig. 1) with different starting conditions were undertaken to obtain ML trees for first and second and for third codon positions. The overall ML tree resulting from searches using first and second positions has a score of ln L = 13 554.64 (Fig. 4A). The four searches using third position data provided the same ML tree, with a score of ln L = 30 139.63 (Fig. 4B).
|
In spite of these profound phylogenetic differences, results shared by both ML trees are the monophyly of seed plants (also resolved in all MP trees), its sister-taxon relation with the Moniliformopses (which includes Equisetum and Psilotum, Fig. 4AB), and the monophyly of cycads, Gnetales, and angiosperms, but not of conifers. One significant point of congruence between the two ML trees is the linkage of Gnetales and Pinaceae as sister taxa. Additionally, phylogenetic relationships within gymnosperm clades are nearly identical in both ML trees.
Phylogenetic relationships within angiosperm in the first and second positions ML tree are substantially incongruent with relationships obtained in numerous independent phylogenetic analyses (see below; Fig. 4A), and thus, we chose to focus all available computer time on estimating the bootstrap support associated with the ML tree resulting from third positions. The percentages of bootstrap replicates supporting internodes are shown in Fig. 4C, and bootstrap values associated with major clades are listed in Table 4.
Relationships within gymnosperm clades are almost identical in the two ML trees (Fig. 4AC) and are equivalent to those found in independent studies. Monophyly of cycads and of Gnetales is highly supported in the ML tree for third positions (96% and 100% BS, respectively). Whereas bootstrap support for the non-Pinaceae conifers is very strong (100%), the sister relationship between Pinaceae and Gnetales is very weak (<50% BS), and the conifer-gnetalean clade as a whole is only moderately supported (75% BS). Relationships among non-Pinaceae conifers recognized in several independent studies are highly supported, including, for example, Araucaria plus Podocarpaceae, the "taxoid" clade, and a clade that includes both (82%, 100%, 100% BS, respectively; Fig. 4C). Relationships within gymnosperm clades shared by both ML trees are also present in the strict consensus of MP trees for each of the two codon position partitions (Figs. 3AB and 4AB).
Relationships within angiosperms, however, are significantly different in the two ML trees. Whereas within-angiosperm relationships according to first and second codon positions are inconsistent with other results, either from other analyses based on psaA and psbB or from independent data (cf. Fig. 4A vs. Graham and Olmstead, 2000
; Qiu et al., 1999
; Soltis et al., 2000
), phylogenetic relationships within angiosperms in the ML tree for third codon positions are almost entirely congruent with relationships recovered consistently from independent data (e.g., Qiu et al., 1999
, 2000
; Soltis et al., 1999
, 2000
; Savolainen et al., 2000
). These congruent relationships are usually well supported.
Substantial heterogeneity in the length of branches is found in the first and second and third positions ML trees. Most clades subtended by long branches according to the first and second position data also have long branches according to third position data (i.e., seed plants, Gnetales, Pinaceae, angiosperms, and Poaceae; Fig. 4AB). Long branches leading to terminal taxa, in both codon partitions, are found within Moniliformopses, Gnetales, and to a much lesser extent, Ginkgo. For first and second positions, genera of Pinaceae are also subtended by long branches, but lengths are comparatively shorter based on third positions. Within angiosperms, both data partitions show that internal branches immediately above the root node are very short, but branches leading to terminal taxa are long (Fig. 4AB). Long branches within angiosperms may likely result, at least to some extent, from sparse taxonomic sampling of extant taxa (see DISCUSSION).
| DISCUSSION |
|---|
|
|
|---|
The results of our parsimony analyses show that the signal of third positions overrides that of first and second positions when analyzed in combination (Figs. 2 and 3AC), an effect that would not be manifested if third positions contained mostly random noise. The effect of excluding third codon positions from parsimony analyses of plants has been documented by previous authors. Topologies resulting from the use of only third codon positions in the rbcL gene are largely in agreement with those found when all the data are used (Lewis, Mishler, and Vilgalys, 1997
; Källersjö et al., 1998
) and with independent sources of evidence (Lewis, Mishler, and Vilgalys, 1997
), and contrary to expectations, the exclusion of third positions resulted in substantial loss of phylogenetic resolution (Källersjö et al., 1998
). Similar effects were found in phylogenetic studies of vertebrates (Edwards, Arctander, and Wilson, 1991
; Björklund, 1999
). One common conclusion of these studies, corroborated by Sanderson et al. (2000)
and by the present study, is that while third positions are highly variable, they retain significant phylogenetic information across land plants, including relationships among major clades, and, therefore, should not be dismissed.
Whereas resolution of substantially conflicting phylogenetic hypotheses from different codon position partitions under parsimony has been documented explicitly or implicitly in previous works (e.g., Chaw et al., 2000
; Sanderson et al., 2000
), the resolution of profoundly different hypotheses from the use of different optimization criteria applied to the same type of data, that is, third codon positions, is a new result. Soltis, Soltis, and Zanis (2002)
found a similar discrepancy in the results of parsimony and maximum likelihood analyses of four chloroplast genes in which only third positions were included. Nevertheless, divergent phylogenetic results were not found when different optimization criteria were applied to first and second positions (Soltis, Soltis, and Zanis, 2002
).
The phylogenetic differences among major clades of seed plants according to the MP and ML topologies for third positions are considerable; these include the clade identified as the sister taxon to seed plants (Huperzia or Moniliformopses, respectively), the clades resulting from the deepest bifurcation within seed plants, and the placement of Gnetales (Figs. 3B and 4B). The differences between these topologies cannot be reconciled by simple shifts in the placement of the root of seed plants, nor by minor changes among a few branches. A comparison of the parsimony and likelihood scores of these two topologies further underscores their differences: the ML topology for third codon positions of psaA-psbB (Fig. 4B) has a parsimony score of 7328, which is 120 steps longer than the MP trees obtained using the same data (length = 7208; Table 4). The likelihood scores of the two topologies were compared through 5000 RELL replicates of the Shimodaira-Hasegawa test (Shimodaira and Hasegawa, 1999
; Goldman, Anderson, and Rodrigo, 2000
), as implemented in PAUP*. The likelihood score of one of the 16 MP trees (tree #3) resulting from the third position data for the concatenated genes was estimated using only third position data via maximum likelihood (HKY85 +
, specifying the corresponding ti/tv and
parameters). Its likelihood score is ln L = 30 179.88, significantly less likely (P = 0.007*) than the most likely topology obtained using the same data and parameters (ln L = 30 139.63).
The difference between the ML trees for first and second and for third positions regarding relationships among major clades of seed plants (Fig. 4AB) lies only in the placement of the root of the seed plants. A shift in the placement of the root to an adjacent branch in the third positions tree would yield a similar topology to the one obtained using first and second positions, at least regarding relationships among major clades of seed plants. Whereas the differences between the two topologies consist only of a shift in the placement of the root, the historical and evolutionary implications of such change are clearly significant. Additionally, differences within clades of seed plants according to the ML trees resulting from first and second and from third positions are substantial in some phylogenetic regions (i.e., within the Moniliformopses and within the angiosperms).
The greater disagreement between results from parsimony and maximum likelihood analysis of third position data compared to first and second position data is not entirely surprising. The third position data have a much higher average rate of substitution and, when coupled with rate heterogeneity among lineages, are more likely to cause reconstruction algorithms to suffer from long-branch attraction (LBA). The higher levels of homoplasy make it difficult for both parsimony and the substitution models used by maximum likelihood to estimate true branch lengths accurately and consequently obtain a correct tree. A slightly incorrect model will be much less misleading at low rates of substitution than at high rates because of the nonlinearity of corrections induced by the model (Zharkikh, 1994
). Thus, methods that (effectively) assume different models should diverge progressively more at higher substitution rates. This behavior is evident even in simple pairwise distance correction formulae, many of which use maximum likelihood estimators (Zharkikh, 1994
).
The disagreements between codon partitions occur with a finite amount of data, but they raise the specter of statistical inconsistency, or long-branch attraction, which is an asymptotic failure of a method to converge on the correct tree as more and more data are added. Little is understood about the combinations of tree topologies, branch lengths and substitution patterns that lead to LBA in large phylogenies. Generalizing from results on four-taxon trees, it is probable that both high substitution rates and rate heterogeneity among lineages can cause parsimony to be inconsistent (Felsenstein, 1978
), and model misspecification with respect to site-to-site rate variation can cause maximum likelihood to be inconsistent (Chang, 1996
). We specifically incorporated site-to-site rate variation in the models used in maximum likelihood analyses, which ought to lessen the impact of this source of potential error in the maximum likelihood analyses. However, our results reveal substantial rate heterogeneity among lineages (Fig. 4) and very high rates for the third position data, which might well have led to problems in the parsimony analyses. Sanderson et al. (2000)
documented LBA in parsimony analyses of third positions of psbB in a lengthy series of simulation studies in a much smaller taxon sample, but scaling these up to the present data sets would incur an onerous computational burden.
Relationships among major clades of seed plants
Results obtained in our analyses, which have also been found repeatedly in independent studies, provide support for the monophyly of seed plants, of the cycads, Gnetales, and the angiosperms and the fact that angiosperms do not appear to be closely related to any single group of living gymnosperms. The ML tree for third positions of psaA-psbB conflicts with this last point, but the obtained sister relationship between angiosperms and cycads is tenuous, supported only by 52% BS (see RESULTS, Fig. 4C). Major conflicts remain, however, regarding the relationships among major lineages of seed plants, including, in particular, the placement of Gnetales, and consequently, the monophyly of conifers.
Relationships of Gnetales and the monophyly of conifers
A result found in several of our analyses, as well as in several independent studies, is the placement of Gnetales in close phylogenetic proximity to the conifers (e.g., Soltis et al., 1999
; Bowe, Coat, and dePamphilis, 2000
; Chaw et al., 2000
; Table 1). The possibility of a close relationship between Gnetales and the (monophyletic) conifers has been discussed in the pre-anthophyte literature (e.g., Coulter and Chamberlain, 1917
; Bailey, 1944
, 1953
; Bierhorst, 1971
). Carlquist (1996)
provides a comprehensive summary of anatomical features of Gnetales, many of which are also present in conifers, including the torus-margo structure on perforations of vascular elements, helical thickenings with intercalated bordered pits in metaxylem tracheids, and ultrastructural features of the sieve elements of the phloem (R. F. Evert, cited in Carlquist, 1996
). Nevertheless, it is unclear whether these features are derived attributes shared by Gnetales and conifers or represent ancestral features shared with other gymnosperms.
Surprisingly, another feature that may possibly document a close relationship between Gnetales and conifers is the process of double fertilization. Double fertilization has been reported, in addition to angiosperms and Gnetales, in two conifers: Abies balsamea (Pinaceae; Hutchison, 1915
; Friedman and Floyd, 2001
) and Thuja occidentalis (Cupressaceae; Land, 1902
; Friedman and Floyd, 2001
). It seems reasonable to assume that double fertilization as manifested in Ephedra is closer to the plesiomorphic double fertilization process for Gnetales as a whole than the double fertilization in Gnetum (and Welwitschia). This assumption is based on two combined facts: first, the monosporic and archegoniate nature of the female gametophyte of Ephedra (e.g., Bierhorst, 1971
; Gifford and Foster, 1989
; Friedman, 1990
, 1992a
, b
) vs. the highly modified, tetrasporic female gametophyte of Gnetum (and Welwitschia; e.g., Carmichael and Friedman, 1995
; Friedman and Carmichael, 1998
, and references therein), and second, the recurrent placement of Ephedra as the sister taxon of a clade formed by Gnetum and Welwitschia (e.g., Chaw et al., 2000
; Soltis et al., 2000
; this study). Among plants in which double fertilization has been reported, the process in Ephedra is most similar to the one reported in two conifers. In these three genera, the two sperm cells (or sperm nuclei, in Ephedra) fertilize the egg cell (or egg nucleus, in Ephedra) and its mitotic sister, the ventral canal cell (or ventral canal nucleus, in Ephedra), inside the archegonium (Friedman, 1990
, 1992a
, b
, and references therein). In contrast, the female participants in the double fertilization process in Gnetum (and apparently also in Welwitschia) may not only not be mitotic sisters, but, given the tetrasporic origin of the megagametophyte, may be derived from different meiotic products (Carmichael and Friedman, 1995
, 1996
; Friedman and Carmichael, 1996
, 1998
, and references therein). In angiosperms, with their highly modified female gametophyte (i.e., the embryo sac), the female participants in double fertilization (in a Polygonum-type embryo sac) are the egg cell and the two polar nuclei, one of which is the mitotic sister to the egg cell (Thomas, 1907
; Brink and Cooper, 1947
; Huang and Russell, 1992
). Whereas the megagametophytes, and thus, the processes of double fertilization, in Gnetum (and in Welwitschia) and in angiosperms are probably divergently modified from conditions found in other seed plants, it is unknown if the similarities in double fertilization between Ephedra and the two conifers are due to unique derivation from shared ancestry or rather to the manifestation of double fertilization in an ancestral, archegoniate megagametophyte. Clearly, the frequency of double fertilization in conifers, its similarities with double fertilization in Ephedra, and detection of possible synapomorphies between Ephedra and conifers should be the subject of extensive and detailed investigation.
While a phylogenetic proximity between Gnetales and conifers seems quite plausible, the placement of Gnetales within conifers, which implies conifer paraphyly, seems unlikely from several important standpoints. Two especially important characters that support conifer monophyly are the structure of the chloroplast genome and morphological features of the ovuliferous cones.
The chloroplast DNA in most land plants contains two copies of a large inverted repeat of about 1025 kilobase pairs (kbp), separated by two single-copy sequences of about 20 kbp and 80 kbp (Palmer and Stein, 1986
). In an investigation of the structure of the chloroplast genome across land plants, Raubeson and Jansen (1992)
documented an important modification shared by all conifers, consisting of the presence of a single copy of the inverted repeat, whereas all other sampled land plants, including the three genera of Gnetales, have two copies. Maps of the entire chloroplast region were consistent with the interpretation that the missing copy of the inverted repeat is the same, suggesting a single loss that characterizes all conifers (Raubeson and Jansen, 1992
). The absence of the inverted repeat in some legumes is interpreted as an independent loss (Raubeson and Jansen, 1992
). Although it may be possible that the distribution of a single copy of the inverted repeat among the conifers and Gnetales may result from homoplasy between Pinaceae and all other conifers, or from a gain of the lost copy in Gnetales, the fact that homoplasy in structural characters of the genome is less frequent than among sequence data supports the monophyly of conifers.
From a morphological standpoint, one feature that suggests conifer monophyly is their elaborate compound ovuliferous cone. The female reproductive structures of most living conifers are compound cones consisting of a main axis bearing sterile bracts with an ovuliferous scale on their axil and one to many ovules associated with the ovuliferous scales. Each ovuliferous scale corresponds to a highly modified fertile short shoot (i.e., a brachiblast, e.g., Florin, 1951
; Clement-Westerhof, 1988
; Mapes and Rothwell, 1991
). The Gnetales also have compound ovuliferous cones. Although the exact nature of the structures between the axil of the primary bract and the gnetalean ovule are not entirely clear, they seem to correspond more to a conventional, though extremely reduced, brachiblast that bears successive pairs of opposite and decussate bracts that envelope the ovule than to the highly modified ovuliferous scale of the conifers. The placement of Gnetales within the conifers implies either an independent and convergent modification from fertile brachiblasts to yield ovuliferous cone scales in Pinaceae and in the lineage leading to all other conifers, or the deconstruction of the ovuliferous scale into a more conventional axis-like structure in the line leading to Gnetales. Conifer taxa that lack the axis-bract-ovuliferous scale organization (i.e., Podocarpaceae, Phyllocladaceae, and Taxaceae) usually display conditions that can be traced back to the ovuliferous cone organization widespread among conifers. Whereas the ovuliferous scale may be highly modified, highly reduced, or perhaps lost in some conifers, it is not known to have regressed to a conventional axial organization.
Summary and concluding remarks
In this study, we conducted analyses on a comprehensive sample of land plants, based on different codon partitions of psaA and psbB, to investigate phylogenetic relationships among major clades of seed plants. Results of parsimony analyses were mostly congruent with parsimony results based on the same genes for a smaller taxonomic sample (Sanderson et al., 2000
), as well as results based on other chloroplast genes (e.g., Bowe, Coat, and dePamphilis, 2000
; Chaw et al., 2000
). As in previous studies, we found great incongruence in parsimony hypotheses of relationships among major clades of seed plants inferred from different codon partitions. Exhaustive maximum likelihood analyses also revealed conflicting phylogenetic results stemming from the two different partitions, but while the results from first and second codon positions are similar to the first and second position parsimony results, the third position maximum likelihood topology is different from the parsimony topology obtained using the same data. However, the third position maximum likelihood topology is partially similar with topologies obtained from first and second positions.
The conflicting results obtained in this study, as well as in other works (summarized in the introduction), indicate that an unambiguously supported hypothesis of phylogenetic relationships among seed plants has not yet been obtained. Lingering problems are the precise placement of Gnetales and the question of gymnosperm monophyly. In sharp contrast, however, there has been considerable improvement in the resolution of phylogenetic relationships within major clades of seed plants. The consistency of detected relationships within clades usually spans not only different phylogenetic analyses performed in this study, but also results of independent studies, based on different genes.
Resolving phylogenetic relationships among extant seed plants has proven to be an extraordinarily difficult problem, further complicated by the substantial loss of seed plant diversity to extinction and the extremely long time during which the surviving lineages have been evolving independently. During this long period, lineages have probably accumulated convergent molecular character states and evolved uniquely derived morphological attributes, the homologies of which are difficult to trace. Whereas recent research has documented the relevance of the type and treatment of data and different methods of analysis, it appears that a definitive solution to the problem is unlikely to stem from analysis of more sequence data and/or from greater taxon sampling alone. We have shown phylogenetic incongruence resulting from different treatments of the data and believe that, in the case of relationships among major clades of seed plants, adding more genes into phylogenetic analyses may simply provide greater support for conflicting results. Adding more taxa has proven useful in solving particular phylogenetic problems because critically selected taxa may effectively break the long branches. Adding a larger number of taxa is probably one of the reasons why a very substantial improvement in resolution of relationships within angiosperms has been achieved. Nevertheless, sorting relationships among seed plants is a very different problem because, at least comparatively, most angiosperm diversity is living, whereas most seed plant diversity is extinct. Avenues of research that may prove useful include, in addition to the use of different types of molecular data, information from the structure of the genome and a renewed consideration of morphological data for living and fossil seed plants.
|
| FOOTNOTES |
|---|