Am. J. Bot. Join the BSA
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Supplemental Data
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (46)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Magallón, S.
Right arrow Articles by Sanderson, M. J.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Magallón, S.
Right arrow Articles by Sanderson, M. J.
Agricola
Right arrow Articles by Magallón, S.
Right arrow Articles by Sanderson, M. J.
(American Journal of Botany. 2002;89:1991-2006.)
© 2002 Botanical Society of America, Inc.


Systematics

Relationships among seed plants inferred from highly conserved genes: sorting conflicting phylogenetic signals among ancient lineages1

Susana Magallón2 and Michael J. Sanderson

Section of Evolution and Ecology, University of California, One Shields Avenue, Davis, California 95616 USA

Received for publication March 7, 2002. Accepted for publication June 20, 2002.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 LITERATURE CITED
 
Phylogenetic studies based on different types and treatment of data provide substantially conflicting hypotheses of relationships among seed plants. We conducted phylogenetic analyses of sequences of two highly conserved chloroplast genes, psaA and psbB, for a comprehensive taxonomic sample of seed plants and land plants. Parsimony analyses of two different codon position partitions resulted in well-supported, but significantly conflicting, phylogenetic trees. First and second codon positions place angiosperms and gymnosperms as sister clades and Gnetales as sister to Pinaceae. Third positions place Gnetales as sister to all other seed plants. Maximum likelihood trees for the two partitions are also in conflict. Relationships among the main seed plant clades according to first and second positions are similar to those found in parsimony analysis for the same data, but the third position maximum likelihood tree is substantially different from the corresponding parsimony tree, although it agrees partially with the first and second position trees in placing Gnetales as the sister group of Pinaceae. Our results document high rate heterogeneity among lineages, which, together with the greater average rate of substitution for third positions, may reduce phylogenetic signal due to long-branch attraction in parsimony reconstructions. Whereas resolution of relationships among major seed plant clades remains pending, this study provides increased support for relationships within major seed plant clades.

Key Words: angiosperms • Gnetales • gymnosperms • long-branch attraction • maximum likelihood • psaApsbB • third codon positions


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 LITERATURE CITED
 
The living seed plants encompass extremely high species richness and extensive morphological variation. Nevertheless, extant seed plant taxa represent only a fragment of historical seed plant diversity. With the probable exception of angiosperms, the living taxa correspond to a reduced taxonomic representation within each of their respective major groups (e.g., extant vs. extinct cycads). In spite of the vast morphological diversity of living seed plants, major structural and organizational types are no longer represented among extant forms (e.g., hydrasperman-type reproduction; Rothwell, 1986 ; Rothwell and Scheckler, 1988 ). Extant seed plant lineages, except the angiosperms, are ancient, and encompass hundreds of millions of years of independent evolution during which uniquely derived and convergent characters probably originated. Given extensive extinction and independent evolution and convergence in both morphological characters and gene sequences, tracing phylogenetic relationships among seed plant lineages has been a complex endeavor.

Morphology-based phylogenetic studies addressing relationships among major lineages of seed plants have been instrumental in providing a framework to test hypotheses of evolutionary relationships and character homology (e.g., Crane, 1985 ; Doyle and Donoghue, 1986 ; Loconte and Stevenson, 1990 ; Nixon et al., 1994 ; Rothwell and Serbet, 1994 ). Although results of these studies conflict significantly with each other, they agree in identifying a clade that contains Gnetales and the angiosperms, together with the extinct Bennettitales (Nixon et al., 1994 ) and Pentoxylales (Crane, 1985 ; Doyle and Donoghue, 1986 ; Rothwell and Serbet, 1994 ). One major implication of this result is that among extant seed plants, Gnetales and angiosperms are most closely related (cf. e.g., Donoghue and Doyle, 2000 ). Gnetales display a number of important similarities, including significant anatomical characters, with the conifers and other gymnosperms (for a detailed review see Carlquist, 1996 ). However, Gnetales also exhibit several characters, some of considerable significance, similar to those found in angiosperms, which, in the absence of an explicit phylogenetic hypothesis, could be interpreted either as convergent or homologous between the two groups. In the context of morphology-based phylogenetic results, the similarities between Gnetales and angiosperms were naturally interpreted as attributes shared by the two groups.

Molecular phylogenetic analyses of seed plants
Beginning in the early 1990s, molecular-based analyses of major seed plant lineages brought a dramatic revision to the concept of the phylogenetic closeness between Gnetales and the angiosperms. Nevertheless, a single, consistently recurring alternative scheme of relationships among major extant seed plant lineages, particularly regarding the placement of Gnetales, has not yet emerged. Rather, available results suggest that particular types of molecular data (e.g., nuclear vs. chloroplast genes), treatment of the primary data (e.g., use of all data vs. exclusion or downweighting of some characters), and the combination of two or more genes from different genomic compartments usually result in one of three distinct phylogenetic hypotheses among major lineages of seed plants. Published studies also hint at the differential effect of the use of parsimony (MP) or maximum likelihood (ML) methods of analysis of the same data (see below; Table 1).


View this table:
[in this window]
[in a new window]
 
Table 1. Summary of results of molecular-based phylogenetic analyses of seed plants. Phylogenetic analyses based on particular combinations of data and treatment of data mostly resolve one of three general phylogenetic hypotheses among major clades of seed plants. [1–2]: first and second codon positions; [3] = third codon positions; [all] = all codon positions; AA = amino acid sequences; BA = Bayesian analysis; ML = maximum likelihood; MP = parsimony; NJ = neighbor joining

 
One of the three phylogenetic hypotheses places Gnetales as the sister taxon to a clade that includes all other seed plants. Within this primary phylogenetic scheme, a secondary one is the placement of angiosperms as the sister to a clade that includes the cycads, Ginkgo, and the conifers. This result is obtained from parsimony analysis of chloroplast gene nucleotide sequences in which all equally weighted codon positions or only third codon positions are included (Bowe, Coat, and dePamphilis, 2000 , p. 4095; Figs. 2B and 3B in Sanderson et al., 2000 ; Figs. 1B and 2C in Rydin, Källersjö, and Friis, 2002 ; Fig. 3 in Soltis, Soltis, and Zanis, 2002 ; Table 1). Combined sequences of the chloroplast genes atpB and rbcL and the nuclear 18S rDNA, with all positions included, also provided this pattern (Fig. 1A in Rydin, Källersjö, and Friis, 2002 ). Maximum likelihood analyses of chloroplast gene sequences sometimes provide the same phylogenetic hypothesis (i.e., Fig. 2A in Chaw et al., 2000 ; Fig. 5B in Sanderson et al., 2000 ), but a substantially different one is obtained more frequently (see below, Table 1). Exceptions in which parsimony analyses of this type of data provide a different phylogenetic hypothesis were obtained by Chaw et al. (2000 , p. 4088; Gnetales resolved as sister to the angiosperms) and Samigullin et al. (1999 , Fig. 6; Gnetales resolved as sister to all other gymnosperms).

A second phylogenetic hypothesis places angiosperms as sister to a clade that includes all gymnosperms and places Gnetales as the sister to Pinaceae, rendering the conifers paraphyletic (Table 1). This result is obtained from the following combinations of data and method of analysis: (a) parsimony analysis of chloroplast gene sequences from which third codon positions are downweighted or excluded (Fig. 3A in Sanderson et al., 2000 ). Note that including all positions or only third codon positions of the same genes using the same method of analysis produces a substantially different phylogenetic hypothesis (see above; Table 1); (b) parsimony analysis of mitochondrial gene sequences in which all positions are included (Fig. 2 in Soltis, Soltis, and Zanis, 2002 ); (c) maximum likelihood or Bayesian analysis of chloroplast or mitochondrial gene sequences in which all positions are included (Figs. 1B and 2 in Bowe, Coat, and dePamphilis, 2000 ; Figs. 1A and 2B in Chaw et al., 2000 ; Figs. 4 and 5 in Soltis, Soltis, and Zanis, 2002 ) or third positions are partially or completely excluded (Fig. 2C in Chaw et al., 2000 ; Fig. 5A in Sanderson et al., 2000 ); and (d) parsimony, maximum likelihood, or Bayesian analysis of combined genes from different genomic compartments in which all positions are included (Fig. 1 in Qiu et al., 1999 ; Fig. 3B in Bowe, Coat, and dePamphilis, 2000 ; Fig. 2 and p. 172 in Gugerli et al., 2001 ; Figs. 6 and 7 in Soltis, Soltis, and Zanis, 2002 ) or third positions are excluded (at least partially; Fig. 3 in Nickrent et al., 2000 ). Several studies based on one of the four conditions described above do not resolve exactly the described phylogenetic pattern, but neither do they contradict it, either because of the presence of polytomies (Figs. 2A and 4A–B in Sanderson et al., 2000 ) or a limited taxonomic sample (Fig. 8A–B in Goremykin et al., 1996 ; Fig. 1B in Hansen et al., 1999 ; Fig. 5 in Samigullin et al., 1999 ). Exceptions in which a different phylogenetic hypothesis is obtained were found by Hasebe et al. (1992 , Fig. 1B; Gnetales are resolved as sister to cycads) and Nickrent et al. (2000 , Fig. 2; Gnetales are resolved as sister to Juniperus, but not to Pinus).



View larger version (29K):
[in this window]
[in a new window]
 
Fig. 1. Strategy for maximum likelihood searches. Maximum likelihood searches were performed independently for first and second positions and for third positions for the concatenated genes. Each search consisted of an initial step in which the transition/transversion ratio (ti/tv) and the shape of the gamma distribution ({alpha}) were estimated and a second step in which the estimated parameters were specified to search for most likely topologies. All searches were conducted under HKY85 + {Gamma}, with tree bisection-reconnection (TBR) branch swapping. In the first step, the likelihood scores of two different topologies were estimated. The ti/tv and {alpha} parameters associated with the most likely topology were selected for the second step in the maximum likelihood searches. In the second step, four different starting conditions, in which ti/tv and {alpha} were specified, were used to obtain maximum likelihood topologies. The topology with highest –ln L was selected as the ML tree for each partition. The four searches for third position data resulted in the same maximum likelihood topology

 


View larger version (49K):
[in this window]
[in a new window]
 
Fig. 2. Summary of results of parsimony analyses. Columns correspond to different codon position partitions (first and second = [1–2], third = [3], and all three = [all]), and rows correspond to gene partitions (psaA, psbB, and psaA-psbB). Differences between genes and gene combinations are subordinate to differences between codon position partitions. Parsimony analyses of first and second positions place angiosperms and gymnosperms as sister taxa and Gnetales as sister to Pinaceae. Third positions and all positions resolve Gnetales as sister to all other seed plants, except for the analysis for psbB [all]. Bootstrap values for clades are indicated on trees for psaA-psbB. Podocarpaceae includes Phyllocladaceae. Taxoids = Taxaceae and Cupressaceae (including Taxodiaceae)

 
A third phylogenetic hypothesis places angiosperms as sister to a clade that includes all gymnosperms and places Gnetales as the sister to the conifers. This result is obtained from parsimony or maximum likelihood analysis of nucleotide sequences of the nuclear 18S rDNA (Table 1). This phylogenetic hypothesis is obtained under different treatments of the primary data, including equal weights to all sites (Fig. 2 in Chaw et al., 1997 ; Fig. 1 in Soltis et al., 1999 ; Fig. 1B in Chaw et al., 2000 ; Fig. 1 in Soltis, Soltis, and Zanis, 2002 ), differential weight to transitions and transversions (Fig. 1 in Chaw et al., 1997 ), or subjecting the data set to RASA analysis (Fig. 1C in Bowe, Coat, and dePamphilis, 2000 ). Whereas 18S rDNA nucleotide sequences consistently provide trees in which Gnetales are the sister to the conifers, no other nuclear gene, nor any other gene, has so far consistently resolved the same phylogenetic hypothesis (Table 1). A parsimony analysis for nucleotide sequences of 26S rDNA in which only transversions were considered (including only 124 parsimony-informative characters) placed angiosperms and Gnetales as sister taxa (Fig. 4 in Stefanovic et al., 1998 ). A parsimony analysis of amino acid sequences of homologues of the nuclear gene Floricaula/LEAFY placed Gnetales as sister to a clade that includes all other gymnosperms (Fig. 1 in Frohlich and Parker, 2000 ), but a maximum likelihood analysis of the same data placed Gnetales as sister to Pinus, the single conifer included in the data set (Fig. 2 in Frohlich and Parker, 2000 ). Finally, a neighbor-joining analysis of amino acid sequences of several MADS-type floral homeotic genes for a small sample of seed plants also placed Gnetales closer to Pinaceae than to angiosperms (Fig. 1 in Winter et al., 1999 ), but provided no additional phylogenetic information. It is therefore unclear if the placement of Gnetales as sister to the monophyletic conifers is a result exclusive to 18S rDNA, or if other nuclear genes, if analyzed using all sequence data for a more comprehensive taxonomic sample and/or larger amount of informative data, would yield the same result.

In this study we investigate phylogenetic relationships among major seed plant lineages by utilizing sequence data of two highly conserved chloroplast genes, psaA and psbB, for a comprehensive taxonomic sample across seed plants. In a previous study, Sanderson et al. (2000) , using sequences of the same genes, established strongly supported, but significantly conflicting phylogenetic hypotheses resulting from different partitions of the data when analyzed using parsimony. In this study, we have tripled the taxonomic sample across seed plants. The goal of this study is to address the relationships among major clades of seed plants in the context of previously detected conflicting phylogenetic signals. The expanded taxonomic sample should provide increased resolution and support for phylogenetic relationships among seed plants, as well as permit us to evaluate the general congruence of phylogenetic results with those obtained in studies focused on particular seed plant clades. We further evaluate the conflict between phylogenetic signals provided by different partitions of sequence data when analyzed using parsimony. We explicitly document the effect of maximum likelihood analysis in recovering the phylogenetic signal from a given data partition: the maximum likelihood tree is profoundly different from the tree resulting from parsimony analysis of the same data.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 LITERATURE CITED
 
Taxonomic sample
The taxonomic sample used in this study includes a dense representation of all extant seed plant clades, as well as members of all other major clades of embryophytes. Each seed plant clade is represented by taxa that span the root node of internal major monophyletic groups identified repeatedly in independent phylogenetic analyses (except Ginkgo). The taxonomic sample includes a total of 63 taxa (61 genera), representing 54 seed plants (52 genera), eight additional tracheophytes, and one liverwort. The list of included species, together with their taxonomic attribution, collection information, and GenBank-EMBL data base accession numbers, have been archived at the Botanical Society of America website (http://ajbsupp.botany.org/v89/).

Angiosperms are represented by 32 genera including four genera belonging to lineages diverging close to the root of the clade, nine genera belonging to seven "magnoliid" lineages, six genera of monocots, and 13 genera of eudicots, including eight core eudicots. The conifers are represented by 13 genera and 15 species belonging to major clades recognized in previous studies (e.g., Chaw et al., 1997 , 2000 ; Stefanovic et al., 1998 ; Soltis et al., 1999 ). Pinaceae, a morphologically divergent lineage among extant conifers (e.g., Page, 1990 ), is represented by six species belonging to four genera. Araucariaceae is represented by one genus, Podocarpaceae (including Phyllocladaceae) and Taxaceae are each represented by two genera, and Cupressaceae (including Taxodiaceae) is represented by four genera. The three living genera of Gnetales and the single living species of Ginkgophyta are also included. Cycads are represented by one genus of Cycadaceae and two genera of Zamiaceae. Other tracheophytes are represented by seven members of the Moniliformopses (sensu Pryer et al., 2001 ), including Equisetum, Psilotum, Angiopteris, Ophioglossum, and three members of Polypodiidae (i.e., leptosporangiate ferns, Pryer et al., 2001 ), and by a representative of Lycophytina (i.e., Huperzia). Nonvascular embryophytes are represented by the liverwort Marchantia (Marchantiaceae), which was designated as the single outgroup.

Genes and molecular methods
Nucleotide sequences of two highly conserved chloroplast genes, psaA and psbB, were used as primary data for phylogenetic analyses. Thylakoid membrane-bound structural proteins that function in the chloroplast photosystems I and II are encoded by psaA and psbB, respectively (Ort and Yocum, 1996 ). The low rate of replacement at the amino acid level (Sanderson et al., 2000 ) and the high conservation of nucleotide sequences across land plants observed in psaA and psbB (see below) render these genes appropriate for addressing relationships among clades of seed plants. As a comparison, psaA and psbB are longer (2253 and 1527 base pairs, respectively) and have a higher percentage of amino acid similarity than rbcL and atpB (Olmstead and Palmer, 1994 ).

DNA of all sampled genera was extracted from leaves collected from plants cultivated in botanical gardens and arboreta, except for one genus (Phyllocladus), for which an aliquot of stock DNA was kindly provided by a colleague. DNA was extracted from the macerated material using a DNeasy Plant mini kit (Qiagen, Valencia, California, USA) following the manufacturer's protocol, except for the following minor modifications: 500 µL of buffer AP1, 5 µL of RNAse, and 160 µL of buffer AP2. Depending on the relative concentration of DNA in the stock solution, 1 : 10 or 1 : 100 dilutions were prepared for gene amplification by polymerase chain reaction (PCR).

Nondegenerate PCR primers that match conserved segments at the 5' and 3' ends of each gene were initially designed by comparing aligned sequences obtained from previously available complete chloroplast genome sequences of Nicotiana, Oryza, Zea, Pinus, and Marchantia (Sanderson et al., 2000 ). Primer sequences were subsequently compared and verified against aligned sequences of the larger taxonomic sample resulting from the study by Sanderson et al. (2000) . Positions and sequences of primers used for PCR and sequencing are shown in Table 2. Purified PCR products were sequenced directly by automated fluorescent dye methods on an ABI model 377 sequencer. For each taxon, sequences obtained from reactions using different sequencing primers were edited and assembled using Sequencher 4.0 (GeneCodes, Ann Arbor, Michigan, USA). The extremely high conservation of psaA and psbB sequences across land plants allowed straightforward visual alignment. Gaps were treated as missing characters.


View this table:
[in this window]
[in a new window]
 
Table 2. Sequence and position of primers used for polymerase chain reaction (PCR) and sequencing of psaA and psbB. The external primers psaA7 and psaA2192R were used for PCRs for all sequenced taxa, except Gnetum gnemon, for which the primer pair psaA1000–psaA2192R was used, yielding a partial psaA sequence (see text). PsbB was amplified using the external primers psbB3 and psbB1394R, except in Asplenium nidus and Marsilea mutica, for which the primer pair psbB3–psbB990R was used (see text). All primers were used in sequencing reactions

 
Methods of phylogenetic analyses
Hypotheses of phylogenetic relationships were obtained through parsimony (MP) and maximum likelihood (ML) analyses. All phylogenetic analyses were conducted using PAUP* version 4.0b8 for Macintosh or PAUP* versions 4.0b4 or 4.0b5 for UNIX (Swofford, 2001 ). The compatibility of phylogenetic signals provided by psaA and psbB and between first and second vs. third codon positions were assessed by using the incongruence length difference test (ILD; Farris et al., 1994 ), implemented as the partition homogeneity test in PAUP*. First, the compatibility between the two genes (including all codon positions) was assessed (gene partition), and subsequently, the compatibility between the first and second vs. the third codon positions of the two concatenated genes was tested (codon position partition). Both tests were performed using parsimony and consisted of 500 heuristic searches, each with ten replicates with random addition of sequences, tree bisection-reconnection (TBR) branch-swapping, and saving multiple trees (MulTrees).

Parsimony analyses were conducted for nucleotide sequences of psaA and psbB and the concatenated sequences of both genes. In each case, separate analyses for first and second, third, and all codon positions were performed. Although results of the partition homogeneity test indicate phylogenetic incompatibility between first and second vs. third codon positions for the two genes (see RESULTS), we nevertheless performed parsimony analyses using all positions, to compare the resulting tree(s) with those obtained from analysis of only first and second positions, and only third positions, to evaluate if one of the corresponding signals predominates in the trees obtained using all codon positions. Parsimony analyses consisted of heuristic searches with 100 replicate random additions of sequences, TBR branch swapping, saving multiple trees (MulTrees). A strict consensus tree of all equally most parsimonious trees was obtained for each data set. The robustness of internal branches was assessed through bootstrap analysis (Felsenstein, 1985 ), implemented on the two codon position partitions and on all positions for the concatenated genes (i.e., psaA-psbB [1–2], psaA-psbB [3], psaA-psbB [all]). Each search consisted of 500 bootstrap replicates in which the number of resampled characters was equal to the number of characters used in the respective parsimony analysis (NCHAR = current). Each bootstrap replicate consisted of a full heuristic search performed with the same options as in the parsimony searches, but with only ten random addition replicates (instead of 100). Groups with a frequency >50% were retained.

Maximum likelihood analyses were performed separately for first and second and for third codon positions for the concatenated sequences of psaA and psbB, because of their incongruent signals, as indicated by a partition homogeneity test (see RESULTS). Maximum likelihood searches were broken into two steps: (1) estimating the transition/transversion ratio (ti/tv) and the shape of the gamma distribution ({alpha}) from trees obtained through parsimony and (2) using the estimated parameters to obtain maximum likelihood trees (Fig. 1). Both steps involved heuristic searches using an HKY85 substitution model with gamma-distributed site-to-site rate variation (HKY + {Gamma}) and TBR branch swapping, without enforcing a molecular clock.

In step one, the likelihood score and ti/tv and {alpha} parameters for two different parsimony topologies were estimated (Fig. 1). For each data set, the estimated ti/tv and {alpha} corresponding to the topology with the highest likelihood score were selected for the next phase of the maximum likelihood searches (Fig. 1).

In the second step, each data set was subjected to four maximum likelihood searches that differed in their heuristic starting condition, in which the ti/tv and {alpha} parameters estimated in the previous step were specified. The same four different starting conditions were applied to each of the data sets (Fig. 1). Of the four trees obtained for each data set, the one with the overall highest likelihood score was selected as the maximum likelihood tree for the corresponding codon position partition.

The level of support in the maximum likelihood tree for third codon positions (ML psaA-psbB [3]) was obtained by bootstrap analysis. Bootstrap analysis was conducted by performing replicate maximum likelihood searches, with stepwise AS IS addition of sequences, using the same search conditions as described above (i.e., HKY85 + {Gamma}) and specifying the values of ti/tv and {alpha} corresponding to psaA-psbB [3]. We succeeded in performing 50 of these computationally intensive bootstrap replicates over a period of several months using four processors. To run simultaneous bootstrap replicates in different processors, we prepared ten batch files that differed in their starting number random seed, each specified to run five bootstrap replicates and to save the resulting bootstrap trees into files. A 50% majority rule consensus bootstrap tree was estimated by aggregating the trees in the files (using the RETAIN TREES PREVIOUSLY IN MEMORY command) and weighting trees accordingly to the number of trees found in each bootstrap replicate, so that the 50 bootstrap replicates (but not the 70 resulting bootstrap trees) had equal weight.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 LITERATURE CITED
 
Sequences
The psaA and psbB data sets contain completely overlapping taxonomic samples, and thus, the combined data set (psaA-psbB) comprises the full set of 63 taxa. In almost all cases, psaA and psbB sequences were obtained from the same DNA extraction; only in one case were sequences obtained from different DNA extractions from the same plant (Chloranthus spicatus), in one case DNA was extracted from different plants of the same species (Asplenium nidus), and in one case the sequences of each gene were obtained from different species of the same genus (Marsilea botryocarpa and M. mutica). This study contributes new sequences of the psaA gene for 41 taxa and new sequences of the psbB gene for 43 taxa (see supplementary information for GenBank-EMBL accession numbers at http://ajbsupp.botany.org/v89).

The primers used in PCR reactions allowed amplification of nearly the complete sequence for each gene in almost all taxa. External primers did not yield psaA PCR products for Gnetum gnemon, and, after trial-and-error experimentation with different internal primers, we obtained a segment of the gene from approximately bp 1000 to the end of its sequence (about 1200 bp). External primers for psbB also failed to yield PCR products for Asplenium nidus and Marsilea mutica. The use of internal primers allowed extraction of a segment of psbB from the beginning of the gene to approximately bp 1000. Visual alignment of sequences was achieved easily, given the extreme conservation of the sequences of both genes across land plants. A 3-bp insertion was detected in the psaA sequences of Ephedra and Welwitschia. It is not known whether or not it is present in Gnetum, because the region where this insertion occurs (near the beginning of the sequence), could not be amplified (see above). An insertion at the same site was also detected in Zea, but not in the closely related Oryza. Indels were not found in psbB. The number of characters and of parsimony-informative characters in each gene and codon position partition are listed in Table 3.


View this table:
[in this window]
[in a new window]
 
Table 3. Results of parsimony analyses. Total number of characters, number of parsimony-informative characters, and tree statistics for nine parsimony (MP) analyses. The search for psbB [1–2] aborted during the first random addition replicate. CI = consistency index; RI = retention index; RC = rescaled consistency index; [1–2] = first and second codon positions; [3] = third codon positions; [all] = all codon positions

 
Partition homogeneity test
The results of the partition homogeneity test for psaA vs. psbB, including all codon positions, indicated that their phylogenetic signals are not significantly different (P = 0.164). Nevertheless, this value is closer to significance than that obtained in a previous assessment of the compatibility between the two genes based on a smaller taxonomic sample (i.e., P = 0.77; Sanderson et al., 2000 ). Congruence between first and second vs. third codon positions was evaluated on the basis of the concatenated sequences of the two genes (i.e., psaA-psbB [1–2] vs. psaA-psbB [3]). We corroborate the previously documented (Sanderson et al., 2000 ) significant incongruence between the phylogenetic signals of the two codon position partitions (P = 0.008*).

Parsimony analyses
Nine parsimony analyses were performed. The number of MP trees found in each analysis and their lengths and scores are presented in Table 3. All except one of the parsimony searches ran to completion. The exception was the parsimony search for psbB using first and second positions, which aborted after filling the computer's memory with equally parsimonious trees during the first random addition sequence replicate.

Congruent phylogenetic results among all parsimony analyses are the monophyly of seed plants, cycads, Gnetales, and angiosperms, and the fact that angiosperms do not appear to be closely related to any particular group of living gymnosperms. Whereas the monophyly of conifers depends on the treatment of data and method of analysis (see below), nearly all MP trees congruently detected (1) a clade that includes all Pinaceae, (2) a clade that includes Araucariaceae plus the strongly supported Podocarpaceae (including Phyllocladus), and (3) a clade that includes Taxaceae and Cupressaceae (including Taxodiaceae), informally referred to here as the "taxoids" (Sciadopitys and Cephalotaxus, which were not sampled in our analysis, also belong to this clade; cf. Chaw et al., 1997 , 2000 ; Stefanovic et al., 1998 ). In seven (out of nine) MP trees, Araucariaceae plus Podocarpaceae are the sister group to the "taxoids." In spite of the consistency of these important results, the strict consensus trees derived from each of the nine parsimony analyses conform to one of two very different phylogenetic hypotheses, depending on the codon position partition, but regardless of the gene or gene combination used to generate them. The differences between these two phylogenetic hypotheses involve the relationships among major clades of seed plants, particularly regarding the placement of Gnetales, and consequently, the monophyly or paraphyly of conifers. A summary of the strict consensus trees from the nine parsimony analyses is presented in Fig. 2.

Trees resulting from parsimony analysis of first and second codon positions (psaA [1–2], psbB [1–2] and psaA-psbB [1–2]) place angiosperms and gymnosperms as sister clades. Within gymnosperms, Gnetales are sister to Pinaceae, and thus, the conifers are paraphyletic (Fig. 2). Differences in phylogenetic results from the two genes and gene combinations involve the sister taxon to seed plants. Additionally, there are differences in the relationships among the three major clades of conifers (Fig. 2) and within angiosperms (results not shown).

Trees resulting from parsimony analyses of third codon positions (psaA [3], psbB [3], and psaA-psbB [3]) and of the three codon positions (psaA [all], psbB [all], and psaA-psbB [all]) depict the same general phylogenetic hypothesis, but a very different one from that derived from first and second codon positions (Fig. 2). Thus, when evaluated in combination, the signal of third positions predominates over that of the first and second, as expected, given the substantially larger number of parsimony-informative characters provided by third codon positions (Table 3). Nearly all MP trees based on third and on all positions place Gnetales as the sister taxon to all other seed plants and angiosperms as the sister to a clade that includes the cycads, Ginkgo, and the conifers (Fig. 2). This pattern is not recovered, but neither is it contradicted, in the strict consensus tree obtained from psbB [all] (Fig. 2). Differences among trees are the sister taxon to seed plants and the placement of cycads and Ginkgo with respect to conifers (Fig. 2). There are additional differences in the relationships among the three major clades of conifers (Fig. 2) and within angiosperms (results not shown).

The difference between parsimony trees resulting from the two codon position partitions is even more striking when considering the high bootstrap support associated with many of the branches in each topology (Fig. 3). A list of bootstrap support percentages (% BS) according to different partitions for the concatenated genes is presented in Table 4. For example, a clade that includes Gnetales and Pinaceae is supported by 96% BS according to first and second positions, but this clade is not detected according to third positions (nor all positions). Nevertheless, major clades recognized by both codon partitions are strongly supported; for example, seed plants, cycads, Gnetales, and angiosperms are all supported by 100% BS (Fig. 3, Table 4).



View larger version (58K):
[in this window]
[in a new window]
 
Fig. 3. Strict consensus of most parsimonious trees for psaA-psbB. (A) Strict consensus of 510 equally parsimonious trees obtained from first and second codon positions. There is substantial irresolution within angiosperms. (B) Strict consensus of 16 equally parsimonious trees obtained from third codon positions. (C) Strict consensus of 30 equally parsimonious trees obtained from all codon positions. Numbers above branches represent bootstrap support. MAR = Marchantiaceae, HUP = Huperzia, MON = Moniliformopses, CYC = cycads, GKO = Ginkgo, GNE = Gnetales, CON = conifers, PIN = Pinaceae, APP = Araucariaceae plus Podocarpaceae (including Phyllocladaceae), TAX = "taxoids": Taxaceae and Cupressaceae (including Taxodiaceae), ANGIOS = angiosperms

 

View this table:
[in this window]
[in a new window]
 
Table 4. Bootstrap support for clades. Bootstrap support associated with clades according to parsimony analyses for psaA-psbB first and second positions (psaA-psbB [1–2]), third positions (psaA-psbB [3]), and all positions (psaA-psbB [all]) and maximum likelihood for third positions. Not resolved = relationship not found in the strict consensus of equally parsimonious trees or in the maximum likelihood tree; <50% = relationship resolved in the strict consensus of equally parsimonious trees or in the maximum likelihood tree, but not in the corresponding 50% majority rule tree for bootstrap trees. MP = parsimony; ML = maximum likelihood; [1–2] = first and second positions; [3] = third positions; [all] = all positions

 
In spite of the profoundly different phylogenetic relationships among major clades of seed plants inferred from different codon position partitions, relationships within each major clade are in substantial agreement (Fig. 3A–C). Shared phylogenetic results outside seed plants include the placement of Psilotum within Moniliformopses (and Equisetum in psaA-psbB [1–2]) and a strongly supported Polypodiidae clade, both in agreement with the results of Pryer et al. (2001) . Among the major seed plant clades, there is congruence in detected phylogenetic relationships within cycads and Gnetales. Within conifers, all trees place Pinaceae (alone or together with Gnetales) as the sister to all other conifers. Araucariaceae plus Podocarpaceae form the sister taxon to the "taxoids," except in the first and second positions tree, where the two clades are recognized, but their sister relationship is not. Relationships within the "taxoids" are equal in all trees (Fig. 3A–C). Shared phylogenetic results are comparatively scarce within angiosperms, due, to some extent, to extensive irresolution in the strict consensus tree for psaA-psbB [1–2] (Fig. 3A). However, shared results include the strongly supported Nymphaealean clade (Nuphar plus Nymphaea) and Zea plus Oryza. Monocots and eudicots are recognized in all psaA and psaA-psbB trees (Fig. 3), but monocots are not recognized in the psbB [1–2] trees, and eudicots do not form a clade according to psbB [3] nor psbB [all] (results not shown).

Maximum likelihood analyses
To avoid averaging values for the transition/transversion ratio (ti/tv) and the shape of gamma distribution ({alpha}) over the extremely different substitution parameters that characterize each of the two codon position partitions, maximum likelihood analyses were performed separately for first and second and for third codon positions. Parameters for ti/tv and {alpha} were estimated in the initial step of the maximum likelihood analyses (HKY85 + {Gamma}; TBR; estimating ti/tv and {alpha}; Fig. 1). For first and second positions, the search starting with 1 of 510 equally most parsimonious trees from psaA-psbB [1–2] produced the tree with the greater likelihood score, –ln L = 13 568 (ti/tv = 2.247; {alpha} = 0.186). The estimates of ti/tv and {alpha} in the search with the alternative starting tree are very close to the selected parameters (ti/tv = 2.137; {alpha} = 0.184; Fig. 1). For third codon positions, the search starting from 1 of 30 equally most parsimonious trees obtained from psaA-psbB [all] produced the more optimal tree (–ln L = 30 723). The associated parameters (ti/tv = 4.217, {alpha} = 1.362) were used to estimate maximum likelihood trees for third position data in the next step of the maximum likelihood analysis. Parameters resulting from the alternative search for third positions were also very close to the selected values (ti/tv = 4.224; {alpha} = 1.335).

In the second step of the maximum likelihood analyses, four heuristic searches (HKY85 + {Gamma}; TBR; ti/tv and {alpha} specified; Fig. 1) with different starting conditions were undertaken to obtain ML trees for first and second and for third codon positions. The overall ML tree resulting from searches using first and second positions has a score of –ln L = 13 554.64 (Fig. 4A). The four searches using third position data provided the same ML tree, with a score of –ln L = 30 139.63 (Fig. 4B).



View larger version (55K):
[in this window]
[in a new window]
 
Fig. 4. Maximum likelihood trees for psaA-psbB. (A) Maximum likelihood phylogram for first and second codon positions. Relationships among major clades of seed plants are similar to those inferred from the parsimony analysis of the same data: angiosperms are sister to a clade that includes all gymnosperms and Gnetales and Pinaceae are sister taxa. Relationships within angiosperms are incongruent with those found in independent analyses. (B) Maximum likelihood phylogram for third codon positions. Relationships among major clades of seed plants conflict with those found in parsimony analyses for the same data, but are partially similar to those found using first and second codon positions. Differences include the clades resulting from the deepest split within seed plants, but they are similar in resolving Gnetales and Pinaceae as sister taxa. Relationships within major seed plant clades are mostly congruent with those found in independent analyses. Branch lengths were estimated using maximum likelihood (HKY85 + {Gamma}), specifying the corresponding ti/tv and {alpha} parameters, with a molecular clock not enforced. Branches subtending angiosperms and Gnetales are very long. Branches leading to terminal taxa of Gnetales and Moniliformopses, and of Pinaceae inferred from first and second positions, are also very long. (C) Maximum likelihood topology for third codon positions. Numbers indicate bootstrap support for branches. Abbreviations for major clades are as in Fig. 3

 
Relationships among major seed plant clades depicted in the two ML trees conflict with one another (Fig. 4A–B). According to the first and second positions ML tree, angiosperms and gymnosperms are sister taxa. The cycads are the sister to all other gymnosperms, and Ginkgo is the sister to conifers (including Gnetales). Gnetales are the sister taxon to Pinaceae and together constitute the sister to a clade that includes all other conifers (Fig. 4A). According to the third positions ML tree, angiosperms and cycads are sister taxa and together form the sister to a clade in which Ginkgo is the sister to the conifers (including Gnetales). Pinaceae and Gnetales are sister taxa and together form the sister to a clade that includes all other conifers (Fig. 4B). The two ML trees also differ in relationships within Moniliformopses and within angiosperms (Fig. 4A–B).

In spite of these profound phylogenetic differences, results shared by both ML trees are the monophyly of seed plants (also resolved in all MP trees), its sister-taxon relation with the Moniliformopses (which includes Equisetum and Psilotum, Fig. 4A–B), and the monophyly of cycads, Gnetales, and angiosperms, but not of conifers. One significant point of congruence between the two ML trees is the linkage of Gnetales and Pinaceae as sister taxa. Additionally, phylogenetic relationships within gymnosperm clades are nearly identical in both ML trees.

Phylogenetic relationships within angiosperm in the first and second positions ML tree are substantially incongruent with relationships obtained in numerous independent phylogenetic analyses (see below; Fig. 4A), and thus, we chose to focus all available computer time on estimating the bootstrap support associated with the ML tree resulting from third positions. The percentages of bootstrap replicates supporting internodes are shown in Fig. 4C, and bootstrap values associated with major clades are listed in Table 4.

Relationships within gymnosperm clades are almost identical in the two ML trees (Fig. 4A–C) and are equivalent to those found in independent studies. Monophyly of cycads and of Gnetales is highly supported in the ML tree for third positions (96% and 100% BS, respectively). Whereas bootstrap support for the non-Pinaceae conifers is very strong (100%), the sister relationship between Pinaceae and Gnetales is very weak (<50% BS), and the conifer-gnetalean clade as a whole is only moderately supported (75% BS). Relationships among non-Pinaceae conifers recognized in several independent studies are highly supported, including, for example, Araucaria plus Podocarpaceae, the "taxoid" clade, and a clade that includes both (82%, 100%, 100% BS, respectively; Fig. 4C). Relationships within gymnosperm clades shared by both ML trees are also present in the strict consensus of MP trees for each of the two codon position partitions (Figs. 3A–B and 4A–B).

Relationships within angiosperms, however, are significantly different in the two ML trees. Whereas within-angiosperm relationships according to first and second codon positions are inconsistent with other results, either from other analyses based on psaA and psbB or from independent data (cf. Fig. 4A vs. Graham and Olmstead, 2000 ; Qiu et al., 1999 ; Soltis et al., 2000 ), phylogenetic relationships within angiosperms in the ML tree for third codon positions are almost entirely congruent with relationships recovered consistently from independent data (e.g., Qiu et al., 1999 , 2000 ; Soltis et al., 1999 , 2000 ; Savolainen et al., 2000 ). These congruent relationships are usually well supported.

Substantial heterogeneity in the length of branches is found in the first and second and third positions ML trees. Most clades subtended by long branches according to the first and second position data also have long branches according to third position data (i.e., seed plants, Gnetales, Pinaceae, angiosperms, and Poaceae; Fig. 4A–B). Long branches leading to terminal taxa, in both codon partitions, are found within Moniliformopses, Gnetales, and to a much lesser extent, Ginkgo. For first and second positions, genera of Pinaceae are also subtended by long branches, but lengths are comparatively shorter based on third positions. Within angiosperms, both data partitions show that internal branches immediately above the root node are very short, but branches leading to terminal taxa are long (Fig. 4A–B). Long branches within angiosperms may likely result, at least to some extent, from sparse taxonomic sampling of extant taxa (see DISCUSSION).


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 LITERATURE CITED
 
Effect of codon position partition and optimization criterion on phylogenetic results
Use of different codon position partitions resulted in profoundly different phylogenetic hypotheses among major clades of seed plants, supporting previous observations (Sanderson et al., 2000 ) that, at least for psaA and psbB, differences between gene partitions are subordinate to differences between codon position partitions. A similar behavior cannot be recurrently documented for other genes because third positions have only rarely been analyzed in isolation (i.e., Lewis, Mishler, and Vilgalys, 1997 ; Källersjö et al., 1998 ). Nevertheless, several other studies suggest a similar effect in rbcL and atpB when third position data are excluded or downweighted or by using amino acid sequences, rather than including all nucleotide sequences (see Table 1 and Soltis, Soltis, and Zanis, 2002 ).

The results of our parsimony analyses show that the signal of third positions overrides that of first and second positions when analyzed in combination (Figs. 2 and 3A–C), an effect that would not be manifested if third positions contained mostly random noise. The effect of excluding third codon positions from parsimony analyses of plants has been documented by previous authors. Topologies resulting from the use of only third codon positions in the rbcL gene are largely in agreement with those found when all the data are used (Lewis, Mishler, and Vilgalys, 1997 ; Källersjö et al., 1998 ) and with independent sources of evidence (Lewis, Mishler, and Vilgalys, 1997 ), and contrary to expectations, the exclusion of third positions resulted in substantial loss of phylogenetic resolution (Källersjö et al., 1998 ). Similar effects were found in phylogenetic studies of vertebrates (Edwards, Arctander, and Wilson, 1991 ; Björklund, 1999 ). One common conclusion of these studies, corroborated by Sanderson et al. (2000) and by the present study, is that while third positions are highly variable, they retain significant phylogenetic information across land plants, including relationships among major clades, and, therefore, should not be dismissed.

Whereas resolution of substantially conflicting phylogenetic hypotheses from different codon position partitions under parsimony has been documented explicitly or implicitly in previous works (e.g., Chaw et al., 2000 ; Sanderson et al., 2000 ), the resolution of profoundly different hypotheses from the use of different optimization criteria applied to the same type of data, that is, third codon positions, is a new result. Soltis, Soltis, and Zanis (2002) found a similar discrepancy in the results of parsimony and maximum likelihood analyses of four chloroplast genes in which only third positions were included. Nevertheless, divergent phylogenetic results were not found when different optimization criteria were applied to first and second positions (Soltis, Soltis, and Zanis, 2002 ).

The phylogenetic differences among major clades of seed plants according to the MP and ML topologies for third positions are considerable; these include the clade identified as the sister taxon to seed plants (Huperzia or Moniliformopses, respectively), the clades resulting from the deepest bifurcation within seed plants, and the placement of Gnetales (Figs. 3B and 4B). The differences between these topologies cannot be reconciled by simple shifts in the placement of the root of seed plants, nor by minor changes among a few branches. A comparison of the parsimony and likelihood scores of these two topologies further underscores their differences: the ML topology for third codon positions of psaA-psbB (Fig. 4B) has a parsimony score of 7328, which is 120 steps longer than the MP trees obtained using the same data (length = 7208; Table 4). The likelihood scores of the two topologies were compared through 5000 RELL replicates of the Shimodaira-Hasegawa test (Shimodaira and Hasegawa, 1999 ; Goldman, Anderson, and Rodrigo, 2000 ), as implemented in PAUP*. The likelihood score of one of the 16 MP trees (tree #3) resulting from the third position data for the concatenated genes was estimated using only third position data via maximum likelihood (HKY85 + {Gamma}, specifying the corresponding ti/tv and {alpha} parameters). Its likelihood score is –ln L = 30 179.88, significantly less likely (P = 0.007*) than the most likely topology obtained using the same data and parameters (–ln L = 30 139.63).

The difference between the ML trees for first and second and for third positions regarding relationships among major clades of seed plants (Fig. 4A–B) lies only in the placement of the root of the seed plants. A shift in the placement of the root to an adjacent branch in the third positions tree would yield a similar topology to the one obtained using first and second positions, at least regarding relationships among major clades of seed plants. Whereas the differences between the two topologies consist only of a shift in the placement of the root, the historical and evolutionary implications of such change are clearly significant. Additionally, differences within clades of seed plants according to the ML trees resulting from first and second and from third positions are substantial in some phylogenetic regions (i.e., within the Moniliformopses and within the angiosperms).

The greater disagreement between results from parsimony and maximum likelihood analysis of third position data compared to first and second position data is not entirely surprising. The third position data have a much higher average rate of substitution and, when coupled with rate heterogeneity among lineages, are more likely to cause reconstruction algorithms to suffer from long-branch attraction (LBA). The higher levels of homoplasy make it difficult for both parsimony and the substitution models used by maximum likelihood to estimate true branch lengths accurately and consequently obtain a correct tree. A slightly incorrect model will be much less misleading at low rates of substitution than at high rates because of the nonlinearity of corrections induced by the model (Zharkikh, 1994 ). Thus, methods that (effectively) assume different models should diverge progressively more at higher substitution rates. This behavior is evident even in simple pairwise distance correction formulae, many of which use maximum likelihood estimators (Zharkikh, 1994 ).

The disagreements between codon partitions occur with a finite amount of data, but they raise the specter of statistical inconsistency, or long-branch attraction, which is an asymptotic failure of a method to converge on the correct tree as more and more data are added. Little is understood about the combinations of tree topologies, branch lengths and substitution patterns that lead to LBA in large phylogenies. Generalizing from results on four-taxon trees, it is probable that both high substitution rates and rate heterogeneity among lineages can cause parsimony to be inconsistent (Felsenstein, 1978 ), and model misspecification with respect to site-to-site rate variation can cause maximum likelihood to be inconsistent (Chang, 1996 ). We specifically incorporated site-to-site rate variation in the models used in maximum likelihood analyses, which ought to lessen the impact of this source of potential error in the maximum likelihood analyses. However, our results reveal substantial rate heterogeneity among lineages (Fig. 4) and very high rates for the third position data, which might well have led to problems in the parsimony analyses. Sanderson et al. (2000) documented LBA in parsimony analyses of third positions of psbB in a lengthy series of simulation studies in a much smaller taxon sample, but scaling these up to the present data sets would incur an onerous computational burden.

Relationships among major clades of seed plants
Results obtained in our analyses, which have also been found repeatedly in independent studies, provide support for the monophyly of seed plants, of the cycads, Gnetales, and the angiosperms and the fact that angiosperms do not appear to be closely related to any single group of living gymnosperms. The ML tree for third positions of psaA-psbB conflicts with this last point, but the obtained sister relationship between angiosperms and cycads is tenuous, supported only by 52% BS (see RESULTS, Fig. 4C). Major conflicts remain, however, regarding the relationships among major lineages of seed plants, including, in particular, the placement of Gnetales, and consequently, the monophyly of conifers.

Relationships of Gnetales and the monophyly of conifers
A result found in several of our analyses, as well as in several independent studies, is the placement of Gnetales in close phylogenetic proximity to the conifers (e.g., Soltis et al., 1999 ; Bowe, Coat, and dePamphilis, 2000 ; Chaw et al., 2000 ; Table 1). The possibility of a close relationship between Gnetales and the (monophyletic) conifers has been discussed in the pre-anthophyte literature (e.g., Coulter and Chamberlain, 1917 ; Bailey, 1944 , 1953 ; Bierhorst, 1971 ). Carlquist (1996) provides a comprehensive summary of anatomical features of Gnetales, many of which are also present in conifers, including the torus-margo structure on perforations of vascular elements, helical thickenings with intercalated bordered pits in metaxylem tracheids, and ultrastructural features of the sieve elements of the phloem (R. F. Evert, cited in Carlquist, 1996 ). Nevertheless, it is unclear whether these features are derived attributes shared by Gnetales and conifers or represent ancestral features shared with other gymnosperms.

Surprisingly, another feature that may possibly document a close relationship between Gnetales and conifers is the process of double fertilization. Double fertilization has been reported, in addition to angiosperms and Gnetales, in two conifers: Abies balsamea (Pinaceae; Hutchison, 1915 ; Friedman and Floyd, 2001 ) and Thuja occidentalis (Cupressaceae; Land, 1902 ; Friedman and Floyd, 2001 ). It seems reasonable to assume that double fertilization as manifested in Ephedra is closer to the plesiomorphic double fertilization process for Gnetales as a whole than the double fertilization in Gnetum (and Welwitschia). This assumption is based on two combined facts: first, the monosporic and archegoniate nature of the female gametophyte of Ephedra (e.g., Bierhorst, 1971 ; Gifford and Foster, 1989 ; Friedman, 1990 , 1992a , b ) vs. the highly modified, tetrasporic female gametophyte of Gnetum (and Welwitschia; e.g., Carmichael and Friedman, 1995 ; Friedman and Carmichael, 1998 , and references therein), and second, the recurrent placement of Ephedra as the sister taxon of a clade formed by Gnetum and Welwitschia (e.g., Chaw et al., 2000 ; Soltis et al., 2000 ; this study). Among plants in which double fertilization has been reported, the process in Ephedra is most similar to the one reported in two conifers. In these three genera, the two sperm cells (or sperm nuclei, in Ephedra) fertilize the egg cell (or egg nucleus, in Ephedra) and its mitotic sister, the ventral canal cell (or ventral canal nucleus, in Ephedra), inside the archegonium (Friedman, 1990 , 1992a , b , and references therein). In contrast, the female participants in the double fertilization process in Gnetum (and apparently also in Welwitschia) may not only not be mitotic sisters, but, given the tetrasporic origin of the megagametophyte, may be derived from different meiotic products (Carmichael and Friedman, 1995 , 1996 ; Friedman and Carmichael, 1996 , 1998 , and references therein). In angiosperms, with their highly modified female gametophyte (i.e., the embryo sac), the female participants in double fertilization (in a Polygonum-type embryo sac) are the egg cell and the two polar nuclei, one of which is the mitotic sister to the egg cell (Thomas, 1907 ; Brink and Cooper, 1947 ; Huang and Russell, 1992 ). Whereas the megagametophytes, and thus, the processes of double fertilization, in Gnetum (and in Welwitschia) and in angiosperms are probably divergently modified from conditions found in other seed plants, it is unknown if the similarities in double fertilization between Ephedra and the two conifers are due to unique derivation from shared ancestry or rather to the manifestation of double fertilization in an ancestral, archegoniate megagametophyte. Clearly, the frequency of double fertilization in conifers, its similarities with double fertilization in Ephedra, and detection of possible synapomorphies between Ephedra and conifers should be the subject of extensive and detailed investigation.

While a phylogenetic proximity between Gnetales and conifers seems quite plausible, the placement of Gnetales within conifers, which implies conifer paraphyly, seems unlikely from several important standpoints. Two especially important characters that support conifer monophyly are the structure of the chloroplast genome and morphological features of the ovuliferous cones.

The chloroplast DNA in most land plants contains two copies of a large inverted repeat of about 10–25 kilobase pairs (kbp), separated by two single-copy sequences of about 20 kbp and 80 kbp (Palmer and Stein, 1986 ). In an investigation of the structure of the chloroplast genome across land plants, Raubeson and Jansen (1992) documented an important modification shared by all conifers, consisting of the presence of a single copy of the inverted repeat, whereas all other sampled land plants, including the three genera of Gnetales, have two copies. Maps of the entire chloroplast region were consistent with the interpretation that the missing copy of the inverted repeat is the same, suggesting a single loss that characterizes all conifers (Raubeson and Jansen, 1992 ). The absence of the inverted repeat in some legumes is interpreted as an independent loss (Raubeson and Jansen, 1992 ). Although it may be possible that the distribution of a single copy of the inverted repeat among the conifers and Gnetales may result from homoplasy between Pinaceae and all other conifers, or from a gain of the lost copy in Gnetales, the fact that homoplasy in structural characters of the genome is less frequent than among sequence data supports the monophyly of conifers.

From a morphological standpoint, one feature that suggests conifer monophyly is their elaborate compound ovuliferous cone. The female reproductive structures of most living conifers are compound cones consisting of a main axis bearing sterile bracts with an ovuliferous scale on their axil and one to many ovules associated with the ovuliferous scales. Each ovuliferous scale corresponds to a highly modified fertile short shoot (i.e., a brachiblast, e.g., Florin, 1951 ; Clement-Westerhof, 1988 ; Mapes and Rothwell, 1991 ). The Gnetales also have compound ovuliferous cones. Although the exact nature of the structures between the axil of the primary bract and the gnetalean ovule are not entirely clear, they seem to correspond more to a conventional, though extremely reduced, brachiblast that bears successive pairs of opposite and decussate bracts that envelope the ovule than to the highly modified ovuliferous scale of the conifers. The placement of Gnetales within the conifers implies either an independent and convergent modification from fertile brachiblasts to yield ovuliferous cone scales in Pinaceae and in the lineage leading to all other conifers, or the deconstruction of the ovuliferous scale into a more conventional axis-like structure in the line leading to Gnetales. Conifer taxa that lack the axis-bract-ovuliferous scale organization (i.e., Podocarpaceae, Phyllocladaceae, and Taxaceae) usually display conditions that can be traced back to the ovuliferous cone organization widespread among conifers. Whereas the ovuliferous scale may be highly modified, highly reduced, or perhaps lost in some conifers, it is not known to have regressed to a conventional axial organization.

Summary and concluding remarks
In this study, we conducted analyses on a comprehensive sample of land plants, based on different codon partitions of psaA and psbB, to investigate phylogenetic relationships among major clades of seed plants. Results of parsimony analyses were mostly congruent with parsimony results based on the same genes for a smaller taxonomic sample (Sanderson et al., 2000 ), as well as results based on other chloroplast genes (e.g., Bowe, Coat, and dePamphilis, 2000 ; Chaw et al., 2000 ). As in previous studies, we found great incongruence in parsimony hypotheses of relationships among major clades of seed plants inferred from different codon partitions. Exhaustive maximum likelihood analyses also revealed conflicting phylogenetic results stemming from the two different partitions, but while the results from first and second codon positions are similar to the first and second position parsimony results, the third position maximum likelihood topology is different from the parsimony topology obtained using the same data. However, the third position maximum likelihood topology is partially similar with topologies obtained from first and second positions.

The conflicting results obtained in this study, as well as in other works (summarized in the introduction), indicate that an unambiguously supported hypothesis of phylogenetic relationships among seed plants has not yet been obtained. Lingering problems are the precise placement of Gnetales and the question of gymnosperm monophyly. In sharp contrast, however, there has been considerable improvement in the resolution of phylogenetic relationships within major clades of seed plants. The consistency of detected relationships within clades usually spans not only different phylogenetic analyses performed in this study, but also results of independent studies, based on different genes.

Resolving phylogenetic relationships among extant seed plants has proven to be an extraordinarily difficult problem, further complicated by the substantial loss of seed plant diversity to extinction and the extremely long time during which the surviving lineages have been evolving independently. During this long period, lineages have probably accumulated convergent molecular character states and evolved uniquely derived morphological attributes, the homologies of which are difficult to trace. Whereas recent research has documented the relevance of the type and treatment of data and different methods of analysis, it appears that a definitive solution to the problem is unlikely to stem from analysis of more sequence data and/or from greater taxon sampling alone. We have shown phylogenetic incongruence resulting from different treatments of the data and believe that, in the case of relationships among major clades of seed plants, adding more genes into phylogenetic analyses may simply provide greater support for conflicting results. Adding more taxa has proven useful in solving particular phylogenetic problems because critically selected taxa may effectively break the long branches. Adding a larger number of taxa is probably one of the reasons why a very substantial improvement in resolution of relationships within angiosperms has been achieved. Nevertheless, sorting relationships among seed plants is a very different problem because, at least comparatively, most angiosperm diversity is living, whereas most seed plant diversity is extinct. Avenues of research that may prove useful include, in addition to the use of different types of molecular data, information from the structure of the genome and a renewed consideration of morphological data for living and fossil seed plants.


View this table:
[in this window]
[in a new window]
 
Table 1. Continued

 

    FOOTNOTES