|
|
||||||||
Systematics |
University of Illinois at Chicago, Department of Biological Sciences, MC 066, 845 West Taylor Street, Chicago, Illinois 60607 USA
Received for publication August 16, 2004. Accepted for publication March 21, 2005.
ABSTRACT
There are two forms of ß-amylase in the Triticeae crop plants wheat, barley, and rye: an endosperm-specific form encoded by two or three closely linked genes, and a tissue-ubiquitous form encoded by a single gene. Both rice and corn have one ubiquitously expressed form encoded by a single gene. This study focuses on two phylogenetic analyses of ß-amylase gene sequences. First, a phylogenetic analysis of coding sequences from wheat, barley, rye, rice, and corn was expected to clarify the relationship between the endosperm-specific and tissue-ubiquitous forms of the protein. Instead, it illustrates possible effects of distant outgroups, based on conflicting patterns of character state variation consistent with different root positions. Next, a broad sample of the monogenomic Triticeae was included in a phylogenetic analysis based on sequences from a portion of the tissue-ubiquitous ß-amylase gene. The results were compared to existing Triticeae gene trees, among which extensive conflict had been noted in the past. One additional gene tree has not completely clarified the complexity of the group, but has shed additional light on reticulate phylogenetic patterns within the tribe, including relationships involving Eremopyrum, Thinopyrum, and the Triticum/Aegilops group.
Key Words: ß-amylase character conflict Gramineae outgroups phylogeny Poaceae tree comparisons Triticeae
In grasses, ß-amylase (1,4-
-glucan maltohydrase; E.C. 3.2.1.2) has been characterized from five major cereal crops: wheat, barley, rye, rice, and corn. The representatives of the Triticeae (wheat, barley, and rye) have two distinct forms of ß-amylase, which differ in their expression patterns: one form is specific to the endosperm, while the other has a tissue-ubiquitous pattern of expression (e.g., Ziegler, 1999
). The endosperm-specific form has been most extensively studied in barley because of its importance to the brewing industry: of the four enzymes that contribute to the diastatic power of barley malt (Dunn, 1974
), i.e., its ability to convert starch into fermentable sugars, ß-amylase is considered to be the principle contributor (Gibson et al., 1995
). The endosperm-specific enzyme is characterized by a glycine-rich region at the 3' end that is subject to extensive post-translation modification (Lundgard and Svensson, 1987
; Bureau et al., 1989
). In barley, it is encoded by two to three closely linked genes on chromosome 4 (Kreis et al., 1987
; Li et al., 2002
).
The ubiquitously expressed form of ß-amylase lacks the 3' glycine-rich tail seen on the endosperm-specific form. Wheat, barley, and rye all express a tissue-ubiquitous form of ß-amylase that appears, based on studies in barley, to be encoded by a single gene on chromosome 2 (Kreis et al., 1987
; Li et al., 2002
). A second, highly divergent and possibly paralogous gene is transcribed ubiquitously in wheat tissues, but a corresponding protein has not been detected (Wagner et al., 1999
). Unlike wheat, barley, and rye, both rice and corn have just a single, ubiquitously expressed form of ß-amylase, which appears to be encoded by a single gene in corn (Wang et al., 1997
) and rice (Yamaguchi et al., 1999
).
The genes for ß-amylase have not been extensively explored as phylogenetic markers, although a few papers have discussed broader-scale relationships among grass ß-amylase sequences (e.g., Daussant et al., 1994
; Wang et al., 1997
; Ziegler, 1999
), and a portion of the gene has been used in a lower-level phylogenetic analysis of Ipomoea series Batatas (Rajapakse et al., 2004
). The many advantages of using single- and low-copy nuclear genes for plant phylogenetic studies, in addition to the more commonly used chloroplast DNA and highly repetitive nuclear genes, have recently been reviewed (Small et al., 2004
). In the case of the wheat tribe, Triticeae Dumort., the reconstruction of its complex reticulate history requires data from several genes, representing different portions of the genome.
In the past 10 years, the Triticeae have been the focus of numerous molecular phylogenetic analyses based on data from chloroplast DNA markers, high-copy nuclear genes, and low- or single-copy nuclear genes. Some of the earlier data sets (Hsiao et al., 1995b
; Kellogg and Appels, 1995
; Mason-Gamer and Kellogg, 1996b
) showed appreciable conflict with one another (Mason-Gamer and Kellogg, 1996a
), a possible result of past hybridization among the genera, lineage sorting of ancestral variation, introgression involving polyploid intermediates, or a combination of these (Kellogg et al., 1996
). Although the phylogenetic relationships among Triticeae genera are not consistent with a single bifurcating tree, it should be possible to clarify reticulate patterns with the use of multiple gene trees, based on molecular markers from throughout the genome.
The goals of this paper are twofold. The first is to examine the relationships among the endosperm-specific and tissue-ubiquitous forms of grass ß-amylase using coding sequences from Genbank. This analysis will be used to demonstrate a case of character conflict within the data set that simultaneously supports two alternative hypotheses about the placement of the root of the tree. The second goal is to present a ß-amylase gene tree for the monogenomic Triticeae, using partial sequences of the single-copy, ubiquitously expressed ß-amylase gene. This will be followed by a discussion of other published phylogenetic analyses of the Triticeae, with emphasis on how they compare to selected groups on the ß-amylase gene tree.
MATERIALS AND METHODS
Relationships among Poaceae sequences
Selected ß-amylase coding sequences from Genbank were analyzed with the goal of clarifying the relationships among the genes encoding endosperm and ubiquitous ß-amylase in grasses. The analysis included sequences encoding endosperm and ubiquitous ß-amylase from Hordeum vulgare L., Secale cereale L., and Triticum aestivum L.; the single, ubiquitously expressed proteins from Zea mays L. and Oryza sativa L.; and as outgroups, sequences from the dicotyledons Castanea crenata Sieb. & Zucc., Ipomoea batatas L. Lam., and Trifolium repens L. (Table 1).
|
The data set was tested for nucleotide stationarity using PAUP* 4.0b10 and was shown to deviate significantly from stationarity (P < 0.0001), violating an assumption that underlies the use of maximum likelihood (ML) models that incorporate nucleotide frequencies. Three sequences were removed to achieve stationarity (P = 0.976): Castanea, Trifolium, and the highly divergent sequence from Triticum. Maximum-likelihood analyses were carried out on the remaining taxa using a general-time reversible (GTR) model of sequence evolution (Rodríguez et al., 1990
), with some proportion of sites assumed to be invariable (I; Hasegawa et al., 1985
), and variation among the remaining sites assumed to follow a gamma (
) distribution (Yang, 1993
; Gu et al., 1995
; Waddell and Penny, 1996
) with shape parameter
. Parameters for the tree search under the GTR + I +
model were first estimated on the shortest MP trees using maximum likelihood and were fixed for an initial ML tree search. Parameters were further optimized using a successive approximations approach (e.g., Sullivan et al., 1996
; Swofford et al., 1996
; Frati et al., 1997
), in which parameters are re-estimated on the resulting ML tree, fixed for a new search, and re-estimated on the resulting tree, until the same tree is found in two successive searches. Recent empirical tests indicate that this approach is robust to starting tree topology and suggest that it should, under most circumstances, yield results indistinguishable from those obtained using full ML optimization searches (Sullivan et al., in press
). For this data set, the second ML tree had the same score as the first. Estimated model parameters based on this tree were: nucleotide frequencies A = 0.26706, C = 0.24645, G = 0.27549, T = 0.21101; relative nucleotide substitution rates AC = 1.66349, AG = 3.90720, AT = 1.05719, CG = 2.17106, CT = 5.83171, GT = 1.00000; I = 0.2488; and
= 1.3476. Support for this tree was estimated using 1000 heuristic ML bootstrap replicates, under the GTR + I +
model, with the same parameters as above. Bayesian posterior probability values were obtained using MrBayes version 3.0 (Huelsenbeck and Ronquist, 2001
). Markov chain Monte Carlo analyses were run with random starting trees and four simultaneous chains, one cold and three incrementally heated. Analyses were run for 5 000 000 generations, with flat prior distributions and with a burn-in of 100 000 generations.
Two a priori hypotheses of relationships among the grass ß-amylase sequences (Fig. 1A and B), which differ in the placement of the root of the tree, were compared using an MP-based Wilcoxon signed-ranks (WSR; Templeton, 1983
) test and ML-based Kishino-Hasegawa (KH; Kishino and Hasegawa, 1989
) and Shimodaira-Hasegawa (SH; Shimodaira and Hasegawa, 1999
) tests. Hypothesis 1 (Fig. 1A) is that the grass sequences group by expression pattern, with the sequences encoding the Triticeae endosperm sequences in one clade and the ubiquitously expressed sequences from corn, rice, and the Triticeae in another. This was suggested by an earlier distance-based analysis of several grass and dicotyledon ß-amylase sequences (Wang et al., 1997
) and would imply that the gene duplication leading to the dichotomy between the ubiquitously expressed and endosperm-specific forms of the enzyme arose relatively early during grass evolution (or before the origin of grasses), and that the ortholog corresponding to the endosperm-specific form in the Triticeae was lost from rice and corn. Hypothesis 2 (Fig. 1B) is that all of the Triticeae sequences, endosperm-specific and tissue-ubiquitous, form a single clade relative to rice, corn, and the dicotyledonous outgroups, i.e., that the Triticeae endosperm-specific sequences arose via a later gene duplication closer to the origin of the Triticeae (Daussant et al., 1994
; Ziegler, 1999
). This would require no subsequent losses in rice and corn.
|
Phylogenetic analysis of the monogenomic Triticeae
Phylogenetic analyses were done on a broad sample of monogenomic, mostly diploid members of the Triticeae (Table 2). The phylogenetic estimate is based on a 1400-base pair (bp) portion of the ubiquitously expressed ß-amylase gene (Fig. 2), amplified using a forward primer in exon 2 and a backward primer in exon 5 (Fig. 2, Table 3). Amplification reactions were carried out in a 10-µL volume using 0.5 units Taq DNA polymerase (Invitrogen, Carlsbad, California, USA), a 1x concentration of the included Taq buffer, 15 nmol MgCl2, 2 nmol of each nucleotide, and 10 pmol of each primer. Amplification products were cloned into Promega pGem-T Easy vectors (Promega, Madison, Wisconsin, USA) using the Promega PCR cloning kit according to instructions, except that reaction volumes were halved. Cloned products were amplified directly from colonies in 40-µL PCR reactions with 0.5 unit Taq DNA polymerase, a 1x concentration of Taq buffer, 60 nmol MgCl2, 8 nmol of each nucleotide, and 40 pmol of each primer. Amplified clones were cleaned using 1 unit shrimp alkaline phosphatase (USB, Cleveland, Ohio, USA) and 5 unit exonuclease I (USB, Cleveland, Ohio, USA); the mixture was heated to 37°C for 15 min to allow the reactions to occur and 75°C for 15 min to denature the enzymes.
|
|
|
Tree searches were done using ML under the GTR + I +
model of sequence evolution. First, because of the size of the data set and the time required to run analyses under this relatively complex model, the GTR + I +
model was compared to 15 other models (e.g., Swofford et al., 1996
; Frati et al., 1997
; Sullivan et al., 1997) to determine whether a simpler, less computationally intensive model would be sufficient. Four nucleotide substitution models were examined: Jukes-Cantor (Jukes and Cantor, 1969
), Kimura two-parameter (Kimura, 1980
), Hasegawa-Kishino-Yano (HKY; Hasegawa et al., 1985
), and GTR (Rodríguez et al., 1990
). Each substitution model was paired with each of four models of among-site rate variation: (1) no rate heterogeneity; (2) some sites invariable (I; Hasegawa et al., 1985
) with equal rates of change among the remaining sites; (3) rate heterogeneity among sites following a gamma distribution (
; Yang, 1993
); and (4) some sites invariable, with gamma-distributed variation among the remaining sites (I +
; Gu et al., 1995
; Waddell and Penny, 1996
). Models were compared based on ML scores estimated for all 14 trees obtained in an equally weighted MP analysis. The best score (GTR + I +
) was compared to the two next-best scores (HKY + I +
and GTR +
) using a likelihood ratio test (Felsenstein, 1981
; Huelsenbeck and Crandall, 1997
; Huelsenbeck and Rannala, 1997
; Sanderson, 1998
), which showed the GTR + I +
model to have the best fit to the data (P < 0.05, Bonferroni-corrected for two non-independent comparisons).
The GTR + I +
parameters estimated on the best MP tree were fixed for an ML tree search, with the starting tree obtained by stepwise addition with 10 trees held at each step. Further optimization was done using a successive approximations approach as described for the exon data set. The second ML tree topology was identical to the first and was used as the phylogenetic estimate for the tribe. The estimated model parameters were: nucleotide frequencies A = 0.31903, C = 0.21553, G = 0.20887, T = 0.25657; relative substitution rates AC = 1.35197, AG = 4.37860, AT = 1.26551, CG = 2.42244, CT = 5.44094, GT = 1.00000; I = 0.3063; and
= 1.8357. Support for the tree was estimated using 100 ML bootstrap replicates under the GTR + I +
model, with the same fixed model parameters as were estimated from the tree. Bayesian posterior probability values were obtained using MrBayes version 3.0 (Huelsenbeck and Ronquist, 2001
) following the same procedure as used for the exon sequences.
RESULTS
Relationships among Poaceae sequences
The MP analysis (Fig. 3A) resulted in three trees of length 1661, with consistency index excluding uninformative characters of 0.658, retention index of 0.706, and rescaled consistency index of 0.505. The grass sequences form a monophyletic group relative to the three dicotyledon sequences. The strongly divergent, ubiquitously transcribed wheat sequence (Y16242) falls at the base of the grasses, well outside of all of the other sequences from the Triticeae. The ML tree (Fig. 3B), which excludes Castanea, Trifolium, and the divergent Triticum sequence, is consistent with the MP tree. On both trees, the Triticeae endosperm-specific and tissue-ubiquitous sequences form a well-supported clade relative to the tissue-ubiquitous sequences of Oryza and Zea.
|
Because character state variation among the outgroups could change the interpretation of character state distribution within the grasses, WSR tests were rerun on the nucleotide data with each of the other outgroups and with all three. With Castanea as the outgroup, the tree constrained by Hypothesis 1 required 952 steps, 13 steps more than the tree constrained by Hypothesis 2. There were 77 characters that differed in the number of changes on the two trees, with 45 requiring more steps on the tissue-ubiquitous monophyly constraint, and 32 requiring more on the Triticeae monophyly constraint. The difference in length was not significant (P = 0.139; N = 77; TS = 1248). With Trifolium as the outgroup, trees constrained by Hypothesis 1 required just five steps more than those constrained by Hypothesis 2 (948 vs. 943 steps). Of the 83 characters that differed in number of changes, 44 required more steps on the tissue-ubiquitous monophyly constraint, and 39 required more on the Triticeae monophyly constraint. The difference in tree length was not significant (P = 0.583; N = 83; TS = 1638). Finally, with all three outgroups, the tree constrained by Hypothesis 1 was 16 steps longer than the tree constrained by Hypothesis 2 (1394 vs. 1378 steps), with 81 characters differing in number of changes on the two trees. Of these, 48 required more steps on the tissue-ubiquitous monophyly constraint, while 33 required more steps on the Triticeae monophyly constraint. The 16-step increase in length was not significant (P = 0.081; N = 81; TS = 1336.5).
The inability of the ß-amylase exon characters to distinguish between the two hypotheses is further illustrated by the results of both the SH and KH tests (P = 0.075 and P = 0.152, respectively). Furthermore, comparisons of likelihood scores of each character on both trees showed a pattern of character conflict within the data set similar to that seen in the WSR test; some characters have better scores in analyses constrained by Triticeae monophyly (Fig. 4, left side), while others have better scores when constrained by tissue-ubiquitous monophyly (Fig. 4, right side).
|
95%) posterior probability, but very low (<50%) ML bootstrap values. (2) Eremopyrum is polyphyletic, with E. distans and E. orientale grouped with Agropyron cristatum and E. bonaepartis with Henrardia. (3) Thinopyrum bessarabicum and Thinopyrum elongatum, representing the only two diploid species within an evolutionarily complex genus, do not form a clade. Thinopyrum elongatum is part of a well-supported clade with Aegilops, Triticum, Crithopsis, and Taeniatherum, while Thinopyrum bessarabicum falls outside of that clade. (4) Sequences from the Triticum/Aegilops (wild wheat) group are not monophyletic. Most fall within a well-supported clade with Thinopyrum elongatum, Crithopsis and Taeniatherum. Within that clade, sequences from A. bicornis, A. caudata, and A. tauschii form a weak clade with Thinopyrum elongatum, while Aegilops comosa, Triticum monococcum, and T. baeoticum are in a well-supported clade with Taeniatherum. Aegilops uniaristata and a second sequence type from A. comosa form a clade that is distant from the other Triticum/Aegilops sequences. An additional round of PCR and cloning of ß-amylase from Aegilops failed to yield the more common Aegilops sequence type from A. uniaristata or the less common type from A. bicornis, A. caudata, or A. tauschii.
|
95%) posterior probability values, but low (<50%) ML bootstrap support; most of these will not be discussed in detail. DISCUSSION
Relationships among Poaceae sequences
Triticum is unique (so far) among grasses in having two very distinct ubiquitously transcribed ß-amylase genes. The function of one of the two genes is not clear, because its protein product has not been detected (Wagner et al., 1999
). Based on its high level of amino acid sequence divergence relative to the more typical Triticum ubiquitous form (Wagner et al., 1999
) and its basal placement relative to other grass sequences in the present analysis, it appears to represent a distinct lineage. However, until the exact function (if any) of this gene is elucidated and/ or its ortholog is detected in additional grass species, the details of its evolutionary history and significance will not be clear.
Among the remaining grass ß-amylase sequences, one hypothesis of relationships, derived from a dendrogram of ß-amylase amino acid sequences (Wang et al., 1997
), suggests that the ubiquitous and endosperm-specific sequences form separate evolutionary lineages. This would be consistent with a gene duplication early in the evolutionary history of grasses, before the deep divergence that separates the Panicoideae (which includes Zea) from the Ehrhartoideae and Pooideae (which include Oryza and the Triticeae, respectively). This scenario would require the loss of the endosperm-specific ortholog from rice and corn. In fact, given the likely relationship among the three subfamilies [(Panicoideae, (Ehrhartoideae, Pooideae)); GPWG, 2001
], two independent losses would be needed to explain the pattern. A second hypothesis is that the two forms in the Triticeae arose via a more recent gene duplication, closer in time to the origin of the tribe (Daussant et al., 1994
; Ziegler, 1999
), in which case all of the Triticeae sequences would form a monophyletic group relative to the rice and corn sequences. This is more intuitively appealing because it does not require the subsequent loss from rice and corn. Both the MP and ML trees are more consistent with the latter hypothesis.
Although the nucleotide data are more consistent with the second hypothesis above, they are unable to reject the first hypothesis in WSR, KH, or SH tests. The patterns of character state variation, apparent in both MP and ML analyses of the fit of the characters to the two hypotheses, clearly illustrate ambiguous signal with regard to the two hypotheses. Because the difference between the two hypotheses involves the placement of the root of the tree, it may be that the observed ambiguity results from the use of a too-distant outgroup. Enough time may have passed since the monocotdicot divergence such that informative signal in the dicot outgroup sequences is effectively random relative to grasses. Too-distant outgroups can be problematic if enough time has passed that phylogenetic signal is obliterated by homoplasious character states shared by the long branch leading to the outgroup and any nonbasal ingroup lineage (Miyamoto and Boyle, 1989
; Wheeler, 1990
), leading to the topological phenomenon of long branch attraction (e.g., Felsenstein, 1978
; Hendy and Penny, 1989
; Kim, 1996
; Huelsenbeck, 1998
). Topological ambiguities due to distant outgroups have been explored in several studies of plants (e.g., Qiu and Palmer, 1999
; Qiu et al., 2001
; Graham et al., 2002
; Xiang et al., 2002
). In the present case, the topology is the same for each outgroup or combination of outgroups, and, unlike some of the previous studies, the support for the placement of the root is relatively well supported. It is the pattern of character conflict that, in this case, reveals a potential problem with the outgroup. Character conflict is demonstrated on a broad scale with the WSR, SH, and KH tests, and in more detail with MP and ML character scores across the two main competing hypotheses. Thus, the unavailability of an appropriate ß-amylase outgroup sequence has confounded the initial goal of clarifying the relationship between the tissue-ubiquitous and endosperm-specific ß-amylases in grasses. The data do, however, provide a convincing demonstration of character-level conflict in a situation where it might easily have been overlooked: where a tree with moderate to high bootstrap support and high posterior probability is in agreement with an intuitively reasonable hypothesis.
Phylogenetic analysis of the monogenomic Triticeae
A detailed description of relationships among Triticeae genera has been confounded for two major reasons. First, in some of the studies that sample widely throughout the tribe, the relationships among the genera are unresolved or poorly supported. Second, even in cases where well-supported intergeneric relationships have been recovered, there are numerous conflicts among published phylogenetic trees. Because the conflicts involve many taxa, and because there are no data sets in total agreement, it is difficult to develop even a simplified core phylogeny by removing problematic taxa or by disregarding one or two especially odd data sets. For some genera, the combination of conflict and poor resolution means that no strong conclusions can be drawn about their relationships. Other relationships, with the addition of more gene trees, have appeared repeatedly. In these cases, we can begin to draw phylogenetic conclusions about certain taxa or at least portions of their genomes.
The discussion that follows is not an attempt to compare all published Triticeae trees in terms of the placement of every taxon. Instead, the intent is to use clades from the ß-amylase tree as examples to illustrate varying amounts of conflict when multiple trees are considered. The trees used for comparison with the ß-amylase tree were based on (1) combined analysis of cpDNA data from the trnT, trnL, and trnF noncoding spacers, the RNA polymerase
-subunit (rpoA) gene, and restriction sites (Mason-Gamer et al., 2002
), or individual analyses of restriction sites (Mason-Gamer and Kellogg, 1996b
) and/or rpoA (Petersen and Seberg, 1997
); (2) integrated and/or individual analyses of three highly repetitive nuclear DNA loci (Kellogg et al., 1996
), including two 5S rDNA spacer loci (long spacers and short spacers; Kellogg and Appels, 1995
) and the internal transcribed spacer of the nuclear DNA repeat (ITS; Hsiao et al., 1995b
); (3) sequences from a single-copy nuclear granule-bound starch synthase I gene (GBSSI; Mason-Gamer, 2001
); (4) sequences of the single-copy disrupted meiotic cDNA gene (DMC1; Petersen and Seberg, 2002
); (5) sequences from one member of the small phosphoenolpyruvate carboxylase gene family (PEPC; Helfgott and Mason-Gamer, 2004
); and (6) a comprehensive morphological analysis (Seberg and Frederiksen, 2001
). The sampling in the different studies is not identical, so discussions of some ß-amylase clades do not refer to each previous study.
Close relationships among Secale, Australopyrum, and Dasypyrum
This clade, though well-supported on the ß-amylase tree, is not recovered on any other published Triticeae trees. The placement of each of the three taxa on the cpDNA (Mason-Gamer et al., 2002
), integrated highly repetitive gene (Kellogg et al., 1996
), GBSSI (Mason-Gamer, 2001
), DMC1 (Petersen and Seberg, 2002
), and PEPC (Helfgott and Mason-Gamer, 2004
) trees is summarized in Table 4. In many cases, the support for differing placements is very low and thus may not represent meaningful conflict. However, it is notable that, regardless of the level of support, no other molecular data sets support a Secale + Australopyrum + Dasypyrum clade; in general, there is little agreement with regard to the placement of any of these taxa. Furthermore, the morphological data (Seberg and Frederiksen, 2001
) are not in full agreement with any of the molecular trees: Secale is sister to a Dasypyrum + Triticum clade, while Australopyrum forms a paraphyletic grade at the base of a large clade containing over half of the remaining taxa. This group of genera illustrates the most complicated kind of scenario within the Triticeae. On the few trees on which genera are resolved with reasonable support, their positions conflict. Other trees are poorly supported with regard to these three genera and thus provide no clues to help interpret the conflict. While Secale, Australopyrum, and Dasypyrum remain frustratingly intractable, the remaining discussion will focus on taxa whose evolutionary history has been clarified by the acquisition of multiple, though sometimes conflicting, gene trees.
|
|
Polyphyletic placement of Eremopyrum with Agropyron and Henrardia
Eremopyrum is a small genus of four species, including two diploids (E. triticeum and E. distans), one tetraploid (E. orientale), and one species with both diploid and tetraploid cytotypes (E. bonaepartis) (Frederiksen, 1991
). A dual placement of Eremopyrum with Agropyron and with Henrardia, as on the ß-amylase tree, has also been supported by other data sets (Table 5). The accession of E. bonaepartis that groups with Henrardia on the ß-amylase tree (accession H5554, tetraploid) was grouped with E. distans, E. orientale, and Agropyron on the chloroplast DNA tree, while a different accession of E. bonaepartis (H5569, diploid) was strongly grouped with Henrardia (Mason-Gamer et al., 2002
). In the GBSSI tree, only one accession of E. bonaepartis (H5554) was included and was placed in a weak clade with E. distans, E. orientale, and Agropyron (Mason-Gamer, 2001
). On the ITS tree, on the other hand, E. bonaepartis, the sole representative of Eremopyrum on that tree (and a different accession altogether), was grouped with Henrardia (Hsiao et al., 1995b
). The DMC1 (Petersen and Seberg, 2002
) and PEPC (Helfgott and Mason-Gamer, 2004
) data sets both yield well-supported Eremopyrum + Agropyron clades; neither analysis included E. bonaepartis.
When taken together, the molecular data appear to support two distinct Eremopyrum relationships, and in fact, nonmolecular data have suggested both before. A close relationship between Eremopyrum and Agropyron seems reasonable on morphological grounds; Eremopyrum has even been described as looking "like an annual crested wheatgrass [Agropyron]" (Barkworth, 1998
). The relationship was suggested more formally based on overall morphological similarity (Clayton and Renvoize, 1986
) and on specific features such as, for example, one-keeled glumes (Frederiksen, 1991
) and caryopsis morphology (Terrell and Peterson, 1993
). Chromosomal features, on the other hand, point to a relationship between Eremopyrum and Henrardia. Even though these two genera are morphologically dissimilar, both are unusual within the Triticeae for their predominantly telocentric or subtelocentric chromosomes (Frederiksen, 1991
). Given the results from the molecular analyses, the apparently disparate relationships suggested by the morphological vs. the cytological data may reflect a complex origin of Eremopyrum, involving both Agropyron and Henrardia. Broader sampling from within the genus would help to define this potentially interesting pattern more clearly and might also clarify what, if any, role polyploidy plays in its apparently reticulate history.
Nonmonophyly of Triticum/Aegilops (the wild wheats) and their close relatives
The Triticum/Aegilops group includes some of the most familiar members of the tribe, including the polyploid cultivated wheats and their diploid progenitors. Recent taxonomic treatments of the wild wheats (reviewed in van Slageren, 1994
) have been varied, with the number of proposed diploid genera ranging from one to eight. This study follows van Slageren (1994)
, who placed the A-genome diploids in Triticum and the remaining diploid species in Aegilops except for A. tripsacoides Jaub. & Spach (not included here), which was recognized as Amblyopyrum muticum (Boiss.) Eig. The most common points of disagreement among the recent phylogenetic data sets with regard to Triticum and Aegilops sensu van Slageren are (1) whether or not the two genera form a single monophyletic group and (2) the monophyly vs. paraphyly of Aegilops relative to other Triticeae genera. On the ß-amylase tree, most of the Triticum and Aegilops sequences are found together, but they form a paraphyletic group with Crithopsis, Taeniatherum, and Thinopyrum. Similarly, Triticum is grouped with Aegilops on the cpDNA (Mason-Gamer and Kellogg, 1996b
; Petersen and Seberg, 1997
; Mason-Gamer et al., 2002
), 5S rDNA short spacer (Kellogg and Appels, 1995
), GBSSI (Mason-Gamer, 2001
), DMC1 (Petersen and Seberg, 2002
), and PEPC (Helfgott and Mason-Gamer, 2004
) trees, but on most of them, as on the ß-amylase tree, the Triticum/Aegilops clade includes additional genera (discussed in more detail later). Thus, a merger of Triticum and Aegilops based on phylogeny would require an expanded definition of the group. Furthermore, there are a few trees, including the 5S long spacer (Kellogg and Appels, 1995
) and morphology-based (Seberg and Frederiksen, 2001
) trees, on which Triticum and Aegilops are not closely related, suggesting that portions of their genomes are evolutionarily distinct.
One unique feature of the ß-amylase tree is the polyphyletic placement of two Aegilops sequences. One of these, A. comosa, also has a sequence in the main Triticum/Aegilops group, while only the more unusual sequence has been recovered from A. uniaristata. Introgression appears to be a more likely explanation for this pattern than gene duplication, since a duplication within Aegilops would yield a closely related sequence. A duplication early enough in the history of the tribe to explain this pattern would be evident in other taxa as well. With the current sampling, however, the identity of the donor of the outlying Aegilops sequences remains unknown.
Placement of Taeniatherum within the Triticum/Aegilops group
As discussed earlier, many molecular trees suggest that the Triticum/Aegilops group is paraphyletic. Several genera appear repeatedly within this group, including Taeniatherum, Crithopsis, and Thinopyrum, and for each of these, there is independent evidence both for and against their close relationship to Triticum/Aegilops (Table 5). On the ß-amylase tree, four Taeniatherum caput-medusae individuals are nested within the Triticum/Aegilops group. The close relationship between Taeniatherum and the Triticum/Aegilops group was also demonstrated by the cpDNA (Mason-Gamer et al., 2002
), integrated highly repetitive gene (Kellogg et al., 1996
), and PEPC (Helfgott and Mason-Gamer, 2004
) trees. However, other Triticeae trees do not support this relationship, including the GBSSI (Mason-Gamer, 2001
) and DMC1 (Petersen and Seberg, 2002
) trees, neither of which place Taeniatherum in or near Triticum or Aegilops.
A possible close relationship between Triticum/Aegilops and Taeniatherum, a small genus with a single polytypic species (Frederiksen, 1986
), is at odds with the morphological analysis (Seberg and Frederiksen, 2001
) and with traditional taxonomic treatments, most of which place Taeniatherum near Hordeum (e.g., Clayton and Renvoize, 1986
). Crossing studies involving Taeniatherum do not point to any likely close relatives; crosses with 30 species representing 11 genera (including 14 Hordeum species but none from Triticum or Aegilops) yielded very few hybrids, and in these, meiotic chromosome pairing was very low (Frederiksen and von Bothmer, 1989
). In another study (Frederiksen, 1994
), occasional meiotic pairing in hybrids between Taeniatherum and hexaploid Triticum aestivum was interpreted as autosyndetic pairing among homoeologous wheat genomes, although this was partly based on the morphologically reasonable a priori assumption that Taeniatherum was closely related to Hordeum and not a close relative of Triticum/Aegilops. In spite of their overall morphological dissimilarity, the accumulated molecular data strongly suggest that a significant portion of the Taeniatherum genome is closely related to that of the Triticum/Aegilops group.
Placement of Crithopsis within the Triticum/Aegilops group
A second genus closely associated with Triticum/Aegilops on the ß-amylase tree is the annual, monotypic Crithopsis. Its placement within or very near Aegilops is supported by most of the molecular data sets that included C. delileana (Table 5), including the cpDNA restriction sites (Mason-Gamer et al., 2002
) and rpoA gene (Petersen and Seberg, 1997
), the integrated highly repetitive genes (Kellogg et al., 1996
), and the DMC1 gene (Petersen and Seberg, 2002
). The morphological analysis (Seberg and Frederiksen, 2001
), on the other hand, did not place Crithopsis with Triticum/Aegilops but instead with Taeniatherum, a result that was further supported by the application of Giemsa C-banding to several Triticeae genera (Linde-Laursen et al., 1999
). These two studies are consistent with many of the molecular analyses in suggesting a close relationship between Crithopsis and Taeniatherum (Table 5), but they do not support the two species' close relationship to Triticum/Aegilops.
Monophyly of Thinopyrum and its relationship to Triticum/ Aegilops
The third genus with a close affinity to Triticum/ Aegilops on the ß-amylase tree is Thinopyrum. The relationship between these groups is economically relevant, because Thinopyrum is a potentially valuable source for improvement of hexaploid wheat, as a contributor of genes for resistance to several viral, fungal, and insect infestations, as well as increased tolerance to salt, low temperature, and drought (e.g., Tang et al., 2000
, and references therein). The ß-amylase gene tree does not support a sister relationship between T. bessarabicum and T. elongatum, which are the only diploid species in the genus. Instead, T. elongatum is in a well-supported clade with Aegilops, Triticum, Crithopsis, and Taeniatherum, while Thinopyrum bessarabicum is placed at the base of this clade, although the support for the latter placement is very weak (<50% bootstrap; <95% posterior probability).
The ß-amylase tree is consistent with an isozyme analysis in which UPGMA phenograms place both T. bessarabicum and Th. elongatum next to or within a Triticum/Aegilops cluster, but not as sister to one another (McIntyre, 1988
). Several gene trees suggest similar relationships (Table 5), including the integrated repetitive nuclear loci (Kellogg et al., 1996
), which placed Thinopyrum bessarabicum in a weak clade with Triticum, Aegilops, Crithopsis, and Taeniatherum, with Thinopyrum elongatum at the base of that clade. The DMC1 gene sequence data separate the Triticum/Aegilops sequences into two well-supported clades, one of which includes Thinopyrum bessarabicum, while the other includes T. elongatum and Crithopsis (Petersen and Seberg, 2002
). Thus, although the details differ, several data sets agree that T. bessarabicum and T. elongatum do not form a monophyletic group and that they are both closely related to Triticum/Aegilops and Crithopsis. This result is also partly consistent with the morphological study (Seberg and Frederiksen, 2001
) in which Thinopyrum elongatum was closely related (though not sister) to the Triticum-Aegilops clade, while Thinopyrum bessarabicum was placed at the base of the Triticeae. In contrast, the GBSSI gene sequence data supported a monophyletic (weakly supported) Thinopyrum (Mason-Gamer, 2001
), but agree with the other data sets in placing them near (in this case within) Triticum/ Aegilops. The cpDNA data are unique in that they not only strongly support the monophyly of Thinopyrum bessarabicum and T. elongatum (along with tetraploid T. scirpeum), but they also nest the clade within Pseudoroegneria, not in or near TriticumAegilops (Mason-Gamer et al., 2002
).
The taxonomy of Thinopyrum has been debated, with particular emphasis on the relationship between the two diploids, T. bessarabicum and T. elongatum. Based on chromosome pairing data, some workers assign separate genome designations to the two species (genomes J and E, respectively) and place the E-genome species in a separate genus, Lophopyrum Á.Löve (Löve, 1984
; Jauhar, 1988
, 1990
). Others suggest that the genomes of T. bessarabicum and T. elongatum are similar enough to be assigned a single genome designation (Dvorák, 1981
; Dewey, 1984
; McGuire, 1984
; Wang, 1985
; Wang and Hsiao, 1989
). In light of the equivocal results and interpretations of the cytogenetic studies, it is perhaps not surprising that different molecular phylogenetic data sets support different conclusions regarding the close relationship between these two species. Together they suggest that the genomes of T. elongatum and T. bessarabicum are mostly evolutionarily distinct and are both related to Triticum/Aegilops. However, portions of their genomes (most notably, the chloroplast genome) are unrelated to Triticum/Aegilops, probably reflecting past introgression.
Conclusions
The ß-amylase gene is a potentially valuable source of phylogenetic data. It has been sequenced from several crop relatives and, in some grasses, its copy number and location(s) within the genome are relatively well understood. The results of a routine analysis of a small number of grass coding sequences, while seemingly straightforward, were complicated by underlying conflicting signal, which would have gone undetected without the examination of individual characters. The conflict, and the resulting uncertainty regarding the root of the tree, may result from the use of too-distant outgroups and thus might be eliminated with the availability of an outgroup more closely related to grasses. There are several copies of the ß-amylase gene in the Triticeae, and the incorporation of existing genetic information allowed the successful (by all appearances) targeting of a single member of the family. Conflict among Triticeae gene trees has been demonstrated in the past, and the ß-amylase gene tree indeed supports some new, unique relationships. Thus, on one hand, the addition of new Triticeae gene trees in the future might add new conflicting relationships, complicating the overall phylogenetic picture. On the other hand, comparisons among larger numbers of data sets potentially allow the identification of points of broad consensus. If Triticeae genomes are thought of as fluid, with potential genetic exchange among lineages, these points of consensus can be thought of as phylogenies of portions of genomes, while well-supported conflicts may be indicators of historical reticulate events.
FOOTNOTES
1 The author thanks Mary E. Barkworth and D. Megan Helfgott for discussion of the Triticeae; the USDA Germplasm Resource Information Network for seeds; two anonymous reviewers for thoughtful and useful suggestions, including comments on the effects of distant outgroups; and Jack Sullivan for discussion of the data analysis and for providing a copy of an in-press manuscript. The work was funded by the National Science Foundation (DEB-9974181 and DEB-0426194). ![]()
LITERATURE CITED
Barkworth M. E. 1998 Grasses of the tribe Hordeae in North America. 3. Comments. Botanical Electronic News 199, website http://www.ou.edu/cas/botany-micro/ben/ben199.html
Bureau D. C. Laurière C. Mayer J. Sadowski J. Daussant 1989 Post-translation modifications of ß-amylases during germination of wheat and rye seeds. Journal of Plant Physiology 134: 44-50
Bureau T. E. S. R. Wessler 1994 Stowaway: a new family of inverted repeat elements associated with the genes of both monocotyledonous and dicotyledonous plants. Plant Cell 6: 907-916[Abstract]
Clark S. E. P. M. Hayes C. A. Henson 2003 Effects of single nucleotide polymorphisms in ß-amylase1 alleles from barley on functional properties of the enzymes. Plant Physiology and Biochemistry 41: 798-804[CrossRef][ISI]
Clayton W. D. S. A. Renvoize 1986 Genera Graminum. Grasses of the world. Royal Botanical Gardens, London, UK
Daussant J. J. Sadowski P. Ziegler 1994 Cereal ß-amylases: diversity of the ß-amylase isozyme status within cereals. Journal of Plant Physiology 143: 585-590[ISI]
Dewey D. R. 1984 The genomic system of classification as a guide to intergeneric hybridization within the perennial Triticeae. In J. P. Gustafson [ed.], Gene manipulation in plant improvement, Proceedings of the 6th Stadler genetics symposium, 209279. Columbia University Press, New York, New York, USA
Dunn G. 1974 A model for starch breakdown in higher plants. Phytochemistry 13: 1341-1346[CrossRef][ISI]
Dvorák J. 1981 Genome relationships among Elytrigia (=Agropyron) elongata, E. stipifolia, "E. elongata 4x," E. caespitosa, E. intermedia, and "E. elongata 10x". Canadian Journal of Genetics and Cytology 23: 481-492[ISI]
Felsenstein J. 1978 Cases in which parsimony or compatibility methods will be positively misleading. Systematic Zoology 27: 401-410[CrossRef][ISI]
Felsenstein J. 1981 Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution 17: 368-376[CrossRef][ISI][Medline]
Feschotte C. N. Jiang S. R. Wessler 2002 Plant transposable elements: where genetics meets genomics. Nature Reviews Genetics 3: 329-341[CrossRef][ISI][Medline]
Frati F. C. Simon J. Sullivan D. L. Swofford 1997 Evolution of the mitochondrial cytochrome oxidase II gene in Collembola. Journal of Molecular Evolution 44: 145-158[CrossRef][ISI][Medline]
Frederiksen S. 1986 Revision of Taeniatherum (Poaceae). Nordic Journal of Botany 6: 389-397
Frederiksen S. 1991 Taxonomic studies in Eremopyrum (Poaceae). Nordic Journal of Botany 11: 271-285
Frederiksen S. 1994 Hybridization between Taeniatherum caput-medusae and Triticum aestivum (Poaceae). Nordic Journal of Botany 14: 3-6
Frederiksen S. R. von Bothmer 1989 Intergeneric hybridization between Taeniatherum and different genera of Triticeae, Poaceae. Nordic Journal of Botany 9: 229-240[ISI]
Gana J. A. N. E. Kalengamaliro S. M. Cunningham J. J. Volenec 1998 Expression of ß-amylase from alfalfa taproots. Plant Physiology 118: 1495-1505
Gibson T. S. V. Solah M. R. Glennie-Holmes H. R. Taylor 1995 Diastatic power in malted barley: contributions of malt parameters to its development and the potential of barley grain beta-amylase to predict malt diastatic power. Journal of the Institute of Brewing 101: 277-280[ISI]
Graham S. W. R. G. Olmstead S. C. H. Barrett 2002 Rooting phylogenetic trees with distant outgroups: a case study from the commelinoid monocots. Molecular Biology and Evolution 19: 1769-1781
Grass Phylogeny Working Group (GPWG). 2001 Phylogeny and subfamilial classification of the grasses (Poaceae). Annals of the Missouri Botanical Garden 88: 372-457
Gu X. Y.-X. Fu W.-H. Li 1995 Maximum likelihood estimation of the heterogeneity of substitution rate among nucleotide sites. Molecular Biology and Evolution 12: 546-557[Abstract]
Hasegawa M. H. Kishino T. Yano 1985 Dating the human-ape split by a molecular clock of mitochondrial DNA. Journal of Molecular Evolution 22: 160-170[CrossRef][ISI][Medline]
Helfgott D. M. R. J. Mason-Gamer 2004 The evolution of North American Elymus (Triticeae, Poaceae) allotetraploids: evidence from phosphoenolpyruvate carboxylase gene sequences. Systematic Botany 29: 850-861[CrossRef][ISI]
Hendy M. D. D. Penny 1989 A framework for the quantitative study of evolutionary trees. Systematic Zoology 38: 297-309[CrossRef][ISI]
Higgins D. G. A. J. Bleasby R. Fuchs 1992 CLUSTAL V: improved software for multiple sequence alignment. Computer Applications in the Biosciences 8: 189-191
Hsiao C. N. J. Chatterton K. H. Asay K. B. Jensen 1995a Molecular phylogeny of the Pooideae (Poaceae) based on nuclear rDNA (ITS) sequences. Theoretical and Applied Genetics 90: 389-398[ISI]
Hsiao C. N. J. Chatterton K. H. Asay K. B. Jensen 1995b Phylogenetic relationships of the monogenomic species of the wheat tribe, Triticeae (Poaceae), inferred from nuclear rDNA (internal transcribed spacer) sequences. Genome 38: 211-223[Medline]
Huelsenbeck J. P. 1998 Systematic bias in phylogenetic analysis: is the Strepsiptera problem solved?. Systematic Biology 47: 519-537[ISI][Medline]
Huelsenbeck J. P. K. A. Crandall 1997 Phylogeny estimation and hypothesis testing using maximum likelihood. Annual Review of Ecology and Systematics 28: 437-466[CrossRef][ISI]
Huelsenbeck J. P. B. Rannala 1997 Phylogenetic methods come of age: testing hypotheses in an evolutionary context. Science 276: 227-232
Huelsenbeck J. P. F. Ronquist 2001 MrBayes: Bayesian inference of phylogenetic trees. Bioinformatics 17: 754-755
Jauhar P. P. 1988 A reassessment of genome relationships between Thinopyrum bessarabicum and T. elongatum of the Triticeae. Genome 30: 903-914