|
|
||||||||
Invited Special Papers |
2Department of Biology, University of MissouriSt. Louis, One University Boulevard, St. Louis, Missouri 63121 USA; 3Department of Genetics, University of Georgia, Athens, Georgia 30622 USA
Received for publication January 8, 2004. Accepted for publication June 15, 2004.
| ABSTRACT |
|---|
|
|
|---|
Key Words: duplication polyploidy retrotransposon
| INTRODUCTION |
|---|
|
|
|---|
This article will describe the standard structural patterns in the nuclear genomes of seed plants and show how these have been conserved over the last 100 million years of angiosperm evolution. The natures, mechanisms, and frequencies of specific chromosomal rearrangements will be described. Genome size variation and genome duplication will be discussed in some detail. Through this presentation, we hope to provide a comprehensive view of the current understanding of plant nuclear genome structure and evolution and indicate future directions of this field of study. Much of this review will focus on grasses (Poaceae), primarily because so much comparative genome structure information is available within this family. Comparative sequence analysis has been undertaken in orthologous regions of the barley, maize, rice, sorghum, and wheat genomes (reviewed in Bennetzen and Ramakrishna, 2002
), thereby providing a large data set for the characterization of local genome evolution. Comparative recombinational maps have also been generated for these species, as well as for pearl millet, sugarcane, foxtail millet, rye, and a few others (Gale and Devos, 1998
). Hence, sufficient data are available only in the grasses for the comprehensive characterization of plant nuclear genome structure and evolution. Comparative data are rapidly accumulating for other families, however, notably for Brassicaceae, and we expect that the next several years will see a tremendous increase in information on how genomes evolve.
Many other important aspects of nuclear genome evolution will not be covered here, although fascinating new data are accumulating. For example, the overall base composition of genomes varies and affects codon usage patterns. Codon usage patterns and percentage GC in onion are more similar to Arabidopsis than to rice, suggesting (although hardly proving) that the high GC content of the grasses may be derived (Kuhl et al., 2004
). Intron size and number vary among plants, but the correlates of this variation are not well known. In animals, small genome size correlates with smaller introns, but this relationship does not seem to hold in plants (see for example, Dubcovsky et al., 2001
; Wendel et al., 2002
). Base composition, rates of substitution, and size and number of introns are all important aspects of genome evolution, and we hope that more data and analyses will be forthcoming on all topics in the next few years.
| STRUCTURE OF SEED PLANT GENOMES |
|---|
|
|
|---|
Chromosome numbers are highly variable in the flowering plants and do not generally relate to overall genome size. For instance, the
440-Mb haploid rice genome is distributed across 12 chromosomes, whereas the
4900-Mb haploid barley genome is present on only seven chromosomes. Hence, the average barley chromosome is almost twice as large as the entire rice genome.
Many plant nuclear genomes also contain accessory chromosomes, often known as B chromosomes. These highly condensed chromosomes are usually small and largely or completely devoid of functional genes. In maize, pollen grains that carry B chromosomes exhibit an advantage in fertilization, thereby increasing the chance that these essentially selfish chromosomes will persist in subsequent generations (reviewed in Carlson, 1986
). Within a single species, individuals may have anywhere from zero to several of these accessory chromosomes, thereby altering nuclear chromosome number and genome size but having little other effect upon the biology of the organism other than increasing overall rates of nondisjunction.
Despite these dramatic differences in size and number, all seed plants appear to have fairly similar general organizations of their chromosomes. Most angiosperm chromosomes have centromeric regions that are necessary for efficient chromosome segregation. These regions show extensive chromatin condensation and are flanked by large regions of additional heterochromatin that is enriched for tandem repeats and transposable elements. In large genomes like barley and wheat, these heterochromatic pericentromeric regions can make up more than 50% of the physical length of the chromosome. Other heterochromatic regions (often called knobs) are also found in all seed plant genomes, even small ones like Arabidopsis, and they are similarly enriched for tandem repeats and transposable elements (Ananiev et al., 1998
; Cold Spring Harbor et al., 2000
). The ends of all seed plant chromosomes studied to date contain short tandem repeats that are presumably added by the enzyme telomerase (Richards and Ausubel, 1988
). In fact, studies by McClintock (1932)
in maize first demonstrated the existence of special structures/properties (now called telomeres) at the ends of eukaryotic linear chromosomes that were necessary for their protection from progressive degradation and/or end fusion.
Although centromeric heterochromatin and knobs are shared by all seed plant genomes, their sizes and locations show extreme interspecies and sometimes intraspecies variation. Smaller genomes always have fewer and smaller regions of heterochromatin than larger genomes, both in plants and in animals. The euchromatic component of plant chromosomes can also be highly variable in size and distribution. Euchromatin, less condensed than heterochromatin, appears to contain most of the genes and most of the meiotic recombination in all multicellular eukaryotes, including plants. However, all euchromatin is not likely to be qualitatively or quantitatively consistent between different chromosomal regions, even within a species. For instance, some euchromatic regions in the wheat genome appear to have a significantly higher gene density and similarly higher recombination frequency than other euchromatic regions (Gill et al., 1996a
, b
).
Plant nuclear genomes exist in an organelle that has significant three-dimensional structure. We know almost nothing about the nature or importance of the three dimensional folding of chromatin in the interphase plant nucleus. Bennett (1987
, 1988
) and Heslop-Harrison and Bennett (1990)
argued that the arrangement of chromosomes in the nucleus is highly structured. They show that in hybrids the two parental chromosomes occupy different regions of the nucleus and that these patterns are consistent for any given pair of parents. However, this work has not been extended to other species beyond the few grasses they investigated. Research in animals indicates that specific and highly dynamic chromatin arrangements are formed and that these arrangements may be unique to a tissue type, developmental time, or stage in the cell cycle (reviewed in Belmont, 2003
). These patterns of three-dimensional structure presumably evolved as important components of regulated gene expression, nuclear packaging, and/or chromosomal mechanics. In plants, the conservation of one predicted genome structure component (matrix attachment regions, or MARS) at specific locations in orthologous genes has been observed (Avramova et al., 1998
; Tikhonov et al., 2000
). These results indicate that the regulated folding of interphase chromatin is an important component of eukaryotic genomes deserving additional investigation. In summary, plant nuclear genomes are conserved in overall structure, all containing multiple chromosomes with centromeres, telomeres, heterochromatic blocks, and euchromatic regions. The locations and sizes of the centromeres and various chromatin types are exceedingly variable between species and can vary within species. Chromosome number, chromosome size, and overall genome size are highly variable between species, sometimes even within the same genus, but tend to be conserved within a species.
Local genome structure
Over the last eight years, genome sequence analysis in plants has shifted from studies of single genes in isolation to detailed studies of larger chromosomal regions, including whole genomes. These studies have shown that plant genes are relatively compact and often clustered, even in large genomes. Plant introns are usually small, averaging less than 200 bp, so that the average transcribed portion of a gene is less than 2.5 kb (Arabidopsis Genome Initiative, 2000
). Upstream and downstream regulatory elements are usually small as well, amounting to no more than a few hundred additional bases in most genes. There are exceptions, however, in which a regulatory element can be more than 90 kb upstream of a locus (Stam et al., 2002
). However, regulation at a distance (common in many animal genes) appears to be rare in plants, so that the average gene plus its regulatory components will normally occupy only about 15 kb of genomic space.
Genes can be tightly juxtaposed in plants and are often within a few hundred base pairs of each other in plants with small genomes like Arabidopsis and rice (Arabidopsis Genome Initiative, 2000
). Even in plants with large genomes, like maize (
2400 Mb) and barley, gene clusters can be found (Rahman et al., 1997
; Llaca and Messing, 1998
; Feuillet and Keller, 1999
; Fu et al., 2001
). Within these gene clusters, gene density approaches one gene per 5 kb, close to the average value of one gene per 4.5 kb for the sequenced portion of the Arabidopsis genome (Arabidopsis Genome Initiative, 2000
). However, clusters with numerous adjacent genes appear to be relatively rare in large genome plants. More commonly, ongoing sequencing projects have found mostly clusters of one to three genes in islands surrounded by seas of repetitive DNA. These intergenic repetitive DNA blocks are composed primarily of intact and fragmented LTR-retrotransposons, often arranged in a nested structure of elements inserted within elements (SanMiguel et al., 1996
; reviewed in Bennetzen, 2000
). In maize, as in all other plants investigated, repetitive DNA blocks are heavily methylated at the cytosines in 5'-CG-3' and 5'-CNG-3' sequences (Bennetzen et al., 1994
; Kashkush et al., 2002
, 2003
). Most of these hypermethylated repeat blocks extend for 5150 kb in maize, making them a significant impediment to physical mapping, genome sequencing, and map-based gene isolation (Bennetzen et al., 1994
).
In summary, local genome structure in seed plants exhibits significant general similarity across all studied species. In nuclear genomes with a low repeat content, most genes are found near one another. In larger genomes, repeats are often inserted between genes, although small repeats like miniature inverted repeat transposable elements (MITEs) can commonly be found in the introns, promoters, or 3' trailer sequences of genes (Wessler et al., 1995
). Gene islands in large genome plants often resemble the genic regions of small genome plants, although the large repeat blocks in large genome plants usually separate islands with only a few (one to three) genes.
| GENOMIC COLINEARITY IN PLANTS |
|---|
|
|
|---|
Fewer comparative mapping studies have been pursued in eudicots, and these have led to the popular perception that eudicots have more genomic rearrangements than do grasses. Many of the eudicot comparative mapping projects have focused on comparisons to Arabidopsis, an organism that has a history of extensive genomic instability (Blanc et al., 2000
; Ku et al., 2000
); this may give the impression that all eudicot genomes are highly rearranged. Among grasses, some lineages have also been unstable and would have served as poor reference genomes for repeated comparative studies. For instance, the close wheat relative Aegilops umbellulata has more gross chromosomal rearrangements relative to wheat than does barley, a much more distant relative (Zhang et al., 1998
). More comparative analyses are needed among much more diverse plant genomes. It is likely that, as in the original eudicot studies (Bonierbale et al., 1988; Tanksley et al., 1988
) and in most cereal comparisons (Gale and Devos, 1998
), closely related species will usually show the fewest rearrangements. It will be important to determine which lineages show the greatest degrees of conservation in gene arrangement and whether grasses are in any way exceptional in their degree of genome conservation. Gaut (2001)
developed a statistical method for assessing colinearity of genomes. In this, he evaluated whether runs of colinear genes could have occurred at random; the method also allowed him to incorporate an estimate of map error into the statistical test. Based on his reanalysis of the maize genome, he estimated 1.31.9 rearrangements per million years, a rate comparable to estimates of 1.42.8 rearrangements per million years for cotton (Brubaker et al., 1999
) and rather higher than rates estimated for the same species by Lagercrantz (1998)
. Gaut (2002)
extended this approach and noted that the rate of rearrangement, measured as the probability of synteny between two markers, is roughly constant among grasses. Similar rates are reported for several Brassicaceae and Solanaceae, although the Brassica rapa/oleracea comparison is estimated as 2.5 per million years. This may be an artifact, however. A (relatively) small number of markers is available for cross-species comparisons, and the taxa investigated diverged millions or tens of millions of years ago. Burke et al. (2004)
found a much higher rate of rearrangement among closely related Helianthus species that diverged less than a million years ago (5.57.5 per million years); their genome maps have a high number of comparable markers and thus detection of rearrangements may be easier. The rate of rearrangement in grasses and other plants may simply reflect the number that can be detected over long periods of time.
Comparisons between cereal genomes and the Arabidopsis genome were stimulated by the development of a detailed physical map and contiguous DNA sequence for Arabidopsis. An early comparative mapping study indicated that significant genetic colinearity had been conserved between Arabidopsis and rice over the more than 100 million years since the lineages that gave rise to these two species diverged (Paterson et al., 1996
). However, subsequent studies demonstrated that adjacent genes in rice were often not adjacent in Arabidopsis (Bennetzen et al., 1998
) or linked on the genetic map (Devos et al., 1999
). With the completion of the sequences of highly extended regions of the Arabidopsis and rice genomes, more comprehensive studies have become feasible, demonstrating "scant collinearity in gene order" (Liu et al., 2001
).
Comparative sequence analyses of small chromosomal segments, mostly in grasses, have shown that local gene content, order, and orientation are also conserved in close relatives (reviewed in Bennetzen and Ma, 2003
). Even in distant comparisons, like Arabidopsis and rice, small segments of colinear genes are apparent (van Dodeweerd et al., 1999
; Salse et al., 2002
). The sequence conservation observed in comparisons between species that have evolved independently for 10 million years or more appears to be limited to genes. Some conserved noncoding sequences (CNSs) are found in grass genomes, but these short segments of similarity and possible homology are tightly linked to genes and may have gene regulatory roles (Kaplinsky et al., 2002
; Guo and Moose; 2003
; Hong et al., 2003
). Moreover, the CNSs in plants are nowhere near as abundant or as large as the CNSs discovered by comparative sequence analysis in vertebrates (Thomas et al., 2003
).
In summary, closely related plants have similar gene content and stretches of conserved gene order. This colinearity, often called synteny (which actually does not refer to order at all, but only to presence on orthologous chromosomes), diminishes as more distant relatives are compared. Some lineages, like Arabidopsis, appear to have exceptionally high frequencies of genomic rearrangement that greatly diminish their colinearity even with close relatives.
| CHROMOSOMAL REARRANGEMENT |
|---|
|
|
|---|
In contrast to the rarity of large chromosomal rearrangements, small chromosomal rearrangements are incredibly abundant in plants. In Arabidopsis, some analyses have estimated that over 60% of the genes appear to have been rearranged (mostly deleted) since the last polyploidization occurred in this lineage (Vision et al., 2000
). In comparisons of rice and sorghum, two highly diploidized species that diverged from a common ancestor 5070 mya (Grass Phylogeny Working Group, 2001
), it appears that approximately 20% of the genes have been rearranged (Bennetzen and Ma, 2003
). Hence, small chromosome rearrangements involving genes are orders of magnitude more frequent in plants than are large chromosomal rearrangements. In approximately the same amount of geological time, rearrangement of genes within mammals is less than 1% (Dehal et al., 2001
; Cooper et al., 2003
; Thomas et al., 2003
).
The population genetics of chromosomal rearrangement has been the subject of much theory, but relatively little data (reviewed in Rieseberg, 2001
). In general, a chromosomal rearrangement appears to protect a part of the genome from gene flow, fixing a set of alleles simultaneously and preventing their breakup or loss by crossing back to progenitor plants. Individual chromosomal rearrangements may create only small reductions in plant fitness, at the same time as they contribute to reproductive isolation. Small population sizes and/or metapopulation structure may also permit fixation of mildly deleterious rearrangements.
Investigations of the sequences between genes show that much of this material is derived from transposable elements, even in small genome species like Arabidopsis and rice (Devos et al., 2002
; Ma et al., 2004
). Transposable elements are remarkably active in plants. Bursts of element activity can generate hundreds or thousands of heritable new element copies in a single plant generation. On average, most plant nuclear genomes accumulate several thousand new transposable elements per million years, most of them LTR-retrotransposons and MITEs. Recently, it has been shown that these elements and other nonessential DNA can be relatively quickly removed from plant genomes by illegitimate recombination (Devos et al., 2002
; Ma et al., 2004
). Deletions caused by illegitimate recombination are usually tiny, the vast majority being less than 100 bp (Bennetzen et al., in press
), but their continuous accumulation can rapidly remove a large part of the genome. Ma et al. (2004)
conservatively estimated that LTR-retrotransposon sequences had a half life of less than 6 million years in rice and that various deletion processes had removed more than 190 Mb of LTR-retrotransposon sequence from the rice genome in the last 8 million years. Given this rapid dynamic of insertion and deletion, it is not surprising that the unselected sequences between genes are different in species that have diverged for more than a few million years. Figure 1 provides an idealized description of global and local processes that can enlarge or shrink a plant genome.
|
|
In summary, local genomic rearrangement is a continuous and highly active process in seed plant genomes. Mechanisms for these events are known or suspected. However, we need much more information to determine the relative frequencies of different classes of rearrangement, whether they differ between plant lineages, and the biological outcomes of these rearrangements.
| GENOME SIZE VARIATION |
|---|
|
|
|---|
Future studies will need to determine whether differences in genome size in any particular lineage are caused by an unusually low or high rate of transposable element amplification or by differences in the mechanisms of transposable element removal. Petrov and coworkers demonstrated that the frequency and size of DNA deletions in LINE retroelements was greater in the small genome of the insect Drosophila melanogaster than in large genomes of the insect genus Laupala (Petrov et al., 2000
). Similarly, Kirik and colleagues (2000)
have shown that double-strand breaks are more commonly resolved with insertions and less commonly with deletions in tobacco, a plant with a relatively large genome, than in Arabidopsis. Hence, it is likely that different organismal lineages will exhibit different rates and modes of DNA removal, thus providing one contribution to differences in current genome sizes.
Lynch and Conery (2003)
, working with a large sample of prokaryotes and eukaryotes, showed clearly that genome size is negatively correlated with the parameter Neµ, which is the product of the effective population size and the mutation rate per nucleotide. Because the range of mutation rates is rather narrow, variation in this parameter generally reflects variation in population size. Small population sizes correlate with large genomes. Vinogradov (2003)
found a similar result in angiosperms, by comparing genome size in rare angiosperms with that of more common species. On average, rare plants had larger genomes. There was no correlation, however, with life history, a result also found for species of Hordeum (Poaceae; Jakob et al., 2004
). In some clades of Hordeum, annual species had smaller genomes than perennials, whereas the relationship was reversed in other clades.
Soltis et al. (2003)
have examined variation in genome size among the angiosperms and have found that, as expected, genome size varies appreciably when mapped on the phylogenetic tree. They estimated the ancestral size as "very small" (<1.4 pg per 1C nucleus) and noted that very large genomes appear in only a few clades (Santalales, Asparagales, and Liliales). They also observed increases and decreases of genome size over evolutionary time. This inference of fluctuation is expected in part because they used both parsimony and squared-change parsimony to estimate ancestral states. Both methods assume that increasing and decreasing values are equally likely, the latter giving the equivalent of a Bayesian reconstruction that assumes a Brownian motion model of evolution of the character (Maddison, 1991
) and that Brownian motion is "infinitely jiggly" (Felsenstein, 2004
, p. 392). Other models of character change (e.g., those listed by Felsenstein, 1988
) might give a different result. For example, Bennetzen and Kellogg (1997)
reconstructed ancestral genome sizes under both the Brownian motion model and a model in which genomes could only get bigger. Not surprisingly, the reconstructions were sensitive to the underlying model of character state change.
Figure 3 illustrates the evolution of genome size among diploid grasses. Sampling is more comprehensive than in our previous study (Bennetzen and Kellogg, 1997
) but remains heavily biased toward subfamily Pooideae (all taxa derived from the common ancestor of Nardus and Aegilops). We retrieved 1C values for all diploid grasses and outgroups from Cyperaceae and Juncaceae from the Angiosperm C-values Database (Bennett and Leitch, 2003
). Lygeum spartum (2n = 40) and Deschampsia antarctica (2n = 26) are listed as diploids but almost certainly are polyploid based on their chromosome numbers; they were therefore excluded. For monophyletic genera with more than one species, we calculated the average 1C value. If, however, more than one chromosome base number was present, we calculated the average value for each base number separately. This assumes that each chromosomal group within a genus is monophyletic, which is in fact unlikely. However, phylogenetic trees are not available for many of the individual genera in the tree. Averaging chromosome numbers within a genus therefore seemed a reasonable compromise between illustrating variation vs. losing information. Because Festuca is paraphyletic and contains Lolium, we used an average value for Lolium, but then treated each clade of Festuca separately following the molecular cladogram presented by Catalán et al. (2004)
.
|
Under this simple model of evolution, however, genome size apparently decreases in multiple lineages. The genus Phleum seems to have an unusually small genome, as do Corynephorus, Holcus, the two annual species of Poa (infirma and supina), and the x = 7 species of Phalaris. Zingeria biebersteiniana, with x = 2, also has a very small genome, but it has not been placed phylogenetically so is not included in this tree. Morphological similarity would place it near Avena.
A more sophisticated model of evolutionary change might lead to different conclusions about the exact size of the ancestral genome and the relative frequency of genome expansion or contraction. Such optimization is also sensitive to taxon sampling; because there are no genome size estimates for any grass lineages that diverged before the common ancestor of maize and rice, nor for any of the immediate outgroups, we expect that the details of our estimates here are subject to change. However, the current data and optimization are sufficient to demonstrate that genome size is labile.
It is also interesting to note that base chromosome number does not correlate precisely with size. A base number of 7 is ancestral and synapomorphic for core Pooideae, but there have been reductions to x = 4 in Milium vernale, 6 in some species of Phalaris, and 5 in Briza minor. Whereas Briza minor has a smaller genome than its x = 7 congeners (2.9 vs. an average of 5.7), the estimate for Milium is higher than that for Phleum, and genomes of the x = 6 Phalaris are on average bigger than for the x = 7 species. Similarly, Sarga versicolor (Andropogoneae; x = 5) has a 1C value almost three times that of Sorghum bicolor and twice that of Vetiveria.
In Fig. 4, we present a more detailed view of tribe Triticeae, which includes the largest known genomes in the grasses. We also include the polyploid species for which genome sizes are known. The phylogenetic relationships of the diploid Triticeae are not clear; every gene investigated appears to have had a different history (Kellogg et al., 1996
; Mason-Gamer et al., 1998
; Mason-Gamer, 2004
, and references therein). Accordingly, we have used the cladogram for the plastid genome (Mason-Gamer et al., 2002
) as one of several possible inferences of the history of the group. The figure shows that even at this level of analysis, genome size is variable. Also, the sizes of the genomes of the polyploids are often, but not always, smaller than the sum of those of their diploid progenitors. A similar result was found by Jakob et al. (2004)
in their detailed analysis of the evolution of genome size in Hordeum. The occasional reduction of genome size in polyploids is consistent with the hypothesis that polyploidy may be followed by loss of genetic material (discussed later). This also extends the observation of Levy and Feldman (2002)
on grasses in general, in which they observed that the average genome size of polyploids was less then twice the average for the diploids.
|
The role of natural selection in determining genome size is unknown. As we come to understand better the mechanisms that control genome size, we may be able to develop clear testable hypotheses. Previous authors (e.g., Cavalier-Smith, 1985
) have suggested that selection operates on nuclear volume and that this indirectly affects genome size. Data bearing on this hypothesis are mostly correlational rather than experimental, but the correlations are far from perfect (see multiple examples given by John and Miklos, 1988
), and attempts to link genome size and phenotypic characters have not always been successful (e.g., Bachmann et al., 1985
). Recent data on genome structure point to many more levels at which selection might act. For example, selection on the ability of the plant to limit transposon activity might keep genomes from expanding indefinitely. Selection could also act on the epigenetic mechanisms that silence genes and genomes in polyploids. Until we know more details of how such mechanisms work, it is difficult to devise a test (experimental or statistical) of the hypothesis of selection.
| POLYPLOIDY |
|---|
|
|
|---|
Genome sizes of allopolyploids are not necessarily arithmetic sums of the sizes of the parental genomes. Genetic material is lost and genomes are rearranged. For example, Triticum dicoccoides is an amphidiploid resulting from a cross between a progenitor similar to Triticum urartu (with the A genome) and one similar to Aegilops speltoides (with a genome similar to the B genome). Belyayev et al. (2000)
showed that genomic probes made from Aegilops speltoides hybridize strongly to large portions of the B genome of T. dicoccoides, but also to dispersed locations in the A genome, suggesting that repetitive sequences from the B genome have been preferentially amplified in the A genome or have moved from one location in the genome to another. Other examples are cited by Wendel (2000)
.
Like most other eukaryotes, plants undergo cycles of polyploidization, followed by diploidization, the latter characterized by gene loss and/or pseudogene formation. Even such "good" diploids as Arabidopsis and rice are now known to be ancient polyploids or at least to have undergone extensive segmental duplication (discussed see next). Thus previous attempts to estimate the percentage of polyploid species among angiosperms (Stebbins, 1971
; Masterson, 1994
) are now seen to be oversimplified; many "diploid" species are paleopolyploid. It appears that 100% of flowering plants are current polyploids or have a polyploid history. In the next few paragraphs, we cite examples from major plant families, in which well-known and well-documented duplication events are superimposed on more ancient duplications.
Polyploidy in Poaceae
Duplication at the base of the family
The grasses all apparently share a number of ancient duplications of at least parts of their genomes. Draft genome sequences have been published for rice (Goff et al., 2002
; Yu et al., 2002
), but the genome has only recently been assembled into a pseudomolecule and is still being annotated. It has thus been difficult to check on earlier suggestions of genome duplications. Recently, Vandepoele et al. (2003)
have produced genomic scaffolds for the rice genome and used these to estimate the number and extent of duplicated blocks. Only 15% of the rice genome falls into identifiable duplicated blocks, which is appreciably less than the 60% estimate for Arabidopsis. Furthermore, these duplications involve only a few of the chromosomes. Finally, a plot of the percentage of duplicated genes against the number of substitutions per silent site did not produce an obvious peak, as would be expected if many of the genes in the genome had been duplicated around the same time. Nonetheless, most of the duplications preceded the diversification of the cereal crops, indicating that they occurred before the origin of the grasses.
The maize duplication
Polyploidy of maize was suggested by Edgar Anderson in 1945
and has been documented by Rhoades (1951)
, Helentjaris et al. (1988)
, and Wendel et al. (1989)
, among others. Anderson speculated that the ancestors might be five-chromosome species of Sorghum and Coix, ignoring the obvious difficulty that five-chromosome Sorghum (now placed in the genus Sarga; Spangler, 2003
) is native to Australia and Africa and Coix to India (Clayton and Renvoize, 1986
). Anderson's hypothesis predicts that genes from maize should be sister to genes from one or more of the x = 5 Andropogoneae, but instead all analyses to date placed Zea sister to Tripsacum (e.g., waxy, Mason-Gamer et al., 1998
; phytochrome B, Mathews et al., 2002
; teosinte branched 1, Lukens and Doebley, 2001
; ndhF, Spangler et al., 1999
; S. Kleweis, S. Malcomber, and E. A. Kellogg, University of Missouri-St. Louis, unpublished data). The sister group of these two is not well supported in any estimate of phylogeny to date, but no study has linked the Zea-Tripsacum clade with Sarga (= Australian "sorghum") or with Coix (Mason-Gamer et al., 1998
; Spangler et al., 1999
; Giussani et al., 2001
; Mathews et al., 2002
; Aliscioni et al., 2003
; S. Kleweis, S. Malcomber, and E. A. Kellogg, University of Missouri-St. Louis, unpublished data).
Gaut and Doebley (1997)
, in a much cited paper, inferred that maize is a segmental allotetraploid. Their hypothesis predicts that trees of orthologous genes should produce one of two alternative patternseither (maize [maize, sorghum]) or ([maize, maize] sorghum) ([M {M, S}] or [{M, M} S])and that linked genes should share the same pattern. Gaut and Doebley (1997)
could not undertake such a comparison because of the lack of corresponding sequence from sorghum and rice. Recently, however, Swigonova et al. (in press)
investigated six pairs of maize genes and their orthologues from sorghum and rice. Phylogenetic analyses are equivocal. Only two of the genes (r1/b1; grf1/grf2) strongly supported the (M [M, S]) tree, and one (orp1/orp2) strongly supported the alternative. The remaining genes produced equivocal trees that were not significantly different from a trichotomy, indicating that the ancestor of the two maize genomes arose about the same time as their divergence from the ancestor of sorghum (about 11 million years ago), consistent with the rapid radiation of the tribe. Wilson et al. (1999)
also argued for the ([M M] S) tree, but hypothesized that the maize ancestors had x = 8, a number that would be highly unusual for a panicoid grass.
Fertilization independent endosperm (fie) genes in maize showed that fie2 was more closely related to sorghum than either gene is to fie1 supporting the (M [M, S]) tree (Danilevskaya et al., 2003
). However, Swigonova et al. (in press)
have shown that fie1 and fie2 in maize are not orthologous, despite being on duplicated segments of the genome. Instead, each region originally contained two paralogous genes; one copy was lost from chromosome 4 and the other from chromosome 10. The lack of orthology may also help explain the unusually long branch leading to Zmfie1 in the figure in Danilevskaya et al. (2003)
.
Duplications in other groups of grasses
Grass chromosome numbers have been studied in detail since the synthesis provided by Avdulov (1931)
. Comparative genome mapping efforts have expanded on his observations. Core Pooideae are marked by having their genes arranged in seven large chromosomes; in addition, all members studied so far have one chromosome that corresponds to a combination of rice linkage groups 5 and 10 and another that corresponds to a novel combination of parts of rice 4 and 7 (Kellogg, 1998
). Panicoideae are divided into three major clades (Giussani et al., 2001
), corresponding to chromosome base numbers of x = 10 (Andropogoneae and Paspaleae) and x = 9 (Paniceae s.s.). All studied panicoids share a chromosome that corresponds to rice chromosomes 3 and 10 and another that corresponds to rice 7 and 9.
The woody bamboos (tribe Bambuseae) are almost all polyploid, with the exception of Chusquea talamancensis and possibly C. subtesselata (Judziewicz et al., 1999
). It is tempting to speculate that the many morphological novelties of the group were generated by major changes in genome structure and gene expression following polyploidy. This hypothesis predicts that the two diploid species are derived rather than ancestral and that the woody bamboos all have two full copies of a rice-like genome. Because of the enormous difficulty of doing genetic studies on woody bamboos, such investigations are a long way off.
Zizania, in tribe Oryzeae, shows that duplications need not encompass the entire genome. Zizania aquatica (North American wild-rice), with 15 chromosomes, has 14 chromosomes that are colinear with 11 of the 12 chromosomes of Oryza sativa and three chromosomes that are apparently duplicates of rice chromosomes 1, 4, and 9 (Kennard et al., 1999
). Zizania is more closely related to Oryza than to any other mapped cereal (Ge et al., 2002
). Nonetheless, there have been a number of rearrangements, all of which appear to involve duplicated loci.
Polyploidy in Brassicaceae
Recent studies in Brassicaceae have been ably reviewed by Koch et al. (2003)
. Here we summarize a few of the major new findings.
Duplication at the base of the family
The whole genome sequence of Arabidopsis made it possible to analyze the patterns of gene duplication across this supposedly compact diploid genome. Surprisingly, it became clear that the genome contained extensive duplicated blocks of sequence (Arabidopsis Genome Initiative, 2000
; Blanc et al., 2000
; Paterson et al., 2000
). Initial attempts to date this duplication using molecular-clock estimates indicated four rounds of genome duplication (Vision et al., 2000
). Subsequent analyses have verified the most recent duplication event, although the estimated date is now thought to be more recent (Blanc et al., 2003
). At least one and possibly two more ancient duplications can be demonstrated (Simillion et al., 2002
). However, a more powerful approach uses phylogenetic events to provide relative dates (Bowers et al., 2003
; Ermolaeva et al., 2003
). The phylogenetic analyses show that the duplication preceded the divergence of Brassica from Arabidopsis. Because the Brassica and Arabidopsis lineages diverged soon after the origin of Brassicaceae (M. Beilstein, University of Missouri, St. Louis, unpublished data), we infer that the "Arabidopsis" duplication is actually a Brassicaceae duplication.
Comparisons of the degenerated "homoeologous" genomes in Arabidopsis thaliana have been informative (Blanc et al., 2000
; Ku et al., 2000
; Arabidopsis Genome Initiative, 2000
). However, these homoeologous genomes are highly rearranged within Arabidopsis, partly because the multiple rounds of polyploidy were all ancient events and partly because the polyploid state apparently removed constraints against extensive gene loss. Hence, the current status of the Arabidopsis genome indicates rearrangements within rearrangements within rearrangements, making it difficult to sort out the nature, timing, and mechanisms of individual events.
Arabidopsis
Arabidopsis, as currently delimited, includes nine species and five subspecies (O'Kane and Al-Shehbaz, 1997
). One species, Arabidopsis suecica, is clearly a recently formed polyploid, the product of a naturally occurring cross between A. thaliana and A. arenosa (O'Kane et al., 1996
). In addition, A. thaliana has been crossed with the related A. lyrata ssp. petraea, and the resulting amphidiploid shows some promise as a tool for understanding the mechanisms and immediate results of amphidiploidy (Nasrallah et al., 2000
).
Phylogenetic studies (e.g., Koch et al., 2001
) have shown that Arabidopsis falls into the same major clade as shepherd's purse, Capsella, a much closer relationship than previously thought (although the relationship was suggested by Brummitt, 1992
). Consistent with this close phylogenetic relationship, Rossberg et al. (2001)
found that a 27-kb region of the Capsella rubella genome is perfectly colinear with a 31.5-kb region of the Arabidopsis thaliana genome. The region includes five genes in the same orientation in both species.
Brassica
The genome of Brassica oleracea is extensively rearranged relative to that of Arabidopsis, complicating efforts to compare the order of genes and infer ancient patterns of genome duplication (Lukens et al., 2003
, and references therein). Even diploid brassicas appear to be duplicated and/or even triplicated, although the evidence for the latter is not clear.
Neo-polyploidy among species of Brassica is well documented, and most introductory students learn about the triangle of U (1935)
. Mapping studies in Brassica napus, a naturally occurring amphidiploid of B. oleracea x B. rapa, have indicated that the genome of the polyploid is colinear with that of the ancestral diploids; no evidence was found for genome rearrangement (Parkin and Lydiate, 1997
). A similar result was found for Brassica juncea, the amphiploid product of B. rapa and B. nigra (Axelsson et al., 2000
). These results indicate that polyploidy does not necessarily have a destabilizing effect on genome structure in all plant species.
Polyploidy in Rosaceae (Maloideae)
The rose family is divided into four subfamilies, distinguished by their floral morphology and chromosome number. Maloideae include such economically important species as pears and apples, as well as many other less familiar fruits and ornamentals (e.g., shadbushes, hawthorns). Most species of Maloideae have a base chromosome number of 17, which was suggested (Sax, 1931
, 1932
, 1933
) to be an ancient polyploid based on a cross between a member of subfamily Amygdaloideae (cherries and apricots; x = 8) and subfamily Spiraeoideae (bridalwreath; most with x = 9). The argument was principally arithmetic: 8 + 9 = 17. Molecular phylogenetic studies using maternally inherited plastid genes showed that Amygdaloideae are not sister to Maloideae but could not rule out the possibility of wide hybridization.
Using DNA sequences of a low copy nuclear gene (granule bound starch synthase I, GBSSI) Evans and Campbell (2002)
have now shown convincingly that Maloideae originated from an ancestor with x = 9. They found two copies of GBSSI in most Rosaceae and four in Maloideae, consistent with the allopolyploid hypothesis. However, sequences from the diploid genus Gillenia (x = 9) were sister to the GBSSI clades of Maloideae, indicating that the ancestor of the maloids was probably spiraeoid. From the phylogenetic data, plus additional morphological similarities among the early-divergent maloids, Evans and Campbell concluded that the ancestral maloid must have had x = 18. Chromosome number would have reduced from x = 18 to x = 17 via dysploidy, with two chromosomes fusing to become one. As genome maps become available for more Rosaceae, it will be interesting to see if