Am. J. Bot.
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (11)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Kellogg, E. A.
Right arrow Articles by Bennetzen, J. L.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Kellogg, E. A.
Right arrow Articles by Bennetzen, J. L.
Agricola
Right arrow Articles by Kellogg, E. A.
Right arrow Articles by Bennetzen, J. L.
(American Journal of Botany. 2004;91:1709-1725.)
© 2004 Botanical Society of America, Inc.


Invited Special Papers

The evolution of nuclear genome structure in seed plants1

Elizabeth A. Kellogg2,4 and Jeffrey L. Bennetzen3

2Department of Biology, University of Missouri–St. Louis, One University Boulevard, St. Louis, Missouri 63121 USA; 3Department of Genetics, University of Georgia, Athens, Georgia 30622 USA

Received for publication January 8, 2004. Accepted for publication June 15, 2004.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 STRUCTURE OF SEED PLANT...
 GENOMIC COLINEARITY IN PLANTS
 CHROMOSOMAL REARRANGEMENT
 GENOME SIZE VARIATION
 POLYPLOIDY
 GENE FAMILIES AND GENE...
 PHYLOGENETIC IMPLICATIONS
 DYNAMIC GENOMES
 LITERATURE CITED
 
Plant nuclear genomes exhibit extensive structural variation in size, chromosome number, number and arrangement of genes, and number of genome copies per nucleus. This variation is the outcome of a set of highly active processes, including gene duplication and deletion, chromosomal duplication followed by gene loss, amplification of retrotransposons separating genes, and genome rearrangement, the latter often following hybridization and/or polyploidy. While these changes occur continuously, it is not surprising that some of them should be fixed evolutionarily and come to mark major clades. Large-scale duplications pre-date the radiation of Brassicaceae and Poaceae and correlate with the origin of many smaller clades as well. Nuclear genomes are largely colinear among closely related species, but more rearrangements are observed with increasing phylogenetic distance; however, the correlation between amount of rearrangement and time since divergence is not perfect. By changing patterns of gene expression and triggering genome rearrangements, novel combinations of genomes (hybrids) may be a driving force in evolution.

Key Words: duplication • polyploidy • retrotransposon


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 STRUCTURE OF SEED PLANT...
 GENOMIC COLINEARITY IN PLANTS
 CHROMOSOMAL REARRANGEMENT
 GENOME SIZE VARIATION
 POLYPLOIDY
 GENE FAMILIES AND GENE...
 PHYLOGENETIC IMPLICATIONS
 DYNAMIC GENOMES
 LITERATURE CITED
 
Plant nuclear genomes are enormously variable. Chromosome number, the degree of gene clustering, and chromosome size can all differ by as much as an order of magnitude, even between closely related species. Some variation is generated so rapidly that two different allelic versions of a chromosomal segment (otherwise known as haplotypes) can be dissimilar in gene content and arrangement even within a single plant species like maize (Fu and Dooner, 2002 ). Hence, plant nuclear genomes vary sufficiently to serve as powerful differentiating factors. Some changes clearly mark particular lineages of seed plants, such as the large inversions and translocations that are found within some clades of grasses (Gale and Devos, 1998 ). Other changes, such as polyploidy and most gene duplications/ deletions, are so frequent that they occur independently in multiple lineages. Recent studies have begun to characterize the natures, rates, and mechanisms of these various types of chromosomal rearrangement, thereby providing our first detailed insights into how these changes contribute to current evolved states and how they may be used in phylogenetic analysis.

This article will describe the standard structural patterns in the nuclear genomes of seed plants and show how these have been conserved over the last 100 million years of angiosperm evolution. The natures, mechanisms, and frequencies of specific chromosomal rearrangements will be described. Genome size variation and genome duplication will be discussed in some detail. Through this presentation, we hope to provide a comprehensive view of the current understanding of plant nuclear genome structure and evolution and indicate future directions of this field of study. Much of this review will focus on grasses (Poaceae), primarily because so much comparative genome structure information is available within this family. Comparative sequence analysis has been undertaken in orthologous regions of the barley, maize, rice, sorghum, and wheat genomes (reviewed in Bennetzen and Ramakrishna, 2002 ), thereby providing a large data set for the characterization of local genome evolution. Comparative recombinational maps have also been generated for these species, as well as for pearl millet, sugarcane, foxtail millet, rye, and a few others (Gale and Devos, 1998 ). Hence, sufficient data are available only in the grasses for the comprehensive characterization of plant nuclear genome structure and evolution. Comparative data are rapidly accumulating for other families, however, notably for Brassicaceae, and we expect that the next several years will see a tremendous increase in information on how genomes evolve.

Many other important aspects of nuclear genome evolution will not be covered here, although fascinating new data are accumulating. For example, the overall base composition of genomes varies and affects codon usage patterns. Codon usage patterns and percentage GC in onion are more similar to Arabidopsis than to rice, suggesting (although hardly proving) that the high GC content of the grasses may be derived (Kuhl et al., 2004 ). Intron size and number vary among plants, but the correlates of this variation are not well known. In animals, small genome size correlates with smaller introns, but this relationship does not seem to hold in plants (see for example, Dubcovsky et al., 2001 ; Wendel et al., 2002 ). Base composition, rates of substitution, and size and number of introns are all important aspects of genome evolution, and we hope that more data and analyses will be forthcoming on all topics in the next few years.


    STRUCTURE OF SEED PLANT GENOMES
 TOP
 ABSTRACT
 INTRODUCTION
 STRUCTURE OF SEED PLANT...
 GENOMIC COLINEARITY IN PLANTS
 CHROMOSOMAL REARRANGEMENT
 GENOME SIZE VARIATION
 POLYPLOIDY
 GENE FAMILIES AND GENE...
 PHYLOGENETIC IMPLICATIONS
 DYNAMIC GENOMES
 LITERATURE CITED
 
Gross genome structure
All angiosperms contain relatively complex nuclear genomes with genes scattered across multiple chromosomes. Other plants are less well characterized, but their large genome sizes (see Bennett and Leitch, 2003 , for a comprehensive presentation of plant genome sizes) indicate that most nonflowering plant genomes are equally complex. In even the smallest genomes, like Arabidopsis (about 140 Mb), more than 20% of the DNA is composed of various repetitive elements (Arabidopsis Genome Initiative, 2000 ). These repeats include transposable elements and various types of simple tandem repeats, including satellite DNA and simple sequence repeats (SSRs). In most angiosperms, transposable elements, especially long terminal repeat (LTR) retrotransposons, comprise the vast majority of this repetitive DNA (reviewed in Bennetzen, 2002a ).

Chromosome numbers are highly variable in the flowering plants and do not generally relate to overall genome size. For instance, the ~440-Mb haploid rice genome is distributed across 12 chromosomes, whereas the ~4900-Mb haploid barley genome is present on only seven chromosomes. Hence, the average barley chromosome is almost twice as large as the entire rice genome.

Many plant nuclear genomes also contain accessory chromosomes, often known as B chromosomes. These highly condensed chromosomes are usually small and largely or completely devoid of functional genes. In maize, pollen grains that carry B chromosomes exhibit an advantage in fertilization, thereby increasing the chance that these essentially selfish chromosomes will persist in subsequent generations (reviewed in Carlson, 1986 ). Within a single species, individuals may have anywhere from zero to several of these accessory chromosomes, thereby altering nuclear chromosome number and genome size but having little other effect upon the biology of the organism other than increasing overall rates of nondisjunction.

Despite these dramatic differences in size and number, all seed plants appear to have fairly similar general organizations of their chromosomes. Most angiosperm chromosomes have centromeric regions that are necessary for efficient chromosome segregation. These regions show extensive chromatin condensation and are flanked by large regions of additional heterochromatin that is enriched for tandem repeats and transposable elements. In large genomes like barley and wheat, these heterochromatic pericentromeric regions can make up more than 50% of the physical length of the chromosome. Other heterochromatic regions (often called knobs) are also found in all seed plant genomes, even small ones like Arabidopsis, and they are similarly enriched for tandem repeats and transposable elements (Ananiev et al., 1998 ; Cold Spring Harbor et al., 2000 ). The ends of all seed plant chromosomes studied to date contain short tandem repeats that are presumably added by the enzyme telomerase (Richards and Ausubel, 1988 ). In fact, studies by McClintock (1932) in maize first demonstrated the existence of special structures/properties (now called telomeres) at the ends of eukaryotic linear chromosomes that were necessary for their protection from progressive degradation and/or end fusion.

Although centromeric heterochromatin and knobs are shared by all seed plant genomes, their sizes and locations show extreme interspecies and sometimes intraspecies variation. Smaller genomes always have fewer and smaller regions of heterochromatin than larger genomes, both in plants and in animals. The euchromatic component of plant chromosomes can also be highly variable in size and distribution. Euchromatin, less condensed than heterochromatin, appears to contain most of the genes and most of the meiotic recombination in all multicellular eukaryotes, including plants. However, all euchromatin is not likely to be qualitatively or quantitatively consistent between different chromosomal regions, even within a species. For instance, some euchromatic regions in the wheat genome appear to have a significantly higher gene density and similarly higher recombination frequency than other euchromatic regions (Gill et al., 1996a , b ).

Plant nuclear genomes exist in an organelle that has significant three-dimensional structure. We know almost nothing about the nature or importance of the three dimensional folding of chromatin in the interphase plant nucleus. Bennett (1987 , 1988 ) and Heslop-Harrison and Bennett (1990) argued that the arrangement of chromosomes in the nucleus is highly structured. They show that in hybrids the two parental chromosomes occupy different regions of the nucleus and that these patterns are consistent for any given pair of parents. However, this work has not been extended to other species beyond the few grasses they investigated. Research in animals indicates that specific and highly dynamic chromatin arrangements are formed and that these arrangements may be unique to a tissue type, developmental time, or stage in the cell cycle (reviewed in Belmont, 2003 ). These patterns of three-dimensional structure presumably evolved as important components of regulated gene expression, nuclear packaging, and/or chromosomal mechanics. In plants, the conservation of one predicted genome structure component (matrix attachment regions, or MARS) at specific locations in orthologous genes has been observed (Avramova et al., 1998 ; Tikhonov et al., 2000 ). These results indicate that the regulated folding of interphase chromatin is an important component of eukaryotic genomes deserving additional investigation. In summary, plant nuclear genomes are conserved in overall structure, all containing multiple chromosomes with centromeres, telomeres, heterochromatic blocks, and euchromatic regions. The locations and sizes of the centromeres and various chromatin types are exceedingly variable between species and can vary within species. Chromosome number, chromosome size, and overall genome size are highly variable between species, sometimes even within the same genus, but tend to be conserved within a species.

Local genome structure
Over the last eight years, genome sequence analysis in plants has shifted from studies of single genes in isolation to detailed studies of larger chromosomal regions, including whole genomes. These studies have shown that plant genes are relatively compact and often clustered, even in large genomes. Plant introns are usually small, averaging less than 200 bp, so that the average transcribed portion of a gene is less than 2.5 kb (Arabidopsis Genome Initiative, 2000 ). Upstream and downstream regulatory elements are usually small as well, amounting to no more than a few hundred additional bases in most genes. There are exceptions, however, in which a regulatory element can be more than 90 kb upstream of a locus (Stam et al., 2002 ). However, regulation at a distance (common in many animal genes) appears to be rare in plants, so that the average gene plus its regulatory components will normally occupy only about 1–5 kb of genomic space.

Genes can be tightly juxtaposed in plants and are often within a few hundred base pairs of each other in plants with small genomes like Arabidopsis and rice (Arabidopsis Genome Initiative, 2000 ). Even in plants with large genomes, like maize (~2400 Mb) and barley, gene clusters can be found (Rahman et al., 1997 ; Llaca and Messing, 1998 ; Feuillet and Keller, 1999 ; Fu et al., 2001 ). Within these gene clusters, gene density approaches one gene per 5 kb, close to the average value of one gene per 4.5 kb for the sequenced portion of the Arabidopsis genome (Arabidopsis Genome Initiative, 2000 ). However, clusters with numerous adjacent genes appear to be relatively rare in large genome plants. More commonly, ongoing sequencing projects have found mostly clusters of one to three genes in islands surrounded by seas of repetitive DNA. These intergenic repetitive DNA blocks are composed primarily of intact and fragmented LTR-retrotransposons, often arranged in a nested structure of elements inserted within elements (SanMiguel et al., 1996 ; reviewed in Bennetzen, 2000 ). In maize, as in all other plants investigated, repetitive DNA blocks are heavily methylated at the cytosines in 5'-CG-3' and 5'-CNG-3' sequences (Bennetzen et al., 1994 ; Kashkush et al., 2002 , 2003 ). Most of these hypermethylated repeat blocks extend for 5–150 kb in maize, making them a significant impediment to physical mapping, genome sequencing, and map-based gene isolation (Bennetzen et al., 1994 ).

In summary, local genome structure in seed plants exhibits significant general similarity across all studied species. In nuclear genomes with a low repeat content, most genes are found near one another. In larger genomes, repeats are often inserted between genes, although small repeats like miniature inverted repeat transposable elements (MITEs) can commonly be found in the introns, promoters, or 3' trailer sequences of genes (Wessler et al., 1995 ). Gene islands in large genome plants often resemble the genic regions of small genome plants, although the large repeat blocks in large genome plants usually separate islands with only a few (one to three) genes.


    GENOMIC COLINEARITY IN PLANTS
 TOP
 ABSTRACT
 INTRODUCTION
 STRUCTURE OF SEED PLANT...
 GENOMIC COLINEARITY IN PLANTS
 CHROMOSOMAL REARRANGEMENT
 GENOME SIZE VARIATION
 POLYPLOIDY
 GENE FAMILIES AND GENE...
 PHYLOGENETIC IMPLICATIONS
 DYNAMIC GENOMES
 LITERATURE CITED
 
The use of DNA markers as probes for recombinational mapping allowed the first comprehensive comparisons of gene composition and order between species. The DNA probes used for most early mapping studies were restriction fragment length polymorphism (RFLP) markers, and their required low copy number meant that they were often fragments of genes. The first comparative mapping study in plants, a comparison of tomato and potato by the Tanksley laboratory, indicated excellent conservation of gene content and order across these genomes (Bonierbale et al., 1988 ). A more distant relative of tomato and potato, pepper, was later investigated by this same group. This study indicated numerous chromosomal rearrangements against a background of conserved gene content (Tanksley et al., 1988 ). The first comparative genetic mapping project in grasses, a brief study to test the utility of maize DNA probes to map sorghum and other cereal species (Hulbert et al., 1990 ), was followed by a flood of studies and syntheses showing that grasses could be studied as variants on a single experimental genome (Bennetzen and Freeling, 1993 ). One important synthesis demonstrated that all cereal species could be represented by a small number of gene linkage blocks, commonly shown in a comparative circle map (Moore et al., 1995 ; Gale and Devos, 1998 ). These maps indicated that only a few major chromosomal rearrangements differentiated the nuclear genomes of such distantly related grasses as rice and barley or maize and wheat (but see Gaut, 2002 ).

Fewer comparative mapping studies have been pursued in eudicots, and these have led to the popular perception that eudicots have more genomic rearrangements than do grasses. Many of the eudicot comparative mapping projects have focused on comparisons to Arabidopsis, an organism that has a history of extensive genomic instability (Blanc et al., 2000 ; Ku et al., 2000 ); this may give the impression that all eudicot genomes are highly rearranged. Among grasses, some lineages have also been unstable and would have served as poor reference genomes for repeated comparative studies. For instance, the close wheat relative Aegilops umbellulata has more gross chromosomal rearrangements relative to wheat than does barley, a much more distant relative (Zhang et al., 1998 ). More comparative analyses are needed among much more diverse plant genomes. It is likely that, as in the original eudicot studies (Bonierbale et al., 1988; Tanksley et al., 1988 ) and in most cereal comparisons (Gale and Devos, 1998 ), closely related species will usually show the fewest rearrangements. It will be important to determine which lineages show the greatest degrees of conservation in gene arrangement and whether grasses are in any way exceptional in their degree of genome conservation. Gaut (2001) developed a statistical method for assessing colinearity of genomes. In this, he evaluated whether runs of colinear genes could have occurred at random; the method also allowed him to incorporate an estimate of map error into the statistical test. Based on his reanalysis of the maize genome, he estimated 1.3–1.9 rearrangements per million years, a rate comparable to estimates of 1.4–2.8 rearrangements per million years for cotton (Brubaker et al., 1999 ) and rather higher than rates estimated for the same species by Lagercrantz (1998) . Gaut (2002) extended this approach and noted that the rate of rearrangement, measured as the probability of synteny between two markers, is roughly constant among grasses. Similar rates are reported for several Brassicaceae and Solanaceae, although the Brassica rapa/oleracea comparison is estimated as 2.5 per million years. This may be an artifact, however. A (relatively) small number of markers is available for cross-species comparisons, and the taxa investigated diverged millions or tens of millions of years ago. Burke et al. (2004) found a much higher rate of rearrangement among closely related Helianthus species that diverged less than a million years ago (5.5–7.5 per million years); their genome maps have a high number of comparable markers and thus detection of rearrangements may be easier. The rate of rearrangement in grasses and other plants may simply reflect the number that can be detected over long periods of time.

Comparisons between cereal genomes and the Arabidopsis genome were stimulated by the development of a detailed physical map and contiguous DNA sequence for Arabidopsis. An early comparative mapping study indicated that significant genetic colinearity had been conserved between Arabidopsis and rice over the more than 100 million years since the lineages that gave rise to these two species diverged (Paterson et al., 1996 ). However, subsequent studies demonstrated that adjacent genes in rice were often not adjacent in Arabidopsis (Bennetzen et al., 1998 ) or linked on the genetic map (Devos et al., 1999 ). With the completion of the sequences of highly extended regions of the Arabidopsis and rice genomes, more comprehensive studies have become feasible, demonstrating "scant collinearity in gene order" (Liu et al., 2001 ).

Comparative sequence analyses of small chromosomal segments, mostly in grasses, have shown that local gene content, order, and orientation are also conserved in close relatives (reviewed in Bennetzen and Ma, 2003 ). Even in distant comparisons, like Arabidopsis and rice, small segments of colinear genes are apparent (van Dodeweerd et al., 1999 ; Salse et al., 2002 ). The sequence conservation observed in comparisons between species that have evolved independently for 10 million years or more appears to be limited to genes. Some conserved noncoding sequences (CNSs) are found in grass genomes, but these short segments of similarity and possible homology are tightly linked to genes and may have gene regulatory roles (Kaplinsky et al., 2002 ; Guo and Moose; 2003 ; Hong et al., 2003 ). Moreover, the CNSs in plants are nowhere near as abundant or as large as the CNSs discovered by comparative sequence analysis in vertebrates (Thomas et al., 2003 ).

In summary, closely related plants have similar gene content and stretches of conserved gene order. This colinearity, often called synteny (which actually does not refer to order at all, but only to presence on orthologous chromosomes), diminishes as more distant relatives are compared. Some lineages, like Arabidopsis, appear to have exceptionally high frequencies of genomic rearrangement that greatly diminish their colinearity even with close relatives.


    CHROMOSOMAL REARRANGEMENT
 TOP
 ABSTRACT
 INTRODUCTION
 STRUCTURE OF SEED PLANT...
 GENOMIC COLINEARITY IN PLANTS
 CHROMOSOMAL REARRANGEMENT
 GENOME SIZE VARIATION
 POLYPLOIDY
 GENE FAMILIES AND GENE...
 PHYLOGENETIC IMPLICATIONS
 DYNAMIC GENOMES
 LITERATURE CITED
 
Cytogenetic studies long ago indicated that major chromosomal rearrangements can mark specific lineages of plants and other eukaryotes. Comparative recombinational maps using DNA markers were able to identify inversions, translocations, chromosome fission/fusion events, and chromosome duplications (both aneuploid and polyploid) that are specific to individual plant families, tribes, or species (Gale and Devos, 1998 ). Mechanisms of chromosome breakage and repair that might lead to such major changes have been documented in laboratory studies, but we do not know the precise cause of genome rearrangement in any naturally occurring population of plants. Major chromosomal rearrangements can have dramatic biological effects, particularly in the fertility of inversion and translocation heterozygotes, but they do appear to be relatively rare in most plant lineages. For instance, only four major chromosomal rearrangements differentiate the rice and foxtail millet lineages, despite more than 50 million years of independent evolutionary descent (Devos et al., 1998 ). On the other hand, Rieseberg et al. (1995 , 1996 , 2003 ) have shown that genome rearrangements can occur immediately upon hybridization, even at the diploid level. Furthermore, these rearrangements are reproducible in artificial hybrids. They have interpreted this to mean that selection favors particular combinations of genes and genomic arrangements.

In contrast to the rarity of large chromosomal rearrangements, small chromosomal rearrangements are incredibly abundant in plants. In Arabidopsis, some analyses have estimated that over 60% of the genes appear to have been rearranged (mostly deleted) since the last polyploidization occurred in this lineage (Vision et al., 2000 ). In comparisons of rice and sorghum, two highly diploidized species that diverged from a common ancestor 50–70 mya (Grass Phylogeny Working Group, 2001 ), it appears that approximately 20% of the genes have been rearranged (Bennetzen and Ma, 2003 ). Hence, small chromosome rearrangements involving genes are orders of magnitude more frequent in plants than are large chromosomal rearrangements. In approximately the same amount of geological time, rearrangement of genes within mammals is less than 1% (Dehal et al., 2001 ; Cooper et al., 2003 ; Thomas et al., 2003 ).

The population genetics of chromosomal rearrangement has been the subject of much theory, but relatively little data (reviewed in Rieseberg, 2001 ). In general, a chromosomal rearrangement appears to protect a part of the genome from gene flow, fixing a set of alleles simultaneously and preventing their breakup or loss by crossing back to progenitor plants. Individual chromosomal rearrangements may create only small reductions in plant fitness, at the same time as they contribute to reproductive isolation. Small population sizes and/or metapopulation structure may also permit fixation of mildly deleterious rearrangements.

Investigations of the sequences between genes show that much of this material is derived from transposable elements, even in small genome species like Arabidopsis and rice (Devos et al., 2002 ; Ma et al., 2004 ). Transposable elements are remarkably active in plants. Bursts of element activity can generate hundreds or thousands of heritable new element copies in a single plant generation. On average, most plant nuclear genomes accumulate several thousand new transposable elements per million years, most of them LTR-retrotransposons and MITEs. Recently, it has been shown that these elements and other nonessential DNA can be relatively quickly removed from plant genomes by illegitimate recombination (Devos et al., 2002 ; Ma et al., 2004 ). Deletions caused by illegitimate recombination are usually tiny, the vast majority being less than 100 bp (Bennetzen et al., in press ), but their continuous accumulation can rapidly remove a large part of the genome. Ma et al. (2004) conservatively estimated that LTR-retrotransposon sequences had a half life of less than 6 million years in rice and that various deletion processes had removed more than 190 Mb of LTR-retrotransposon sequence from the rice genome in the last 8 million years. Given this rapid dynamic of insertion and deletion, it is not surprising that the unselected sequences between genes are different in species that have diverged for more than a few million years. Figure 1 provides an idealized description of global and local processes that can enlarge or shrink a plant genome.



View larger version (8K):
[in this window]
[in a new window]
 
Fig. 1. Mechanisms responsible for genome size variation in plants. The uppermost line indicates an idealized segment of plant nuclear DNA, with horizontal arrows indicating the direction of transcription, location, and size of genes. The genes are numbered sequentially left to right, except for a pair of tandemly duplicated genes that are designated 3–1 and 3–2. A. Genome expansion. The left figure shows the outcome of polyploidy, in which the nucleus now contains twice as many copies of similar genetic regions. The right figure shows multistep growth in genome size by insertion of transposons between genes. B. Genome shrinkage. The left figure shows a progressive (probably random) deletion of genomic DNA, including some genes. These gene losses appear to be tolerated because the organism still retains at least one copy of each gene (Ilic et al., 2003 ). The right figure shows progressive deletion of nongenic sequences by an accumulation of small deletion events (Bennetzen et al., in press ). Of course, in any genome, both the leftward and rightward processes could be ongoing simultaneously, and there will be a dynamic competition between concurrent expansion and contraction processes. Black boxes indicate repetitive DNA inserted between genes, commonly LTR-retrotransposons

 
Gene movement is significantly less rapid than intergenic sequence change, but it is still relatively frequent in plants. All types of rearrangements are observed, including small inversions, deletions, duplications, and long-distance movements to new chromosomal locations. The high rate of this last class of rearrangement is especially unexpected. In a comparison of orthologous adh regions of rice, sorghum, and maize, for instance, four genes were inserted into the adh region in the ancestral lineage that gave rise to sorghum. One of these was a two-gene insertion that occurred prior to the divergence of maize and sorghum ancestors. The other two were insertions of a single gene from unlinked chromosomal locations into adjacent positions after the divergence of maize and sorghum lineages (Ilic et al., 2003 ). A mechanism for the movement of these small gene-bearing DNA fragments has not been documented, but unequal homologous recombination or double-strand break repair (reviewed in Gorbunova and Levy, 1999 ) are likely candidates. We also do not know the origins of specific events that caused small inversions, gene deletions, or gene duplications in plants, but unequal recombination is a candidate. For a single-gene inversion found near the barley Vrn1 homologue, Ramakrishna and coworkers observed two highly degenerate flanking transposable elements of the same family in inverted orientation (Ramakrishna et al., 2002 ). Intrastrand unequal recombination between these elements would have led to just such an inversion. Similarly, unequal recombination between flanking direct repeats or directly repeated gene family members can give rise to the frequent gene duplications and deletions found for all genes, but especially in tandem gene families. Figure 2 provides a summary of the events that can alter gene order and content in specific small regions of a plant genome.



View larger version (8K):
[in this window]
[in a new window]
 
Fig. 2. Types of gene movement observed in plants. The topmost line shows the same idealized segment of plant nuclear DNA as in the same position of Fig. 1 . A. Inversion of a two gene segment (genes 2 and 3–1). B. Deletion of a multigene segment (genes 3–1 and 3–2). C. Growth in tandem gene family number by unequal homologous recombination. D. Decrease in tandem gene family number by unequal homologous recombination, creating a single gene from parts of genes 3–1 and 3–2. This is the reciprocal outcome of the event described in panel C. E. Movement of a new gene (gene 7) into the region from an unlinked chromosomal location

 
Now that we know that local genome rearrangement is an aggressive and ongoing process in plants, it becomes clear that we need to know much more about it. What are the precise mechanisms that dominate genome rearrangement in any given genome context? What percentage of initial events are fixed? Are there periodicities in the generation of rearrangements, and might these be caused by particular environmental inputs? For all of these questions, are there lineage-specific differences in the answers? Last, and perhaps most important, what effects do these local rearrangements have on gene and genome function? The movement of adh1 orthologues to different chromosomes and repetitive DNA contexts in rice vs. maize does not seem to have altered the tissue specificity, developmental timing, or inducible properties of this gene (Ilic et al., 2003 ). This same lack of sensitivity to chromosomal environment has been observed for the majority of other plant genes, indicating that chromosomal rearrangement has surprisingly little influence upon gene expression. Much of this resistance to position effect in plants is probably epigenetic, as shown by the sudden transcriptional alterations that can occur after epigenetic balance is disturbed (Kashkush et al., 2003 ). Moreover, so many rearrangements occur that it seems likely that even a low frequency of biological significance to individual events will eventually be swamped by rare events that affect gene function. A prime example of this is the insertion of a transposable element into a promoter region, thereby providing the raw material for the evolution of new regulatory properties (reviewed in Wessler et al., 1995 ).

In summary, local genomic rearrangement is a continuous and highly active process in seed plant genomes. Mechanisms for these events are known or suspected. However, we need much more information to determine the relative frequencies of different classes of rearrangement, whether they differ between plant lineages, and the biological outcomes of these rearrangements.


    GENOME SIZE VARIATION
 TOP
 ABSTRACT
 INTRODUCTION
 STRUCTURE OF SEED PLANT...
 GENOMIC COLINEARITY IN PLANTS
 CHROMOSOMAL REARRANGEMENT
 GENOME SIZE VARIATION
 POLYPLOIDY
 GENE FAMILIES AND GENE...
 PHYLOGENETIC IMPLICATIONS
 DYNAMIC GENOMES
 LITERATURE CITED
 
Flowering plants have an impressive range of nuclear genome sizes, including some species with less than 50 Mb of DNA per haploid nucleus and others with more than 85 000 Mb (Bennett and Leitch, 2003 ). Some of this variation originates in the frequent formation of polyploids in flowering plant lineages (Wendel, 2000 ), but most appears to be caused by variations in the amount of LTR-retrotransposon DNA in specific genomes (reviewed in Bennetzen, 2002a ). In all plant species examined, the vast majority of intact LTR-retrotransposons are relatively young (less than 5 million years since insertion; Bennetzen et al., in press ). The youth of these elements is misleading, however, because their ubiquity and the great divergence of active elements indicate that their origin only slightly postdates the origin of eukaryotes. Although recent bursts of LTR-retrotransposon activity are both likely and documented (Vitte and Panaud, 2003 ), the simplest explanation for the absence of intact ancient elements is that they have been progressively deleted (Devos et al., 2002 ; Wicker et al., 2003 ; Ma et al., 2004 ). These mechanisms can account for the current observed differences in plant genomes without any recourse to a possible selective advantage of particular plant genome sizes, although this conclusion does not negate the possibility that selection on nuclear DNA content can be a major factor in many instances (Bennetzen et al., in press ).

Future studies will need to determine whether differences in genome size in any particular lineage are caused by an unusually low or high rate of transposable element amplification or by differences in the mechanisms of transposable element removal. Petrov and coworkers demonstrated that the frequency and size of DNA deletions in LINE retroelements was greater in the small genome of the insect Drosophila melanogaster than in large genomes of the insect genus Laupala (Petrov et al., 2000 ). Similarly, Kirik and colleagues (2000) have shown that double-strand breaks are more commonly resolved with insertions and less commonly with deletions in tobacco, a plant with a relatively large genome, than in Arabidopsis. Hence, it is likely that different organismal lineages will exhibit different rates and modes of DNA removal, thus providing one contribution to differences in current genome sizes.

Lynch and Conery (2003) , working with a large sample of prokaryotes and eukaryotes, showed clearly that genome size is negatively correlated with the parameter Neµ, which is the product of the effective population size and the mutation rate per nucleotide. Because the range of mutation rates is rather narrow, variation in this parameter generally reflects variation in population size. Small population sizes correlate with large genomes. Vinogradov (2003) found a similar result in angiosperms, by comparing genome size in rare angiosperms with that of more common species. On average, rare plants had larger genomes. There was no correlation, however, with life history, a result also found for species of Hordeum (Poaceae; Jakob et al., 2004 ). In some clades of Hordeum, annual species had smaller genomes than perennials, whereas the relationship was reversed in other clades.

Soltis et al. (2003) have examined variation in genome size among the angiosperms and have found that, as expected, genome size varies appreciably when mapped on the phylogenetic tree. They estimated the ancestral size as "very small" (<1.4 pg per 1C nucleus) and noted that very large genomes appear in only a few clades (Santalales, Asparagales, and Liliales). They also observed increases and decreases of genome size over evolutionary time. This inference of fluctuation is expected in part because they used both parsimony and squared-change parsimony to estimate ancestral states. Both methods assume that increasing and decreasing values are equally likely, the latter giving the equivalent of a Bayesian reconstruction that assumes a Brownian motion model of evolution of the character (Maddison, 1991 ) and that Brownian motion is "infinitely jiggly" (Felsenstein, 2004 , p. 392). Other models of character change (e.g., those listed by Felsenstein, 1988 ) might give a different result. For example, Bennetzen and Kellogg (1997) reconstructed ancestral genome sizes under both the Brownian motion model and a model in which genomes could only get bigger. Not surprisingly, the reconstructions were sensitive to the underlying model of character state change.

Figure 3 illustrates the evolution of genome size among diploid grasses. Sampling is more comprehensive than in our previous study (Bennetzen and Kellogg, 1997 ) but remains heavily biased toward subfamily Pooideae (all taxa derived from the common ancestor of Nardus and Aegilops). We retrieved 1C values for all diploid grasses and outgroups from Cyperaceae and Juncaceae from the Angiosperm C-values Database (Bennett and Leitch, 2003 ). Lygeum spartum (2n = 40) and Deschampsia antarctica (2n = 26) are listed as diploids but almost certainly are polyploid based on their chromosome numbers; they were therefore excluded. For monophyletic genera with more than one species, we calculated the average 1C value. If, however, more than one chromosome base number was present, we calculated the average value for each base number separately. This assumes that each chromosomal group within a genus is monophyletic, which is in fact unlikely. However, phylogenetic trees are not available for many of the individual genera in the tree. Averaging chromosome numbers within a genus therefore seemed a reasonable compromise between illustrating variation vs. losing information. Because Festuca is paraphyletic and contains Lolium, we used an average value for Lolium, but then treated each clade of Festuca separately following the molecular cladogram presented by Catalán et al. (2004) .



View larger version (39K):
[in this window]
[in a new window]
 
Fig. 3. A hypothesis of the history of diploid grasses showing the evolution of genome size (1C). Data on genome size are from the Angiosperm C-values Database (Bennett and Leitch, 2003 ). Relationships among the major clades are based on the Grass Phylogeny Working Group (2001) tree. The cladogram for subfamily Pooideae follows Soreng and Davis (2000) ; cladogram of Festuca and Lolium follows Torrecilla and Catalán (2002) and Catalán et al. (2004) ; cladogram of Triticeae follows Kellogg et al. (1996) . Base chromosome numbers are indicated in parentheses. Number of species = the number of species for which genome size estimates were included in the average generic value; v = voucher available

 
We optimized 1C value for diploid genomes using squared change parsimony in MacClade 4.0 (Maddison and Maddison, 2000 ). Under this optimization, we estimated an ancestral 1C genome size about 1.3 pg of DNA, slightly smaller than our previous estimate (ca. 1.9) and just below the boundary between "very small" and "small" as defined by Soltis et al. (2003) . Genome size is inferred to increase at the base of Pooideae and also at the base of core Pooideae (after the divergence of Nardus). This latter increase correlates with the origin of a chromosome base number of x = 7, which requires the combination of several chromosomes and loss of multiple centromeres. Consistent with this genomic reorganization, comparative maps find novel genome arrangements (see Kellogg, 1998 ). The largest genomes in grasses are found in Triticeae, notably in Psathyrostachys (average 8.38) and Secale (average 7.8).

Under this simple model of evolution, however, genome size apparently decreases in multiple lineages. The genus Phleum seems to have an unusually small genome, as do Corynephorus, Holcus, the two annual species of Poa (infirma and supina), and the x = 7 species of Phalaris. Zingeria biebersteiniana, with x = 2, also has a very small genome, but it has not been placed phylogenetically so is not included in this tree. Morphological similarity would place it near Avena.

A more sophisticated model of evolutionary change might lead to different conclusions about the exact size of the ancestral genome and the relative frequency of genome expansion or contraction. Such optimization is also sensitive to taxon sampling; because there are no genome size estimates for any grass lineages that diverged before the common ancestor of maize and rice, nor for any of the immediate outgroups, we expect that the details of our estimates here are subject to change. However, the current data and optimization are sufficient to demonstrate that genome size is labile.

It is also interesting to note that base chromosome number does not correlate precisely with size. A base number of 7 is ancestral and synapomorphic for core Pooideae, but there have been reductions to x = 4 in Milium vernale, 6 in some species of Phalaris, and 5 in Briza minor. Whereas Briza minor has a smaller genome than its x = 7 congeners (2.9 vs. an average of 5.7), the estimate for Milium is higher than that for Phleum, and genomes of the x = 6 Phalaris are on average bigger than for the x = 7 species. Similarly, Sarga versicolor (Andropogoneae; x = 5) has a 1C value almost three times that of Sorghum bicolor and twice that of Vetiveria.

In Fig. 4, we present a more detailed view of tribe Triticeae, which includes the largest known genomes in the grasses. We also include the polyploid species for which genome sizes are known. The phylogenetic relationships of the diploid Triticeae are not clear; every gene investigated appears to have had a different history (Kellogg et al., 1996 ; Mason-Gamer et al., 1998 ; Mason-Gamer, 2004 , and references therein). Accordingly, we have used the cladogram for the plastid genome (Mason-Gamer et al., 2002 ) as one of several possible inferences of the history of the group. The figure shows that even at this level of analysis, genome size is variable. Also, the sizes of the genomes of the polyploids are often, but not always, smaller than the sum of those of their diploid progenitors. A similar result was found by Jakob et al. (2004) in their detailed analysis of the evolution of genome size in Hordeum. The occasional reduction of genome size in polyploids is consistent with the hypothesis that polyploidy may be followed by loss of genetic material (discussed later). This also extends the observation of Levy and Feldman (2002) on grasses in general, in which they observed that the average genome size of polyploids was less then twice the average for the diploids.



View larger version (51K):
[in this window]
[in a new window]
 
Fig. 4. One estimate of the phylogeny of Triticeae showing the evolution of genome size (1C) among diploid, tetraploid, and hexaploid members of the tribe. Relationships among species of Hordeum are based on data from Jakob et al. (2004) , which should be consulted for a much more comprehensive study of genome size evolution in that genus; relationships among major clades of Triticeae are based on the plastid tree, as estimated by Mason-Gamer et al. (2002) ; these relationships are different from those shown by 5S RNA spacers, the ITS, and waxy, which are all different from each other. Several polytomies have been resolved in a way that is compatible with one or more gene trees, but results should be considered illustrative

 
In summary, we now know that size variation in plant nuclear genomes is mechanistically driven by polyploidy, transposable element amplification, and recurrent DNA deletion. Because closely related plant lineages can differ dramatically in both genome size and trends in genome size (see later), it is likely that some or all of these mechanisms vary in their intensity in different lineages.

The role of natural selection in determining genome size is unknown. As we come to understand better the mechanisms that control genome size, we may be able to develop clear testable hypotheses. Previous authors (e.g., Cavalier-Smith, 1985 ) have suggested that selection operates on nuclear volume and that this indirectly affects genome size. Data bearing on this hypothesis are mostly correlational rather than experimental, but the correlations are far from perfect (see multiple examples given by John and Miklos, 1988 ), and attempts to link genome size and phenotypic characters have not always been successful (e.g., Bachmann et al., 1985 ). Recent data on genome structure point to many more levels at which selection might act. For example, selection on the ability of the plant to limit transposon activity might keep genomes from expanding indefinitely. Selection could also act on the epigenetic mechanisms that silence genes and genomes in polyploids. Until we know more details of how such mechanisms work, it is difficult to devise a test (experimental or statistical) of the hypothesis of selection.


    POLYPLOIDY
 TOP
 ABSTRACT
 INTRODUCTION
 STRUCTURE OF SEED PLANT...
 GENOMIC COLINEARITY IN PLANTS
 CHROMOSOMAL REARRANGEMENT
 GENOME SIZE VARIATION
 POLYPLOIDY
 GENE FAMILIES AND GENE...
 PHYLOGENETIC IMPLICATIONS
 DYNAMIC GENOMES
 LITERATURE CITED
 
A major and well-known form of genomic change is the duplication of entire genomes or polyploidy. Polyploidy has been the subject of several recent reviews (Soltis and Soltis, 2000 ; Wendel, 2000 ; Levy and Feldman, 2002 ; Cronn and Wendel, 2003 ), and interested readers should consult those papers for a full exploration of the subject. These papers together cited evidence that polyploids occur in all major plant groups and that repeated origins of polyploidy are common (Soltis and Soltis, 1993 , 1999 ; Soltis et al., 1992 ).

Genome sizes of allopolyploids are not necessarily arithmetic sums of the sizes of the parental genomes. Genetic material is lost and genomes are rearranged. For example, Triticum dicoccoides is an amphidiploid resulting from a cross between a progenitor similar to Triticum urartu (with the A genome) and one similar to Aegilops speltoides (with a genome similar to the B genome). Belyayev et al. (2000) showed that genomic probes made from Aegilops speltoides hybridize strongly to large portions of the B genome of T. dicoccoides, but also to dispersed locations in the A genome, suggesting that repetitive sequences from the B genome have been preferentially amplified in the A genome or have moved from one location in the genome to another. Other examples are cited by Wendel (2000) .

Like most other eukaryotes, plants undergo cycles of polyploidization, followed by diploidization, the latter characterized by gene loss and/or pseudogene formation. Even such "good" diploids as Arabidopsis and rice are now known to be ancient polyploids or at least to have undergone extensive segmental duplication (discussed see next). Thus previous attempts to estimate the percentage of polyploid species among angiosperms (Stebbins, 1971 ; Masterson, 1994 ) are now seen to be oversimplified; many "diploid" species are paleopolyploid. It appears that 100% of flowering plants are current polyploids or have a polyploid history. In the next few paragraphs, we cite examples from major plant families, in which well-known and well-documented duplication events are superimposed on more ancient duplications.

Polyploidy in Poaceae
Duplication at the base of the family
The grasses all apparently share a number of ancient duplications of at least parts of their genomes. Draft genome sequences have been published for rice (Goff et al., 2002 ; Yu et al., 2002 ), but the genome has only recently been assembled into a pseudomolecule and is still being annotated. It has thus been difficult to check on earlier suggestions of genome duplications. Recently, Vandepoele et al. (2003) have produced genomic scaffolds for the rice genome and used these to estimate the number and extent of duplicated blocks. Only 15% of the rice genome falls into identifiable duplicated blocks, which is appreciably less than the 60% estimate for Arabidopsis. Furthermore, these duplications involve only a few of the chromosomes. Finally, a plot of the percentage of duplicated genes against the number of substitutions per silent site did not produce an obvious peak, as would be expected if many of the genes in the genome had been duplicated around the same time. Nonetheless, most of the duplications preceded the diversification of the cereal crops, indicating that they occurred before the origin of the grasses.

The maize duplication
Polyploidy of maize was suggested by Edgar Anderson in 1945 and has been documented by Rhoades (1951) , Helentjaris et al. (1988) , and Wendel et al. (1989) , among others. Anderson speculated that the ancestors might be five-chromosome species of Sorghum and Coix, ignoring the obvious difficulty that five-chromosome Sorghum (now placed in the genus Sarga; Spangler, 2003 ) is native to Australia and Africa and Coix to India (Clayton and Renvoize, 1986 ). Anderson's hypothesis predicts that genes from maize should be sister to genes from one or more of the x = 5 Andropogoneae, but instead all analyses to date placed Zea sister to Tripsacum (e.g., waxy, Mason-Gamer et al., 1998 ; phytochrome B, Mathews et al., 2002 ; teosinte branched 1, Lukens and Doebley, 2001 ; ndhF, Spangler et al., 1999 ; S. Kleweis, S. Malcomber, and E. A. Kellogg, University of Missouri-St. Louis, unpublished data). The sister group of these two is not well supported in any estimate of phylogeny to date, but no study has linked the Zea-Tripsacum clade with Sarga (= Australian "sorghum") or with Coix (Mason-Gamer et al., 1998 ; Spangler et al., 1999 ; Giussani et al., 2001 ; Mathews et al., 2002 ; Aliscioni et al., 2003 ; S. Kleweis, S. Malcomber, and E. A. Kellogg, University of Missouri-St. Louis, unpublished data).

Gaut and Doebley (1997) , in a much cited paper, inferred that maize is a segmental allotetraploid. Their hypothesis predicts that trees of orthologous genes should produce one of two alternative patterns—either (maize [maize, sorghum]) or ([maize, maize] sorghum) ([M {M, S}] or [{M, M} S])—and that linked genes should share the same pattern. Gaut and Doebley (1997) could not undertake such a comparison because of the lack of corresponding sequence from sorghum and rice. Recently, however, Swigonova et al. (in press) investigated six pairs of maize genes and their orthologues from sorghum and rice. Phylogenetic analyses are equivocal. Only two of the genes (r1/b1; grf1/grf2) strongly supported the (M [M, S]) tree, and one (orp1/orp2) strongly supported the alternative. The remaining genes produced equivocal trees that were not significantly different from a trichotomy, indicating that the ancestor of the two maize genomes arose about the same time as their divergence from the ancestor of sorghum (about 11 million years ago), consistent with the rapid radiation of the tribe. Wilson et al. (1999) also argued for the ([M M] S) tree, but hypothesized that the maize ancestors had x = 8, a number that would be highly unusual for a panicoid grass.

Fertilization independent endosperm (fie) genes in maize showed that fie2 was more closely related to sorghum than either gene is to fie1 supporting the (M [M, S]) tree (Danilevskaya et al., 2003 ). However, Swigonova et al. (in press) have shown that fie1 and fie2 in maize are not orthologous, despite being on duplicated segments of the genome. Instead, each region originally contained two paralogous genes; one copy was lost from chromosome 4 and the other from chromosome 10. The lack of orthology may also help explain the unusually long branch leading to Zmfie1 in the figure in Danilevskaya et al. (2003) .

Duplications in other groups of grasses
Grass chromosome numbers have been studied in detail since the synthesis provided by Avdulov (1931) . Comparative genome mapping efforts have expanded on his observations. Core Pooideae are marked by having their genes arranged in seven large chromosomes; in addition, all members studied so far have one chromosome that corresponds to a combination of rice linkage groups 5 and 10 and another that corresponds to a novel combination of parts of rice 4 and 7 (Kellogg, 1998 ). Panicoideae are divided into three major clades (Giussani et al., 2001 ), corresponding to chromosome base numbers of x = 10 (Andropogoneae and Paspaleae) and x = 9 (Paniceae s.s.). All studied panicoids share a chromosome that corresponds to rice chromosomes 3 and 10 and another that corresponds to rice 7 and 9.

The woody bamboos (tribe Bambuseae) are almost all polyploid, with the exception of Chusquea talamancensis and possibly C. subtesselata (Judziewicz et al., 1999 ). It is tempting to speculate that the many morphological novelties of the group were generated by major changes in genome structure and gene expression following polyploidy. This hypothesis predicts that the two diploid species are derived rather than ancestral and that the woody bamboos all have two full copies of a rice-like genome. Because of the enormous difficulty of doing genetic studies on woody bamboos, such investigations are a long way off.

Zizania, in tribe Oryzeae, shows that duplications need not encompass the entire genome. Zizania aquatica (North American wild-rice), with 15 chromosomes, has 14 chromosomes that are colinear with 11 of the 12 chromosomes of Oryza sativa and three chromosomes that are apparently duplicates of rice chromosomes 1, 4, and 9 (Kennard et al., 1999 ). Zizania is more closely related to Oryza than to any other mapped cereal (Ge et al., 2002 ). Nonetheless, there have been a number of rearrangements, all of which appear to involve duplicated loci.

Polyploidy in Brassicaceae
Recent studies in Brassicaceae have been ably reviewed by Koch et al. (2003) . Here we summarize a few of the major new findings.

Duplication at the base of the family
The whole genome sequence of Arabidopsis made it possible to analyze the patterns of gene duplication across this supposedly compact diploid genome. Surprisingly, it became clear that the genome contained extensive duplicated blocks of sequence (Arabidopsis Genome Initiative, 2000 ; Blanc et al., 2000 ; Paterson et al., 2000 ). Initial attempts to date this duplication using molecular-clock estimates indicated four rounds of genome duplication (Vision et al., 2000 ). Subsequent analyses have verified the most recent duplication event, although the estimated date is now thought to be more recent (Blanc et al., 2003 ). At least one and possibly two more ancient duplications can be demonstrated (Simillion et al., 2002 ). However, a more powerful approach uses phylogenetic events to provide relative dates (Bowers et al., 2003 ; Ermolaeva et al., 2003 ). The phylogenetic analyses show that the duplication preceded the divergence of Brassica from Arabidopsis. Because the Brassica and Arabidopsis lineages diverged soon after the origin of Brassicaceae (M. Beilstein, University of Missouri, St. Louis, unpublished data), we infer that the "Arabidopsis" duplication is actually a Brassicaceae duplication.

Comparisons of the degenerated "homoeologous" genomes in Arabidopsis thaliana have been informative (Blanc et al., 2000 ; Ku et al., 2000 ; Arabidopsis Genome Initiative, 2000 ). However, these homoeologous genomes are highly rearranged within Arabidopsis, partly because the multiple rounds of polyploidy were all ancient events and partly because the polyploid state apparently removed constraints against extensive gene loss. Hence, the current status of the Arabidopsis genome indicates rearrangements within rearrangements within rearrangements, making it difficult to sort out the nature, timing, and mechanisms of individual events.

Arabidopsis
Arabidopsis, as currently delimited, includes nine species and five subspecies (O'Kane and Al-Shehbaz, 1997 ). One species, Arabidopsis suecica, is clearly a recently formed polyploid, the product of a naturally occurring cross between A. thaliana and A. arenosa (O'Kane et al., 1996 ). In addition, A. thaliana has been crossed with the related A. lyrata ssp. petraea, and the resulting amphidiploid shows some promise as a tool for understanding the mechanisms and immediate results of amphidiploidy (Nasrallah et al., 2000 ).

Phylogenetic studies (e.g., Koch et al., 2001 ) have shown that Arabidopsis falls into the same major clade as shepherd's purse, Capsella, a much closer relationship than previously thought (although the relationship was suggested by Brummitt, 1992 ). Consistent with this close phylogenetic relationship, Rossberg et al. (2001) found that a 27-kb region of the Capsella rubella genome is perfectly colinear with a 31.5-kb region of the Arabidopsis thaliana genome. The region includes five genes in the same orientation in both species.

Brassica
The genome of Brassica oleracea is extensively rearranged relative to that of Arabidopsis, complicating efforts to compare the order of genes and infer ancient patterns of genome duplication (Lukens et al., 2003 , and references therein). Even diploid brassicas appear to be duplicated and/or even triplicated, although the evidence for the latter is not clear.

Neo-polyploidy among species of Brassica is well documented, and most introductory students learn about the triangle of U (1935) . Mapping studies in Brassica napus, a naturally occurring amphidiploid of B. oleracea x B. rapa, have indicated that the genome of the polyploid is colinear with that of the ancestral diploids; no evidence was found for genome rearrangement (Parkin and Lydiate, 1997 ). A similar result was found for Brassica juncea, the amphiploid product of B. rapa and B. nigra (Axelsson et al., 2000 ). These results indicate that polyploidy does not necessarily have a destabilizing effect on genome structure in all plant species.

Polyploidy in Rosaceae (Maloideae)
The rose family is divided into four subfamilies, distinguished by their floral morphology and chromosome number. Maloideae include such economically important species as pears and apples, as well as many other less familiar fruits and ornamentals (e.g., shadbushes, hawthorns). Most species of Maloideae have a base chromosome number of 17, which was suggested (Sax, 1931 , 1932 , 1933 ) to be an ancient polyploid based on a cross between a member of subfamily Amygdaloideae (cherries and apricots; x = 8) and subfamily Spiraeoideae (bridalwreath; most with x = 9). The argument was principally arithmetic: 8 + 9 = 17. Molecular phylogenetic studies using maternally inherited plastid genes showed that Amygdaloideae are not sister to Maloideae but could not rule out the possibility of wide hybridization.

Using DNA sequences of a low copy nuclear gene (granule bound starch synthase I, GBSSI) Evans and Campbell (2002) have now shown convincingly that Maloideae originated from an ancestor with x = 9. They found two copies of GBSSI in most Rosaceae and four in Maloideae, consistent with the allopolyploid hypothesis. However, sequences from the diploid genus Gillenia (x = 9) were sister to the GBSSI clades of Maloideae, indicating that the ancestor of the maloids was probably spiraeoid. From the phylogenetic data, plus additional morphological similarities among the early-divergent maloids, Evans and Campbell concluded that the ancestral maloid must have had x = 18. Chromosome number would have reduced from x = 18 to x = 17 via dysploidy, with two chromosomes fusing to become one. As genome maps become available for more Rosaceae, it will be interesting to see if