|
|
||||||||
Systematics and Phytogeography |
2Evolutionary Functional Genomics, Department of Evolution, Genomics and Systematics, Uppsala University, Norbyvägen 18 D, SE-75236 Uppsala, Sweden; 3Department of Cell and Organism Biology, Lund University, Solvegatan 29, SE-22362 Lund, Sweden; and 4Department of Systematic Botany, University of Osnabrück, Barbarastrasse 11, D-49069 Osnabrück, Germany
Received for publication December 30, 2005. Accepted for publication September 21, 2006.
ABSTRACT
Polyploidization, often accompanied by hybridization, has been of major importance in flowering plant evolution. Here we investigate the importance of these processes for the evolution of the tetraploid crucifer Capsella bursa-pastoris using DNA sequences from two chloroplast loci as well as from three nuclear low-copy genes. The near-absence of variation at the C. bursa-pastoris chloroplast markers suggests a single and recent origin of the tetraploid. However, despite supporting a single phylogeny, chloroplast data indicate that neither of the extant Capsella diploids is the maternal parent of the tetraploid. Combined with data from the three nuclear loci, our results do not lend support to previous hypotheses on the origin of C. bursa-pastoris as an allopolyploid between the diploids C. grandiflora and C. rubella or an autopolyploid of C. grandiflora. Nevertheless, for each locus, some of the C. bursa-pastoris accessions harbored C. rubella alleles, indicating that C. rubella contributed to the gene pool of C. bursa-pastoris, either through allopolyploid speciation or, more likely, through hybridization and introgression. To our knowledge, this study is the first of a wild, nonmodel plant genus that uses a combination of chloroplast and multiple low-copy nuclear loci for phylogenetic inference of polyploid evolution.
Key Words: Adh Brassicaceae Capsella chloroplast DNA sequences introgression low-copy nuclear genes LUMINIDEPENDENS PISTILLATA
Polyploidy, or the duplication of entire genomes, has had a profound influence on the evolution of flowering plants. It has been estimated that a large proportion of extant plant taxa have undergone polyploidization in the past (Grant, 1963
; Stebbins, 1971
; Masterson, 1994
), and even species with small genomes, such as Arabidopsis thaliana, have been shown to be ancient polyploids (Arabidopsis Genome Initiative, 2000
; Simillion et al., 2002
). Because phylogenetic information is fundamental to comparative studies of gene and genome evolution, the recent upsurge of interest in the genetic, genomic, and epigenetic consequences of polyploidy have led to renewed efforts to study the evolutionary relationships between polyploid and diploid taxa (reviewed by Adams and Wendel, 2005
).
Polyploid species have traditionally been classified with respect to their mode of formation into autopolyploids and allopolyploids. According to this definition autopolyploids originate within a single species, whereas allopolyploids arise by hybridization between distinct species (e.g., Ramsey and Schemske, 1998
; Soltis et al., 2003
). Even though various refinements and revisions of this classification scheme have been proposed, the basic distinction between auto- and allopolyploids has remained central to the study of the evolution of polyploid species (Grant, 1981
; Soltis et al., 2003
). More recently, the introduction of molecular genetic markers has allowed unprecedented insight into the origin of polyploid species. For example, studies in plants have showed that polyploids frequently have multiple origins and that autopolyploidy is more common than previously believed (for review, see Soltis and Soltis, 1993
). An emerging insight from molecular data is that polyploid formation is a complex process, often characterized by extensive reticulation and lineage sorting, that may not be fully captured by standard concepts of auto- vs. allopolyploidy (Wolfe et al., 2003
; Soltis et al., 2003
).
The family Brassicaceae comprises some 340 genera and 3350 species of which 50% or more are believed to be polyploids (Koch et al., 2003
). With the ready availability of genomic data from Arabidopsis thaliana and Brassica crop species, the Brassicaceae provide several attractive opportunities to study polyploid evolution (e.g., Jakobsson et al., 2006
). One of the genera most closely related to Arabidopsis is Capsella (Koch et al., 2000
, 2001
), which, according to current classification (Hurka and Neuffer, 1997
) includes three species: the tetraploid (2n = 4x = 32) Capsella bursa-pastoris (L.) Medik. and two diploids (2n = 2x = 16), Capsella rubella Reut. and Capsella grandiflora (Fauché & Chaub.) Boiss. Capsella bursa-pastoris, or shepherd's purse, is a predominantly selfing species with disomic inheritance and a nearly worldwide distribution (Hurka and Neuffer, 1997
). The self-fertile C. rubella is mainly found in central-southern Europe, whereas the self-incompatible C. grandiflora has a distribution limited to the western Balkans (Chater, 1993
; Hurka and Neuffer, 1997
). Due to its phylogenetic proximity to A. thaliana, the diploid C. rubella has recently attracted attention as a model system for comparative studies of the evolution of genome organization in crucifers (Acarkan et al., 2000
; Rossberg et al., 2001
; Boivin et al., 2004
).
Despite the apparent taxonomic simplicity of the genus, previous investigations into the phylogenetic relationships among the three Capsella species have failed to provide conclusive results. Based on isozyme electrophoresis (Hurka et al., 1989
) and isoelectric focusing of Rubisco (Mummenhoff and Hurka, 1990
), the tetraploid C. bursa-pastoris was hypothesized to be an allopolyploid between C. rubella and C. grandiflora. In both of these studies, C. bursa-pastoris was shown to share some alleles with C. rubella and C. grandiflora while it also had some unique alleles. Later, with the inclusion of data on restriction enzyme site variation in the chloroplast genome, C. bursa-pastoris was instead hypothesized to be an ancient autopolyploid of C. grandiflora (Hurka and Neuffer, 1997
). However, neither data from the nuclear enzyme loci nor from the chloroplast restriction patterns have been fully informative. So far, no attempts have been made to resolve the issue by using information from DNA sequences.
Molecular phylogenetic studies of plants, including those of polyploid species, have mostly relied on a combination of sequence data from the chloroplast genome and from internal transcribed spacer (ITS) regions of nuclear ribosomal DNA. However, due to the presence of multiple copies of nuclear rRNA genes, the use of ITS data for phylogenetic inference has been questioned, particularly in hybrid or polyploid species (Alvarez and Wendel, 2003
; Small et al., 2004
). Furthermore, in polyploid species, the contribution of one parental species can be erased by a variety of processes such as concerted evolution, homoeologous recombination, genome rearrangement, and gene loss (Wendel, 2000
). For these reasons, phylogenetic studies of hybrid taxa should preferably make use of DNA sequence information from cytoplasmic genomes in combination with multiple unlinked, single-copy nuclear loci (Linder and Rieseberg, 2004
). In polyploid taxa, this type of data has as of yet only been available from a small number of well-characterized species, such as cotton (Cronn and Wendel, 2003
), soy bean (Doyle et al., 2003
), rice (Gaut and Doebley, 1997
), and Arabidopsis (Jakobsson et al., 2006
).
Here we use DNA sequence data obtained from multiple chloroplast and single-copy nuclear loci in an attempt to clarify relationships between diploid and polyploid species within the genus Capsella. As far as we are aware, our study is the first of its kind in a nondomesticated, nonmodel plant system. Combined with large-scale sequencing efforts of the C. rubella genome that are currently under way, detailed information about patterns of genome evolution in Capsella will make C. bursa-pastoris an attractive model species for studies of evolution and adaptation in polyploids.
MATERIALS AND METHODS
Plant material, DNA extraction, and ploidy analysis
We obtained seeds from a total of 20 accessions of C. bursa-pastoris, nine accessions of C. rubella, two accessions of C. grandiflora as well as one accession of Neslia paniculata (L.) Desv. The accession designations and geographical origin or source of Capsella and Neslia seed material are given in the Appendix. Seeds were germinated in petri dishes and seedlings transplanted into pots. DNA was extracted from fresh or frozen leaf tissue from a single individual per accession. Leaf tissue was ground to a fine powder in liquid nitrogen, and DNA was extracted using the QIAgen DNeasy Plant Mini Kit (QIAGEN, Valencia, California, USA). In addition, we obtained DNA extracted from herbarium specimens of Halimolobos berlandieri (Fourn.) O. E. Schulz and H. virgata (Nutt. ex Torrey & A. Gray) O. E. Schulz. We chose to use N. paniculata and Halimolobos spp. as outgroups because previous studies have shown them to be phylogenetically close to Capsella (Koch et al., 1999
; Zunk et al., 1999
; Bailey et al., 2002
). Because C. rubella and C. bursa-pastoris are often difficult to distinguish based on morphology, we sampled seeds from these species from natural populations instead of using herbarium specimens. Flow cytometry analysis on fresh leaf tissue, carried out by Plant Cytometry Services (Schijndel, The Netherlands), was employed to ascertain that all C. bursa-pastoris accessions were tetraploid and all C. rubella and C. grandiflora accessions were diploid. The 1C nuclear DNA content of C. rubella and C. bursa-pastoris has previously been estimated as 0.257 pg and 0.414 pg, respectively (Johnston et al., 2005
). However, in this study, we only obtained relative genome size estimates, which was sufficient to reliably classify individuals as diploid, triploid, or tetraploid (Appendix).
Loci included in the study
To fully elucidate the evolutionary origin of the tetraploid C. bursa-pastoris, we included both maternally inherited chloroplast loci and biparentally inherited nuclear loci. From the chloroplast genome, we amplified a region situated between the tRNA-encoding genes trnD and trnT and another region situated between trnS and trnfM genes. To minimize the risk of paralogy problems, nuclear markers that are unlinked and single-copy in A. thaliana were used. Such genes are likely to be single-copy in diploid Capsella, as comparative mapping studies of A. thaliana and C. rubella have indicated a high degree of microsynteny and highly similar gene content between these species (Acarkan et al., 2000
; Boivin et al., 2004
). The nuclear loci included partial gene sequences of Alcohol dehydrogenase (Adh), PISTILLATA (PI), and LUMINIDEPENDENS (LD). Other markers were initially explored, e.g., genes coding for FCA (FCA) and chalcone synthase (Chs), but work on these genes was not pursued further due to their lower content of phylogenetic information at this level (T. Slotte and A. Ceplitis, unpublished data).
Chloroplast loci were amplified and sequenced in all Capsella samples as well as in the N. paniculata and Halimolobos spp. outgroup taxa. Nuclear loci were amplified and sequenced in all diploid samples and in four accessions (721, 740, PL, and SE14) of the tetraploid C. bursa-pastoris.
Polymerase chain reaction amplification and cloning
Chloroplast loci were PCR-amplified using primer pairs trnD-trnT and trnS-trnFM, situated in the tRNA encoding genes trnD, trnT, trnS, and trnfM, respectively (Demesure et al., 1995
). Amplification reactions contained 1x PCR buffer (MBI Fermentas, Hanover, Maryland, USA), 2 mmol/L MgCl2, 100 µmol/L dNTP, 0.2 µmol/L of each primer and 0.01 unit/µL Taq polymerase (MBI Fermentas). The cycling scheme was the same as that of Demesure et al. (1995)
, except the annealing temperature was 52°C. Primers Adh259F (5'-tggaggttgctccaccgcagaaacacgaag-3') and Adh1043R (5'-tgcagcggctaaaccaacagcacctagtcc-3'), in exons 2 and 4 of Adh, respectively, were designed to amplify
700 bp of the Adh gene (Fig. 1). All PCR amplifications of Adh had 1x BD Advantage 2 PCR buffer, 0.2 mmol/L dNTP, 0.2 µmol/L of each primer and 1x BD Advantage 2 polymerase mix (Clontech, BD Biosciences, Mountain View, California, USA). The amplification protocol consisted of 1 min at 95°C, 30 cycles of 30 s at 95°C, and 1 min at 68°C, followed by 1 min at 68°C and a final 10 min at 70°C.
|
600-bp region spanning from exon 4 to exon 7 of the LD gene of A. thaliana. Amplification reactions contained 1x PCR buffer (MBI Fermentas), 2 mmol/L MgCl2, 0.2 mmol/L dNTP, 0.3 µmol/L of each primer, and 0.02 unit/µL Taq polymerase (MBI Fermentas). The reaction scheme was 2 min at 94°C, 35 cycles of 30 s at 94°C, 30 s at 47°C, and 1 min at 72°C, and finally 8 min at 72°C. Approximately 1000 bp of the first PI intron was amplified using primers PI-ITF and PI-ITR and the PCR protocol developed by Lee et al. (2002)
DNA sequencing and analysis
Purified PCR products from the chloroplast genome and from the nuclear loci were sequenced directly. For each accession, both PCR replicates were sequenced in both directions, using the amplification primers as sequencing primers. Additional internal sequencing primers for the PI first intron were designed to obtain complete sequences of all accessions (Fig. 1). Vector-specific primers OMNI (5'-acaggaaacagctatgaccatgat-3') and UNI (5'-cgacgttgtaaaacgaggccagt-3') were used to sequence cloned PCR products of nuclear loci. Sequencing reactions were either set up using the BigDye Terminator Sequencing Ready Reaction Kit, versions 2.0 and 3.1 (Applied Biosystems, Foster City, California, USA) and run on an ABI 3100 Genetic Analyzer (Applied Biosystems), or set up with the DYEnamic ET Dye Terminator Kit (GE Healthcare Bio-Sciences, Piscataway, New Jersey, USA) and run on a MegaBACE 1000 capillary sequencer (GE Healthcare Bio-Sciences).
Sequence reads were assembled into contigs and edited in Sequencher version 4.0.5 (Gene Codes, Ann Arbor, Michigan, USA). The resulting sequences were aligned using T-Coffee version 1.37 (Notredame et al., 2000
) using default options for DNA sequence data. Nucleotides at segregating sites were rechecked from chromatograms. Because the chloroplast genome is nonrecombining and therefore has a single evolutionary history, alignments of sequences from the two chloroplast loci were merged and treated as one data set in the phylogenetic analyses. Sequences from each nuclear gene were aligned and analyzed separately. GenBank accession numbers of chloroplast sequences and partial Adh, LD, and PI sequences are listed in the Appendix, and all alignments are deposited in the GenBank PopSet database.
Identification of homoeologous nuclear loci
For each of the three nuclear loci, direct sequencing of amplification products from tetraploid individuals resulted in chromatograms with overlapping peaks, indicating that more than one type of sequence had been amplified and that these sequences differed by at least one indel. To separate these sequence types, amplification products from tetraploids were cloned as described earlier. At least six (but in most cases eight or more) such clones per accession and nuclear locus were sequenced. For each tetraploid individual, clone sequences from each nuclear locus were aligned. Because PCR-mediated recombination is a potential problem when cloning heterogeneous amplification products (Cronn et al., 2002
) and because the presence of recombination violates basic assumptions underlying tree reconstruction methods, we did not reconstruct trees of clone sequences. Instead, we constructed median networks (Bandelt et al., 1995
) using Spectronet version 1.2 (Huber et al., 2002
). Inspection of median networks indicated that each tetraploid individual harbored two sequence types; however, we also identified putative PCR recombinant sequences (a total of seven from more than 110 clone sequences) (Appendix S1, see Supplemental Data accompanying the online version of this article).
To determine whether the recombinant sequences were artifacts or the result of a biologically relevant recombination, we designed primers to specifically amplify each of the two main sequence types for each nuclear locus. Each of these primer pairs used one nonselective primer in combination with a selective primer having a 3' end in a region that differed between the two main sequence types found in the tetraploid. We did not detect any signs of recombination when amplifying and directly sequencing amplification products from tetraploids with these primer pairs. Thus, the recombinant clone sequences do seem to be PCR-mediated artifacts.
For Adh, the selective primers were Adh-LS2F-C (5'-cactgatttactcwcawcactc-3') and Adh-LS2F-G (5'-cttcactgatttactcataatcaag-3') and the nonselective primer was Adh-LS2R (5'-ccagtggataaaccacaact-3'). Primer locations are shown in Fig. 1. Amplification reactions using these primers contained 1x Buffer Gold (Applied Biosystems, Foster City, California, USA), 2 mmol/L MgCl2, 0.2 mmol/L dNTPs, 0.3 µmol/L of each primer and 0.0125 unit/µL AmpliTaq Gold (Applied Biosystems). The reaction scheme had an initial 8 min at 95°C, followed by 30 cycles of 30 s at 94°C, 45 s at 59.1°C, and 1 min at 72°C, with a 10-min final extension at 72°C. For LD, the selective primers were CbpLD-0001F-C (5'-taggccaggtgaaactaatggac-3') and CbpLD-0001F-G (5'-taggccaggtgaaactaatggag-3') and primer LD-XC4R was used as nonselective primer (Fig. 1). For PI, the selective primers were PI-LS2R-G (5'-aagccacctatagtgtaaaatcag-3') and PI-LS2R-C (5'-aagccacctataatgtaaaatcac-3') and the nonselective primer was PI-LS2-F (5'-aaaagacccctgaatctctaacc-3') (Fig. 1). The reaction mix and cycling scheme for amplification of partial regions of both LD and PI using these primer pairs were identical to the Adh protocol specified earlier, except that a touch-down protocol was employed. The touch-down protocol started with an annealing temperature of 59°C, which was decreased by 0.7°C per cycle for the first 12 cycles, followed by 23 cycles with a set annealing temperature at 50°C.
To determine whether the two sequence types found corresponded to loci duplicated by polyploidy (homoeologues) or whether they were allelic, amplification and direct sequencing using the primer pairs described were carried out on DNA from the same C. bursa-pastoris individuals used previously (PL, SE14, 721, and 740) as well as on 34 offspring derived by self-fertilization of each of these individuals. If different sequence types of a given gene were allelic, selfed offspring from a mother plant with two sequence types would segregate for the two types. No segregation was observed, and for each gene, both sequence types were found in all offspring individuals, indicating that the two sequence types do not correspond to alleles but to duplicated loci. Because no signs of duplication were found in the diploids, it is reasonable to assume that these sequences correspond to the two loci duplicated by tetraploidy, i.e., are homoeologous.
Each clone sequence was subsequently assigned to either of the two putative homoeologues, designated A and B, based on the majority of informative polymorphisms (that is, ignoring singletons). Artifactual recombinant sequences were excluded, and consensus sequences for each individual and locus were constructed, to avoid inclusion of sequencing errors. These consensus sequences were used in subsequent phylogenetic analyses. It should be noted that, for each nuclear gene, the designations A and B for the two sequence types are arbitrary. Thus, the A- or B-homoeologue sequences for different genes may or may not come from the same genome. This, however, does not affect subsequent phylogenetic analyses.
Analyses of phylogenetic relationships, polymorphism, and divergence
Phylogenetic analyses were performed in PAUP* version 4b10 (Swofford, 2002
). For all data sets, both parsimony and likelihood were used as optimality criteria, and gaps were coded as missing data. Parsimony analyses were unweighted, and branch-and-bound searches with furthest sequence addition were used to find the most parsimonious trees. No gap coding was performed. Maximum likelihood (ML) phylogenetic reconstructions were performed with the model with the best fit in a hierarchical likelihood ratio test using Modeltest version 3.6 (Posada and Crandall, 1998
). Maximum likelihood estimates of base frequencies and substitution rates were obtained from Modeltest and used in subsequent likelihood analyses. Initial trees were generated by random stepwise sequence addition. Heuristic searches with tree bisection and reconnection (TBR) branch swapping and 10 replicate searches were employed to find trees with the highest likelihood. For all phylogenetic analyses, support was evaluated with 1000 bootstrap replicates and trees were rooted with outgroup sequences.
Analyses of polymorphism and divergence were carried out in DnaSP version 4.0 (Rozas et al., 2003
). Sites containing gaps were excluded from estimation of nucleotide diversity and divergence. Nucleotide diversity (
) was estimated according to Nei (1987)
without JukesCantor correction. Nucleotide divergence was calculated as the mean proportion of nucleotide divergence between populations or species (Nei, 1987
) without JukesCantor correction. Synonymous and nonsynonymous divergence was estimated according to Nei and Gojobori (1986)
. To test for differences in evolutionary rates between the diploids and the tetraploid, Tajima's relative rate tests (Tajima, 1993
), using N. paniculata as an outgroup, were carried out in MEGA version 3.1 (Kumar et al., 2004
).
RESULTS
Chloroplast DNA divergence and polymorphism
Divergence between Capsella species at the trnDtrnT and trnStrnfM regions is summarized in Table 1. All fixed differences were in noncoding regions. Overall, 10 substitutions (four transitions and six transversions) separated C. bursa-pastoris from C. rubella, whereas a total of 11 fixed differences (four transitions and seven transversions) separated C. bursa-pastoris from C. grandiflora. Five nucleotide substitutions (two transitions and three transversions) were found between C. rubella and C. grandiflora.
|
= 0.00015). There were no indel polymorphisms in C. rubella, and all sequences from C. grandiflora were identical to one another. There were no shared polymorphisms between the different Capsella species.
Phylogenetic analysis of chloroplast DNA data
Phylogenetic analyses of the cpDNA data set resulted in trees with identical topologies under parsimony and likelihood optimality criteria. In these trees, C. bursa-pastoris was sister to a clade consisting of C. rubella and C. grandiflora (Fig. 2). There was high support for relationships within the ingroup under both parsimony analysis and likelihood analysis under the F81 model (Felsenstein, 1981
), which had the best fit in the hierarchical likelihood ratio test. The cpDNA data set, which had 71 variable sites, contained a total of 37 parsimony-informative sites and a low level of homoplasy, as indicated by high consistency and rescaled consistency indices, 0.950 and 0.936, respectively, in the parsimony analysis.
|
Phylogenetic analyses of nuclear loci
Phylogenetic analyses of the three nuclear loci resulted in different topologies, with some features in common. For all three nuclear loci, one of the C. bursa-pastoris accessions possessed a C. rubella allele, which always clustered with a clade of haplotypes mainly found in C. rubella and C. grandiflora (Figs. 35). All three loci also contained four C. bursa-pastoris sequences that were highly similar to one another and were tentatively assigned to the same homoeologue, henceforth called the B homoeologue (Figs. 35). The position of the remaining C. bursa-pastoris sequences, here tentatively designated as A-homoeologue sequences, was unstable and differed both between loci and depending on phylogenetic reconstruction method (Figs. 35).
|
|
For the 576-bp LD data set containing 47 parsimony-informative sites, there were two equally parsimonious trees of length 94 steps, which differed regarding the branching order of two highly supported clades consisting of the putative A and B homoeologues of C. bursa-pastoris. Both trees contained a clade of C. rubella sequences that included the putative A homoeologue of C. bursa-pastoris accession 721 (Fig. 4). Maximum likelihood analysis used the HKY85 model (Hasegawa et al., 1985
) with gamma-distributed, among-site variation, which had the best fit in the hierarchical likelihood ratio test as implemented in Modeltest version 3.6. The topology of the single most likely tree found for this data set was identical to that of the parsimony tree shown in Fig. 4, and levels of support were similar.
|
All nuclear loci had a similar proportion of parsimony-informative characters. However, levels of homoplasy, as measured by consistency (CI) and rescaled consistency indices (RC) were higher for the PI first intron than for the other two loci (CI = 0.915, RC = 0.849 for the PI first intron; CI = 0.958, RC = 0.924 for Adh; and CI = 0.968, RC = 0.943 for LD).
Divergence at nuclear loci
For each nuclear locus, an allele identical or highly similar to a C. rubella allele was found at one of the two putative homoeologues in one of the C. bursa-pastoris accessions (Figs. 35; Appendix S2, see Supplemental Data with the online version of this article). For Adh, accession SE14 from Sweden harbored an allele at one of the homoeologous loci that only differed from the allele found in all C. rubella accessions by a 2-bp insertion in a poly-A region. For LD, an allele identical to that found in C. rubella accessions from Italy and Chile was found in C. bursa-pastoris accession 721 from California, and for the PI first intron, an allele identical to that found in C. rubella accessions from Greece and Turkey was found in C. bursa-pastoris accession 740 from Nevada. In other words, in three separate C. bursa-pastoris accessions, one of the homoeologues at one locus was highly similar or identical to a C. rubella allele. The presence of this allele in C. bursa-pastoris was verified by sequencing and cloning of additional PCR replicates from independent DNA extractions to rule out contamination as a possible explanation. This C. bursa-pastoris sequence was not included in calculations of divergence between the Capsella genomes. Nevertheless, the A-type homoelogues showed a closer relationship to C. rubella and C. grandiflora than the B-type homoeologues (Tables 24). At all three sequenced nuclear loci, divergence between the homoeologues in the tetraploid was greater than that between the two diploids (Tables 24).
|
|
|
Chloroplast phylogeny and the maternal origin of C. bursa-pastoris
In contrast to many polyploid plant species that have apparently arisen on multiple occasions (Soltis and Soltis, 1993
, 1995
, 2000
), the nearly complete lack of intraspecific variation among the chloroplast sequences strongly suggests a single origin of C. bursa-pastoris (Fig. 2). This observation is in agreement with a previous study that found limited variation at seven chloroplast microsatellite loci in a geographically widespread sample of 59 C. bursa-pastoris accessions (Ceplitis et al., 2005
). In that study, the upper bound for the time to the most recent common ancestor (MRCA) was estimated to be 48000 or 60000 years, depending on the genealogical model used (constant population size and exponential population growth, respectively). By assuming two extreme genealogical scenarios and a mutation rate of 2.9 x 109 for noncoding cpDNA, as in Säll et al. (2003)
, we estimate that the 95% upper bound for the time to the MRCA of our sample of C. bursa-pastoris is between 43000 years (for a star-shaped genealogy) and 430000 years (for a genealogy with two deep branches). In other words, for comparable genealogical scenarios, our sequence data give estimates very similar to microsatellite data of the time to the MRCA of the C. bursa-pastoris chloroplast genome. Of course, it cannot be excluded that a postpolyploidization bottleneck has reduced variation in the C. bursa-pastoris genome and that the origin of the tetraploid is to be found farther back in time. Nevertheless, in a study of the tetraploid Arabidopsis suecica, Jakobsson et al. (2006)
concluded that A. suecica, which like C. bursa-pastoris is a highly selfing species but with a more limited distribution (Säll et al., 2004
), had a unique and recent origin between 12000 and 300000 ago. It is thus possible that both C. bursa-pastoris and A. suecica arose after the last Pleistocene glaciation. A similar scenario was proposed for the polyploid Draba ladina (Brassicaceae; Widmer and Baltisberger, 1999
). It could be that the appearance of polyploid lineages provided an evolutionary impetus in different plant taxa that facilitated their spread in the early postglacial environment. Even though available evidence points to a single and recent origin of C. bursa-pastoris, the present analysis of chloroplast DNA sequence data does not unequivocally identify the maternal parent of C. bursa-pastoris. Had one of the two diploid Capsella species formed a clade together with C. bursa-pastoris in the chloroplast phylogeny, we would have strong evidence for that species as the maternal ancestor of C. bursa-pastoris. Instead, the chloroplast data set resulted in a strongly supported phylogeny in which C. bursa-pastoris is sister to the two diploids. This finding is in line with previous studies of the Capsella genus using chloroplast DNA restriction site data and isoelectric focusing of the chloroplast-encoded Rubisco enzyme showing that the diploids have highly similar or identical patterns, whereas the tetraploid has a deviating pattern (Hurka and Neuffer, 1997
). Against this background, one would need to invoke additional processes of chloroplast divergence and/or sorting in order to maintain that either of the two extant diploid Capsella species constitutes the maternal parent of C. bursa-pastoris. An alternative explanation would be that neither C. rubella nor C. grandiflora is the maternal parent of the tetraploid C. bursa-pastoris, but that the true maternal parent species is extinct or has not been sampled. Which of these hypotheses is correct cannot presently be determined.
Nuclear gene phylogenies, polyploidization, and introgression
In the ideal case, sequences from the two homoeologous loci for a given gene in a tetraploid would be expected to form a clade with the same or with different diploid species if the tetraploid were an auto- or allopolyploid, respectively. In the present study, neither of these two possible topologies appeared for any of the three nuclear genes investigated (Figs. 35); more specifically, even though topologies for the three different nuclear loci were similar, none of them can be obviously reconciled with the hypothesis of the tetraploid C. bursa-pastoris being an allopolyploid between C. rubella and C. grandiflora, nor with an autopolyploid origin of C. bursa-pastoris from C. grandiflora, as has previously been suggested (Hurka et al., 1989
; Mummenhoff and Hurka, 1990
; Hurka and Neuffer, 1997
). Indeed, both the chloroplast and the nuclear data seem to suggest that neither C. rubella nor C. grandiflora is directly involved in the ancestry of the tetraploid C. bursa-pastoris. This is a peculiar finding given that these are the only extant diploid species in the genus. Some form of natural selection acting on the nuclear genes might have created a topology deviating from the expected patterns outlined. It appears highly unlikely, however, that selection should affect three unlinked genes in a similar way.
At any rate, for each nuclear locus, the two putative homoeologues found in C. bursa-pastoris had a higher level of divergence than that found between the two diploids. This could indicate that the tetraploid is of allopolyploid origin, as supported by the fact that inheritance in C. bursa-pastoris is disomic (Hurka et al., 1989
; Mummenhoff and Hurka, 1990
; Hurka and Düring, 1994
). On the other hand, the degree of inter-homoeologue divergence is also compatible with C. bursa-pastoris being an autopolyploid that has undergone a process of diploidization (see Wolfe, 2003
). If so, C. bursa-pastoris must be relatively ancient, a notion that apparently runs counter to the recent origin indicated by the low degree of chloroplast diversity. As noted, however, the tetraploid might well be older than the ancestor to the present-day chloroplast lineages.
The interpretation of the phylogenies of the three nuclear genes relies on the assumption that the two divergent classes of sequence found for each nuclear gene in C. bursa-pastoris in fact represent the two homoeologues of the tetraploid. There are two alternative possibilities that call for a discussion: First, the two sequence types may be alleles at a single locus. The fact that, for all three genes, we recovered both types of sequence from all offspring from selfed mother plants carrying both sequence types clearly shows that they are not allelic. Moreover, in a cross between accessions 721 and 740, polymorphisms within each sequence type of the LD gene were inherited as alternative alleles, whereas polymorphisms in different sequence types segregated independently as expected from unlinked loci (A. Ceplitis, unpublished data). In addition, had the sequence types represented different alleles, C. bursa-pastoris would have been highly heterozygous, an improbable condition given the strongly selfing habit of the species (Shull, 1929
; Hurka et al., 1989
). Second, the two sequence types may derive from separate loci that have been duplicated in a process unconnected to polyploidization. We have, however, chosen loci that are single-copy in the closely related Arabidopsis thaliana. Furthermore, we found only a single sequence type for each of the three nuclear genes in C. rubella and C. grandiflora, indicating that these genes are also single-copy in the diploid Capsella species. All in all, we believe there is convincing evidence that the two distinct types of sequence that we found for each nuclear gene in C. bursa-pastoris belong to the two homoeologous loci.
An added level of complexity in the nuclear gene phylogenies is that, for each nuclear gene, a different C. bursa-pastoris accession carried C. rubella alleles at one of the homoeologues (Figs. 35). This is a surprising finding that has no obvious explanation under standard polyploidization schemes. There are at least two possible scenarios that could account for the existence of C. rubella alleles in present-day C. bursa-pastoris. First, the rubella alleles could be remnants of a polyploidization event involving C. rubella, or a species possessing rubella-like alleles, as the paternal ancestor and an unknown or extinct taxon as the maternal ancestor of C. bursa-pastoris. Under disomic inheritance, it would be necessary to invoke multiple origins of the tetraploid as a single origin is an extreme bottleneck that is unlikely to leave several different alleles segregating at the nuclear loci. Alternatively, assuming that C. bursa-pastoris initially had tetrasomic inheritance, genetic drift followed by diploidization could have created the mosaic pattern of variation in the nuclear genome of C. bursa-pastoris. Rapid genomic restructuring promoting transition to cytological diploidy has been demonstrated in artificially created polyploids (Ma and Gustafson, 2005
). These observations notwithstanding, the degree of sequence divergence found between homoeologous loci in C. bursa-pastoris (Tables 24), irrespective of parental source, clearly suggests that if C. bursa-pastoris arose as a tetrasomic autopolyploid, it is of relatively ancient origin. However, neither multiple origins of C. bursa-pastoris, as required in the former case, nor a considerable age of the tetraploid implied in the latter is apparently compatible with the chloroplast data. On the other hand, it is important to remember that the genealogy of a single locus, such as the chloroplast genome, may differ from that of entire species (Pamilo and Nei, 1988
; Rosenberg and Nordborg, 2002
); moreover, inferring the number of founders of a polyploid from molecular data is difficult and can depend strongly on assumptions about ancestral population sizes (Jakobsson et al., 2006
). Hence, even if our data suggest otherwise, we cannot entirely discount the possibility that C. bursa-pastoris is relatively ancient and/or was founded on several separate occasions.
Second, the C. rubella alleles in C. bursa-pastoris could have been introduced after the establishment of the tetraploid, i.e., as a result of repeated hybridization between tetraploid C. bursa-pastoris and diploid C. rubella with subsequent introgression of C. rubella alleles into the C. bursa-pastoris gene pool. These species occur in sympatry and are known to occasionally hybridize to produce triploid hybrids that are highly sterile when selfed (Capsella xgracilis; Shull, 1929
; Keble-Martin, 1986
; Chater, 1993
). It is not known, however, to what extent backcrossing of triploids to tetraploid C. bursa-pastoris yields viable and fertile progeny. Ramsey and Schemske (1998)
concluded that triploid plants often are semifertile and that backcrossing of triploids to tetraploids can give rise to a high proportion of tetraploid progeny. They also found that viable triploid progeny is more likely to be formed when the seed parent is a tetraploid and the pollen donor a diploid, in which case introgression would not be reflected in the maternally inherited chloroplast genome. Because the alleles shared between C. rubella and C. bursa-pastoris were identical or near-identical, it is more likely that the rubella alleles have been introduced very recently into C. bursa-pastoris rather than having existed in the tetraploid since its conception. We note that the C. bursa-pastoris accessions found to harbor rubella alleles originate from localities (California and Central Sweden) outside the present-day distribution of C. rubella. Hence, if the presence of rubella alleles in C. bursa-pastoris is a result of hybridization, this must have occurred before the species gained its current worldwide distribution. Archeological and historical evidence (Hurka and Neuffer, 1997
) as well as genetic data (Ceplitis et al., 2005
) show that C. bursa-pastoris began spreading across Europe 5000 to 7000 years ago and that it reached the Americas in post-Columbian times. Consequently, even if the rubella alleles in the investigated C. bursa-pastoris accessions originate from hybridization events pre-dating the spread of the species, the time frame is most likely sufficiently short for the shared alleles to have remained highly similar. Taken together, even though we cannot find any conclusive evidence for or against either of the two major hypotheses regarding the presence of C. rubella alleles in C. bursa-pastoris, we believe that the single and recent origin of C. bursa-pastoris suggested by the chloroplast data, together with the close similarity between the alleles shared between C. rubella and C. bursa-pastoris, as well as the spontaneous occurrence of triploid hybrids in nature, argues for the hybridization scenario.
To fully understand the various genomic changes that accompany polyploidization, it is absolutely essential to clarify the phylogenetic history of polyploid species (Wendel, 2000
). As exemplified by the present study, an increased use of multiple unlinked, single-copy nuclear genes for phylogeny reconstruction in diploidpolyploid systems is almost certain to reveal that the evolution of polyploid taxa is even more complex than currently appreciated. Also, in a genus as small and apparently simple as Capsella, with only two diploids and one tetraploid, phylogenetic relationships are obviously more reticulate and web-like than simple and tree-like. Moreover, the presence of C. rubella alleles in C. bursa-pastoris is of particular relevance for studies of the molecular basis of adaptations. It is an interesting possibility that part of the enormous natural variability in a number of ecologically important traits that characterizes C. bursa-pastoris and which has led to the unparalleled success of the species as an invasive weed, might have been brought about by the sporadic occurrence of divergent C. rubella alleles segregating in C. bursa-pastoris. Future studies of molecular variation at genes involved in adaptive evolution in C. bursa-pastoris will shed light on this matter.
APPENDIX
Geographical origin, source, and GenBank accession numbers for taxa and sequences included in this study. A dash indicates the region was not sampled. Seed samples were collected from natural populations by A. Ceplitis (Lund University, Sweden), H. Ceplitis (Lund University, Sweden), D. Crawford (University of Kansas, USA), H. Hurka (University of Osnabrück, Germany), S. Holm (Mid Sweden University, Sweden), T. Säll (Lund University, Sweden), Y. N. Isakov (Research Institute of Forest Genetics and Breeding, Russia) and Yau-Wen Yang (Institute of Botany, Academica Sinica, Taiwan) and are available at request. Seeds were also obtained from GAT = the Institute of Plant Genetics and Crop Plant Research, Gatersleben, Germany, GOET = the Botanical Garden of Göttingen, Germany, B = the Botanical Garden Berlin-Dahlem, Berlin, Germany, and from USDA = U.S. Department of Agriculture. Voucher specimens are deposited in OSBU = Herbarium of Systematic Botany in Osnabrück, Germany, GA = Plant Biology Department, University of Georgia, BH = the Bailey Hortorium Herbarium at Cornell University, and MEXU = Herbario Nacional, Universidad Autonoma de Mexico.
Tetraploids
Capsella bursa-pastoris(L.) Medik.Ecotype designation; Ploidy level; Locality; Source, collection number; Accession no.: trnD, trnT, trnS, trnFM, Adh (A homoeologue), Adh (B homoeologue), PI (A homoeologue), PI (B homoeologue), LD (A homoeologue), LD (B homoeologue).
C. bursa-pastorisET8; 2n = 4x; Addis Ababa, Ethiopia; Säll 8; DQ343376, DQ343409, DQ343442, DQ343475, , , , , , . C. bursa-pastorisSE12; 2n = 4x; Härnösand, Sweden; Holm 12; DQ343362, DQ343395, DQ343428, DQ343461, , , , , , . C. bursa-pastorisSE14; 2n = 4x; Härnösand, Sweden; Holm 14; DQ343382, DQ343415, DQ343448, DQ343481, DQ343316, DQ343308, DQ343488, DQ343482, DQ343328, DQ343333. C. bursa-pastorisSE17; 2n = 4x; Härnösand, Sweden; Holm 17; DQ343377, DQ343410, DQ343443, DQ343476, , , , , , . C. bursa-pastorisSE30; 2n = 4x; Lund, Sweden; Ceplitis, A. 30; DQ343378, DQ343411, DQ343444, DQ343477, , , , , , . C. bursa-pastorisSE31; 2n = 4x; Torup, Sweden; Ceplitis, A. 31; DQ343379, DQ343412, DQ343445, DQ343478, , , , , , . C. bursa-pastorisSE32; 2n = 4x; Hjärup, Sweden; Ceplitis, A. 32; DQ343375, DQ343408, DQ343441, DQ343474, , , , , , . C. bursa-pastorisSE33; 2n = 4x; Hässleholm, Sweden; Ceplitis, A. 33; DQ343374, DQ343407, DQ343440, DQ343473, , , , , , . C. bursa-pastorisSE35; 2n = 4x; Växjö, Sweden; Ceplitis, A. 35; DQ343373, DQ343406, DQ343439, DQ343472, , , , , , . C. bursa-pastorisSE37; 2n = 4x; Västervik, Sweden; Ceplitis, A. 37; DQ343372, DQ343405, DQ343438, DQ343471, , , , , , . C. bursa-pastorisFR49; 2n = 4x; Colmar, France; Ceplitis, A. 49; DQ343371, DQ343404, DQ343437, DQ343470, , , , , , . C. bursa-pastorisFR50; 2n = 4x; Colmar, France; Ceplitis, A. 50; DQ343370, DQ343403, DQ343436, DQ343469, , , , , , . C. bursa-pastoris;FR51; 2n = 4x; Ronchamp, France; Ceplitis, A. 51; DQ343369, DQ343402, DQ343435, DQ343468, , , , , , . C. bursa-pastorisNL54; 2n = 4x; Amsterdam, Netherlands; Ceplitis, A. 54; DQ343368, DQ343401, DQ343434, DQ343467, , , , , , . C. bursa-pastorisNL55; 2n = 4x; Meppen, Netherlands; Ceplitis, A. 55; DQ343367, DQ343400, DQ343433, DQ343466, , , , , , . C. bursa-pastorisCD; 2n = 4x; Chengdu, China; Yang 101; DQ343366, DQ343399, DQ343432, DQ343465, , , , , , . C. bursa-pastorisPL; 2n = 4x; Puli, Taiwan; Yang 102; DQ343365, DQ343398, DQ343431, DQ343464, DQ343313, DQ343310, DQ343487, DQ343485, DQ343329, DQ343334. C. bursa-pastoris721; 2n = 4x; Shafter, California, USA; Hurka 721 (OSBU); DQ343363, DQ343396, DQ343429, DQ343462, DQ343311, DQ343307, DQ343486, DQ343483, DQ343341, DQ343331. C. bursa-pastoris740; 2n = 4x; Reno, Nevada, USA; Hurka 740 (OSBU); DQ343364, DQ343397, DQ343430, DQ343463, DQ343312, DQ343309, DQ343493, DQ343484, DQ343330, DQ343332. C. bursa-pastorisCOA; 2n = 4x; Voronez, Russia; Isakov 103; DQ343361, DQ343394, DQ343427, DQ343460, , , , , , .
Diploids
TaxonEcotype designation; Ploidy level; Locality; Source, collection number; trnD, trnT, trnS, trnfM, Adh, PI, LD.
Capsella grandiflora(Fauché & Chaub.) Boiss.USDA; 2n = 2x; unknown; USDA; DQ343360, DQ343393, DQ343426, DQ343459, DQ343315, DQ343489, DQ343335. C. grandifloraIPK; 2n = 2x; unknown; GAT; DQ343359, DQ343392, DQ343425, DQ343458, DQ343314, DQ343490, DQ343336. C. rubellaReut.1GR1; 2n = 2x; Samos, Greece; Ceplitis, A. 1; DQ343350, DQ343383, DQ343416, DQ343449, DQ343322, DQ343491, DQ343337. C. rubella72TR1; 2n = 2x; Istanbul, Turkey; Ceplitis, A. 72; DQ343352, DQ343385, DQ343418, DQ343451, DQ343324, DQ343499, DQ343343. C. rubella77TR1; 2n = 2x; Istanbul, Turkey; Ceplitis, A. 77; DQ343358, DQ343391, DQ343424, DQ343457, DQ343323, DQ343498, DQ343338. C. rubella80TR1; 2n = 2x; Istanbul, Turkey; Ceplitis, A. 80; DQ343357, DQ343390, DQ343423, DQ343456, DQ343319, DQ343500, DQ343339. C. rubella82TR1; 2n = 2x; Istanbul, Turkey; Ceplitis, A. 82; DQ343353, DQ343386, DQ343419, DQ343452, DQ343321, DQ343492, DQ3433440. C. rubella86IT1; 2n = 2x; Sorrento, Italy; Ceplitis, H. 86; DQ343351, DQ343384, DQ343417, DQ343450, DQ343320, DQ343496, DQ343345. C. rubella836/14/1/4; 2n = 2x; Concepcion, Chile; Crawford 836 (OSBU); DQ343354, DQ343387, DQ343420, DQ343453, , DQ343497, DQ343346. C. rubella GO; 2n = 2x; unknown; GOET; DQ343355, DQ343388, DQ343421, DQ343454, DQ343317, DQ343495, DQ343344. C. rubellaIPK; 2n = 2x; unknown; GAT; DQ343356, DQ343389, DQ343422, DQ343455, DQ343318, DQ343494, DQ343342.
Neslia paniculata(L.) Desv.NP1; 2n = 2x; unknown; B; DQ343380, DQ343413, DQ343446, DQ343479, DQ343325, DQ343501, DQ343348.
Halimolobos virgata(Nutt. ex Torrey & A. Gray) O.E. Schulz ; 2n = 2x; unknown; Price 1385 (GA); DQ343381, DQ343414, DQ343447, DQ343480, DQ343327, AF307606a, DQ343347.
Halimolobos berlandieri(Fourn.) O.E. Schulz ; 2n = 2x; unknown; Bailey & Ochoterena 139 (BH & MEXU); , , , , DQ343326, AF307595a, DQ343349.
aSequences not produced in this study.
FOOTNOTES
1 The authors thank D. Bailey for supplying DNA from Halimolobos spp. and U. Lagercrantz, M. Jakobsson, and two anonymous reviewers for constructive comments on the manuscript. Financial aid was provided by the Swedish Research Council for Environmental, Agricultural Sciences and Spatial Planning (M.L.) and the Carl Trygger Foundation (A.C.). ![]()