|
|
||||||||
Systematics and Phytogeography |
2Institut de Recherche en Biologie Végétale, Université de Montréal, 4101 Sherbrooke est, Montréal, Québec H1X 2B2 Canada; 3Department of Biology, 214 Shoemaker Hall, University of Mississippi, University, Mississippi 38677 USA; 4Biology Department, Washington University, Campus Box 1137, St. Louis, Missouri 63130 USA
Received for publication June 13, 2005. Accepted for publication November 23, 2005.
ABSTRACT
This study investigates the impact of hybridization and polyploidy in the evolution of eastern North American roses. We explore these processes in the Rosa carolina complex (section Cinnamomeae), which consists of five diploid and three tetraploid species. To clarify the status and origins of polyploids, a haplotype network (statistical parsimony) of the glyceraldehyde 3-phosphate dehydrogenase (GAPDH) nuclear gene was estimated for polyploids of the complex and for diploids of section Cinnamomeae in North America. A genealogical approach helped to decipher the evolutionary history of polyploids from noise created by hybridization, incomplete lineage sorting, and allelic segregation. At the diploid level, species west of the Rocky Mountains are distinct from eastern species. In the east, two groups of diploids were found: one consists of R. blanda and R. woodsii and the other of R. foliolosa, R. nitida, and R. palustris. Only eastern diploids are involved in the origins of the polyploids. Rosa arkansana is derived from the blandawoodsii group, R. virginiana originated from the foliolosanitidapalustris group, and R. carolina is derived from a hybrid between the two diploid groups. The distinct origins of these polyploid taxa support the hypothesis that the three polyploids are separate species.
Key Words: haplotype network incomplete lineage sorting multiple origins polyploidy reticulate evolution Rosa carolina complex statistical parsimony
Wild species of roses are characterized by extensive morphological variation, which has resulted in a notoriously complex taxonomy. For instance, Linnaeus (Stearn, 1957
, p. 158) wrote in Species Plantarum, "The species of Rosa are with difficulty to be distinguished, with even greater difficulty to be defined; nature seems to me to have blended several or by way of sport to have formed several from one." North American roses are no exception; Crépin (1896)
, Watson (1885)
, Rydberg (1920)
, and Erlanson MacFarlane (1966)
described 13, 18, 129, and 22 Rosa species on this continent, respectively. Hybridization has long been considered to be one of the major causes of taxonomic confusion (Linnaeus, 1753
; Crépin, 1894
, 1896
), and artificial crosses have shown that in fact most diploids are interfertile (Erlanson, 1934
; Ratsek et al., 1939
, 1940
; Lewis and Basye, 1961
). Cytological studies during the early 20th century demonstrated that polyploidy is frequent in Rosa (Täckholm, 1922
; Hurst, 1925
) and that it could represent another source of variation. The present research explores issues related to hybridization and polyploidy, two important processes in plant evolution (Arnold, 1997
; Otto and Whitton, 2000
), that may explain the difficulty of recognizing species in wild roses.
This study focuses on the North American Rosa carolina L. complex of section Cinnamomeae, a group that epitomizes the complexity of the genus. Indeed, Lewis (1957c
, p. 126) considered the group to be "... the most difficult taxonomic problem in our North American Rosa." The complex consists of five diploid and three tetraploid species, almost entirely located east of the Rocky Mountains. The diploids R. blanda Ait., R. foliolosa Nutt., R. nitida Wild., R. palustris Marsh., and R. woodsii Lindl. (the sole species of the complex also found west of the Rocky Mountains) are relatively well circumscribed (Lewis, 1957c
; Erlanson MacFarlane, 1966
), but natural interspecific hybrids have been reported (Erlanson, 1929
, 1934
; Lewis, 1962
), and some have been given species status (Rydberg, 1920
; Erlanson, 1934
). In contrast, the tetraploid taxa R. arkansana Porter, R. carolina L., and R. virginiana Mill. are characterized by extensive continuous morphological variation that blurs their limits with each other and with their putative diploid ancestors in the R. carolina complex (Erlanson, 1934
; Lewis, 1957b
). Despite the important biosystematic investigations involving cytology and morphology in this complex (Erlanson, 1929
, 1934
; Lewis, 1957b
), the limits and origins of the polyploid taxa are still unclear. The broad polymorphism of polyploid species may be caused by hybridization given that it frequently has been reported in areas of contact between R. carolina and R. arkansana in the west (e.g., R. x rudiuscula Greene: Erlanson MacFarlane, 1966
; Lewis, 1957b
; A. Fishbein and W. H. Lewis, Washington University, unpublished manuscript) and between R. carolina and R. virginiana in the east (Fernald, 1922
; Lewis, 1957b
) (Fig. 1). Yet, it is also possible that these taxa represent a single polymorphic species rather than three distinct taxa. Therefore, reconstructing the origins of the polyploids is a logical first step toward a global understanding of the R. carolina complex because it could be relevant to solving the species status of the polyploids if these are shown to have evolved independently.
|
Investigation of polyploid origin must be done within a sound phylogenetic framework. To date, phylogenetic studies of Rosa have not included a good sampling of North American roses (e.g., Millan et al., 1996
; Matsumoto et al., 1998
; Wissemann and Ritz, 2005
), leaving their relationships obscure. Reconstruction of the diploid relationships could be further complicated by the recent origin of the complex, which is suggested by the low variation of ribosomal (Ritz et al., 2005
) and chloroplastic markers (Wissemann and Ritz, 2005
). Recent origin of species may result in incomplete lineage sorting of several molecular markers for the diploids (Pamilo and Nei, 1988
; Rosenberg, 2002
, 2003
), which in turn could hamper our ability to accurately identify the species that were involved in the origins of polyploids. These potential problems need to be addressed prior to investigating polyploid evolution.
A genealogical approach using a single-copy nuclear gene is used to address the relationship of diploids and to investigate the origins of the polyploids. A genealogical approach has major advantages over a genotyping method (e.g., microsatellites, amplified fragment length polymorphisms [ALFPs], isozymes) because it places the data in a historical perspective: it relates who is ancestral to whom rather than who is similar to whom. This is particularly important in order to discern some of the confounding events mentioned earlier from our principal goalreconstructing polyploid evolution. The use of nuclear genes is particularly useful in this regard because non-haploid organisms (except for clonal and apomict taxa) receive one chromosome copy from each parent. Thus, nuclear genes can retain information about the reticulate history of organisms, which is impossible for maternally or paternally transmitted markers. Such an approach has been successful in reconstructing the polyploid origins of other taxa (Doyle et al., 2002
; Senchina et al., 2003
; Smedmark et al., 2003
; Helfgott and Mason-Gamer, 2004
; Joly and Bruneau, 2004
; Mason-Gamer, 2004
; Petersen and Seberg, 2004
; Evans et al., 2005
).
MATERIALS AND METHODS
Sampling
Because it was more important to assess the extent of genetic variation within species rather than within populations, a single individual per population was investigated. Populations were sampled to represent the geographical range of each species of the complex (Table 1). Diploid roses of section Cinnamomeae west of the Rocky Mountains, R. gymnocarpa Nutt. and R. pisocarpa Gray, were included because they could be involved in the origins of the eastern polyploids. Diploid roses of section Synstylae found in North America, R. setigera Michx. (native to North America) and R. multiflora Thunb. (introduced from China and now a noxious invasive in eastern North America [Meiners et al., 2001
; Hunter and Mattice, 2002
]), were included as outgroup taxa. Only one species of Rosa section Cinnamomeae occurring east of the Rocky Mountains was not investigated here: R. acicularis Lindl., a circumboreal species that has both hexaploid and octoploid populations (Lewis, 1959
). Investigation of its origin would require a broader taxonomic sampling at the diploid level, which is beyond the scope of the present study.
|
Molecular methods
DNA was extracted using a modified version of the cetyltrimethylammonium bromide (CTAB) extraction of Doyle and Doyle (1987)
. Modifications involved scaling the protocol for a total CTAB volume of 600 µL; adding 12 µL of 0.5 mol/L ethylenediamine tetra-acetic acid (EDTA) pH 8.0 per 600 µL of CTAB and 1% polyvinylpyrrolidone (PVP) to the extraction buffer prior to extraction; adding 20 µg of RNAse A to the CTAB buffer prior to incubation at 65°C; performing two chloroform-isoamyl alcohol (24:1) extractions and precipitating the DNA with 1.5 volumes of 100% ethanol.
Gene selection
North American roses are particularly uniform at the DNA level. For example, sequences of the internal transcribed spacer of the 18S-5.8S-26S ribosomal gene family showed few variations among North American rose species sampled by Ritz et al. (2005)
, even though this marker is generally considered to be highly variable in many plant taxa (Baldwin et al., 1995
). Similarly, only five variable characters were found between R. woodsii, R. blanda, and R. palustris among 4318 base pairs (bp) from seven chloroplast gene spacers or introns (S. Joly and J. R. Starr, unpublished data). Because of this, introns of single-copy nuclear genes became the alternative for providing sufficient variation. Initial screening (data not shown) of several nuclear genes (LEAFY [e.g., Frohlich and Meyerowitz, 1997
; Archambault and Bruneau, 2004
]; GBSSI [e.g., Evans et al., 2000
]; RPB2 [e.g., Denton et al., 1998
; Pfeil et al., 2004
]; GAPDH [e.g., Strand et al., 1997
; Olsen and Schaal, 1999
]) identified GAPDH as the most variable region.
Gene amplification
The cytosolic glyceraldehyde 3-phosphaste dehydrogenase (GAPDH) gene was amplified from the end of exon 7 (according to the Arabidopsis thaliana sequence; GenBank locus tag: At3g04120) to the beginning of exon 11 (which is exon 9 in A. thaliana; Fig. 2). The 5' end of the forward primer GPDX7F (5'-GATAGATTTGGAATTGTTGAGG-3'; Strand et al., 1997
) starts 52 bp upstream of the intron in the seventh exon, whereas the GPDX11R primer (5'-GACattgaatgagataaacc-3'; lowercase letters represent intron nucleotides) spans the junction between exon 11 and the previous intron. Polymerase chain reactions in final volumes of 50 µL contained 1x PCR reaction buffer (Roche Diagnostics, Laval, Québec, Canada; for a total MgCl2 concentration of 1.5 mmol/L), 0.05% Tween 20, 5 µg bovine serum albumin, 1 mmol/L of each primer, 200 µmol/L of each dNTP, two units Taq polymerase, and ca. 300 ng genomic DNA. The PCR conditions included an initial denaturation step of 3 min at 94°C, followed by 40 cycles of denaturation (30 s at 95°C), annealing (30 s at 48°C), and elongation (2 min at 72°C), with a final extension step of 10 min at 72°C. A long elongation time was used, and reactions were performed in triplicate to reduce the potential for PCR recombinants (Judo et al., 1998
; Cronn et al., 2002
). The triplicate reactions also reduced the possibility of finding the same Taq-induced mutation in many different clones. The PCR products were purified with polyethylene glycol (PEG; molecular mass, 8000) according to the following procedure. The PCR reactions were mixed with an equal volume of PEG solution (20% PEG, 2.5 mol/L NaCl), incubated 15 min at 37°C, and centrifuged 15 min at 12 000 x g. The supernatant was removed and the pellet was washed twice with 80% ethanol (spinning 5 min at 12 000 x g before ethanol removal). The pellet was dried 2 min in a vacuum centrifuge (no heat) and was resuspended in TE0.1 (20 mmol/L Tris-HCl, 0.1 mmol/L EDTA, pH 8.0).
|
Allele sampling
In order to derive firm conclusions on the origins of polyploids, it is important to sample all alleles in every individual. The approach used to achieve this objective differed for diploids and polyploids. Diploids that did not show polymorphic nucleotides in direct sequencing (from the total PCR reaction) were assumed to be homozygous and were not cloned. Such an assumption is valid because two equally frequent templates should be equally visible on chromatograms if there is no strong PCR bias in the reactions (Rauscher et al., 2002
). When only one polymorphic nucleotide was found for one individual, no cloning was necessary because the alleles can easily be distinguished. In contrast, individuals that showed more than one polymorphic site or that had indels among its alleles were cloned. In these cases, 34 clones were sequenced to retrieve allelic sequences. More than one clone was sequenced to eliminate the possibility of sampling a PCR recombinant with a single clone.
All tetraploids were cloned because it is easier to miss polymorphic sites on direct sequences when four alleles may be present in the genome. Assuming no PCR bias between alleles (but see Wagner et al., 1994
), the binomial distribution predicts that the probability of sampling all alleles in an individual is
![]()
where t is the number of alleles in the individual and n is the number of clones sequenced. If there were four alleles in a tetraploid, 15 clones would be required in order to obtain a 95% probability that all alleles have been sampled. With three alleles, 11 clones are needed. On average, 1115 clones were sequenced per individual (Table 1), with additional clones sequenced in all cases where the alleles resulting in polymorphisms detected in direct sequencing were not recovered.
For both diploids and tetraploids, Taq-induced PCR errors were identified and removed from analyses by comparing the sequence of cloned amplicons to one another and to the initial sequences obtained from direct sequencing. Henceforth, it will be assumed that all alleles were retrieved from each individual even if there is a non-zero probability that some alleles were not sampled in some individuals. The PCR products were cloned with the TOPO TA cloning kit (Invitrogen, Burlington, Ontario, Canada). Plasmids containing the gene were extracted from E. coli using the QIAprep miniprep kit (Qiagen, Mississauga, Ontario, Canada) and were sequenced as described earlier. Alleles from both diploids and tetraploids were aligned with ClustalX (Thompson et al., 1994
, 1997
) with a gap opening penalty of 25 and a gap extension penalty of 6. The resulting alignment did not need further manual corrections.
Testing recombination
Two different methods were used to detect recombination: the homoplasy test (Maynard Smith and Smith, 1998
), which works best when divergence between sequences is low (less than 5%; Maynard Smith and Smith, 1998
; Posada and Crandall, 2001
), and a parsimony network approach (Templeton et al., 1992
). The homoplasy test was performed using datin and exph programs (Maynard Smith and Smith, 1998
) under conservative (SE = 0.6S) and liberal (SE = S) conditions, where SE is the effective number of sites and S is the total number of sites in the data set. First and second codon positions in exons were removed from the analysis because they are evolutionarily constrained (Maynard Smith and Smith, 1998
), and the analysis was performed only on ingroup taxa. With the parsimony network approach, recombination was inferred only when it could explain at least two homoplasies and when the homoplasies corresponding to the parental alleles were physically clustered on the recombinant allele (Aquadro et al., 1986
; Templeton et al., 1992
).
Network construction
GapCoder (Young and Healy, 2003
) was used to code indels under the simple gap coding method of Simmons and Ochoterena (2000)
. The resulting matrix was used to estimate the gene genealogy of the GAPDH locus by statistical parsimony (Templeton et al., 1992
) as implemented in the TCS program (version 1.18; Clement et al., 2000
). The statistical limit of parsimony was evaluated on the matrix with the gaps recoded (although estimating it without the gaps gave the same result), and the final network was constructed so that all the haplotypes could be united in a single network.
Statistical distinction of diploid species
Diploid species boundaries were tested by permutations using an analysis of molecular variance (AMOVAs; Excoffier et al., 1992
). An uncorrected P distance matrix among haplotypes was calculated in PAUP* version 4.10b (Swofford, 2002
), and the partitioning of haplotype variance in different groups (species) was tested in Arlequin version 2 (10 000 permutations; Schneider et al., 2000
).
Origins of the polyploids
To reconstruct the evolutionary history of the polyploid taxa, the closest diploid haplotype ancestor for each allele of each polyploid individual was identified to determine which diploid species contributed to polyploids. Because alleles can mutate in polyploids, simply counting the number of haplotypes in a polyploid species will overestimate the number of origins (Doyle et al., 2004
). A conservative way of evaluating the likelihood that the polyploid species evolved recurrently is to estimate the number of "polyploid haplotype groups" that comprise all polyploid haplotypes that have a most recent common diploid haplotype (or expected diploid haplotype) ancestor (Fig. 3; see also Doyle et al., 2004
). At formation, a tetraploid can acquire up to four different alleles from diploids. Independent polyploid origins can involve one or more identical diploid alleles, yet it is impossible to detect this if there is segregation in polyploid populations. To be conservative, it was therefore assumed that for one polyploid species, a polyploid haplotype group can only be involved in one origin and that each origin always involved four polyploid haplotype groups. So if there are n polyploid haplotype groups in one polyploid species (n = 4 in Fig. 3), there needs to be at least n/4 (rounded to the upper unit) distinct origins to account for this variability (one distinct origin in the simplified example given in Fig. 3).
|
Sequences and alleles
The number of alleles found and the number of clones sequenced for each individual is indicated in Table 1. The phylogenetic analysis used the portion of the GAPDH gene that starts immediately after exon 7 and that stops at the GPDX11R primer, 17 bp downstream of exon 11. The length of this aligned region is 759 bp and it includes 15 indels. Multiple alleles in an individual were distinguished by a letter (i.e., A, B, etc.) following the species name and accession number. GenBank accession numbers (DQ091014DQ091057, DQ091060DQ091174) are given for each allele of each individual in Appendix S1 (see Supplemental Data accompanying the online version of this article).
Of all alleles recovered, one was obviously a pseudogene: the carolina289.A allele. This allele has a deletion of 1 bp in exon 10, that causes a frame shift and introduces a stop codon. Because the indel was visible in the direct sequences, and therefore present in relatively high proportions in the PCR products (Rauscher et al., 2002
) and because the reactions were performed in triplicate, it is unlikely that this mutation is the result of a PCR error.
Length of stomatal guard cells
Based on the taxonomic identifications, diploids and polyploids had disjoint distributions for their mean stomatal guard cell length (Fig. 4), and the difference between the two groups is statistically significant (two-way Student t test:
= 50, t = 14.061, P < 0.001; homoscedasticity hypothesis accepted: Levene F = 3.949, P = 0.53). The mean lengths of diploids and polyploids were under 19.18 µm and over 19.30 µm, respectively (Fig. 4, Table 1). The gap is more important when making abstraction of the carolina 626 individual, without which all polyploids would have a mean length over 20.16 µm.
|
The stomatal cell lengths reported are about 1.3 times smaller than those obtained by Lewis (1957b
, 1958
, 1959
) for both diploids and polyploids. These discrepancies are simply caused by differences in methodology.
Network
One of the premises of tree-like phylogenetic methods is that all characters have the same evolutionary history. Recombination can violate this assumption for nuclear loci, and it is important to test for its presence when using such markers. The homoplasy test was significant under both the conservative and liberal conditions (P < 0.001), suggesting that recombination is present in the data set. In contrast, no clear recombinants were detected using the network approach. Even within the loops, there was always one alternative that required only one homoplasy. The discrepancy between these results could be due to the presence of homoplasious sites in the data set: a standard parsimony analysis gave a consistency index of 0.83. Even if allelic variation ranges from 0 to 3.4% of variation among ingroup taxa, this level of homoplasy may be high enough to violate the homoplasy test's assumption of low levels of variation, which could bias the test towards a conclusion for recombination. Such behavior of the homoplasy test has previously been reported (Posada and Crandall, 2001
; Posada, 2002
). Because no clear recombination events were identified on the network, the evidence for recombination in the data is equivocal at best and the data set was analyzed as if there were no recombination.
Haplotypes with a distance of more than 12 steps (parsimony limit) from all other haplotypes were not statistically supported and their relationship to the rest of the haplotypes should be viewed as if estimated by standard parsimony procedures (Fig. 5). However, only section Synstylae was not connected to the rest of the network with this limit; the two sub-networks were 13 steps away. Henceforth, haplotypes will be referred to by the number of the box in which they occur on the network and by their specific letter (e.g., I-a represents the haplotype of allele multiflora302.A of section Synstylae; Fig. 5).
|
Regarding the diploid species east of the Rocky Mountains, two main groups can be distinguished on the network (Fig. 5). The first group includes all alleles of diploid species R. blanda and R. woodsii (the blandawoodsii or BW group, box IV in Fig. 5), whereas the other contains most alleles of R. foliolosa, R. nitida, and R. palustris (the foliolosanitidapalustris or FNP group, box V). These groups are not monophyletic, but they are nevertheless almost exclusive. There are two exceptions: one allele of R. palustris and one of R. nitida occur in the BW group. Even with these, the AMOVAs showed that the distinction between the BW and the FNP groups is significant (P < 0.001; Table 2). Neither the AMOVAs nor the network found a distinction between R. blanda and R. woodsii. Within the FNP group, AMOVAs suggest that R. foliolosa is significantly distinct from R. nitida and R. palustris (P < 0.001) and also that the differentiation between R. nitida and R. palustris is marginally significant (P < 0.05; Table 2). The network is ambiguous regarding these distinctions, however, and R. nitida and R. palustris do not clearly form distinct groups (Fig. 5). Moreover, only two individuals of R. foliolosa were investigated, limiting the significance of the distinction found with AMOVAs. In addition, the R. foliolosa alleles have R. nitida alleles as ancestors. Therefore, R. foliolosa, R. nitida, and R. palustris are considered to form a single group in the following analyses.
|
|
DISCUSSION
Diploid species boundaries
Three evolutionary processes can result in nonmonophyletic species in a genealogical framework: hybridization, incomplete lineage sorting (or deep coalescence), and gene duplication (Maddison, 1997
; Funk and Omland, 2003
). Among these processes, gene duplication is the least likely problem at low phylogenetic levels. Because no evidence of gene duplication was found, this process will not be discussed further.
Incomplete lineage sorting and hybridization
Attempts have been made to distinguish between incongruence due to incomplete lineage sorting and incongruence due to hybridization in gene trees (Sang and Zhong, 2000
), but these mostly have been unfruitful (Holder et al., 2001
). However, it is possible, in some circumstances, to discriminate between the two processes by using the full amount of information contained in branch lengths (Holder et al., 2001
). Take a hypothetical example of a lineage that splits into two distinct species at time TS, where one incongruent haplotype happens to be more closely related to the haplotypes of its sister species than it is to its own haplotypes (Fig. 7). Note that the time of speciation is independent of the gene lineages and corresponds to the time when gene flow ceased among sibling species (see Holder et al., 2001
). With incomplete lineage sorting, the most recent common ancestor of the incongruent haplotype and the haplotypes of the sister species must have been present in the common lineage before the speciation event (Fig. 7A). Therefore, the time since the divergence of the incongruent allele and the alleles of the sister species (TLS) must be at least as old as the time of divergence of the two species (TLS
TS). On a hypothetic genealogy, the incongruent allele should branch near the split between the two species relative to an outgroup taxon, and it should be quite divergent from the alleles of the sister species because it has evolved independently from the other sister species alleles for a time TLS (Fig. 7B).
|
The GAPDH haplotype network may give us examples of both hybridization and incomplete lineage sorting between the blandawoodsii and the foliolosanitidapalustris diploid groups. First, a hybridization event is probably the cause of the position of the nitida604.A allele (haplotype IV-m) in the blandawoodsii group (Fig. 5). The hybridization hypothesis is supported because the haplotype connects to the network three steps away from the node separating the two diploid groups on the network and also because it is found in a contemporary R. blanda individual. This shows that the divergence between the incongruent haplotype IV-m and the other species' allele is recent relative to the separation between the two diploid groups. The other incongruent allele, palustris386 (haplotype IV-w), is more likely to be caused by incomplete lineage sorting because it diverges from its ancestor one step away from the node delimiting the two diploid groups on the network (i.e., the split is relatively old) and because it is five steps away from the closest contemporary alleles of the blandawoodsii group, which is plausible if it has evolved independently from these alleles for some time. As discussed above, however, it is impossible to completely reject the hypothesis of hybridization for this incongruence. It is also plausible that contemporary blandawoodsii alleles closer to this allele exist but were not sampled.
Testing species boundaries
Hybridization is more frequent among closely related species. The same is true of incomplete lineage sorting, which is particularly important for nuclear genes because their effective population sizes are greater than for chloroplast or mitochondrial genes (Moore, 1995
; Wollenberg and Avise, 1999
; Rosenberg, 2003
). If we consider that species are ecologically, morphologically, and (or) genetically cohesive groups of populations that evolve independently from other such groups, then nuclear genes may fail to identify recently derived species if a criterion of monophyly (e.g., the genealogical species concept, Baum and Shaw [1995]
; the monophyletic species concept, Mishler and Theriot [2000]
, Wheeler and Platnick [2000]
) is applied (Hudson and Coyne, 2002
). Templeton (2001)
has proposed using nested clade analysis as a way to test "cohesive" species boundaries (i.e., Templeton, 1989
), therefore allowing some incongruence between the species tree and the gene tree. Unfortunately, this method requires extensive population sampling, which is a laborious task for single-copy nuclear genes because of the extensive cloning effort necessary to properly sample alleles. As an alternative, AMOVAs were used to evaluate the genetic variation due to within-species (or groups of species) variation as compared to among-species variation and to test if this latter variance is greater than that expected by chance. This method also allows some alleles to be incongruent with the species tree.
The network suggests that R. gymnocarpa is sister to all other North American Rosa species of section Cinnamomeae. The distinctiveness of this species has already been reported based on morphological characters (Watson, 1885
; Crépin, 1896
), but its phylogenetic position was uncertain. Rosa pisocarpa, although non monophyletic, is distinct from diploid species of the R. carolina complex on the network, and its position suggests that eastern diploid species are monophyletic.
Among the largely eastern taxa of the complex, AMOVAs identified two major groups of diploids: blandawoodsii and foliolosanitidapalustris. This shows that the incongruence found among groups (and discussed earlier) is not significant and that these groups could be considered as distinct. In the blandawoodsii group, no distinction was found between R. blanda and R. woodsii. Indeed, these species cannot be distinguished using morphological and molecular (AFLP) characters (J. R. Starr, S. Joly, and A. Bruneau, unpublished data). Moreover, hybrids between R. blanda and R. woodsii have been shown to be highly fertile (Erlanson, 1934
; Ratsek et al.,
939), and a hybrid zone appears to exist in the area where the two species overlap (Lewis, 1962
). Given this, the status of these species certainly needs to be addressed. In the foliolosanitidapalustris group, analyses of molecular variance suggested that R. foliolosa was distinct, although no strong conclusions regarding this species are drawn because of limited sampling. Yet, the distinction of R. foliolosa from other eastern diploid species is supported by morphology, this species being peculiar for its narrow leaflets and short pedicels, among other characters (Lewis, 1957b
, 1958
). The AMOVAs also suggest a weak distinction between R. nitida and R. palustris even if the network clearly shows that they do not form distinct groups. The species status for these two taxa is different from that of R. blanda and R. woodsii because they are clearly distinct morphologically (Lewis, 1957a
, b). Rosa nitida has numerous red bristles, is generally less than 1 m tall, and has no distinct infrastipular thorns, whereas R. palustris lacks bristles, is greater than 1 m tall, and almost always has curved infrastipular thorns. Therefore, the absence of reciprocal monophyly between R. nitida and R. palustris for the GAPDH marker may be a consequence of their recent divergence.
Origins of the polyploids
The identification of genetically distinct groups of diploids in section Cinnamomeae in North America allows the evaluation of different evolutionary hypotheses concerning the origin of the polyploids. Yet it can be difficult to determine whether a polyploid is an autopolyploid or an allopolyploid in the event of conflicting signals produced by hybridization among polyploid species, gene flow between diploids and polyploids, or allelic segregation in polyploids. Both homoploid hybridization among polyploid species and gene flow from diploids to polyploids can introduce haplotypes in a polyploid that were not originally involved in its formation and can cause an autopolyploid to look like an allopolyploid. However, gene flow also can cause an allopolyploid to look like an autopolyploid if alleles from a diploid species are fixed in the allopolyploid due to recurrent gene flow. A further confounding factor is allelic segregation. Allopolyploids are expected to maintain alleles from both parental species in their genomes by disomic segregation due to bivalent formation at meiosis. This is to be expected in northeastern American polyploid Rosa species because individuals from the three polyploid species investigated show bivalent formation (Erlanson, 1929
; Lewis, 1957b
). Nonetheless, occasional pairing between homologous chromosomes (from the different diploid species) at meiosis could cause tri- or tetravalent formation. Indeed, trivalents and tetravalents have been observed in these polyploids (W. H. Lewis, unpublished data), but these and other meiotic irregularities such as lagging chromosomes and interlocked ring bivalents are rare and are only known of individuals from the zone of sympatry between R. arkansana and R. carolina (Lewis, 1966
). Such multivalent formation leads to multisomic segregation that could bias the expected 1:1 ratio of parental alleles in an individual. Eventually this could lead to the fixation of alleles that come from a single diploid parent, resulting in a situation in which an allopolyploid might look like an autopolyploid.
Inspection of the GAPDH network shows that polyploids are of recent origin because many polyploid haplotypes are also found in contemporary diploids. The presence of shared haplotypes among diploids and polyploids makes the determination of the type of polyploid formation more difficult for each species. This is because it is harder to eliminate hypotheses of hybridization among polyploid species and of gene flow between diploids and polyploids when diploids and polyploids share the same haplotypes. Of these confounding processes, gene flow between ploidy levels seems unlikely for many reasons. First, very few triploids have been reported in wild roses (Erlanson, 1929
), and crosses between diploids and tetraploids give triploids that are highly sterile (Erlanson, 1934
). Second, diploid and tetraploid species of Rosa are often separated both in space and in time of flowering, with diploids flowering before the tetraploids, except for R. palustris, that flowers after all other species (Erlanson, 1930
). Polyploids more often grow in dry soils, either in sandy soils (R. carolina and R. virginiana; although R. virginiana also grows in salt marshes) or in upland prairies (R. arkansana), whereas diploids grow in bogs (R. nitida and R. palustris) or in mesic soils along woods and rivers (R. blanda and R. woodsii). Therefore, we consider that the probability of gene flow between ploidy levels is low. For the other conflicting processes, hybridization at the polyploid level and allele segregation in the polyploids, the recent origin of the polyploids allows us to make some assumptions about the expected results.
Given that each polyploid species has evolved recurrently (discussed later), the recent origin of polyploids gives little time for between-population genetic homogenization within polyploid species. Thus, if we have many recent formations of the polyploid species, we expect that individuals from several separate populations retain information of their origin. In other words, hybridization and allele segregation should only affect a limited number of populations in each species. Therefore, the expectations for an autopolyploid species is that most individuals will have alleles from a single diploid species even if a few can have acquired alleles from another diploid species via introgression. Moreover, individuals bearing introgressed alleles should be geographically close to individuals (or species) from which the allele is derived (Rieseberg, 1998
). In a similar way, it is unlikely that parental alleles in allopolyploid individuals will segregate in all populations and even less likely that the segregation will always be toward the same parental alleles (unless there is selection). Therefore, we expect that most individuals of an allopolyploid species will possess alleles from two diploid species even if some individuals could have fixed alleles from a single diploid species or have segregated toward a ratio of parental diploid alleles that deviates from the expected 1:1 ratio. In a further attempt to limit the potential impact of hybridization on the conclusions regarding polyploid origins, we avoided sampling individuals in areas where the distribution of polyploid species overlapped. The only exception is for R. arkansana, for which a few individuals were sampled from the zone of sympatry with R. carolina; potential impacts on the conclusions will be discussed later.
Of the eight R. arkansana individuals sampled, all have alleles in the blandawoodsii group with five lacking alleles from the foliolosanitidapalustris diploid group. Moreover, the three individuals with alleles from this latter group come from the region of sympatry between R. arkansana and R. carolina (Figs. 1 and 6). This suggests that R. arkansana evolved from within the blandawoodsii group and that the presence of alleles from the foliolosanitidapalustris group in some individuals could be the result of introgression from R. carolina. Indeed, a hypothesis of introgression from R. carolina to R. arkansana is supported by cytological (Lewis, 1966
) and morphological (A. Fishbein and W. H. Lewis, Washington University, unpublished manuscript) evidence, suggesting hybridization between these species. Because the relationships within the blanda-woodsii group are unresolved using the GAPDH marker, it cannot be stated whether R. arkansana is an auto- or an allopolyploid using a taxonomic definition (Grant, 1981
; Ramsey and Schemske, 1998
). Yet, some prefer to define autopolyploidy in a cytological context (Stebbins, 1980
; Levin, 2002
) according to which autopolyploids evolve from parents that are interfertile at the diploid level, whereas allopolyploids are formed from a hybrid that has reduced fertility. This definition predicts multivalent formation in autopolyploids and bivalent formation in allopolyploids, at least in the first stages of their evolution. According to the cytological definition, R. arkansana would probably be an autopolyploid because R. blanda and R. woodsii produce highly fertile hybrids and because they are morphologically and genetically similar.
Rosa carolina is different from R. arkansana in that all except two individuals investigated have alleles from both the blandawoodsii and the foliolosanitidapalustris diploid groups. Given the wide geographic distribution of the individuals sampled, we can affirm that R. carolina is an allopolyploid with one parent from the blandawoodsii diploid group and the other from the foliolosanitidapalustris group. The deviation from a 1:1 ratio of parental alleles expected for allopolyploids observed in some individuals is probably the result of either segregation of homologous chromosomes or introgression.
Finally, individuals of R. virginiana were found to possess only alleles that were exclusive to the foliolosanitidapalustris diploid group, except for two individuals that also have a blandawoodsii allele and one that has an allele of ambiguous origin. Therefore, the most likely hypothesis for the origin of this polyploid species is that it originated from within the foliolosanitidapalustris diploid group. Again, we cannot be certain that R. virginiana is an auto- or an allopolyploid due to the lack of resolution within the foliolosanitidapalustris group. It is highly likely that R. foliolosa was not involved in the evolution of this species, however, because no R. virginiana alleles were closely related to the alleles sampled from R. foliolosa. The situation is also different from that for R. arkansana because we have no information on the fertility of hybrids between R. palustris and R. nitida. Hence, any conclusions regarding the type of polyploid origin of R. virginiana must await further data.
To summarize, R. arkansana evolved from the blanda-woodsii group, R. viriginiana from the foliolosa-nitida-palustris group, and R. carolina from a cross between these two eastern diploid groups. These results allow an evaluation of different hypotheses that have been proposed concerning the origins of eastern polyploids. Erlanson (1929)
proposed that R. arkansana originated from a cross between R. blanda and either R. macounii Greene or R. fendleri Crépin, two species now considered synonymous with R. woodsii (Erlanson, 1934
). This hypothesis is compatible with the present findings, although our results cannot confirm that two taxonomic species were involved. For R. carolina, Erlanson (1929)
first proposed that R. virginiana would have crossed with R. palustris and that the hybrid eventually would have given a tetraploid that would have backcrossed to R. virginiana to give R. carolina. This hypothesis is improbable according to the present results because it would imply that the genetic diversity of R. carolina is a subset of R. virginiana. Because several R. carolina haplotypes do not have R. virginiana haplotypes as ancestors, our data disagree with such an evolutionary scenario. A few years later, Erlanson (1938)
suggested that R. blanda and R. woodsii gave rise to all three eastern tetraploid species as well as to R. foliolosa, R. nitida, and R. palustris. Her hypothesis regarding the evolution of R. foliolosa, R. nitida, and R. palustris seems improbable in light of the present data because these species do not appear to be derived from R. blanda and R. woodsii. Her hypothesis regarding the evolution of R. carolina and R. virginiana from R. blanda and R. woodsii alone is also likely inaccurate because the foliolosa-nitida-palustris diploid group was certainly involved in the origin of these two tetraploid species.
The results clearly show that the western diploid species were not involved in the origins of the eastern polyploid species. It is indeed improbable that a western species would have been involved in the origin of the polyploids without leaving a trace, given that several polyploid individuals from a wide geographic range were sampled. A general pattern of evolution within section Cinnamomeae in North America thus emerges from these results: diploids west and east of the Rocky Mountains seem to form distinct groups and eastern polyploids evolved from eastern diploids following the diversification of diploids.
Multiple origins of polyploidy
The number of polyploid origins was estimated using "polyploid haplotype groups" (Fig. 3), which estimates the genetic diversity of polyploids that is contributed by diploids. When working with haploid markers, each polyploid haplotype group can be interpreted as a distinct polyploid origin (e.g., Soltis et al., 1989
; Doyle et al., 1990
; Segraves et al., 1999
; Sharbel and Mitchell-Olds, 2001
). Similarly for autosomal markers, a specific combination of polyploid haplotype groups in individuals can sometimes be considered to represent a distinct origin. This is true of selfing allopolyploids that are homozygous at each homologous locus (as in Glycine; Doyle et al., 2004
) and of clonal taxa (Joly and Bruneau, 2004
). However, more often alleles at nuclear loci will segregate in polyploids, and this can create any possible combination of alleles. Hence, interpreting each genotype as an independent origin would seem to overestimate the true number of polyploid origins. For this reason, it was assumed that for each species, each tetraploid formation involved four distinct polyploid haplotype groups and that each independent formation always involved polyploid haplotype groups that were not involved in other polyploid origins. These assumptions are clearly overly conservative. For example, there may be unsampled diploid haplotypes that would increase the number of polyploid haplotype groups and a tetraploid formation can involve less than four alleles. Yet, the approach is legitimate if the objective is to evaluate the likelihood that species evolved recurrently rather than to estimate the true number of polyploid origins.
According to these conservative assumptions, all polyploid species must have evolved at least three times to explain the observed diversity of polyploids. This estimate makes many simplifications such as an absence of gene flow between ploidy levels that would tend to overestimate the number of independent origins. Yet, the impact of gene flow between ploidy levels is probably limited in North American roses (discussed earlier). Hybridization between polyploid species is another way by which polyploids acquire genetic variability that is not due to multiple origins. It is harder to account for hybridization because polyploids are known to hybridize and because they have a recent origin; this is why individuals mostly were sampled from outside the zones of sympatry between polyploids. The only exception is R. arkansana, from which we sampled five individuals that are considered near or in the sympatric zone with R. carolina (Figs. 1, 6). But even with these individuals removed (accessions 345, 406, 416, 665, and 692), there are still seven polyploid haplotype groups represented, and two independent origins of R. arkansana are needed to explain such a diversity.
Interestingly, polyploids have been able to acquire most of the available genetic diversity at the diploid level; almost all diploid haplotypes were also found in one or more polyploid species (Fig. 5). This further supports the hypothesis of independent origins of polyploid species, but above all it shows that polyploids possess a high degree of genetic variation. In the end, it is this genetic diversity that is most important, not how it was acquired. This variability, coupled with recombination and mutation in polyploid species, is likely to allow polyploid species to create adaptive genotypes that will be fitter and have more evolutionary potential in certain environments.
Taxonomic consequences
The rose species investigated here have sometimes been divided into sections Cinnamomeae (R. arkansana, R. blanda, R. woodsii) and Carolinae (R. carolina, R. foliolosa, R. nitida, R. palustris, R. virginiana) based on strictly basal placentation (Carolinae) vs. basilo-parietal placentation (Cinnamomeae), presence (Carolinae) vs. absence (Cinnamomeae) of hypanthium glands, and deciduous (Carolinae) vs. persistent (Cinnamomeae) sepals after fruit maturation (Crépin, 1889
). The present data suggest that the separation of these two sections is artificial. First, it makes section Cinnamomeae paraphyletic, and second, the reticulate origin of R. carolina also renders section Carolinae unnatural. Therefore, the best solution would be to treat section Carolinae as synonymous with section Cinnamomeae. This was previously proposed by Erlanson (1934)
and Lewis (1957a)
based on the unreliability of the morphological characters that were used to separate these sections and also supports investigations of biochemical (Grossi et al., 1998
) and molecular characters (Wissemann and Ritz, 2005
). Yet, this taxonomy still is used in the most recent comprehensive flora treatments in the United States (generic flora of the southeastern United States, Robertson, 1974
) and in Europe (Tutin et al., 1968
), perhaps because Rehder's (1940)
classification, which uses section Carolinae, is still the most widely cited taxonomic treatment of Rosa. We suggest that section Carolinae be completely removed from further taxonomic treatments.
The present study also sheds light upon the species status of the three polyploid taxa of the R. carolina complex. The results suggest that R. arkansana, R. carolina, and R. virginiana have distinct evolutionary histories, although it will certainly be important to confirm this with more markers. Consequently, this also suggests that these polyploids should be considered distinct species. These species are, of course, highly polymorphic probably in part owing to their recurrent origins, and their identification will remain difficult especially in regions of sympatry where the extensive variation is best explained by hybrid zones. Yet, results suggest that these are secondary hybrid zones (Endler, 1977
; Barton and Hewitt, 1985
) that were formed after polyploid speciation. Of course, distinct evolutionary histories do not guarantee that species will always remain distinct, and the extent of gene flow in these secondary hybrid zones will determine the future of these polyploids.
Conclusion
This study shows that both hybridization and polyploidy have been important in the evolution of the Rosa carolina complex. Three species are the result of polyploid speciation, and hybridization has occurred among diploid species and has been involved in the formation of the polyploids. In addition, hybridization further complicates the picture of the polyploids and may lead to the extensive morphological variation observed in these taxa. Finally, this study of wild rose species gives a conceptual framework that may be used to unveil the evolutionary history of other species complexes where hybridization and polyploidy are important.
FOOTNOTES
1 The authors thank L. Brouillet, E. Dickson, B. Ertter, A. Meilleur, and J. Saarela for providing plant material. Financial help for this study came from research grants (A.B.) and fellowships (S.J. and J.S.) from the National Sciences and Engineering Research Council of Canada and from the Fonds québécois de la recherche sur la nature et les technologies. ![]()
5 Author for correspondence (e-mail: simon.joly{at}umontreal.ca
) ![]()
LITERATURE CITED
Aquadro C. F Desse S. F Bland M. M Langley C. H Laurie-Ahlberg C. C. 1986. Molecular population genetics of the alcohol dehydrogenase gene region of Drosophila melanogaster. Genetics 114: 1165-1190.
Archambault A Bruneau A. 2004. Phylogenetic utility of the LEAFY/FLORICAULA gene in the Caesalpinioideae (Leguminosae): gene duplication and a novel insertion. Systematic Botany 29: 609-626.[CrossRef][ISI]
Arnold M. L. 1997. Natural hybidization and evolution Oxford University Press New York, New York, USA.
Baldwin B. G Sanderson M. J Porter J. M Wojciechowski M. F Campbell C. S Donoghue M. J. 1995. The ITS region of nuclear ribosomal DNA: a valuable source of evidence on angiosperm phylogeny. Annals of the Missouri Botanical Garden 82: 247-277.[CrossRef][ISI]
Barton N. H Hewitt G. M. 1985. Analysis of hybrid zones. Annual Reviews in Ecology and Systematics 16: 113-148.[CrossRef]
Baum D. A Shaw K. L. 1995. Genealogical perspectives on the species problem. In P. C. Hoch and A. G. Stephenson [eds.], Experimental and molecular approaches to plant biosystematics, 289303. Monographs in Systematic Botany. Missouri Botanical Garden, St. Louis, Missouri, USA.
Castelloe J Templeton A. R. 1994. Root probabilities for intraspecific gene trees under neutral coalescent theory. Molecular Phylogenetics and Evolution 3: 102-113.[CrossRef][Medline]
Clement M Posada D Crandall K. A. 2000. TCS: a computer program to estimate gene genealogies. Molecular Ecology 9: 1657-1659.[CrossRef][Medline]
Crandall K. A Templeton A. R. 1993. Empirical test of some predictions from coalescent theory with applications to intraspecific phylogeny reconstruction. Genetics 134: 959-969.[Abstract]
Crépin F. 1889. Sketch of a new classification of roses. Journal of the Royal Horticultural Society 11: 217-228.
Crépin F. 1894. Rosae hybridae: études sur les roses hybrides. Bulletin de la Société Royale de Botanique de Belgique, 1ière partie (Mémoires) 33: 52-55.
Crépin F. 1896. Rosae Americanae. I. Observations upon the genus Rosa in North America. Botanical Gazette 22: 1-34.[CrossRef]
Cronn R Cedroni M Haselkorn T Grover C Wendel J. F. 2002. PCR-mediated recombination in amplification products derived from polyploid cotton. Theoretical and Applied Genetics 104: 482-489.[CrossRef][ISI][Medline]
Denton A. L McConaughy B. L Hall B. D. 1998. Usefulness of RNA polymerase II coding sequences for estimation of green plant phylogeny. Molecular Biology and Evolution 15: 1082-1085.[ISI][Medline]
Donnelly P Tavaré S. 1986. The ages of alleles and a coalescent. Advances in Applied Probability 18: 1-19.