|
|
||||||||
Genetics and Molecular Biology |
Chaire de recherche du Canada en génomique forestière et environnementale and Centre de recherche en biologie forestière, Université Laval, Sainte-Foy, Québec, Canada G1K 7P4
Received for publication March 8, 2004. Accepted for publication September 9, 2004.
| ABSTRACT |
|---|
|
|
|---|
Key Words: allele coalescence expressed sequence tag polymorphisms historical demography intragenic recombination nuclear gene phylogeny species divergence conifers spruce
| INTRODUCTION |
|---|
|
|
|---|
Most conifers including Picea sp. are characterized by an open-pollinated mating system and wind-dispersed pollen, which appear to promote extensive nuclear gene flow and large population sizes (Hamrick et al., 1992
). This trend is best illustrated by the little differentiation in allozymes or nuclear DNA markers usually observed among spruce populations (Boyle and Morgenstern, 1987
; Furnier et al., 1991
; Isabel et al., 1995
; Müller-Starck, 1995
; Jaramillo-Correa et al., 2001
; Perry and Bousquet, 2001
; Collignon et al., 2002
; Gamache et al., 2003
). In the long-term, these factors should retard the fixation of neutral or nearly neutral polymorphisms, resulting in increased genetic diversity. Such factors have been suggested to account, at least partly, for the maintenance of high levels of allozyme diversity observed in conifers and trees in general, as compared to most other plants and organisms (e.g., Hamrick et al., 1992
; Hamrick and Godt, 1996
; Ledig, 1998
).
Thus, an investigation of the genealogy of orthologous alleles at nuclear gene loci among distant conifer species might reveal deep coalescence, perhaps preceding species and lineage divergence. If so, shared ancestral polymorphisms would lead to allele genealogies that conflict with species phylogenies. While fixed polymorphisms accumulated because lineage divergence would favor the monophyly of species, ancient shared polymorphisms would tend to break up such monophyletic assemblages.
Codominant markers of expressed nuclear genes have been developed for genetic mapping and population genetics studies for a number of spruce species (Perry and Bousquet, 1998a
, b
; Perry et al., 1999
). These markers correspond to segregating alleles at arbitrarily chosen orthologous gene loci. Sampling allele diversity at the DNA sequence level for these gene loci offers the possibility to investigate allele coalescence in a set of reproductively isolated and divergent species characterized by high genetic diversity and reputed large effective population sizes. In this study, alleles for each of three nuclear gene loci were sampled, sequenced, and compared among three distantly related biological species in the genus Picea.
| MATERIAL AND METHODS |
|---|
|
|
|---|
The three spruce species sampled, the North American, sympatric Picea glauca (Moench) Voss and P. mariana (Mill.) B.S.P., and the North Eurasian P. abies (L.) Karst. are dominant in their ecosystems with large natural ranges (Wright, 1955
). They harbor an open-pollinated mating system and extensive intraspecific gene flow for nuclear genes (e.g., Perry and Bousquet, 2001
; Gamache et al., 2003
). These various parameters should concur in large effective population sizes. On the other hand, they are not closely related: contrary to closely related taxa in the genus, they do not cross naturally and show little compatibility with each other (Wright, 1955
; Mikkola, 1969
), so they warrant recognition as distinct biological species. They also represent divergent taxa in terms of morphology (Wright, 1955
; Weng and Jackson, 2000
), with deep phylogenetic branching in the genus, as shown by phylogenies based on paternally inherited cpDNA (Sigurgeirsson and Szmidt, 1993
; Bouillé and Bousquet, unpublished data). Hence, it is highly unlikely that if trans-species shared polymorphisms were observed, they could be attributed to recent lateral gene flow between species.
For each species and each nuclear locus, around 20 alleles (haplotypes) were sampled from a panel of individuals representative of multiple populations from diverse areas of the species ranges so that most common alleles would be sampled for each locus in each species. To ensure that only one allele was sequenced at a time, haploid megagametophytes were used. Sequences from outgroup taxa belonging to the Pinaceae family were also determined for rooting the three gene trees: Abies lasiocarpa, Pinus cembra, Pseudotsuga mensiezii, and Tsuga canadensis. In order to estimate the divergence time between the three spruce species (see next), the chloroplast gene rbcL (large subunit of ribulose-1,5-bisphosphate carboxylase) was also sequenced for each of Picea abies, P. glauca, and P. mariana. Other conifer rbcL sequences were retrieved from Genbank (www.ncbi.nlm.nih.gov/Genbank).
Biochemical methods
Genomic DNA was extracted with the DNeasy Plant mini kit (Qiagen, Mississauga, Ontario). PCR and sequencing were carried out using the following primers: for Sb16, forward primer 5'-GTTCCGCCACCATATGAC-3' and reverse 5'-GCTCATTCAGCTACAAAAGC-3'; for Sb29 forward 5'-AGCGGCATTGAACAGAGTAAC-3' and reverse 5'-AATGGAAATGAAGGCAGACTC-3'; and for Sb62, forward 5'-TGAGATCCGTGGCTGAAGAG-3' and reverse 5'-GATAACGCCGGAGAGATAGAG-3'. PCR was performed with Platinum Taq DNA polymerase (Gibco BRL, Carlsbad, California, USA) according to the manufacturer's recommendations: 4 min at 95°C for denaturation; 40 cycles consisting of 30 s at 95°C, 30 s at annealing temperatures adjusted to 60°C for Sb16, and 55°C for Sb29 and Sb62, and 1 min at 72°C for extension; and final elongation for 10 min at 72°C. For rbcL, the primers and PCR protocol followed previously published procedures (Wang et al., 1999
). Amplified fragments were purified and concentrated with Amicon Microcon-PCR filter units (Millipore, Bedford, Massachusetts, USA). Both DNA strands were directly sequenced with Perkin-Elmer ABI DNA sequencers (3100 and 3700, Applied Biosystems, Foster City, California, USA), using BigDye Terminator cycle sequencing kits. Genbank accession numbers are as follows: for Sb16, AY606806 to AY606815; for Sb29, AY6117037 to AY611050; for Sb62, AY611051 to AY611064; for rbcL, AY611034 to AY611036.
Data analysis
For each nuclear locus, indels as well as substitutions were identified among alleles within and across species. Because of the small number of substitutions observed between alleles within and among species, correcting for multiple substitutions had no effect on the observed numbers of substitutions. In addition, all indels were treated as single events because there was no indication that the observed indels were associated with simple sequence repeats or other potential source of homoplasious variation. Nucleotide diversity (
) was estimated for each species using the program DnaSP (Rozas and Rozas, 1999
). The average numbers of pairwise differences per site, including substitutions and indels, were estimated between alleles within species (dw) and between species (db). Net between-species divergence values per site were also obtained by correcting for within-species diversity according to Nei (1987
, equation 10.21). For each nuclear locus, a site-by-site analysis was also conducted, and sites were classified as those carrying (1) shared polymorphisms between Picea species (trans-species shared polymorphisms, those derived polymorphisms shared by at least two species which do not support species monophyly), (2) reciprocally fixed differences between Picea species (interspecific fixed polymorphisms, which support species monophyly), and (3) apomorphies (intraspecifically derived polymorphisms limited to one species). The derived or ancestral nature of each state at a polymorphic site was determined by comparison with outgroup sequences from other genera of the Pinaceae. A state was assumed as derived when it was different than that observed in the outgroup sequences.
Gene/allele genealogies were estimated for each locus using parsimony analysis and the neighbor-joining method as implemented in PAUP* 4.0b10 (Swofford, 2002
). For both methods, pairwise deletion of indel sites was enforced so the number of characters corresponded to the length of the consensus sequence between any two taxa, plus a number of additional binary characters for scoring the presence or absence of each indel. If a substitution and an indel were superimposed on a same site, they were considered independently, and a missing value was assigned to the substitution site(s) for those sequences with the deletion. Parsimony analyses consisted of heuristic searches with 100 replicates of random additions of sequences, Tree-bisection-reconnection branch swapping and saving multiple trees. Scenarios constraining alleles of the same species to monophyly (reciprocal monophyly) were tested: differences in tree length between constrained trees and minimum trees were tested for statistical significance using the T-PTP test of Faith (1991)
as implemented in PAUP* with 5000 permutations for each gene and by considering both substitutions and indels. For neighbor-joining, pairwise numbers of differences per site were used, including substitutions and indels. The robustness of internal nodes was assessed by means of 500 bootstrap replicates for each method. Nodes supported by a frequency equal or larger than 50% were retained.
For each locus, intragenic recombination was tested among Picea sequences following the "four-gamete test" (Hudson and Kaplan, 1985
) as implemented in DnaSP (Rozas and Rozas, 1999
). The homoplasy test (Maynard Smith and Smith, 1998
) was also conducted with the START 1.0.6 software (Jolley et al., 2001
) and corroborated the results obtained earlier in all cases. If recombinant sites were identified for a given gene, the putative recombinant types were identified as follows. The ancestral state of character at each recombining position was determined from the outgroup sequences, thus allowing the identification of the allele of the ancestral type. Then, both allele types, each different from the ancestral allele at only one of the two recombinant sites, were assumed derived, thus giving rise to two derived types. The recombinant type was identified as the one resulting from recombination between these two derived types. Recombination between the ancestral type and a derived type at the two recombinant sites is also possible but less likely, especially if the double-derived type is rare or unique, such as an apomorphy. Most likely, this double-derived type is rather the recombinant type. Once a putative recombinant type allele was identified for a given gene, it was substracted from the data set, and the gene tree was estimated again to verify the robustness of tree topology.
A mutation rate (per site per year) was estimated for the three nuclear gene loci studied by averaging the divergence values per site (substitutions only) estimated between the three spruce species and then dividing by 2T where T is the divergence time between the spruce species under study. The first approach to estimate T relied on an analysis of the fossil record, where the earliest fossils morphologically representative of extant taxa could be interpreted as evidence for the diversification of the genus Picea in its major extant lineages. The second approach relied on estimating a molecular clock from rates of protein divergence, as estimated by antigenic distances between the genus Picea and both subgenera of Pinus in the Pinaceae (Prager et al., 1976
) (Fig. 1). The lineage leading to Picea split early from the lineage leading to Pinus in the history of the Pinaceae (e.g., Magallón and Sanderson, 2002
). Thus, the average distance between Picea and Pinus was calibrated by using as landmark the date of early diversification of the Pinaceae in the Early Cretaceous, 120140 my ago (mya) (Florin, 1963
; Miller, 1988
). No test of rate constancy could be performed and the relationship between time and antigenic distance was assumed to be linear (Prager et al., 1976
). The third approach was based on a molecular clock constructed using 32 rbcL sequences from both subgenera of Pinus (Wang et al., 1999
) and rbcL sequences determined herein for Picea abies, P. glauca, and P. mariana. To construct the molecular clock, the average pairwise number of substitutions was estimated from all pairwise sequence comparisons between Pinus and Picea using both synonymous and nonsynonymous substitutions. The same landmark as before (120140 mya) was used to calibrate the clock (Fig. 1). Before estimating the rbcL clock, lineage relative rate tests (Li and Bousquet, 1992
) were used to assess rate homogeneity between sequences of Pinus and Picea, using as outgroups rbcL sequences from three sister taxa of the Pinaceae: Cupressus corneyana (Cupressaceae), Podocarpus gracilior (Podocarpaceae), and Taxus baccata (Taxaceae) (Wang et al., 1999
). The lineage relative rate test follows approximately the standardized normal distribution. The tests were conducted with an application developed by J. Laroche (Centre for Bioinformatics, Univ. Laval). The rbcL rates were corrected for multiple substitutions following the two-parameter method of Kimura (1980)
because divergence rates ranged from 5 to 10% between conifer families.
|
| RESULTS |
|---|
|
|
|---|
|
|
|
|
|
|
|
|
|
|
Overall, Sb16 contributed 11 sites to trans-species shared polymorphisms (Table 3), while Sb29 contributed nine such sites (Table 5) and Sb62 seven such sites (Table 6). If we were to remove putative recombinant sites involving trans-species shared polymorphisms for loci Sb16 and Sb29, there would remain eight, six, and seven sites involving trans-species shared polymorphisms, respectively, for Sb16, Sb29, and Sb62.
Fixed interspecific polymorphisms are indicative of a separate lineage evolution. There were no fixed interspecific polymorphisms favoring species monophyly for loci Sb29 and Sb62, while there were only three such sites for Sb16, all favoring the monophyly of P. mariana alleles.
Divergence time between the three Picea species and mutation rate at nuclear gene loci
Because of uncertainty regarding the exact divergence history among the lineages leading to Picea abies, P. glauca, and P. mariana, a trichotomy was assumed for estimating their divergence time T. A first approximation of T is given by the fossil record for which the earliest fossils morphologically representative of extant spruce taxa could be interpreted as evidence for the diversification of the genus in its major extant lineages. These fossils date back to the middle Miocene, around 15 mya, and more questionable fossils were reported from the early Miocene, back to 23 mya (Wolfe, 1964
). While P. abies, P. glauca, and P. mariana represent morphologically divergent taxa (Wright, 1955
; Weng and Jackson, 2000
) with deep phylogenetic branching (Sigurgeirsson and Szmidt, 1993
; Bouillé and Bousquet, unpublished data), other extant taxa might have branched out earlier and thus, this period should be considered as a lower bound estimate of the divergence between the lineages leading to extant taxa. This period is much latter than the first reliable occurrence of Picea fossils, that is, from the middle Eocene, around 45 mya (Axelrod, 1998
; LePage, 2001
).
A second estimate of divergence time T was obtained from antigenic distance data (Prager et al., 1976
). With an average intergeneric distance of 3.5 units for the divergence between Picea and Pinus and a landmark divergence time of 120140 mya between these two genera, the maximum distance of 0.5 unit reported between Picea mariana, P. glauca, and P. abies (Prager et al., 1976
) converts to a divergence time T of 17 20 mya, assuming a linear relationship (Fig. 1).
A third estimate of divergence time T could be obtained from rbcL sequences, which behaved essentially as a molecular clock for the set of sequences sampled: lineage relative rate tests indicated that there was no rate heterogeneity between the Picea and Pinus lineages using three distinct sister groups to the Pinaceae (test value = 1.16 with outgroup Cupressus, test value = 1.92 with outgroup Podocarpus, test value = 0.97 with outgroup Taxus, all not significant at P = 0.05). With an average rate (K) of 0.0276 substitution per site (synonymous and nonsynonymous) for the divergence between Picea and Pinus and a landmark divergence time of 120140 mya (L), and by dividing K by 2L, an overall rate of substitution per site per year of 0.99 x 1010 to 1.15 x 1010 was obtained. Applying these rates to half of the average pairwise rate of 0.0031 substitution per site estimated between Picea abies, P. glauca, and P. mariana translates to a divergence time T of 13 to 16 mya (Fig. 1).
An estimate of mutation rate µ for the three nuclear gene loci was obtained by calculating the average divergence per site between species (substitutions only) (Table 4) and then dividing by 2T where T was estimated between 13 to 20 mya, the range of possible values determined earlier for the split between Picea abies, P. glauca, and P. mariana. With an average substitution rate per site of 0.0089 between spruce species, µ was estimated as 2.23 x 1010 to 3.42 x 1010 per site per year. This mutation rate is used later for estimating parameters of historical demography.
| DISCUSSION |
|---|
|
|
|---|
values) were observed within spruce species. However, the values of nucleotide diversity were in the same range as those reported for synonymous sites of protein coding sequences in the conifer Pinus sylvestris L. and in Arabidopsis (Dvornyk et al., 2002
There was no clear trend in the current study as to which of the three Picea species analyzed was the most diverse in terms of haplotype/allelic diversity: P. glauca was the most diverse for Sb16 while P. abies was fixed, but P. abies and P. glauca were the most diverse for Sb29, and P. mariana was the most diverse for Sb62. When analyzing diploid genotypes for a dozen of expressed nuclear gene loci, average estimates of observed heterozygosity were similar among the three spruce species analyzed herein, but locus-to-locus variance was high (Perry and Bousquet, 1998a
, b
; Perry et al., 1999
). Overall, our results suggest that the gene-to-gene variance is high and that more loci will need to be investigated in order to compare adequately the diversity residing in the various species. Such gene-to-gene variance is expected because of stochastic factors (Tajima, 1983
; Arbogast et al., 2002
; Rosenberg and Nordborg, 2002
).
Allele coalescence and historical population size
For each of the three nuclear gene loci analyzed, the average numbers of pairwise differences between alleles from different species (db values) were small and in the same range as those observed between alleles within species (dw values). When correcting for within-species diversity, the resulting net divergence values between species were smaller than those observed within-species, indicating that, on average, allele divergence would precede species divergence. This trend is surprising, given a divergence time of at least 13 mya between these biological species. This unexpected trend is also supported by the analysis of gene sequences site by site in which no fixed interspecific polymorphisms were found for two of the three nuclear gene loci analyzed, Sb29 and Sb62. These polymorphisms would support reciprocal species monophyly. On the other hand, several trans-species shared polymorphisms were observed for each of the three nuclear gene loci. The trend towards trans-species shared polymorphisms was confirmed by a phylogenetic analysis in which both distance and character-state approaches indicated that there were no instances where alleles from the various species were reciprocally monophyletic. Rather, alleles from the various species were intermingled for Sb29 or Sb62. For Sb16, for which no allelic variation was observed for P. abies and the two alleles detected in P. mariana were monophyletic, alleles of P. glauca were not monophyletic and many instances of trans-species shared polymorphisms were identified.
While introgressive hybridization represents a possible cause for the presence of shared polymorphisms among closely related taxa (e.g., Wang et al., 1997
; Isoda and Shiraishi, 2001
; Machado et al., 2002
), and while natural hybridization has been reported between closely related spruce taxa (e.g., between P. mariana and P. rubens Sarg., Perron and Bousquet, 1997
), more or less recent interspecific gene exchanges appear highly unlikely between the three spruce species analyzed herein. Contrary to closely related taxa in the genus, these species do not cross naturally, and they are hardly compatible with each other (Wright, 1955
; Mikkola, 1969
), which warrant their actual recognition as distinct biological species. In such a slowly evolving genus as Picea, achieving reproductive isolation is generally indicative of large divergence (Wright, 1955
). These species are also divergent with respect to several morphological characters, which is notable for this rather morphologically uniform genus (Wright, 1955
; Weng and Jackson, 2000
). In phylogenies based on cpDNA, these species are also present in distinct lineages with deep phylogenetic branching in the genus (Sigurgeirsson and Szmidt, 1993
; Bouillé and Bousquet, unpublished data). Thus, it is highly unlikely that these biological species have exchanged genes in the recent past. If they had done so, cpDNA phylogenies would reveal high phylogenetic affinity between these taxa, because cpDNA is paternally transmitted by pollen in conifers (e.g., Stine et al., 1989
).
Hence, under such conditions, trans-species shared polymorphisms are likely to be of shared ancestry. For all three loci, Tajima's test for selection was inconclusive and trans-specific-shared polymorphisms did not show any distribution pattern suggestive of maintenance by selection. Many shared polymorphisms were observed in introns, and there was an excess of shared polymorphisms over fixed polymorphisms for the three loci investigated (Table 7), which is not suggestive of locus-specific selective effects (Wang et al., 1997
). Even if neutral tests are known for their lack of power, it seems reasonable to assume, given the overall evidence at hand, that the observed polymorphisms are neutral or nearly neutral and that allele coalescence time would be governed largely by demographic factors.
Historical population size (Ne) was estimated by solving the equation
= 4Neµ (Tajima, 1983
), where µ is the mutation rate per generation for the nuclear gene loci examined. The mutation rate µ was estimated earlier as 2.23 x 1010 to 3.42 x 1010 per site per year. This estimate is in the same range as the estimated rate of synonymous substitutions per site per year obtained from comparing spruce and pine nuclear phytochrome PHYO gene sequences (µ = 4.8 x 1010, Garcia-Gil et al., 2003
). By assuming 50 years as the average generation time in these species, which is about three to five times the age at first reproduction but which is less than the maximum life expectancy in these species (Burns and Honkala, 1990
), µ becomes 1.11 to 1.71 x 108 per site per generation. Then, by using average
values deduced from Table 2 for each species, Ne estimates from 96 000 to 182 000 were obtained by applying the formula above, depending of the species (Table 8). The Ne estimates are large and well above estimates for animal species for which numbers in the range of 10 000 to 50 000 have been estimated (e.g., Sherry et al., 1997
; Hare et al., 2002
). Even if Ne estimates were underestimated or overestimated due to a number of factors, including the value of generation time used and the fact that trees have overlapping generations (Caballero, 1994
), the order of magnitude of the estimates obtained appears robust. More precise estimates for each species would require better estimates of generation time that would take into account temporal and spatial heterogeneity as well as generation overlap. As well, they would require allele frequencies estimated from several more loci in order to stabilize the gene-to-gene variance in nucleotide diversity (Tajima, 1983
; Arbogast et al., 2002
).
|
Implications for the long-term maintenance of genetic diversity in trees
The order of magnitude of these values of historical population size (Ne) reconciles well with much lower estimates obtained at the time of speciation for Picea rubens, a recent derivative of P. mariana during the Pleistocene, which harbors reduced gene diversity at expressed gene loci (in terms of observed heterozygosity, proportion of polymorphic loci and number of alleles, Perron et al., 2000
) compared to that of P. abies (Perry et al., 1999
), P. glauca (Perry and Bousquet, 1998b
), or P. mariana (Perry and Bousquet, 1998a
; Perron et al., 2000
). To account for the lower genetic diversity observed in P. rubens, a mild bottleneck of Ne in the range of 10 000 was estimated from genetic drift simulations (Perron et al., 2000
). Hence, the estimates of historical population size in this study, which are more than an order of magnitude higher than those inferred for the genetically depauperate P. rubens, support the idea that large historical population sizes in P. abies, P. glauca, and P. mariana led at least partly to the long-term maintenance of higher levels of gene diversity. Indeed, these three species do not seem to have suffered much from the Pleistocene glaciations, even if they were displaced repeatedly (e.g., Davis, 1983
).
Such large historical population sizes also reconcile well with our knowledge of the ecological determinants and reproductive biology of these species. Spruces harbor archaic wind-pollinated mating systems with high outcrossing rates (e.g., Perry and Bousquet, 2001
). It is unlikely that these traits have changed during their history. These taxa are also abundant and occupy a dominant position in their ecosystems (Wright, 1955
). All these factors contribute to maintaining large effective population sizes. Such population sizes and extensive gene flow are also suggested by the low levels of population differentiation in spruces, as measured by nuclear markers of a molecular or biochemical nature (Boyle and Morgenstern, 1987
; Furnier and Stine, 1991
; Isabel et al., 1995
; Müller-Starck, 1995
; Jaramillo-Correa et al., 2001
; Perry and Bousquet, 2001
; Collignon et al., 2002
; Gamache et al., 2003
). Exceptions to this rule exist in which spruce taxa with more scattered populations and restricted ranges show more population differentiation and overall smaller genetic diversity as a result of more or less recent bottlenecks, such as for P. rubens (Hawley and DeHayes, 1994
; Perron et al., 2000
; Rajora et al., 2000
) or for P. martinezii T. F. Patterson (Ledig et al., 2000
).
These results bear serious implications for our understanding of the maintenance of neutral or nearly neutral genetic diversity at nuclear genes in spruces and, more generally, in conifers and other tree species harboring ecological and reproductive strategies promoting large population sizes. On average, one would expect large population sizes to favor the maintenance of high levels of neutral or nearly neutral genetic diversity. Under such conditions, novel alleles would be less likely to be lost through drift and their time to fixation would be longer (Kimura and Ohta, 1969
).
If effective, this process would lead to the disproportionate coexistence of neutral or nearly neutral polymorphisms at nuclear gene loci and high levels of haplotype/allelic diversity, such as observed in this study and in previous studies (Perry and Bousquet, 1998a
, b
; Perry et al., 1999
). The comparative analysis of allozyme diversity among plant taxa with contrasting life-history and ecological determinants appears to support such a neutral/demographic process (Hamrick et al., 1992
; Hamrick and Godt, 1996
). On average, taxa with higher levels of heterozygosity and higher number of alleles per locus are characterized by ecological, population, and demographic features conferring large population sizes, such as outcrossing, wind pollination or dominant position in the ecosystem. The three spruce species analyzed in this study are good examples, with high levels of diversity at the population level (He typically in the range of 0.200.30 and above) for allozymes (Boyle and Morgenstern, 1987
; Bergmann and Ruetz, 1991
; Furnier et al., 1991
; Hamrick et al., 1992
; Isabel et al., 1995
; Müller-Starck, 1995
; Jaramillo-Correa et al., 2001
). Such a link between population determinants and diversity at allozyme loci is not unexpected, because of the essentially neutral or nearly neutral nature of allozyme polymorphisms (e.g., in black spruce, Isabel et al., 1995
; in white spruce, Jaramillo-Correa et al., 2001
). Recently, this diversity trend has further been confirmed at the DNA level, for which the highest levels of within-population gene diversity for RAPD and microsatellite markers were detected in outcrossing plant taxa and long-lived perennials, which included tree species (Nybom, 2004
).
Implications for estimating species phylogenies from nuclear genes
The detection of trans-species shared polymorphisms at orthologous gene loci calls for caution in estimating congeneric species phylogenies from nuclear genes in plants with life-history and reproductive determinants favoring large effective population sizes, even when gene orthologues can be unambiguously distinguished from putative paralogs. In this study, well-characterized regions of the nuclear genome were used and paralogous sequences were avoided: primers were designed to be specific to a single gene region (Perry and Bousquet, 1998a
), the mendelian segregation of allelic variants had been previously documented (Perry and Bousquet, 1998a
, b
; Perry et al., 1999
), and orthology was further validated by sequencing alleles from haploid tissues. Even so, allele orthology did not appear to be a sufficient criterion to guarantee a gene tree matching a species tree. The detection of several trans-species shared polymorphisms at the three gene loci investigated in this study warns about potential pitfalls in estimating species phylogeny from nuclear genes in such taxa. It emphasizes the need for data from many nuclear loci (Arbogast et al., 2002
; Rosenberg and Nordborg, 2002
). Ideally, validation should be sought with phylogenies derived from chloroplast and mitochondrial genes for which allele coalescence time and the frequency of trans-species shared polymorphisms would be reduced as compared to nuclear genes (Bouillé and Bousquet, unpublished data). This is because of the uniparental transmission of organellar genomes for which the average coalescence time of two randomly picked cpDNA or mtDNA alleles is not 2Ne but Ne generations. Such recommendations of caution appear even more appropriate when little is known about the demography and the history of a species or group of species.
Trans-species shared polymorphisms of two different types were detected. The ones shared by ancestry were the most abundant. But for two of the three nuclear genes analyzed, a few trans-species shared polymorphisms were detected, which were likely the result of ancient intragenic recombination. The genome-wide frequency of intragenic recombination is unknown for plant nuclear genes. But intragenic recombination in itself constitutes another factor that can lead to biased phylogenies (Shierup and Hein, 2000
; Posada and Crandall, 2002
; Rosenberg and Nordborg, 2002
). In our study, this factor was taken into account by estimating gene trees with and without the recombinant alleles.
The presence of trans-species shared polymorphisms also calls for caution in interpreting gene genealogies. In plant and tree species suspected of large historical population sizes, nonmonophyletic patterns of allelic variation should persist for tens or hundreds of thousands generations. In such cases, it has been pointed out that extreme caution should be exercised when inferring reticulate evolution from polyphyletic patterns at nuclear loci (Hare et al., 2002
). Such patterns could also be caused by the long-term maintenance of trans-species-polymorphisms of shared ancestry. More generally, our study emphasizes the limitations of single or oligo-gene genealogies at the nuclear level to circumscribe useful taxonomical or ecological units when large historical population sizes are suspected.
| FOOTNOTES |
|---|
| LITERATURE CITED |
|---|
|
|
|---|
Axelrod D. I. 1998 The Eocene Thunder Mountain flora of central Idaho. University of California Publications in Geological Sciences 142: 1-61
Bergmann F. W. Ruetz 1991 Isozyme variation and heterozygosity in random tree samples and selected orchard clones from the same Norway spruce populations. Forest Ecology and Management 46: 39-47[CrossRef][ISI]
Boyle T. J. B. E. K. Morgenstern 1987 Some aspects of the population structure of black spruce in central New Brunswick. Silvae Genetica 36: 53-60[ISI]
Burns R. M. B. H. Honkala 1990 Silvics of North America, vol. 1, Conifers. Agriculture Handbook 654. US Department of Agriculture, Forest Service, Washington, D.C., USA
Caballero A. 1994 Developments in the prediction of effective population size. Heredity 73: 657-679
Collignon A.-M. H. Van de Sype J.-M. Favre 2002 Geographical variation in random amplified DNA and quantitative traits in Norway spruce. Canadian Journal of Forest Research 32: 266-282[CrossRef]
Davis M. B. 1983 Quaternary history of deciduous forests of eastern North America and Europe. Annals of the Missouri Botanical Garden 70: 550-563
Dvornyk V. A. Sirviö M. Mikkonen O. Savolainen 2002 Low nucleotide diversity at the pal1 locus in the widely distributed Pinus sylvestris. Molecular Biology and Evolution 19: 179-188
Faith D. P. 1991 Cladistic permutation tests for monophyly and nonmonophyly. Systematic Zoology 40: 366-375[CrossRef]
Florin R. 1963 The distribution of conifer and taxad genera in time and space. Acta Horti Bergiani 20: 121-312
Furnier G. R. M. Stine C. A. Mohn M. A. Clyde 1991 Geographic patterns of variation in allozymes and height growth in white spruce. Canadian Journal of Forest Research 21: 707-712[CrossRef]
Gamache I. J. P. Jaramillo-Corea S. Payette J. Bousquet 2003 Diverging patterns of mitochondrial and nuclear DNA diversity in subarctic black spruce: imprint of a founder effect associated with postglacial colonization. Molecular Ecology 12: 891-901[CrossRef][Medline]
Garcia-Gil M. R. M. Mikkonen O. Savolainen 2003 Nucleotide diversity at two phytochrome loci along a latitudinal cline in Pinus sylvestris. Molecular Ecology 12: 1195-1206[CrossRef][Medline]
Hamrick J. L. M. J. W. Godt S. L. Sherman-Broyles 1992 Factors influencing levels of genetic diversity in woody plant species. New Forests 6: 95-124[CrossRef]
Hamrick J. L. M. J. W. Godt 1996 Effects of life history traits on genetic diversity in plant species. Philosophical Transactions of the Royal Society of London, B, Biological Sciences 351: 1291-1298[CrossRef]
Hare M. P. F. Cipriano S. R. Palumbi 2002 Genetic evidence on the demography of speciation in allopatric dolphin species. Evolution 56: 804-816[CrossRef][ISI][Medline]
Hawley G. J. D. H. DeHayes 1994 Genetic diversity and population structure of red spruce (Picea rubens). Canadian Journal of Botany 72: 1778-1786[CrossRef]
Hudson R. R. 1990 Gene genealogies and the coalescent process. In D. Gutuyama and J. Antonivics [eds.], Oxford surveys in evolutionary biology, vol. 7, 144. Oxford University Press, Oxford, UK
Hudson R. R. N. L. Kaplan 1985 Statistical properties of the number of recombination events in the history of a sample of DNA samples. Genetics 111: 147-164
Ioerger T. R. A. G. Clark T.-H. Kao 1990 Polymorphism at the self-incompatibility locus in Solanaceae predates speciation. Proceedings of the National Academy of Sciences, USA 87: 9732-9735
Isabel N. J. Beaulieu J. Bousquet 1995 Complete congruence between gene diversity estimates derived from genotypic data at enzyme and random amplified polymorphic DNA loci in black spruce. Proceedings of the National Academy of Sciences, USA 92: 6369-6373
Isoda K. S. Shiraishi 2001 Allelic sequence polymorphisms in the intron region of the nuclear-encoded GapC gene preceded the speciation of three closely related Abies species (Pinaceae). Theoretical and Applied Genetics 102: 244-250[CrossRef][ISI]
Jaramillo-Correa J. P. J. Beaulieu J. Bousquet 2001 Contrasting evolutionary forces driving population structure at expressed sequence tag polymorphisms, allozymes and quantitative traits in white spruce. Molecular Ecology 10: 2729-2740[CrossRef][Medline]
Jolley K. A. E. J. Feil M.-S. Chan M. C. J. Maiden 2001 Sequence type analysis and recombinational tests (START). Bioinformatics 17: 1230-1231
Kimura M. T. Ohta 1969 The average number of generations until fixation of a mutant gene in a finite population. Genetics 61: 763-771
Kimura M. 1980 A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. Journal of Molecular Evolution 16: 111-120[CrossRef][ISI][Medline]
Ledig F. T. 1998 Genetic variation in Pinus. In D. M. Richardson [ed.], Ecology and biogeography of Pinus, 251280. Cambridge University Press, Cambridge, UK
Ledig F. T. B. Bermejo-Velazquez P. D. Hodgskiss D. R. Johnson C. Flores-Lopez V. Jacob-Cervantes 2000 The mating system and genus diversity in Martinez spruce, an extremely rare endemic of Mexico's Sierra Madre Oriental: an example of facultative selfing and survival in interglacial refugia. Canadian Journal of Forest Research 30: 1156-1164[CrossRef]
LePage B. A. 2001 New species of Picea A. Dietrich (Pinaceae) from the middle Eocene of Axel Heiberg Island, Arctic Canada. Botanical Journal of the Linnean Society 135: 137-167[CrossRef]
Li P. J. Bousquet 1992 Relative-rate test for nucleotide substitutions between two lineages. Molecular Biology and Evolution 9: 1185-1189[ISI]
Machado C. A. R. M. Kliman J. A. Markert J. Hey 2002 Inferring the history of speciation from multilocus DNA sequence data: the case of Drosophila pseudoobscura and close relatives. Molecular Biology and Evolution 19: 472-488
Magallón S. M. J. Sanderson 2002 Relationships among seed plants inferred from highly conserved genes: sorting conflicting phylogenetic signals among ancient lineages. American Journal of Botany 89: 1991-2006
Maynard Smith J. N. H. Smith 1998 Detecting recombination from gene trees. Molecular Biology and Evolution 15: 590-599