|
|
||||||||
|
(American Journal of Botany. 2008;95:1466-1474.) doi: 10.3732/ajb.0800091 © 2008 Botanical Society of America, Inc. |
What's this? |
Population Biology |
2 Department of Ecology and Evolutionary Biology and The Natural History Museum and Biodiversity Research Center, University of Kansas, Lawrence, Kansas 66045 USA 3 Jardín de Aclimatación de la Orotava, Puerto de la Cruz, Tenerife, Canary Islands, Spain
Received for publication 9 March 2008. Accepted for publication 25 August 2008.
ABSTRACT
Accurate determination of patterns of genetic variation provides a powerful inferential tool for studies of evolution and conservation. For more than 30 years, enzyme electrophoresis was the preferred method for elucidating these patterns. As a result, evolutionary geneticists have acquired considerable understanding of the relationship between patterns of allozyme variation and aspects of evolutionary process. Myriad molecular markers and statistical analyses have since emerged, enabling improved estimates of patterns of genetic diversity. With these advances, there is a need to evaluate results obtained with different markers and analytical methods. We present a comparative study of gene statistic estimates (FST, GST, FIS, HS, and HT) calculated from an intersimple sequence repeat (ISSR) and an allozyme data set derived from the same populations using both standard and Bayesian statistical approaches. Significant differences were found between estimates, owing to the effects of marker and analysis type. Most notably, FST estimates for codominant data differ between Bayesian and standard approaches. Levels of statistical significance are greatly affected by methodology and, in some cases, are not associated with similar levels of biological significance. Our results suggest that caution should be used in equating or comparing results obtained using different markers and/or methods of analysis.
Key Words: allozymes arbitrarily amplified DNA (AAD) Asteraceae Bayesian analysis codominant markers genetic differentiation genetic diversity intersimple sequence repeat ISSR Tolpis
The patterns of genetic variation within and among populations are of interest to diverse fields in plant biology including population genetics, systematics, and conservation. For the past four decades, following the demonstration of the utility of enzyme electrophoresis (Harris, 1966
; Hubby and Lewontin, 1966
; Lewontin and Hubby, 1966
), there has been an ever-increasing use of various types of molecular markers to assess genetic variation. The basic rationale for molecular markers replacing earlier approaches such as quantitative characters as a means of assessing genetic variation is the more direct equation between genotype and phenotype obtained with molecular methods (Lewontin and Hubby, 1966
; Schulman, 2007
). However, as the science has progressed, considerations of improved efficiency and sensitivity have promoted the development of new molecular markers, many of which present significant analytical challenges to accurately assessing genetic variation (Sunnucks, 2000
). Although ongoing developments in statistical analyses offer the potential to overcome these challenges, there is still a general inability to knowledgeably compare analogous estimates derived from different marker classes and/or statistical methods (Bonin et al., 2007
).
For more than four decades, allozyme markers have been an invaluable tool for studies of evolutionary genetics, providing plant biologists with a straightforward, low cost means of estimating levels of intraspecific genetic variation (Cruzan, 1998
). Allozymes produce codominant data, which permit direct observation of allele frequencies at allozyme loci and can be used rather simply to calculate various gene statistics (Hubby and Lewontin, 1966
; Lewontin and Hubby, 1966
; Hamrick, 1989
; Weeden and Wendel, 1989
). In addition, because of the highly conserved nature of allozyme loci in flowering plants (Gottlieb, 1982
), homologous loci can be compared between closely related species. Practical advantages of allozymes include the relative procedural simplicity and low cost of the method (Clegg, 1989
). Because of the large database that has accrued for allozymes, their estimated patterns of variation can be compared among plants with different ecological and life history traits (e.g., Hamrick and Godt, 1989
). One of the major criticisms of allozyme data concerns the level of genome sampling; allozyme variation can only be determined for protein-coding genes (many of the assays are for enzymes of glycolysis and the citric acid cycle) of which, in plants, there is a rather small (
40) potential pool of useful candidates (Clegg, 1989
; Wendel and Weeden, 1989
). This number is further reduced for within-species studies where often only about 50% of the loci are polymorphic (Hamrick and Godt, 1989
). In addition, variation at each of these loci may only be detected if it affects the electrophoretic mobility of the enzyme with the standard conditions employed (Lewontin and Hubby, 1966
; Clegg, 1989
). As much as 20% of the base substitutions may go undetected (Coates and Byrne, 2005
). Allozyme variation is often absent in groups of recently radiated taxa for which allozymes frequently provide limited and/or imprecise estimates of population genetic structure (e.g., Schwartz, 1985
; Crawford et al., 1987
).
In contrast to allozymes, arbitrarily amplified DNA (AAD) methods (e.g., AFLP, intersimple sequence repeat, and random amplified polymorphic DNA) are able to produce large data sets, especially when loci are visualized using polyacrylamide gels. These loci presumably represent neutral, rapidly evolving regions from across the genome (Clegg, 1989
; Huang and Sun, 2000
; Krauss, 2000
; Archibald et al., 2006
). Very small amounts of plant material are needed for these markers, making them ideal for use with rare species. Utilizing PCR, AAD methods amplify a specific region of DNA or "allele," which is visualized on an electrophoretic gel as a band presence or absence. Because band presence can indicate either the dominant homozygote or heterozygote, genotype and allele frequencies cannot be directly determined and estimation of gene statistics can be problematic (Meudt and Clarke, 2007
). This represents a significant disadvantage compared to allozymes. However, the increased level of variation often seen with AAD markers and the increased ease of obtaining the required amount of plant material has led to these markers being preferred over allozymes in many studies; particularly in cases where genetic variation within and among populations and/or species is low (Crawford et al., 1994
, 2001
).
Although codominant markers like allozymes are preferred for most genetic studies, especially those that calculate statistics requiring knowledge of allele frequencies, recently developed analytical techniques have allowed researchers to take greater advantage of the benefits of dominant markers (Krauss, 2000
; Bonin et al., 2007
). The traditional approach to estimating levels of population structure with allele frequency data has typically involved the use of F-statistics (i.e., FIT, FIS, FST). These were originally defined by Wright (1943
, 1951
) and based on correlations between uniting gametes at different hierarchical levels, total population (T) and population subdivision (S). Nei (1973
, 1977
), seeking to expand the use of Wrights F-statistics beyond a single locus two allele system, redefined them as functions of partitioned gene diversity and calculated levels of inbreeding (FIS and FIT) and genetic differentiation (GST) using measures of observed (HO) and expected (HS and HT) heterozygosity. Neis (1973)
coefficient of gene differentiation, GST, is a multilocus, multiallele equivalent of Wrights FST.
Though F-statistics have been used extensively with codominant data, the methodological requirement of allele frequency estimates has made their application to dominant data problematic. Bayesian statistical analysis is an approach that is increasingly applied to evolutionary genetic studies because it ostensibly offers investigators the ability to overcome some of the analytical shortfalls of dominant data sets (Zhivotovsky, 1999
; Holsinger et al., 2002
). One of the more prominent implementations of Bayesian statistics to the analysis of dominant data is the method described by Holsinger et al. (2002)
. The method is based on a Bayesian hierarchical model and directly estimates an FST analogue (
B) from dominant or codominant data, while incorporating the effect of uncertainty in inbreeding (FIS) on this estimate. Holsinger et al. (2002)
has demonstrated that this method produces reliable estimates of FST (although see Bonin et al., 2007
) and allele frequencies without the assumption of a known inbreeding coefficient, which is required by other approaches (Lynch and Milligan, 1994
; Zhivotovsky, 1999
). By using the Bayesian estimate of mean allele frequency to calculate expected panmictic heterozygosities (HS and HT), the method is also able to produce an estimate of Neis GST, known as GSTB. Under this framework,
B corresponds to a random-effects model of population sampling, which produces estimates from all potentially sampled populations and presumably reduces sampling error (Weir, 1996
, p. 162; Holsinger, 1999
). Alternatively, the GSTB estimate corresponds to a fixed-effects model and is derived from all actually sampled populations (Weir, 1996
, p. 162; Holsinger, 1999
). Unlike standard implementations of the random-effects model (i.e., Weir and Cockerham, 1984
), the Bayesian method is able to use it without a specified model of population divergence (Holsinger, 1999
).
Despite their respective limitations, both allozyme and dominant data offer viable, low cost alternatives to more resource intensive methods (e.g., microsatellites; Schulman, 2007
; Agarwal et al., 2008
). At present, the greatly increased use of dominant markers over allozymes and the aforementioned large database available for allozymes provides the potential to compare results from the two types of markers and relate them to many traits of plant ecology and life history. However, the relationships among analogous gene statistics derived from different marker classes and statistical methods are not well known. A better understanding of the comparability of gene statistic estimates derived from different methods will allow a more complete synthesis of the knowledge accumulated throughout more than 40 years of molecular genetic research. The opportunity for researchers to employ the full body of this knowledge will benefit a broad range of studies across the fields of plant genetics, systematics, and conservation, placing in a larger biological context the results of newer methods of data production and analysis.
For insight into the relationships among gene statistic estimates, comparisons of these estimates should be conducted across a broad range of biological conditions (i.e., patterns of genetic variation) and include empirical data for both codominant and dominant markers from the same populations. The purpose of this study was to respond to this need for comparative data among different commonly used markers and methods of analysis. We present a comparison of analogous estimates of FST, GST, FIS, and expected heterozygosity derived from dominant (intersimple sequence repeat, ISSR) and codominant (allozyme) data sets for the angiosperm genus Tolpis (Asteraceae) using the Holsinger et al. (2002)
Bayesian method, as well as standard approaches to codominant and dominant data. Tolpis represents a recent radiation for which both dominant and codominant data have been used to assess genetic relationships among taxa, as well as other aspects of species biology (Archibald et al., 2006
; Crawford et al., 2006
). By comparing specific estimates (Table 1), our goal was to determine patterns of relationships among analogous gene statistics and provide additional information as to the operation of Bayesian analysis when applied to individual empirical studies.
|
Data
Tolpis (Asteraceae) is a Macaronesian and Mediterranean angiosperm genus, of which the majority of species represent a recent radiation within the Canary Island archipelago (Park et al., 2001
; Crawford et al., 2006
). From this Canarian radiation, we sampled 15 populations across five species (Table 2), constituting the bulk of the so-called T. laciniata-T. lagopoda complex (Archibald et al., 2006
; Crawford et al., 2006
). This complex comprises a high level of morphological and ecological variation, though all species share a perennial habit and are highly self-incompatible (Crawford et al., 2008
). Tolpis laciniata and T. lagopoda, in particular, form large outcrossing populations (Crawford et al., 2006
).
|
Analyses
Allozyme data were analyzed using both the Bayesian approach implemented in the program Hickory version 1.1 (Holsinger and Lewis, 2003
) and the standard methods described by Nei and Chesser (1983)
, which used manual calculations. Hickory estimates the Bayesian analogue of Weir and Cockerhams (1984)
FST and Wrights FIS (1951), designated in the program as
II and
, respectively (Holsinger, 1999
; Holsinger et al., 2002
; Holsinger and Lewis 2003
). It also provides estimates of Neis (1973)
average expected panmictic heterozygosity (HS), total expected panmictic heterozygosity (HT), and coefficient of gene differentiation (GST), denoted by GSTB. Bayesian analyses of the codominant allozyme data set in Hickory were performed with default parameter settings (burn-in = 50000, sampling = 250000, thinning = 50) under the full model analysis, which provides estimates of both
II and
. Nei and Chessers (1983)
methods produce unbiased estimates of GST, FIS, HS, and HT. The GST statistic is a multilocus, multiallele estimate of Wrights FST (Nei, 1973
) and is often (as in the cases of Hall et al., 1994
; Yeh et al., 1997
) designated as the latter for ease of discussion. For this reason, we too refer to this estimate as FST. However, considering its statistical affinity we compared it to both
II and GSTB.
The ISSR data set was analyzed using Hickory under both the
-free and full model options and with Arlequin version 3.1 (Excoffier et al., 2005
). Under Hickorys full model, estimates of
influence those of
II and vice versa (Holsinger et al., 2002
). Because estimates of
have been shown to be unreliable when calculated from dominant data sets, especially with small population sample sizes, Holsinger and Lewis (2003)
suggested that under these conditions the
-free model, which removes the constraints of
on
II estimation, may be more appropriate. Both of these analyses were performed under the default parameter settings described before. Two replicate runs of each Hickory analysis, for both marker classes, were produced to ensure convergence of the Markov chain Monte Carlo (MCMC) sampler.
In Arlequin, a locus-by-locus analysis of molecular variance (AMOVA) produced locus-specific estimates of
ST, which is an FST analogue based on pairwise squared Euclidean distances (Excoffier et al., 1992
, 2005
). The average of these individual values was calculated to produce the multilocus estimate reported in the results. The AMOVA is a commonly used method for producing estimates of genetic differentiation from dominant data.
Statistical comparisons among estimates were conducted using 95% Bayesian credible intervals and confidence intervals for individual estimates. In Hickory, the production of a sample log file during analysis allowed a statistical comparison of
II and
estimates (described in Holsinger and Wallace, 2004
) using the posterior comparison option. Pairwise comparisons involving non-Bayesian estimates required the calculation of a 95% confidence interval (Zar, 1999
), which could be compared to a similar Bayesian credible interval. Estimates were considered statistically different if each confidence/credible interval did not overlap the other estimates mean value.
The following comparisons (summarized in Table 1) were considered for statistical analysis and discussion: (1) comparison between Bayesian and standard analyses of an allozyme data set; (2) comparison between models of Bayesian analysis applied to an ISSR data set; (3) comparison between Bayesian and AMOVA estimates applied to an ISSR data set; (4) comparison between ISSR and allozyme estimates produced by Bayesian analyses.
Sample size differences
Population sample size is an important factor in gene statistic estimation, and differential sampling of a given population or set of subpopulations may yield disparate estimates of genetic structure (Bonin et al., 2007
). In this study, differences in population sample sizes between marker classes could potentially account for differences in estimates of population parameters between those classes. To determine if, and in what way, the differences in sample size between the ISSR and allozyme data sets have influenced estimates of genetic differentiation in this study, the following analyses were conducted: (1) populations with ISSR sample sizes of three individuals (i.e., 17 and 1869; Table 2) were removed from the allozyme and ISSR data sets and FST and/or
II were recalculated for both; (2) population sample sizes in the allozyme data set were reduced, randomly, to equal the corresponding sample sizes in the ISSR data set, then FST and
II estimates were recalculated.
Data simulation and analysis
Two data simulation studies were conducted to address aspects of Bayesian gene statistic estimation that surfaced in the results of our analyses of the empirical data sets. Simulation study 1 was designed to investigate divergent allozyme FST estimates for standard and Bayesian methods, while simulation study 2 addressed the effects of sample size on accurate
estimation with dominant data.
Simulation study 1
The program EASYPOP version 1.7 (Balloux, 2001
), a forward-time simulator employing an individual based model of evolution, was used to produce 110 codominant data sets. Evolution was simulated at 15 multiallelic loci (i.e., with five alleles each) for four populations of 100 diploid individuals each and 20 individuals per population were randomly subsampled for analysis. There was no migration, mutation, or selfing, and mating was random. Simulation durations (i.e., number of generations) were varied to produce different final FST values. Each of the data sets was analyzed using a standard approach in the program Popgene version 1.32 (Yeh and Boyle, 1997
; Yeh et al., 1997
) and the Bayesian full model from Hickory. The differences between the FST/GST estimates produced by these methods were plotted against the simulated FST.
Simulation study 2
Using the program EASYPOP v. 1.7 (Balloux, 2001
), we simulated five data sets at each of five FST-value ranges (0.01–0.1, 0.275–0.325, 0.4–0.45, 0.55–0.6, 0.8–0.85) for four different sample sizes (20, 50, 100, and 200 individuals). Data sets consisted of 10 populations of 500 individuals each. Each population was randomly subsampled to produce the sample sizes listed. Individuals genotypes comprised 100 biallelic loci, which were converted to dominant data. Simulated FIS values ranged between 0.13 and 0.15. Evolution input parameters for simulations included: no migration, no mutation, and a 0.25 selfing value. Simulated data sets were analyzed in Hickory under the full model.
RESULTS
FST and GST
Our estimates of FST differed according to analytical method (Fig. 1). The standard method (AS; see Table 1 for designation of methods) estimate for allozymes (0.388 ± 0.22) exceeded all three of the Bayesian
II values, including that produced (0.172 ± 0.017) for the same allozyme data set (AB). Different model analyses of ISSR data in Hickory produced significantly different results for
II (IB, 0.165 ± 0.003; IBf, 0.147 ± 0.007), though neither could be distinguished statistically from the AB estimate. The AMOVA-based
ST estimate (IS, 0.137 ± 0.151) was statistically different from the IB estimate, but not from that of IBf.
|
The effect of sample size differences between data sets was investigated by altering, either through the removal of whole populations or individuals within populations, the sample design of the full data analysis and recalculating FST and
II values. The removal of populations 17 and 1869, each with an ISSR sample size of three, produced estimates for AS, AB, and IBf that were not significantly different from the corresponding values in the full data analysis. Reducing population sample sizes in the allozyme data set did not significantly change the AS FST estimate (0.439), but did lower the AB
II estimate (0.129) as compared to the full data analysis.
In simulation study 1, we found a strong positive linear relationship (Fig. 2) between the difference in FST estimates and the simulated FST. We observe the same positive relationship for Bayesian GSTB estimates (Fig. 3), though the increase in the difference between estimates over the span of FST values is of a much smaller magnitude.
|
|
|
and the simulated FIS for each FST value and sample size (Fig. 5). We found that increasing sample size up to 200 individuals per population had little effect on the accuracy of
estimation. While changing FST values did affect the relationship of simulated and estimated FIS values, it, likewise, did not result in a change in the accuracy of the Bayesian
estimate.
|
|
|
|
The increasing use of recently developed molecular genetic markers and their concomitant statistical analyses necessitates an improved understanding of the comparability of genetic estimates across various methodological approaches. Such an understanding is especially important given the wealth of genetic and life history information that is restricted to interpretations of allozyme data sets, now that allozymes are being increasingly supplanted by AAD and other markers. However, despite a number of comparative reviews of molecular markers (e.g., Nybom and Bartish, 2000
; Nybom, 2004
; Coates and Byrne, 2005
), there are few (e.g., Virk et al., 2000
) that conduct exhaustive comparisons of alternative marker data sets produced from identical populations and under similar analytical conditions. Even more rare are studies in which both different markers and different methods of analysis have been compared (Holsinger and Wallace, 2004
). To contribute to the overall understanding of relationships among analogous gene statistic estimates, we conducted a narrowly focused, comparative study of a set of gene statistics calculated in three different analytical environments from empirically derived dominant and codominant data. The study shows that estimates differ between analytical method and marker type, though not always in the manner suggested by the literature.
Consider first the confidence with which estimates of population differentiation obtained by different markers and analyses can be compared. The apportionment of genetic diversity within and among populations is one of the most important measures for biologists because it is broadly informative. For example, it is often employed in conservation strategies aimed at preserving maximum genetic diversity in a species, specifically as an aid to identifying genetically distinct (e.g., highly differentiated) elements within a species. Estimates of genetic differentiation among populations based on allozymes are available for a multitude of taxa, and the level and pattern of differentiation are often associated with a variety of life history attributes, especially breeding system (Hamrick and Godt, 1989
). However, the lack of variation at allozyme loci, particularly in rare species, may preclude calculations of genetic differentiation. In these cases it would be advantageous to determine whether estimates of differentiation from AADs are comparable to estimates from allozymes for species with similar life history and ecological attributes.
FST and GST
The FST statistic estimates the level of population differentiation based on the degree of fixation of alleles among populations and is subject to the effects of selection, drift, and mutation. The influence of these processes is an important consideration in comparing statistical estimates produced by different molecular markers. The expected differences in the detectable rate of mutation and level of selective control operating at allozyme and AAD loci may contribute to divergence in FST estimates between these marker classes. In fact, under a broad range of conditions, mutation rate appears to be the prime factor in determining the degree of genetic differentiation among populations, with higher mutation rates corresponding to lower FST values (Hedrick, 1999
; Fu et al., 2003
; Holsinger and Wallace, 2004
). Despite the presumably higher mutation rates in AAD markers than in allozyme markers, we did not find that the ISSR data set produced a comparably lower
II estimate than the allozymes (Fig. 1). Although not consistent with the expected result, this finding does not appear to be uncommon in genetic studies employing both allozymes and AAD markers and may reflect the influence of other evolutionary processes (e.g., natural selection or genetic drift; Zeng et al., 2003
; Volis et al., 2005
).
However, when we consider estimates of GSTB, we find that the amount of genetic differentiation in the ISSR data set is significantly lower than it is in the allozyme data set (Fig. 1). A similar relationship is revealed when comparing the AS estimate to the IBf or IB
II estimates, and both of these findings reflect the expected pattern of genetic differentiation under the assumption that a differential mutation rate between marker types is a strong determining force. In addition, the nonsignificant difference between allozyme and ISSR
II estimates, which contradicts the expected result, seems to be more a product of the Bayesian method employed, than an indication of biological reality. This explanation is supported by the results of our simulated data study, which demonstrate that, as "true" FST values become larger in a codominant data set, Hickory will underestimate levels of genetic differentiation to an increasingly greater degree (Figs. 2 and 3). We are unsure of the reason for the underestimation by
II, but suggest that it may be caused by the random-effects sampling model implemented in Hickory. Random-effects estimates of
II are calculated from Bayesian estimates of allele frequency in all potentially sampled populations (Holsinger, 1999
). In a standard analysis of variance approach, the random-effect model is expected to produce smaller test statistics relative to the fixed-effect model by increasing mean square values (Weir, 1996
, p. 162). This expectation is supported by the findings of simulated data study 1, which generally produced larger estimates of GSTB (i.e., fixed-effect estimate) when compared to
II (Figs. 2 and 3).
Statistically significant differences observed between full and
-free model ISSR-based estimates of
II and GSTB may be due to the confounding effects of erroneous
estimates under the full model (Holsinger and Lewis, 2003
). However, despite statistical significance, the respective mean estimates of these models do not differ substantially. In fact, the statistically significant finding is almost definitely due to the large number of ISSR loci used in these analyses, illustrating the occasional discordance between statistical and biological significance (Hedrick, 1999
).
Our manipulation of population and individual sampling produced only slight, and mostly nonsignificant, changes in FST estimates for both allozymes and ISSRs and in no circumstance did it change relationships among estimates. Given this, we conclude that differences in sample size between data sets seem to have played little role in producing the divergent estimates of FST in allozymes and ISSRs that were observed.
While our results for the Tolpis data sets cannot be viewed as generally applicable guidelines for estimating population differentiation, several comments are warranted. The most significant finding is the difference between Bayesian and standard estimates of FST for codominant markers. While the underlying statistical affinities of these procedures do differ, it appears more likely that the influence of some component(s) of the Hickory implementation (e.g., the random-effects sampling design) is primarily responsible for the significant differentiation between these estimates. Alternatively, the close approximation of the Bayesian GSTB estimate to the standard FST (Fig. 3), both of which are based on the unbiased GST statistic of Nei and Chesser (1983)
, suggests that these are much more comparable values. Thus we caution against comparing allozyme-based FST values estimated from a standard method to those from the Bayesian procedure implemented in Hickory. With regard to estimates of population differentiation (
II and GSTB) from ISSR markers, significantly different, but very similar values were obtained with the full and
-free models. Given the possible confounding effects of erroneous
estimates under the full model (Holsinger and Lewis, 2003
) in view of our results, we recommend using the
-free model. Or, if both models are used and substantially different results are obtained, more confidence should be placed in results from the
-free model. Comparison of population differentiation estimates between codominant and dominant markers is not straightforward because of the aforementioned issues with both the methods and markers. However, given that our simulation study suggests that, with codominant data, GSTB is a better approximation of Nei and Chessers (1983)
unbiased FST than is
II, it may be more meaningful to use this statistic for comparison with dominant data as well. Finally, considering our results as well as those of a more extensive review conducted by Bonin et al. (2007)
, we find that the AMOVA-based
ST estimate, a commonly used approach for dominant data, is generally quite similar to the
II value estimated under the
-free model of Hickory. Nevertheless, we believe it is prudent, given the limited investigation into the comparability of these statistics and the ease of formatting for their respective analyses, for one to calculate both estimates for dominant data sets and compare only like statistics.
FIS
Our results for inbreeding are quite clear in demonstrating that the codominant allozyme markers, whether using standard or Bayesian analyses, are to be preferred over AAD markers. These results are not surprising as Holsinger and Lewis (2003)
cautioned that dominant data might not be appropriate for estimating
, at least under the tested Bayesian framework. Though Holsinger and Lewis (2003)
did suggest that a large sample size might overcome these issues, our own simulations show no evidence of this. Even at sample sizes of 200 individuals per population, we found significant differences between estimated
and the expected (i.e., simulated) value (Fig. 5). This result suggests that estimating
in Hickory with dominant data is largely uninformative.
Expected heterozygosity
Although the large confidence intervals for heterozygosity estimates of HS and HT using standard analyses of allozymes result in no statistical difference between them and all other analyses, a few comments can be proffered about the results. The Bayesian estimates for allozymes are substantially higher than those for ISSR markers. Nybom and Bartish (2000)
and Coates and Byrne (2005)
reviewed estimates of diversity from allozyme and AAD (mostly RAPD) markers. Because data in those reviews are from standard analyses of both types of markers and there is variation in the method used to produce their estimates, such as whether to include monomorphic loci (excluded in our study), they are of limited value for our purposes. However, with these caveats in mind, higher diversity estimates with AAD markers appear to be more common than those with allozymes. By contrast, our results suggest higher estimates for allozymes. Neither published data nor results of the current study provide compelling reasons for predicting a priori the relative levels of diversity estimates provided by the two markers. Additional studies are needed before a more definitive assessment can be made.
Biological vs. statistical significance
Earlier in the discussion, we alluded to the difference between statistical and biological significance (Hedrick, 1999
), and several additional comments are in order because the significance issue is important to the interpretation of our analyses. In particular, the relatively small number of loci sampled in the standard allozyme approach resulted in a large degree of error in estimation and made it difficult to find this estimate significantly different from those of other methods, though, in comparison, its mean value was often quite divergent. In contrast, significant differences in estimation between the two Bayesian models (full and
-free) with ISSRs were common, despite very similar mean estimates. These findings can be attributed to two effects, that of the large number of ISSR loci analyzed and the statistical sampling procedure of the Bayesian method, which also greatly decreased error for the analysis of the 10-locus allozyme data set. Certainly, the differences presented between
II and FST, given the large error surrounding the mean, may be interpreted as more substantial than those between the two Bayesian models, especially considering that the former influences interpretations of method comparisons much more meaningfully than the latter.
FOOTNOTES
1 The authors thank K. Holsinger, M. Holder, and J. Kelly for discussions of Bayesian methodology and its application to the current study. This research was supported by the Department of Ecology and Evolutionary Biology and the Natural History Museum & Biodiversity Research Center at the University of Kansas. A Kansas NSF EPSCoR Ecological Genomics postdoctoral award to J.A. helped to fund this research. ![]()
4 Author for correspondence (e-mail: levsenn{at}ku.edu) ![]()
LITERATURE CITED
Agarwal, M., N. Shrivastava, AND H. Padh. 2008. Advances in molecular marker techniques and their applications in plant sciences. Plant Cell Reports 27: 617–631.[CrossRef][Web of Science][Medline]
Archibald, J. K., D. J. Crawford, A. Santos-Guerra, AND M. E. Mort. 2006. The utility of automated analysis of inter-simple sequence repeat (ISSR) loci for resolving relationships in the Canary Island species of Tolpis (Asteraceae). American Journal of Botany 93: 1154–1162.
Balloux, F. 2001. EASYPOP (version 1.7): A computer program for population genetics simulation. Journal of Heredity 92: 301–302.
Bonin, A., D. Ehrich, AND S. Manel. 2007. Statistical analysis of amplified fragment length polymorphism data: A toolbox for molecular ecologists and evolutionists. Molecular Ecology 16: 3737–3758.[CrossRef][Medline]
Clegg, M. T. 1989. Molecular diversity in plant populations. In A. H. D. Brown, M. T. Clegg, A. L. Kahler, and B. S. Weir [eds.], Plant population, genetics, breeding, and genetic resources, 43–63. Sinauer, Sunderland, Massachusetts, USA.
Coates, D. J., AND M. Byrne. 2005. Genetic variation in plant populations: Assessing cause and pattern. In R. J. Henry [ed.], Plant diversity and evolution: Genotypic and phenotypic variation in higher plants, 139–164. CABI Publishing, Cambridge, Massachusetts, USA.
Crawford, D. J., J. K. Archibald, A. Santos-Guerra, AND M. E. Mort. 2006. Allozyme diversity within and divergence among species of Tolpis (Asteraceae-Lactuceae) in the Canary Islands: Systematic, evolutionary, and biogeographical implications. American Journal of Botany 93: 656–664.
Crawford, D. J., K. Archibald, D. Stoermer, M. E. Mort, J. Kelly, AND A. Santos-Guerra. 2008. A test of Bakers law: Breeding systems and the radiation of Tolpis (Asteraceae) in the Canary Islands. International Journal of Plant Sciences 169: 782–791.[CrossRef][Web of Science]
Crawford, D. J., T. F. Stuessy, M. B. Cosner, D. W. Haines, D. Wiens, AND P. Penailillo. 1994. Lactoris fernandeziana (Lactoridaceae) on the Juan Fernández Islands: Allozyme uniformity and field observations. Conservation Biology 8: 277–280.[CrossRef][Web of Science]
Crawford, D. J., T. F. Stuessy, AND M. O. Silva. 1987. Allozyme divergence and the evolution of Dendroseris (Compositae: Lactuceae) on the Juan Fernández islands. Systematic Botany 12: 435–443.[CrossRef][Web of Science]
Crawford, D. J., M. Tago-Nakazawa, T. F. Stuessy, G. J. Anderson, G. Bernardello, E. Ruiz, R. J. Jensen et al.. 2001. Intersimple sequence repeat (ISSR) variation in Lactoris fernandeziana (Lactoridaceae), a rare endemic of the Juan Fernández Archipelago, Chile. Plant Species Biology 16: 185–192.[CrossRef]
Cruzan, M. B. 1998. Genetic markers in plant evolutionary ecology. Ecology 79: 400–412.[CrossRef][Web of Science]
Excoffier, L., L. G. Laval, AND S. Schneider. 2005. Arlequin ver. 3.0: An integrated software package for populations genetics data analysis. Evolutionary Bioinformatics Online 1: 47–50.
Excoffier, L., P. Smouse, AND J. Quattro. 1992. Analysis of molecular variance inferred from metric distances among DNA haplotypes: Application to human mitochondrial DNA restriction data. Genetics 131: 479–491.[Abstract]
Fu, R. W., A. E. Gelfand, AND K. E. Holsinger. 2003. Exact moment calculations for genetic models with migration, mutation, and drift. Theoretical Population Biology 63: 231–243.[CrossRef][Web of Science][Medline]
Gottlieb, L. D. 1982. Conservation and duplication of isozymes in plants. Science 216: 373–380.
Hall, P., L. C. Orrell, AND K. S. Bawa. 1994. Genetic diversity and mating system in a tropical tree, Carapa guianensis (Meliaceae). American Journal of Botany 81: 1104–1111.[CrossRef][Web of Science]
Hamrick, J. L. 1989. Isozymes and the analysis of genetic structure in plant populations. In D. E. Soltis, and P. S. Soltis [eds.], Isozymes in plant biology, 87–105. Dioscorides Press, Portland, Oregon, USA.
Hamrick, J. L., AND M. J. W. Godt. 1989. Allozyme diversity in plant species. In A. H. D. Brown, M. T. Clegg, A. L. Kahler, and B. S. Weir [eds.], Plant population, genetics, breeding, and genetic resources, 43–63. Sinauer, Sunderland, Massachusetts.USA.
Harris, H. 1966. Enzyme polymorphisms in man. Proceedings of the Royal Society of London, B, Biological Sciences 164: 298–310.
Hedrick, P. W. 1999. Perspective: Highly variable loci and their interpretation in evolution and conservation. Evolution 53: 313–318.[CrossRef][Web of Science]
Holsinger, K. E. 1999. Analysis of genetic diversity in geographically structured populations: A Bayesian perspective. Hereditas 130: 245–255.[CrossRef][Web of Science]
Holsinger, K. E., AND P. O. Lewis. 2003. Hickory: A package for analysis of population genetic data v1.1. Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, Connecticut, USA.
Holsinger, K. E., P. O. Lewis, AND D. K. Dey. 2002. A Bayesian approach to inferring population structure from dominant markers. Molecular Ecology 11: 1157–1164.[CrossRef][Medline]
Holsinger, K. E., AND L. E. Wallace. 2004. Bayesian approaches for the analysis of populations genetic structure: An example from Platanthera leucophaea (Orchidaceae). Molecular Ecology 13: 887–894.[CrossRef][Medline]
Huang, J. C., AND M. Sun. 2000. Genetic diversity and relationships of sweetpotato and its wild relatives in Ipomoea series Batatas (Convolvulaceae) as revealed by inter-simple sequence repeat (ISSR) and restriction analysis of chloroplast DNA. Theoretical and Applied Genetics 100: 1050–1060.[CrossRef][Web of Science]
Hubby, J. L., AND R. C. Lewontin. 1966. A molecular approach to the study of genic heterozygosity in natural populations. I. The number of alleles at different loci in Drosophila pseudoobscura. Genetics 54: 577–594.
Krauss, S. L. 2000. Accurate gene diversity estimates from amplified fragment length polymorphism (AFLP) markers. Molecular Ecology 9: 1241–1245.[CrossRef][Medline]
Lewontin, R. C., AND J. L. Hubby. 1966. A molecular approach to the study of genic heterozygosity in natural populations. II. Amount of variation and degree of heterozygosity in natural populations of Drosophila pseudoobscura. Genetics 54: 595–609.
Lynch, M., AND B. G. Milligan. 1994. Analysis of population genetic structure with RAPD markers. Molecular Ecology 3: 91–99.[Medline]
Meudt, H. M., AND A. C. Clarke. 2007. Almost forgotten or latest practice? AFLP applications, analyses and advances. Trends in Plant Science 12: 106–117.[CrossRef][Web of Science][Medline]
Nei, M. 1973. Analysis of gene diversity in subdivided populations. Proceedings of the National Academy of Sciences, USA 70: 3321–3323.
Nei, M. 1977. F-statistics and analysis of gene diversity in subdivided populations. Annals of Human Genetics 41: 225–233.[Web of Science][Medline]
Nei, M., AND R. K. Chesser. 1983. Estimation of fixation indices and gene diversities. Annals of Human Genetics 47: 253–259.[Web of Science][Medline]
Nybom, H. 2004. Comparison of different nuclear DNA markers for estimating intraspecific genetic diversity in plants. Molecular Ecology 13: 1143–1155.[CrossRef][Medline]
Nybom, H., AND I. V. Bartish. 2000. Effects of life history traits and sampling strategies on genetic diversity estimates obtained with RAPD markers in plants. Perspectives in Plant Ecology, Evolution and Systematics 3: 93–114.[CrossRef]
Park, S.-J., E. J. Korompai, J. Francisco-Ortega, A. Santos-Guerra, AND R. K. Jansen. 2001. Phylogenetic relationships of Tolpis (Asteraceae: Lactuceae) based on ndh F sequence data. Plant Systematics and Evolution 226: 23–33.[CrossRef][Web of Science]
Schulman, A. H. 2007. Molecular markers to assess genetic diversity. Euphytica 158: 313–321.[CrossRef][Web of Science]
Schwartz, O. A. 1985. Lack of protein polymorphism in the endemic relict Chrysosplenium iowense (Saxifragaceae). Canadian Journal of Botany 63: 2031–2034.
Sunnucks, P. 2000. Efficient genetic markers for population biology. Trends in Ecology & Evolution 15: 199–203.[CrossRef][Web of Science][Medline]
Virk, P. S., J. Zhu, H. J. Newbury, G. J. Bryan, M. T. Jackson, AND B. V. Ford-Lloyd. 2000. Effectiveness of different classes of molecular markers for classifying and revealing variation in rice (Oryza sativa) germplasm. Euphytica 112: 275–284.[CrossRef][Web of Science]
Volis, S., B. Yakubov, I. Shulgina, D. Ward, AND S. Mendlinger. 2005. Distinguishing adaptive from nonadaptive genetic differentiation: comparison of QST and FST at two spatial scales. Heredity 95: 466–475.[CrossRef][Web of Science][Medline]
Weeden, N. F., AND J. F. Wendel. 1989. Genetics of plant isozymes. In D. E. Soltis, and P. S. Soltis [eds.], Isozymes in plant biology, 87–105. Dioscorides Press, Portland, Oregon, USA.
Weir, B. S. 1996. Genetic data analysis II. Sinauer, Sunderland, Massachusetts, USA.
Weir, B. S., AND C. C. Cockerham. 1984. Estimating F-statistics for the analysis of population structure. Evolution 38: 1358–1370.[CrossRef][Web of Science]
Wendel, J. F., AND N. F. Weeden. 1989. Visualization and interpretation of plant isozymes. In D. E. Soltis, and P. S. Soltis [eds.], Isozymes in plant biology, 5–45. Dioscorides Press, Portland, Oregon, USA.
Wright, S. 1943. Isolation by distance. Genetics 28: 114–138.
Wright, S. 1951. The genetical structure of populations. Annals of Eugenics 15: 323–354.[Web of Science]
Yeh, F. C., AND T. J. B. Boyle. 1997. Population genetic analysis of co-dominant and dominant markers and quantitative traits. Belgian Journal of Botany 129: 157.
Yeh, F. C., R. Yang, AND T. J. B. Boyle. 1997. POPGENE version 1.32. Ag/For Molecular Biology and Biotechnology Centre, University of Alberta and Center for International Forestry Research, Edmonton, Alberta, Canada. Website http://www.ualberta.ca/
fyeh.
Zar, J. H. 1999. Biological statistics. Prentice Hall, Upper Saddle River, New Jersey, USA.
Zeng, J., Y. Zou, J. Bai, AND H. Zheng. 2003. RAPD analysis of genetic variation in natural populations of Betula alnoides from Guangxi, China. Euphytica 134: 33–41.[CrossRef][Web of Science]
Zhivotovsky, L. A. 1999. Estimating population structure in diploids with multilocus dominant DNA markers. Molecular Ecology 8: 907–913.[CrossRef][Medline]
![]()
CiteULike
Complore
Connotea
Del.icio.us
Digg
Facebook
Reddit
Technorati
Twitter What's this?
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |