Am. J. Bot. Join the BSA
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


(American Journal of Botany. 2008;95:1466-1474.)
doi: 10.3732/ajb.0800091
© 2008 Botanical Society of America, Inc.
  Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Facebook   Add to Reddit   Add to Technorati   Add to Twitter
What's this?
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Levsen, N. D.
Right arrow Articles by Mort, M. E.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Levsen, N. D.
Right arrow Articles by Mort, M. E.
Agricola
Right arrow Articles by Levsen, N. D.
Right arrow Articles by Mort, M. E.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Facebook   Add to Reddit   Add to Technorati   Add to Twitter  
What's this?

Population Biology

Nei’s to Bayes’: comparing computational methods and genetic markers to estimate patterns of genetic variation in Tolpis (Asteraceae)1

Nicholas D. Levsen2,4, Daniel J. Crawford2, Jenny K. Archibald2, Arnoldo Santos-Geurra3 and Mark E. Mort2

2 Department of Ecology and Evolutionary Biology and The Natural History Museum and Biodiversity Research Center, University of Kansas, Lawrence, Kansas 66045 USA 3 Jardín de Aclimatación de la Orotava, Puerto de la Cruz, Tenerife, Canary Islands, Spain

Received for publication 9 March 2008. Accepted for publication 25 August 2008.

ABSTRACT

Accurate determination of patterns of genetic variation provides a powerful inferential tool for studies of evolution and conservation. For more than 30 years, enzyme electrophoresis was the preferred method for elucidating these patterns. As a result, evolutionary geneticists have acquired considerable understanding of the relationship between patterns of allozyme variation and aspects of evolutionary process. Myriad molecular markers and statistical analyses have since emerged, enabling improved estimates of patterns of genetic diversity. With these advances, there is a need to evaluate results obtained with different markers and analytical methods. We present a comparative study of gene statistic estimates (FST, GST, FIS, HS, and HT) calculated from an intersimple sequence repeat (ISSR) and an allozyme data set derived from the same populations using both standard and Bayesian statistical approaches. Significant differences were found between estimates, owing to the effects of marker and analysis type. Most notably, FST estimates for codominant data differ between Bayesian and standard approaches. Levels of statistical significance are greatly affected by methodology and, in some cases, are not associated with similar levels of biological significance. Our results suggest that caution should be used in equating or comparing results obtained using different markers and/or methods of analysis.

Key Words: allozymes • arbitrarily amplified DNA (AAD) • Asteraceae • Bayesian analysis • codominant markers • genetic differentiation • genetic diversity • intersimple sequence repeat • ISSR • Tolpis

The patterns of genetic variation within and among populations are of interest to diverse fields in plant biology including population genetics, systematics, and conservation. For the past four decades, following the demonstration of the utility of enzyme electrophoresis (Harris, 1966Go; Hubby and Lewontin, 1966Go; Lewontin and Hubby, 1966Go), there has been an ever-increasing use of various types of molecular markers to assess genetic variation. The basic rationale for molecular markers replacing earlier approaches such as quantitative characters as a means of assessing genetic variation is the more direct equation between genotype and phenotype obtained with molecular methods (Lewontin and Hubby, 1966Go; Schulman, 2007Go). However, as the science has progressed, considerations of improved efficiency and sensitivity have promoted the development of new molecular markers, many of which present significant analytical challenges to accurately assessing genetic variation (Sunnucks, 2000Go). Although ongoing developments in statistical analyses offer the potential to overcome these challenges, there is still a general inability to knowledgeably compare analogous estimates derived from different marker classes and/or statistical methods (Bonin et al., 2007Go).

For more than four decades, allozyme markers have been an invaluable tool for studies of evolutionary genetics, providing plant biologists with a straightforward, low cost means of estimating levels of intraspecific genetic variation (Cruzan, 1998Go). Allozymes produce codominant data, which permit direct observation of allele frequencies at allozyme loci and can be used rather simply to calculate various gene statistics (Hubby and Lewontin, 1966Go; Lewontin and Hubby, 1966Go; Hamrick, 1989Go; Weeden and Wendel, 1989Go). In addition, because of the highly conserved nature of allozyme loci in flowering plants (Gottlieb, 1982Go), homologous loci can be compared between closely related species. Practical advantages of allozymes include the relative procedural simplicity and low cost of the method (Clegg, 1989Go). Because of the large database that has accrued for allozymes, their estimated patterns of variation can be compared among plants with different ecological and life history traits (e.g., Hamrick and Godt, 1989Go). One of the major criticisms of allozyme data concerns the level of genome sampling; allozyme variation can only be determined for protein-coding genes (many of the assays are for enzymes of glycolysis and the citric acid cycle) of which, in plants, there is a rather small (~40) potential pool of useful candidates (Clegg, 1989Go; Wendel and Weeden, 1989Go). This number is further reduced for within-species studies where often only about 50% of the loci are polymorphic (Hamrick and Godt, 1989Go). In addition, variation at each of these loci may only be detected if it affects the electrophoretic mobility of the enzyme with the standard conditions employed (Lewontin and Hubby, 1966Go; Clegg, 1989Go). As much as 20% of the base substitutions may go undetected (Coates and Byrne, 2005Go). Allozyme variation is often absent in groups of recently radiated taxa for which allozymes frequently provide limited and/or imprecise estimates of population genetic structure (e.g., Schwartz, 1985Go; Crawford et al., 1987Go).

In contrast to allozymes, arbitrarily amplified DNA (AAD) methods (e.g., AFLP, intersimple sequence repeat, and random amplified polymorphic DNA) are able to produce large data sets, especially when loci are visualized using polyacrylamide gels. These loci presumably represent neutral, rapidly evolving regions from across the genome (Clegg, 1989Go; Huang and Sun, 2000Go; Krauss, 2000Go; Archibald et al., 2006Go). Very small amounts of plant material are needed for these markers, making them ideal for use with rare species. Utilizing PCR, AAD methods amplify a specific region of DNA or "allele," which is visualized on an electrophoretic gel as a band presence or absence. Because band presence can indicate either the dominant homozygote or heterozygote, genotype and allele frequencies cannot be directly determined and estimation of gene statistics can be problematic (Meudt and Clarke, 2007Go). This represents a significant disadvantage compared to allozymes. However, the increased level of variation often seen with AAD markers and the increased ease of obtaining the required amount of plant material has led to these markers being preferred over allozymes in many studies; particularly in cases where genetic variation within and among populations and/or species is low (Crawford et al., 1994Go, 2001Go).

Although codominant markers like allozymes are preferred for most genetic studies, especially those that calculate statistics requiring knowledge of allele frequencies, recently developed analytical techniques have allowed researchers to take greater advantage of the benefits of dominant markers (Krauss, 2000Go; Bonin et al., 2007Go). The traditional approach to estimating levels of population structure with allele frequency data has typically involved the use of F-statistics (i.e., FIT, FIS, FST). These were originally defined by Wright (1943Go, 1951Go) and based on correlations between uniting gametes at different hierarchical levels, total population (T) and population subdivision (S). Nei (1973Go, 1977Go), seeking to expand the use of Wright’s F-statistics beyond a single locus two allele system, redefined them as functions of partitioned gene diversity and calculated levels of inbreeding (FIS and FIT) and genetic differentiation (GST) using measures of observed (HO) and expected (HS and HT) heterozygosity. Nei’s (1973)Go coefficient of gene differentiation, GST, is a multilocus, multiallele equivalent of Wright’s FST.

Though F-statistics have been used extensively with codominant data, the methodological requirement of allele frequency estimates has made their application to dominant data problematic. Bayesian statistical analysis is an approach that is increasingly applied to evolutionary genetic studies because it ostensibly offers investigators the ability to overcome some of the analytical shortfalls of dominant data sets (Zhivotovsky, 1999Go; Holsinger et al., 2002Go). One of the more prominent implementations of Bayesian statistics to the analysis of dominant data is the method described by Holsinger et al. (2002)Go. The method is based on a Bayesian hierarchical model and directly estimates an FST analogue ({theta}B) from dominant or codominant data, while incorporating the effect of uncertainty in inbreeding (FIS) on this estimate. Holsinger et al. (2002)Go has demonstrated that this method produces reliable estimates of FST (although see Bonin et al., 2007Go) and allele frequencies without the assumption of a known inbreeding coefficient, which is required by other approaches (Lynch and Milligan, 1994Go; Zhivotovsky, 1999Go). By using the Bayesian estimate of mean allele frequency to calculate expected panmictic heterozygosities (HS and HT), the method is also able to produce an estimate of Nei’s GST, known as GSTB. Under this framework, {theta}B corresponds to a random-effects model of population sampling, which produces estimates from all potentially sampled populations and presumably reduces sampling error (Weir, 1996Go, p. 162; Holsinger, 1999Go). Alternatively, the GSTB estimate corresponds to a fixed-effects model and is derived from all actually sampled populations (Weir, 1996Go, p. 162; Holsinger, 1999Go). Unlike standard implementations of the random-effects model (i.e., Weir and Cockerham, 1984Go), the Bayesian method is able to use it without a specified model of population divergence (Holsinger, 1999Go).

Despite their respective limitations, both allozyme and dominant data offer viable, low cost alternatives to more resource intensive methods (e.g., microsatellites; Schulman, 2007Go; Agarwal et al., 2008Go). At present, the greatly increased use of dominant markers over allozymes and the aforementioned large database available for allozymes provides the potential to compare results from the two types of markers and relate them to many traits of plant ecology and life history. However, the relationships among analogous gene statistics derived from different marker classes and statistical methods are not well known. A better understanding of the comparability of gene statistic estimates derived from different methods will allow a more complete synthesis of the knowledge accumulated throughout more than 40 years of molecular genetic research. The opportunity for researchers to employ the full body of this knowledge will benefit a broad range of studies across the fields of plant genetics, systematics, and conservation, placing in a larger biological context the results of newer methods of data production and analysis.

For insight into the relationships among gene statistic estimates, comparisons of these estimates should be conducted across a broad range of biological conditions (i.e., patterns of genetic variation) and include empirical data for both codominant and dominant markers from the same populations. The purpose of this study was to respond to this need for comparative data among different commonly used markers and methods of analysis. We present a comparison of analogous estimates of FST, GST, FIS, and expected heterozygosity derived from dominant (intersimple sequence repeat, ISSR) and codominant (allozyme) data sets for the angiosperm genus Tolpis (Asteraceae) using the Holsinger et al. (2002)Go Bayesian method, as well as standard approaches to codominant and dominant data. Tolpis represents a recent radiation for which both dominant and codominant data have been used to assess genetic relationships among taxa, as well as other aspects of species biology (Archibald et al., 2006Go; Crawford et al., 2006Go). By comparing specific estimates (Table 1), our goal was to determine patterns of relationships among analogous gene statistics and provide additional information as to the operation of Bayesian analysis when applied to individual empirical studies.


View this table:
[in this window]
[in a new window]

 
Table 1. The five combinations of molecular marker and statistical method considered in this study are shown along with their respective abbreviations (Abbrev.). The far right column lists the four methodological comparisons described in this study of gene statistic estimates from the genus Tolpis.

 
MATERIALS AND METHODS

Data
Tolpis (Asteraceae) is a Macaronesian and Mediterranean angiosperm genus, of which the majority of species represent a recent radiation within the Canary Island archipelago (Park et al., 2001Go; Crawford et al., 2006Go). From this Canarian radiation, we sampled 15 populations across five species (Table 2), constituting the bulk of the so-called T. laciniata-T. lagopoda complex (Archibald et al., 2006Go; Crawford et al., 2006Go). This complex comprises a high level of morphological and ecological variation, though all species share a perennial habit and are highly self-incompatible (Crawford et al., 2008Go). Tolpis laciniata and T. lagopoda, in particular, form large outcrossing populations (Crawford et al., 2006Go).


View this table:
[in this window]
[in a new window]

 
Table 2. Population sampling information detailing Tolpis taxonomic affinities, sample size for the respective markers, and geographic origin within the Canarian Archipelago.

 
Individuals from each of the 15 populations sampled were genotyped at 10 polymorphic allozyme loci (GPI-2, PGM-1, TPI-1, TPI-2, PGD-1, PGD-2, GDH, AAT-2, AAT-3, MDH-3) and at 1510 polymorphic ISSR loci. Protocols for enzyme electrophoresis and ISSR amplification and scoring are described in Crawford et al. (2006)Go and Archibald et al. (2006)Go, respectively. Population sample sizes varied between marker classes. The allozyme data set averaged 17 individuals per population (range, 6–29), while the ISSR data set averaged almost eight individuals per population (range, 3–14). All 10 allozyme loci had three or more alleles (seven maximum) at a locus.

Analyses
Allozyme data were analyzed using both the Bayesian approach implemented in the program Hickory version 1.1 (Holsinger and Lewis, 2003Go) and the standard methods described by Nei and Chesser (1983)Go, which used manual calculations. Hickory estimates the Bayesian analogue of Weir and Cockerham’s (1984)Go FST and Wright’s FIS (1951), designated in the program as {theta}II and f, respectively (Holsinger, 1999Go; Holsinger et al., 2002Go; Holsinger and Lewis 2003Go). It also provides estimates of Nei’s (1973)Go average expected panmictic heterozygosity (HS), total expected panmictic heterozygosity (HT), and coefficient of gene differentiation (GST), denoted by GSTB. Bayesian analyses of the codominant allozyme data set in Hickory were performed with default parameter settings (burn-in = 50000, sampling = 250000, thinning = 50) under the full model analysis, which provides estimates of both {theta}II and f. Nei and Chesser’s (1983)Go methods produce unbiased estimates of GST, FIS, HS, and HT. The GST statistic is a multilocus, multiallele estimate of Wright’s FST (Nei, 1973Go) and is often (as in the cases of Hall et al., 1994Go; Yeh et al., 1997Go) designated as the latter for ease of discussion. For this reason, we too refer to this estimate as FST. However, considering its statistical affinity we compared it to both {theta}II and GSTB.

The ISSR data set was analyzed using Hickory under both the f-free and full model options and with Arlequin version 3.1 (Excoffier et al., 2005Go). Under Hickory’s full model, estimates of f influence those of {theta}II and vice versa (Holsinger et al., 2002Go). Because estimates of f have been shown to be unreliable when calculated from dominant data sets, especially with small population sample sizes, Holsinger and Lewis (2003)Go suggested that under these conditions the f-free model, which removes the constraints of f on {theta}II estimation, may be more appropriate. Both of these analyses were performed under the default parameter settings described before. Two replicate runs of each Hickory analysis, for both marker classes, were produced to ensure convergence of the Markov chain Monte Carlo (MCMC) sampler.

In Arlequin, a locus-by-locus analysis of molecular variance (AMOVA) produced locus-specific estimates of feST, which is an FST analogue based on pairwise squared Euclidean distances (Excoffier et al., 1992Go, 2005Go). The average of these individual values was calculated to produce the multilocus estimate reported in the results. The AMOVA is a commonly used method for producing estimates of genetic differentiation from dominant data.

Statistical comparisons among estimates were conducted using 95% Bayesian credible intervals and confidence intervals for individual estimates. In Hickory, the production of a sample log file during analysis allowed a statistical comparison of {theta}II and f estimates (described in Holsinger and Wallace, 2004Go) using the posterior comparison option. Pairwise comparisons involving non-Bayesian estimates required the calculation of a 95% confidence interval (Zar, 1999Go), which could be compared to a similar Bayesian credible interval. Estimates were considered statistically different if each confidence/credible interval did not overlap the other estimate’s mean value.

The following comparisons (summarized in Table 1) were considered for statistical analysis and discussion: (1) comparison between Bayesian and standard analyses of an allozyme data set; (2) comparison between models of Bayesian analysis applied to an ISSR data set; (3) comparison between Bayesian and AMOVA estimates applied to an ISSR data set; (4) comparison between ISSR and allozyme estimates produced by Bayesian analyses.

Sample size differences
Population sample size is an important factor in gene statistic estimation, and differential sampling of a given population or set of subpopulations may yield disparate estimates of genetic structure (Bonin et al., 2007Go). In this study, differences in population sample sizes between marker classes could potentially account for differences in estimates of population parameters between those classes. To determine if, and in what way, the differences in sample size between the ISSR and allozyme data sets have influenced estimates of genetic differentiation in this study, the following analyses were conducted: (1) populations with ISSR sample sizes of three individuals (i.e., 17 and 1869; Table 2) were removed from the allozyme and ISSR data sets and FST and/or {theta}II were recalculated for both; (2) population sample sizes in the allozyme data set were reduced, randomly, to equal the corresponding sample sizes in the ISSR data set, then FST and {theta}II estimates were recalculated.

Data simulation and analysis
Two data simulation studies were conducted to address aspects of Bayesian gene statistic estimation that surfaced in the results of our analyses of the empirical data sets. Simulation study 1 was designed to investigate divergent allozyme FST estimates for standard and Bayesian methods, while simulation study 2 addressed the effects of sample size on accurate f estimation with dominant data.

Simulation study 1
The program EASYPOP version 1.7 (Balloux, 2001Go), a forward-time simulator employing an individual based model of evolution, was used to produce 110 codominant data sets. Evolution was simulated at 15 multiallelic loci (i.e., with five alleles each) for four populations of 100 diploid individuals each and 20 individuals per population were randomly subsampled for analysis. There was no migration, mutation, or selfing, and mating was random. Simulation durations (i.e., number of generations) were varied to produce different final FST values. Each of the data sets was analyzed using a standard approach in the program Popgene version 1.32 (Yeh and Boyle, 1997Go; Yeh et al., 1997Go) and the Bayesian full model from Hickory. The differences between the FST/GST estimates produced by these methods were plotted against the simulated FST.

Simulation study 2
Using the program EASYPOP v. 1.7 (Balloux, 2001Go), we simulated five data sets at each of five FST-value ranges (0.01–0.1, 0.275–0.325, 0.4–0.45, 0.55–0.6, 0.8–0.85) for four different sample sizes (20, 50, 100, and 200 individuals). Data sets consisted of 10 populations of 500 individuals each. Each population was randomly subsampled to produce the sample sizes listed. Individuals’ genotypes comprised 100 biallelic loci, which were converted to dominant data. Simulated FIS values ranged between 0.13 and 0.15. Evolution input parameters for simulations included: no migration, no mutation, and a 0.25 selfing value. Simulated data sets were analyzed in Hickory under the full model.

RESULTS

FST and GST
Our estimates of FST differed according to analytical method (Fig. 1). The standard method (AS; see Table 1 for designation of methods) estimate for allozymes (0.388 ± 0.22) exceeded all three of the Bayesian {theta}II values, including that produced (0.172 ± 0.017) for the same allozyme data set (AB). Different model analyses of ISSR data in Hickory produced significantly different results for {theta}II (IB, 0.165 ± 0.003; IBf, 0.147 ± 0.007), though neither could be distinguished statistically from the AB estimate. The AMOVA-based feST estimate (IS, 0.137 ± 0.151) was statistically different from the IB estimate, but not from that of IBf.


Figure 1
View larger version (16K):
[in this window]
[in a new window]

 
Fig. 1. Mean values for FST among Tolpis populations, including {theta}II and feST, (diamond) and GSTB (square) estimates across five analysis categories (AS, AB, IBf, IB, and IS). Bars indicate 95% credible/confidence interval. The table in the upper right shows comparisons among statistically different (indicated with "X") estimates of genetic differentiation produced by the different methods.

 
Only one of the listed methods (IBf) produced statistically overlapping estimates of GST and FST, and comparisons among GST estimates gave slightly different results than those with FST (Fig. 1). Most notable among these results, the GSTB estimate for AB (0.258 ± 0.018) was not found to be different from the AS estimate reported above. The IBf (0.138 ± 0.008) and IB (0.157 ± 0.003) methods produced significantly different estimates of GSTB and each differed from the AB estimate.

The effect of sample size differences between data sets was investigated by altering, either through the removal of whole populations or individuals within populations, the sample design of the full data analysis and recalculating FST and {theta}II values. The removal of populations 17 and 1869, each with an ISSR sample size of three, produced estimates for AS, AB, and IBf that were not significantly different from the corresponding values in the full data analysis. Reducing population sample sizes in the allozyme data set did not significantly change the AS FST estimate (0.439), but did lower the AB {theta}II estimate (0.129) as compared to the full data analysis.

In simulation study 1, we found a strong positive linear relationship (Fig. 2) between the difference in FST estimates and the simulated FST. We observe the same positive relationship for Bayesian GSTB estimates (Fig. 3), though the increase in the difference between estimates over the span of FST values is of a much smaller magnitude.


Figure 2
View larger version (12K):
[in this window]
[in a new window]

 
Fig. 2. Correlation between the difference in standard and Bayesian estimates of FST and the simulated FST.

 

Figure 3
View larger version (11K):
[in this window]
[in a new window]

 
Fig. 3. Correlation between the difference in standard FST and Bayesian GST estimates and the simulated FST.

 
FIS
Mean estimates of inbreeding based on allozymes (AS, 0.317 ± 0.339; AB, 0.28 ± 0.037) are not statistically distinguishable (Fig. 4). The ISSR-based Bayesian estimate (IB, 0.999 ± 0.0009) produced by the full model analysis is extremely high and is not corroborated by any other biological data.


Figure 4
View larger version (6K):
[in this window]
[in a new window]

 
Fig. 4. Mean values of FIS and f estimates for Tolpis populations across three analysis categories (no estimate produced in f-free Bayesian model; AS, AB, and IB). Bars indicate 95% credible/confidence interval.

 
From simulation study 2, we report the average difference between f and the simulated FIS for each FST value and sample size (Fig. 5). We found that increasing sample size up to 200 individuals per population had little effect on the accuracy of f estimation. While changing FST values did affect the relationship of simulated and estimated FIS values, it, likewise, did not result in a change in the accuracy of the Bayesian f estimate.


Figure 5
View larger version (13K):
[in this window]
[in a new window]

 
Fig. 5. Average difference between f estimation and simulated FIS for data sets at four sample sizes (20, 50, 100, and 200 individuals per population). Bars show the negative or positive aspect of the 95% confidence interval. Intervals that do not intersect the zero line (broken) indicate statistically significant differences between the estimated and simulated FIS.

 
Expected heterozygosity
Average expected panmictic heterozygosity (HS) estimates differ between model runs (IB, 0.173 ± 0.0008; IBf, 0.147 ± 0.007) conducted on ISSR data (Fig. 6). Both of these values also differ from the AB estimate (0.228 ± 0.007). The large degree of error surrounding the AS estimate (0.194 ± 0.175) does not allow it to be distinguished from the other estimates, despite, in some cases, large differences in mean value. This is also the issue when considering individual population HIS estimates (Fig. 7), where the AS estimate is almost never statistically different from the AB value, except in the case of populations 6 and 1883. Comparisons made among Hickory-based heterozygosity estimates for individual populations show consistently that there are differences between model runs and data sets. The Bayesian total pooled expected heterozygosity (HT) estimates differed between models and data sets (IB, 0.2047 ± 0.001; IBf, 0.1707 ± 0.009; AB, 0.307 ± 0.008; Fig. 8). While the AS estimate of HT (0.316 ± 0.229) is most similar to that of the AB method, it cannot be distinguished from any of the three Bayesian HT estimates.


Figure 6
View larger version (5K):
[in this window]
[in a new window]

 
Fig. 6. Mean values of HS estimates for Tolpis populations across four analysis categories (AS, AB, IBf, and IB). Bars indicate 95% credible/confidence interval.

 

Figure 7
View larger version (13K):
[in this window]
[in a new window]

 
Fig. 7. Mean values of within population expected panmictic heterozygosity (HiS) estimates for Tolpis populations across four analysis categories (AS = diamond; AB = square; IBf = circle; IB = triangle). Shared colors indicate a lack of significance among estimates. Standard allozyme (AS) estimates did not differ statistically from any of the other three estimates in any population other than 6 and 1883.

 

Figure 8
View larger version (6K):
[in this window]
[in a new window]

 
Fig. 8. Mean values of HT estimates for Tolpis populations across four analysis categories (AS, AB, IBf, and IB). Bars indicate 95% credible/confidence interval.

 
DISCUSSION

The increasing use of recently developed molecular genetic markers and their concomitant statistical analyses necessitates an improved understanding of the comparability of genetic estimates across various methodological approaches. Such an understanding is especially important given the wealth of genetic and life history information that is restricted to interpretations of allozyme data sets, now that allozymes are being increasingly supplanted by AAD and other markers. However, despite a number of comparative reviews of molecular markers (e.g., Nybom and Bartish, 2000Go; Nybom, 2004Go; Coates and Byrne, 2005Go), there are few (e.g., Virk et al., 2000Go) that conduct exhaustive comparisons of alternative marker data sets produced from identical populations and under similar analytical conditions. Even more rare are studies in which both different markers and different methods of analysis have been compared (Holsinger and Wallace, 2004Go). To contribute to the overall understanding of relationships among analogous gene statistic estimates, we conducted a narrowly focused, comparative study of a set of gene statistics calculated in three different analytical environments from empirically derived dominant and codominant data. The study shows that estimates differ between analytical method and marker type, though not always in the manner suggested by the literature.

Consider first the confidence with which estimates of population differentiation obtained by different markers and analyses can be compared. The apportionment of genetic diversity within and among populations is one of the most important measures for biologists because it is broadly informative. For example, it is often employed in conservation strategies aimed at preserving maximum genetic diversity in a species, specifically as an aid to identifying genetically distinct (e.g., highly differentiated) elements within a species. Estimates of genetic differentiation among populations based on allozymes are available for a multitude of taxa, and the level and pattern of differentiation are often associated with a variety of life history attributes, especially breeding system (Hamrick and Godt, 1989Go). However, the lack of variation at allozyme loci, particularly in rare species, may preclude calculations of genetic differentiation. In these cases it would be advantageous to determine whether estimates of differentiation from AADs are comparable to estimates from allozymes for species with similar life history and ecological attributes.

FST and GST
The FST statistic estimates the level of population differentiation based on the degree of fixation of alleles among populations and is subject to the effects of selection, drift, and mutation. The influence of these processes is an important consideration in comparing statistical estimates produced by different molecular markers. The expected differences in the detectable rate of mutation and level of selective control operating at allozyme and AAD loci may contribute to divergence in FST estimates between these marker classes. In fact, under a broad range of conditions, mutation rate appears to be the prime factor in determining the degree of genetic differentiation among populations, with higher mutation rates corresponding to lower FST values (Hedrick, 1999Go; Fu et al., 2003Go; Holsinger and Wallace, 2004Go). Despite the presumably higher mutation rates in AAD markers than in allozyme markers, we did not find that the ISSR data set produced a comparably lower {theta}II estimate than the allozymes (Fig. 1). Although not consistent with the expected result, this finding does not appear to be uncommon in genetic studies employing both allozymes and AAD markers and may reflect the influence of other evolutionary processes (e.g., natural selection or genetic drift; Zeng et al., 2003Go; Volis et al., 2005Go).

However, when we consider estimates of GSTB, we find that the amount of genetic differentiation in the ISSR data set is significantly lower than it is in the allozyme data set (Fig. 1). A similar relationship is revealed when comparing the AS estimate to the IBf or IB {theta}II estimates, and both of these findings reflect the expected pattern of genetic differentiation under the assumption that a differential mutation rate between marker types is a strong determining force. In addition, the nonsignificant difference between allozyme and ISSR {theta}II estimates, which contradicts the expected result, seems to be more a product of the Bayesian method employed, than an indication of biological reality. This explanation is supported by the results of our simulated data study, which demonstrate that, as "true" FST values become larger in a codominant data set, Hickory will underestimate levels of genetic differentiation to an increasingly greater degree (Figs. 2 and 3). We are unsure of the reason for the underestimation by {theta}II, but suggest that it may be caused by the random-effects sampling model implemented in Hickory. Random-effects estimates of {theta}II are calculated from Bayesian estimates of allele frequency in all potentially sampled populations (Holsinger, 1999Go). In a standard analysis of variance approach, the random-effect model is expected to produce smaller test statistics relative to the fixed-effect model by increasing mean square values (Weir, 1996Go, p. 162). This expectation is supported by the findings of simulated data study 1, which generally produced larger estimates of GSTB (i.e., fixed-effect estimate) when compared to {theta}II (Figs. 2 and 3).

Statistically significant differences observed between full and f-free model ISSR-based estimates of {theta}II and GSTB may be due to the confounding effects of erroneous f estimates under the full model (Holsinger and Lewis, 2003Go). However, despite statistical significance, the respective mean estimates of these models do not differ substantially. In fact, the statistically significant finding is almost definitely due to the large number of ISSR loci used in these analyses, illustrating the occasional discordance between statistical and biological significance (Hedrick, 1999Go).

Our manipulation of population and individual sampling produced only slight, and mostly nonsignificant, changes in FST estimates for both allozymes and ISSRs and in no circumstance did it change relationships among estimates. Given this, we conclude that differences in sample size between data sets seem to have played little role in producing the divergent estimates of FST in allozymes and ISSRs that were observed.

While our results for the Tolpis data sets cannot be viewed as generally applicable guidelines for estimating population differentiation, several comments are warranted. The most significant finding is the difference between Bayesian and standard estimates of FST for codominant markers. While the underlying statistical affinities of these procedures do differ, it appears more likely that the influence of some component(s) of the Hickory implementation (e.g., the random-effects sampling design) is primarily responsible for the significant differentiation between these estimates. Alternatively, the close approximation of the Bayesian GSTB estimate to the standard FST (Fig. 3), both of which are based on the unbiased GST statistic of Nei and Chesser (1983)Go, suggests that these are much more comparable values. Thus we caution against comparing allozyme-based FST values estimated from a standard method to those from the Bayesian procedure implemented in Hickory. With regard to estimates of population differentiation ({theta}II and GSTB) from ISSR markers, significantly different, but very similar values were obtained with the full and f-free models. Given the possible confounding effects of erroneous f estimates under the full model (Holsinger and Lewis, 2003Go) in view of our results, we recommend using the f-free model. Or, if both models are used and substantially different results are obtained, more confidence should be placed in results from the f-free model. Comparison of population differentiation estimates between codominant and dominant markers is not straightforward because of the aforementioned issues with both the methods and markers. However, given that our simulation study suggests that, with codominant data, GSTB is a better approximation of Nei and Chesser’s (1983)Go unbiased FST than is {theta}II, it may be more meaningful to use this statistic for comparison with dominant data as well. Finally, considering our results as well as those of a more extensive review conducted by Bonin et al. (2007)Go, we find that the AMOVA-based feST estimate, a commonly used approach for dominant data, is generally quite similar to the {theta}II value estimated under the f-free model of Hickory. Nevertheless, we believe it is prudent, given the limited investigation into the comparability of these statistics and the ease of formatting for their respective analyses, for one to calculate both estimates for dominant data sets and compare only like statistics.

FIS
Our results for inbreeding are quite clear in demonstrating that the codominant allozyme markers, whether using standard or Bayesian analyses, are to be preferred over AAD markers. These results are not surprising as Holsinger and Lewis (2003)Go cautioned that dominant data might not be appropriate for estimating f, at least under the tested Bayesian framework. Though Holsinger and Lewis (2003)Go did suggest that a large sample size might overcome these issues, our own simulations show no evidence of this. Even at sample sizes of 200 individuals per population, we found significant differences between estimated f and the expected (i.e., simulated) value (Fig. 5). This result suggests that estimating f in Hickory with dominant data is largely uninformative.

Expected heterozygosity
Although the large confidence intervals for heterozygosity estimates of HS and HT using standard analyses of allozymes result in no statistical difference between them and all other analyses, a few comments can be proffered about the results. The Bayesian estimates for allozymes are substantially higher than those for ISSR markers. Nybom and Bartish (2000)Go and Coates and Byrne (2005)Go reviewed estimates of diversity from allozyme and AAD (mostly RAPD) markers. Because data in those reviews are from standard analyses of both types of markers and there is variation in the method used to produce their estimates, such as whether to include monomorphic loci (excluded in our study), they are of limited value for our purposes. However, with these caveats in mind, higher diversity estimates with AAD markers appear to be more common than those with allozymes. By contrast, our results suggest higher estimates for allozymes. Neither published data nor results of the current study provide compelling reasons for predicting a priori the relative levels of diversity estimates provided by the two markers. Additional studies are needed before a more definitive assessment can be made.

Biological vs. statistical significance
Earlier in the discussion, we alluded to the difference between statistical and biological significance (Hedrick, 1999Go), and several additional comments are in order because the significance issue is important to the interpretation of our analyses. In particular, the relatively small number of loci sampled in the standard allozyme approach resulted in a large degree of error in estimation and made it difficult to find this estimate significantly different from those of other methods, though, in comparison, its mean value was often quite divergent. In contrast, significant differences in estimation between the two Bayesian models (full and f-free) with ISSRs were common, despite very similar mean estimates. These findings can be attributed to two effects, that of the large number of ISSR loci analyzed and the statistical sampling procedure of the Bayesian method, which also greatly decreased error for the analysis of the 10-locus allozyme data set. Certainly, the differences presented between {theta}II and FST, given the large error surrounding the mean, may be interpreted as more substantial than those between the two Bayesian models, especially considering that the former influences interpretations of method comparisons much more meaningfully than the latter.

FOOTNOTES

1 The authors thank K. Holsinger, M. Holder, and J. Kelly for discussions of Bayesian methodology and its application to the current study. This research was supported by the Department of Ecology and Evolutionary Biology and the Natural History Museum & Biodiversity Research Center at the University of Kansas. A Kansas NSF EPSCoR Ecological Genomics postdoctoral award to J.A. helped to fund this research. Back

4 Author for correspondence (e-mail: levsenn{at}ku.edu) Back

LITERATURE CITED

Agarwal, M., N. Shrivastava, AND H. Padh. 2008. Advances in molecular marker techniques and their applications in plant sciences. Plant Cell Reports 27: 617–631.[CrossRef][Web of Science][Medline]

Archibald, J. K., D. J. Crawford, A. Santos-Guerra, AND M. E. Mort. 2006. The utility of automated analysis of inter-simple sequence repeat (ISSR) loci for resolving relationships in the Canary Island species of Tolpis (Asteraceae). American Journal of Botany 93: 1154–1162.[Abstract/Free Full Text]

Balloux, F. 2001. EASYPOP (version 1.7): A computer program for population genetics simulation. Journal of Heredity 92: 301–302.[Free Full Text]

Bonin, A., D. Ehrich, AND S. Manel. 2007. Statistical analysis of amplified fragment length polymorphism data: A toolbox for molecular ecologists and evolutionists. Molecular Ecology 16: 3737–3758.[CrossRef][Medline]

Clegg, M. T. 1989. Molecular diversity in plant populations. In A. H. D. Brown, M. T. Clegg, A. L. Kahler, and B. S. Weir [eds.], Plant population, genetics, breeding, and genetic resources, 43–63. Sinauer, Sunderland, Massachusetts, USA.

Coates, D. J., AND M. Byrne. 2005. Genetic variation in plant populations: Assessing cause and pattern. In R. J. Henry [ed.], Plant diversity and evolution: Genotypic and phenotypic variation in higher plants, 139–164. CABI Publishing, Cambridge, Massachusetts, USA.

Crawford, D. J., J. K. Archibald, A. Santos-Guerra, AND M. E. Mort. 2006. Allozyme diversity within and divergence among species of Tolpis (Asteraceae-Lactuceae) in the Canary Islands: Systematic, evolutionary, and biogeographical implications. American Journal of Botany 93: 656–664.[Abstract/Free Full Text]

Crawford, D. J., K. Archibald, D. Stoermer, M. E. Mort, J. Kelly, AND A. Santos-Guerra. 2008. A test of Baker’s law: Breeding systems and the radiation of Tolpis (Asteraceae) in the Canary Islands. International Journal of Plant Sciences 169: 782–791.[CrossRef][Web of Science]

Crawford, D. J., T. F. Stuessy, M. B. Cosner, D. W. Haines, D. Wiens, AND P. Penailillo. 1994. Lactoris fernandeziana (Lactoridaceae) on the Juan Fernández Islands: Allozyme uniformity and field observations. Conservation Biology 8: 277–280.[CrossRef][Web of Science]

Crawford, D. J., T. F. Stuessy, AND M. O. Silva. 1987. Allozyme divergence and the evolution of Dendroseris (Compositae: Lactuceae) on the Juan Fernández islands. Systematic Botany 12: 435–443.[CrossRef][Web of Science]

Crawford, D. J., M. Tago-Nakazawa, T. F. Stuessy, G. J. Anderson, G. Bernardello, E. Ruiz, R. J. Jensen et al.. 2001. Intersimple sequence repeat (ISSR) variation in Lactoris fernandeziana (Lactoridaceae), a rare endemic of the Juan Fernández Archipelago, Chile. Plant Species Biology 16: 185–192.[CrossRef]

Cruzan, M. B. 1998. Genetic markers in plant evolutionary ecology. Ecology 79: 400–412.[CrossRef][Web of Science]

Excoffier, L., L. G. Laval, AND S. Schneider. 2005. Arlequin ver. 3.0: An integrated software package for populations genetics data analysis. Evolutionary Bioinformatics Online 1: 47–50.

Excoffier, L., P. Smouse, AND J. Quattro. 1992. Analysis of molecular variance inferred from metric distances among DNA haplotypes: Application to human mitochondrial DNA restriction data. Genetics 131: 479–491.[Abstract]

Fu, R. W., A. E. Gelfand, AND K. E. Holsinger. 2003. Exact moment calculations for genetic models with migration, mutation, and drift. Theoretical Population Biology 63: 231–243.[CrossRef][Web of Science][Medline]

Gottlieb, L. D. 1982. Conservation and duplication of isozymes in plants. Science 216: 373–380.[Abstract/Free Full Text]

Hall, P., L. C. Orrell, AND K. S. Bawa. 1994. Genetic diversity and mating system in a tropical tree, Carapa guianensis (Meliaceae). American Journal of Botany 81: 1104–1111.[CrossRef][Web of Science]

Hamrick, J. L. 1989. Isozymes and the analysis of genetic structure in plant populations. In D. E. Soltis, and P. S. Soltis [eds.], Isozymes in plant biology, 87–105. Dioscorides Press, Portland, Oregon, USA.

Hamrick, J. L., AND M. J. W. Godt. 1989. Allozyme diversity in plant species. In A. H. D. Brown, M. T. Clegg, A. L. Kahler, and B. S. Weir [eds.], Plant population, genetics, breeding, and genetic resources, 43–63. Sinauer, Sunderland, Massachusetts.USA.

Harris, H. 1966. Enzyme polymorphisms in man. Proceedings of the Royal Society of London, B, Biological Sciences 164: 298–310.[Free Full Text]

Hedrick, P. W. 1999. Perspective: Highly variable loci and their interpretation in evolution and conservation. Evolution 53: 313–318.[CrossRef][Web of Science]

Holsinger, K. E. 1999. Analysis of genetic diversity in geographically structured populations: A Bayesian perspective. Hereditas 130: 245–255.[CrossRef][Web of Science]

Holsinger, K. E., AND P. O. Lewis. 2003. Hickory: A package for analysis of population genetic data v1.1. Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, Connecticut, USA.

Holsinger, K. E., P. O. Lewis, AND D. K. Dey. 2002. A Bayesian approach to inferring population structure from dominant markers. Molecular Ecology 11: 1157–1164.[CrossRef][Medline]

Holsinger, K. E., AND L. E. Wallace. 2004. Bayesian approaches for the analysis of populations genetic structure: An example from Platanthera leucophaea (Orchidaceae). Molecular Ecology 13: 887–894.[CrossRef][Medline]

Huang, J. C., AND M. Sun. 2000. Genetic diversity and relationships of sweetpotato and its wild relatives in Ipomoea series Batatas (Convolvulaceae) as revealed by inter-simple sequence repeat (ISSR) and restriction analysis of chloroplast DNA. Theoretical and Applied Genetics 100: 1050–1060.[CrossRef][Web of Science]

Hubby, J. L., AND R. C. Lewontin. 1966. A molecular approach to the study of genic heterozygosity in natural populations. I. The number of alleles at different loci in Drosophila pseudoobscura. Genetics 54: 577–594.[Free Full Text]

Krauss, S. L. 2000. Accurate gene diversity estimates from amplified fragment length polymorphism (AFLP) markers. Molecular Ecology 9: 1241–1245.[CrossRef][Medline]

Lewontin, R. C., AND J. L. Hubby. 1966. A molecular approach to the study of genic heterozygosity in natural populations. II. Amount of variation and degree of heterozygosity in natural populations of Drosophila pseudoobscura. Genetics 54: 595–609.[Free Full Text]

Lynch, M., AND B. G. Milligan. 1994. Analysis of population genetic structure with RAPD markers. Molecular Ecology 3: 91–99.[Medline]

Meudt, H. M., AND A. C. Clarke. 2007. Almost forgotten or latest practice? AFLP applications, analyses and advances. Trends in Plant Science 12: 106–117.[CrossRef][Web of Science][Medline]

Nei, M. 1973. Analysis of gene diversity in subdivided populations. Proceedings of the National Academy of Sciences, USA 70: 3321–3323.[Abstract/Free Full Text]

Nei, M. 1977. F-statistics and analysis of gene diversity in subdivided populations. Annals of Human Genetics 41: 225–233.[Web of Science][Medline]

Nei, M., AND R. K. Chesser. 1983. Estimation of fixation indices and gene diversities. Annals of Human Genetics 47: 253–259.[Web of Science][Medline]

Nybom, H. 2004. Comparison of different nuclear DNA markers for estimating intraspecific genetic diversity in plants. Molecular Ecology 13: 1143–1155.[CrossRef][Medline]

Nybom, H., AND I. V. Bartish. 2000. Effects of life history traits and sampling strategies on genetic diversity estimates obtained with RAPD markers in plants. Perspectives in Plant Ecology, Evolution and Systematics 3: 93–114.[CrossRef]

Park, S.-J., E. J. Korompai, J. Francisco-Ortega, A. Santos-Guerra, AND R. K. Jansen. 2001. Phylogenetic relationships of Tolpis (Asteraceae: Lactuceae) based on ndh F sequence data. Plant Systematics and Evolution 226: 23–33.[CrossRef][Web of Science]

Schulman, A. H. 2007. Molecular markers to assess genetic diversity. Euphytica 158: 313–321.[CrossRef][Web of Science]

Schwartz, O. A. 1985. Lack of protein polymorphism in the endemic relict Chrysosplenium iowense (Saxifragaceae). Canadian Journal of Botany 63: 2031–2034.

Sunnucks, P. 2000. Efficient genetic markers for population biology. Trends in Ecology & Evolution 15: 199–203.[CrossRef][Web of Science][Medline]

Virk, P. S., J. Zhu, H. J. Newbury, G. J. Bryan, M. T. Jackson, AND B. V. Ford-Lloyd. 2000. Effectiveness of different classes of molecular markers for classifying and revealing variation in rice (Oryza sativa) germplasm. Euphytica 112: 275–284.[CrossRef][Web of Science]

Volis, S., B. Yakubov, I. Shulgina, D. Ward, AND S. Mendlinger. 2005. Distinguishing adaptive from nonadaptive genetic differentiation: comparison of QST and FST at two spatial scales. Heredity 95: 466–475.[CrossRef][Web of Science][Medline]

Weeden, N. F., AND J. F. Wendel. 1989. Genetics of plant isozymes. In D. E. Soltis, and P. S. Soltis [eds.], Isozymes in plant biology, 87–105. Dioscorides Press, Portland, Oregon, USA.

Weir, B. S. 1996. Genetic data analysis II. Sinauer, Sunderland, Massachusetts, USA.

Weir, B. S., AND C. C. Cockerham. 1984. Estimating F-statistics for the analysis of population structure. Evolution 38: 1358–1370.[CrossRef][Web of Science]

Wendel, J. F., AND N. F. Weeden. 1989. Visualization and interpretation of plant isozymes. In D. E. Soltis, and P. S. Soltis [eds.], Isozymes in plant biology, 5–45. Dioscorides Press, Portland, Oregon, USA.

Wright, S. 1943. Isolation by distance. Genetics 28: 114–138.[Free Full Text]

Wright, S. 1951. The genetical structure of populations. Annals of Eugenics 15: 323–354.[Web of Science]

Yeh, F. C., AND T. J. B. Boyle. 1997. Population genetic analysis of co-dominant and dominant markers and quantitative traits. Belgian Journal of Botany 129: 157.

Yeh, F. C., R. Yang, AND T. J. B. Boyle. 1997. POPGENE version 1.32. Ag/For Molecular Biology and Biotechnology Centre, University of Alberta and Center for International Forestry Research, Edmonton, Alberta, Canada. Website http://www.ualberta.ca/~fyeh.

Zar, J. H. 1999. Biological statistics. Prentice Hall, Upper Saddle River, New Jersey, USA.

Zeng, J., Y. Zou, J. Bai, AND H. Zheng. 2003. RAPD analysis of genetic variation in natural populations of Betula alnoides from Guangxi, China. Euphytica 134: 33–41.[CrossRef][Web of Science]

Zhivotovsky, L. A. 1999. Estimating population structure in diploids with multilocus dominant DNA markers. Molecular Ecology 8: 907–913.[CrossRef][Medline]


Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Facebook Facebook   Add to Reddit Reddit   Add to Technorati Technorati   Add to Twitter Twitter    What's this?



This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Submit a response
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Levsen, N. D.
Right arrow Articles by Mort, M. E.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Levsen, N. D.
Right arrow Articles by Mort, M. E.
Agricola
Right arrow Articles by Levsen, N. D.
Right arrow Articles by Mort, M. E.
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Facebook   Add to Reddit   Add to Technorati   Add to Twitter  
What's this?


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS