|
|
||||||||
Population Biology |
Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, 1735 Neil Avenue, Columbus, Ohio 43210-1293 USA
Received for publication May 3, 2001. Accepted for publication August 28, 2001.
| ABSTRACT |
|---|
|
|
|---|
Key Words: genetic variation GST Hamrick and Godt population differentiation
| INTRODUCTION |
|---|
|
|
|---|
|
As one of Nei's (1973)
genetic diversity statistics, GST is defined as the proportion of genetic diversity that resides among populations. It is equivalent to Wright's (1951)
FST when there are only two alleles at a locus, and, in the case of multiple alleles, GST is equivalent to the weighted average of FST for all alleles (Nei, 1973
). GST is also similar to Weir and Cockerham's (1984)
, except that the latter accounts for effects of uneven sample sizes and number of sampled populations. Although rare,
may take on negative values (Weir, 1996
). GST is calculated from the total genetic diversity in the pooled populations (HT) and mean diversity within each population (HS) as:
pi2, where pi is the frequency of a given allele; Nei, 1973
|
In the Nei method, HT and HS values are first averaged across all loci, and GST is calculated from these mean values according to Eq. 1. Although both monomorphic and polymorphic loci are usually used in calculations of HT and HS (Nei, 1986
), Nei's GST is unaltered by the inclusion of monomorphic loci because they contribute to both the numerator and denominator; in effect, they cancel each other out (i.e., N is absent in Eq. 14 below). Nei's GST can be rewritten as the following, where N is the number of all loci:
As evident in Eqs. 8 and 14, the HG and Nei methods of calculating GST are not mathematically identical. Both methods will yield the same value in only a few rare cases. First, if all populations are completely differentiated from one another, all of the diversity will lie among populations, rather than within them (HS = 0). If this is true for all loci, both methods yield GST values of one. This can also occur when values of HT are identical over all loci. Second, if the total diversity is contained within each population (HS = HT) for all loci, a GST value of zero will result using both methods. Finally, HG's and Nei's GST will be equivalent if values of (HS/HT) are identical for all loci. If any of the above cases is not true for at least one locus (e.g., 0 < HS < HT), GST values calculated using the Nei and HG methods will differ to some extent from one another.
| MATERIALS AND METHODS |
|---|
|
|
|---|
) and (2) whether the statistic was compared to GST values presented in Hamrick and Godt (1989)
Of the 695 studies that cited Hamrick and Godt (1989)
, a large number (45%) did not calculate any statistics. Several other papers included various measures of population differentiation, such as FST (15%),
(3%),
ST (<1%), and
(<1%). Over a third of the total papers (36% or 252 studies) reported a GST value. Of these, 49 studies contained insufficient information so that (1) neither GST value could be recomputed, (2) only one value could be recalculated and this did not match the reported GST, or (3) both values could be recalculated but neither matched the reported number. These studies were not considered further.
Of the 203 remaining studies in which GST could be recalculated, the method (HG or Nei) was confidently determined in 167 papers (82%) because enough data were given to calculate GST using both methods. This set of papers will hereafter be referred to as HGconfident or Neiconfident, depending upon which method could be confidently assigned. In the remaining 36 papers (18%), the method could not be established with certainty because insufficient data made it impossible to calculate GST according to both methods. For example, mean HT and HS values were sometimes reported (allowing computation of Nei's GST), but tables of allele frequencies or GST values across loci were not given (i.e., HG's GST could not be calculated). In these papers, the GST value that could be recalculated matched the reported GST within the scope of rounding error (approximately ±0.01). These studies will henceforth be referred to as HGinfer or Neiinfer, depending upon which method could be inferred.
| RESULTS |
|---|
|
|
|---|
61% of these papers contained comparisons to Hamrick and Godt's (1989)
|
|
|
| DISCUSSION |
|---|
|
|
|---|
0.010.80). In contrast, papers in which there were no substantial differences between the two values (
0.002) generally had a lower range of GST values across loci (0.00.30), although there were some exceptions. In addition, similar GST values were obtained in rare cases in which populations were either completely fixed for different alleles (HS = 0) or had identical allele frequencies (HS = HT) for all loci (two cases noted earlier). A wide range of GST values usually resulted from a mixture of uneven allele frequencies across populations at some loci (resulting in high GST values) and similar allele frequencies across populations at other loci (low GST). Uneven frequencies at individual loci were largely due to fixation of different alleles and/or the loss of a common allele within a few populations, which could occur as a result of a reduction in effective population size and subsequent genetic drift. A low range of GST values typically reflected a similarity of allele frequencies across all populations for all loci.
The HG and Nei GST values may also diverge because of a difference in their underlying mathematical properties. As apparent in Fig. 3, the Nei method gave relatively higher GST values than the HG technique in a number of papers. This is an example of Jensen's inequality, a mathematical property of nonlinear functions (Hansen, 2000
). Essentially, the difference between the HG and Nei methods is how a ratio (HS/HT) is averaged before it is subtracted from one (Eq. 2). In this particular case, Jensen's inequality states that the mean of a ratio (the HG method) will always be less than the ratio of the means (the Nei method), assuming that both the numerator (HS) and denominator (HT) are independent of one another. If true, the HG estimate of GST should be relatively lower than the Nei estimate. In an analysis of ten studies with large GST differences, HT and HS values were less correlated (more independent) with one another (r = 0.61, P = 0.58) than in ten other studies in which there was no difference between Nei and HG values (r = 0.98, P = 0.0001). Thus, there are certain cases in which the Nei method will give relatively higher GST values than the HG method simply because of the way means are calculated.
A closer examination also revealed that the Nei method appears to be more sensitive to interlocus variation in allele frequencies than the HG technique. If HT increases relative to HS for a locus, Nei's GST will increase while HG's value remains the same. For example, the addition of a third population fixed for a third allele to a system results in an increase of Nei's GST, while the HG value remains unchanged (Table 3). As more populations and fixed alleles are added to the first locus, Nei's GST increases, while HG's value remains the same.
|
|
|
If a comparison to a published study is not intended, an additional factor to consider in choosing an appropriate method is whether one technique is mathematically or biologically more meaningful than the other. Unfortunately, there is no clear answer about which method is best to use. From a mathematical viewpoint, there is no inherent reason to prefer taking the mean of ratios over the ratio of means, but the HG method is more likely to generate lower results than Nei's method (i.e., Jensen's inequality). At present, there is no way of knowing which value best represents population genetic structure in a given situation. The HG method gives equal weight to loci with high and low HT values, which may result in a range of different GST values across loci; the Nei method would be less affected by such loci (J. Hamrick, University of Georgia, personal communication). The Nei method may be more biologically meaningful, as it is more sensitive to variation in allele frequencies across populations (the very factor promoting population differentiation). However, investigators should consider using other genetic measures instead of GST (see below) if a comparison to Hamrick and Godt's (1989)
review is not intended. For example,
is advantageous because it has a real biological definition (correlation of uniting games) and is unbiased with respect to sample size and the number of sampled populations (Weir and Cockerham, 1984
).
Based on our review, we have several suggestions for future studies in which GST values are presented. First, the method of calculation should be clearly explained; many studies have cited Nei (1973)
without any further explanation. At the very least, the HG method can be described as "GST values were averaged over polymorphic loci," while the Nei method can be expressed as "the mean values of HT and HS over all loci were used to calculate GST according to Nei (1973)
." Second, GST values should be reported for individual loci (along with HT, HS, and DST) to facilitate calculations of both HG and Nei GST values and to allow an examination of variation in these statistics. Several studies only reported mean values of HT, HS, and GST for the species or population(s) in question. Third, sample sizes should be given in allele frequency tables, as the absence of this information makes it impossible to recalculate GST values. Fourth, scientists should carefully consider whether biased or unbiased estimates of HT and HS are appropriate for the data, especially when studying rare species (see Nei, 1978, 1986
; Nei and Chesser, 1983
; Chakraborty and Danker-Hopfe, 1991
). Finally, there was some confusion in the literature over the use of HT, HS, Hes, and Hep when comparing results to Hamrick and Godt (1989)
. In that review, genetic diversity was calculated over all loci (monomorphic and polymorphic) as Hes at the species level by pooling across all populations, and as Hep at the population level (mean of population values). These statistics are analogous to Nei's (1973)
HT and HS, respectively. However, the HT and HS values reported in Hamrick and Godt (1989)
are calculated over polymorphic loci only, and as such, are not directly comparable with Nei's HT and HS. Thus, it is important that the researcher state whether all loci or only polymorphic loci were used to calculate HT and HS.
In this paper, we have been concerned with GST as a measure of population differentiation because it is commonly used and there was an urgent need to clarify how it is calculated. Whether or not GST is most appropriate as a measure of population structure is yet another issue. Several other statistics do exist (e.g., FST,
, RST,
ST) and may be more suitable than GST in many situations. For example, GST is dependent on sample sizes and number of populations, in addition to its reliance on Hardy-Weinberg genotype proportions (P. Lewis, University of Connecticut, personal communication), conditions that may be violated when analyzing small isolated populations. Although Nei (1986)
and Nei and Chesser (1983)
suggested that unbiased estimates of GST can be obtained through modifications of the original formula (but see Cockerham and Weir, 1986
), others have argued that
is a better estimator (Weir and Cockerham, 1984
; but see Chakraborty and Danke-Hopfe, 1991
). In view of this debate, investigators should include raw genotype counts in published studies so that
can be recomputed if necessary for comparison with other studies. Although a discussion of these genetic statistics is beyond the scope of the current paper, the researcher should carefully consider which statistic is best for the system under investigation.
| FOOTNOTES |
|---|
2 Author for reprint requests, current address: Department of Ecology and Evolutionary Biology, University of California-Irvine, Irvine, California 92697-2525 USA (tel: 949-824-1772, FAX: 949-824-2181; tculley{at}uci.edu
) ![]()
3 Current address: Department of Ecology and Evolutionary Biology, University of Kansas, 1200 Sunnyside Avenue, Lawrence, Kansas 66045-7534 USA ![]()
| LITERATURE CITED |
|---|
|
|
|---|
Bossart J. L. D. P. Prowell 1998 Genetic estimates of population structure and gene flow: limitations, lessons and new directions. Trends in Ecology and Evolution 13: 202-206
Ceska J. F. J. M. Affolter J. L. Hamrick 1997 Developing a sampling strategy for Baptisia arachnifera based on allozyme diversity. Conservation Biology 11: 1133-1139[CrossRef][ISI]
Chakraborty R. H. Danker-Hopfe 1991 Analysis of population structure: a comparative study of different estimators of Wright's fixation indices. In C. R. Rao and R. Chakraborty [eds.], Handbook of statistics, vol. 8, Statistical methods in biological and medical sciences, 203254. Elsevier Science, New York, New York, USA
Cockerham C. C. B. S. Weir 1986 Estimation of inbreeding parameters in stratified populations. Annals of Human Genetics 50: 271-281[ISI][Medline]
Gitzendanner M. A. P. S. Soltis 2000 Patterns of genetic variation in rare and widespread plant congeners. American Journal of Botany 87: 783-792
Hamrick J. L. M. J. W. Godt 1989 Allozyme diversity in plant species. In A. H. D. Brown, M. T. Clegg, A. L. Kahler, and B. S. Weir [eds.], Plant population genetics, breeding, and genetic resources, 4363. Sinauer, Sunderland, Massachusetts, USA
Hansen F. 2000 Operator inequalities associated with Jensen's inequality. In T. M. Rassias [ed.], Survey on classical inequalities, 6798. Kluwer Academic, Dordrecht, The Netherlands
Nei M. 1973 Analysis of gene diversity in subdivided populations. Proceedings of the National Academy of Sciences, USA 70: 3321-3323
Nei M. 1978 Estimation of average heterozygosity and genetic distance from a small number of individuals. Genetics 89: 583-590
Nei M. 1986 Definition and estimation of fixation indices. Evolution 40: 643-645[CrossRef][ISI]
Nei M. R. K. Chesser 1983 Estimation of fixation indices and gene diversities. Annals of Human Genetics 47: 253-259[ISI][Medline]
Weir B. S. C. C. Cockerham 1984 Estimating F-statistics for the analysis of population structure. Evolution 38: 1358-1370[CrossRef][ISI]
Weir B. S. 1996 Genetic data analysis II. Sinauer, Sunderland, Massachusetts, USA
Whitlock M. C. D. E. McCauley 1999 Indirect measures of gene flow and migration: FST
1/(4Nm + 1). Heredity 82: 117-125
Wright S. 1951 The genetical structure of populations. Annals of Eugenetics 15: 323-354
This article has been cited by other articles:
![]() |
R. C. Johnson, T. J. Kisha, and M. A. Evans Characterizing Safflower Germplasm with AFLP Molecular Markers Crop Sci., July 30, 2007; 47(4): 1728 - 1736. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. MANDAK, K. BIMOVA, I. PLACKOVA, V. MAHELKA, and J. CHRTEK Loss of Genetic Variation in Geographically Marginal Populations of Atriplex tatarica (Chenopodiaceae) Ann. Bot., October 1, 2005; 96(5): 901 - 912. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. T. Tercek, D. P. Hauber, and S. P. Darwin Genetic and historical relationships among geothermally adapted Agrostis (bentgrass) of North America and Kamchatka: evidence for a previously unrecognized, thermally adapted taxon Am. J. Botany, September 1, 2003; 90(9): 1306 - 1312. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |