Population stratification

Population stratification (or population structure) is the presence of a systematic difference in allele frequencies between subpopulations in a population, possibly due to different ancestry, especially in the context of association studies. Population stratification (or population structure) is the presence of a systematic difference in allele frequencies between subpopulations in a population, possibly due to different ancestry, especially in the context of association studies. The basic cause of population stratification is non-random mating between groups, often due to their physical separation (e.g., for populations of African and European descent) followed by genetic drift of allele frequencies in each group. In some contemporary populations there has been recent admixture between individuals from different populations, leading to populations in which ancestry is variable (as in African Americans). In some parts of the globe (e.g., in Europe), population structure is best modeled by isolation-by-distance, in which allele frequencies tend to vary smoothly with location. Population stratification can be a problem for association studies, such as case-control studies, where the association could be found due to the underlying structure of the population and not a disease associated locus. By analogy, one might imagine a scenario in which certain small beads are made out of a certain type of unique foam, and that children tend to choke on these beads; one might wrongly conclude that the foam material causes choking when in fact it is the small size of the beads. Also the real disease causing locus might not be found in the study if the locus is less prevalent in the population where the case subjects are chosen. For this reason, it was common in the 1990s to use family-based data where the effect of population stratification can easily be controlled for using methods such as the Transmission disequilibrium test (TDT). But if the structure is known or a putative structure is found, there are a number of possible ways to implement this structure in the association studies and thuscompensate for any population bias. Most contemporary genome-wide association studies take the view that the problem of population stratification ismanageable, and that the logistic advantages of using unrelated cases and controls make these studies preferable to family-based association studies. The two most widely used approaches to this problem include genomic control, which is a relatively nonparametric method for controlling the inflation of test statistics, and structured association methods, which use genetic information to estimate and control for population structure. Currently, the most widely used structured association method is Eigenstrat, developed by Alkes Price and colleagues. The assumption of population homogeneity in association studies, especially case-controlstudies, can easily be violated and can lead to both type I and type II errors. It istherefore important for the models used in the study to compensate for the populationstructure. The problem in case control studies is that if there is a genetic involvement inthe disease, the case population is more likely to be related than the individuals in thecontrol population. This means that the assumption of independence of observations isviolated. Often this will lead to an overestimation of the significance of an associationbut it depends on the way the sample was chosen. If, coincidentally, there is a higher allelefrequency in a subpopulation of the cases, you will find association with any trait that is more prevalentin the case population. This kind of spurious associationincreases as the sample population grows so the problem should be of special concern inlarge scale association studies when loci only cause relatively small effects on the trait. A method that in some cases can compensate for the above described problems has been developed by Devlin andRoeder (1999). It uses both a frequentist and a Bayesian approach (the latter beingappropriate when dealing with a large number of candidate genes). The frequentist way of correcting for population stratification works by using markers that are not linked with the trait in question to correctfor any inflation of the statistic caused by population stratification. The method wasfirst developed for binary traits but has since been generalized for quantitative ones. For the binary one, which applies to finding genetic differencesbetween the case and control populations, Devlin and Roeder (1999) use Armitage's trend test Y 2 = N ( N ( r 1 + 2 r 2 ) − R ( n 1 + 2 n 2 ) ) 2 R ( N − R ) ( N ( n 1 + 4 n 2 ) − ( n 1 + 2 n 2 ) 2 ) {displaystyle Y^{2}={frac {N(N(r_{1}+2r_{2})-R(n_{1}+2n_{2}))^{2}}{R(N-R)(N(n_{1}+4n_{2})-(n_{1}+2n_{2})^{2})}}} and the χ 2 {displaystyle chi ^{2}} test for allelic frequencies χ 2 ∼ X A 2 = 2 N ( 2 N ( r 1 + 2 r 2 ) − R ( n 1 + 2 n 2 ) ) 2 4 R ( N − R ) ( 2 N ( n 1 + 2 n 2 ) − ( n 1 + 2 n 2 ) 2 ) {displaystyle chi ^{2}sim X_{A}^{2}={frac {2N(2N(r_{1}+2r_{2})-R(n_{1}+2n_{2}))^{2}}{4R(N-R)(2N(n_{1}+2n_{2})-(n_{1}+2n_{2})^{2})}}}

Parent Topic

Child Topic

No Parent Topic