Power and sample size calculations for case-control genetic association tests when errors are present: application to single nucleotide polymorphisms

D Gordon, SJ Finch, M Nothnagel, J Ott - Human heredity, 2002 - karger.com
D Gordon, SJ Finch, M Nothnagel, J Ott
Human heredity, 2002karger.com
The purpose of this work is to quantify the effects that errors in genotyping have on power
and the sample size necessary to maintain constant asymptotic Type I and Type II error rates
(SSN) for case-control genetic association studies between a disease phenotype and a di-
allelic marker locus, for example a single nucleotide polymorphism (SNP) locus. We
consider the effects of three published models of genotyping errors on the chi-square test for
independence in the 2× 3 table. After specifying genotype frequencies for the marker locus …
Abstract
The purpose of this work is to quantify the effects that errors in genotyping have on power and the sample size necessary to maintain constant asymptotic Type I and Type II error rates (SSN) for case-control genetic association studies between a disease phenotype and a di-allelic marker locus, for example a single nucleotide polymorphism (SNP) locus. We consider the effects of three published models of genotyping errors on the chi-square test for independence in the 2 × 3 table. After specifying genotype frequencies for the marker locus conditional on disease status and error model in both a genetic model-based and a genetic model-free framework, we compute the asymptotic power to detect association through specification of the test’s non-centrality parameter. This parameter determines the functional dependence of SSN on the genotyping error rates. Additionally, we study the dependence of SSN on linkage disequilibrium (LD), marker allele frequencies, and genotyping error rates for a dominant disease model. Increased genotyping error rate requires a larger SSN. Every 1% increase in sum of genotyping error rates requires that both case and control SSN be increased by 2–8%, with the extent of increase dependent upon the error model. For the dominant disease model, SSN is a nonlinear function of LD and genotyping error rate, with greater SSN for lower LD and higher genotyping error rate. The combination of lower LD and higher genotyping error rates requires a larger SSN than the sum of the SSN for the lower LD and for the higher genotyping error rate.
Karger