Power and predictive accuracy of polygenic risk scores - PubMed Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Mar;9(3):e1003348.
doi: 10.1371/journal.pgen.1003348. Epub 2013 Mar 21.

Power and predictive accuracy of polygenic risk scores

Affiliations

Power and predictive accuracy of polygenic risk scores

Frank Dudbridge. PLoS Genet. 2013 Mar.

Erratum in

  • PLoS Genet. 2013 Apr;9(4). doi: 10.1371/annotation/b91ba224-10be-409d-93f4-7423d502cba0

Abstract

Polygenic scores have recently been used to summarise genetic effects among an ensemble of markers that do not individually achieve significance in a large-scale association study. Markers are selected using an initial training sample and used to construct a score in an independent replication sample by forming the weighted sum of associated alleles within each subject. Association between a trait and this composite score implies that a genetic signal is present among the selected markers, and the score can then be used for prediction of individual trait values. This approach has been used to obtain evidence of a genetic effect when no single markers are significant, to establish a common genetic basis for related disorders, and to construct risk prediction models. In some cases, however, the desired association or prediction has not been achieved. Here, the power and predictive accuracy of a polygenic score are derived from a quantitative genetics model as a function of the sizes of the two samples, explained genetic variance, selection thresholds for including a marker in the score, and methods for weighting effect sizes in the score. Expressions are derived for quantitative and discrete traits, the latter allowing for case/control sampling. A novel approach to estimating the variance explained by a marker panel is also proposed. It is shown that published studies with significant association of polygenic scores have been well powered, whereas those with negative results can be explained by low sample size. It is also shown that useful levels of prediction may only be approached when predictors are estimated from very large samples, up to an order of magnitude greater than currently available. Therefore, polygenic scores currently have more utility for association testing than predicting complex traits, but prediction will become more feasible as sample sizes continue to grow.

PubMed Disclaimer

Conflict of interest statement

The author has declared that no competing interests exist.

Figures

Figure 1
Figure 1. Expected −log10(P) of linear regression estimate as a function of P-value threshold for selecting markers into the polygenic score.
Training sample, 3322 cases and 3587 controls; replication sample, 2687 cases and 2656 controls. Marker panel of 74062 independent SNPs. Variance explained by markers, 28.7%. pi0, proportion of markers with no effect on disease.
Figure 2
Figure 2. Expected −log10(P) of allele score estimate as a function of P-value threshold for selecting markers into the polygenic score.
Training sample, 3322 cases and 3587 controls; replication sample, 2687 cases and 2656 controls. Marker panel of 74062 independent SNPs. Variance explained by markers, 28.7%. pi0, proportion of markers with no effect on disease.
Figure 3
Figure 3. AUC as a function of sample size, using a panel of 100,000 markers that explains half the heritability of liability.
n, number of cases and of controls in training sample. Heritability of liability, 76% for Crohn's disease. 44% for breast cancer. Line annotations are the proportion of markers with no effect on disease.
Figure 4
Figure 4. AUC as a function of sample size, using a panel of 1,000,000 markers that explains the full heritability.
n, number of cases and of controls in training sample. Heritability of liability, 76% for Crohn's disease. 44% for breast cancer. Line annotations are the proportion of markers with no effect on disease.

Similar articles

Cited by

References

    1. Visscher PM, Brown MA, McCarthy MI, Yang J (2012) Five years of GWAS discovery. Am J Hum Genet 90: 7–24. - PMC - PubMed
    1. Wray NR, Goddard ME, Visscher PM (2007) Prediction of individual genetic risk to disease from genome-wide association studies. Genome Res 17: 1520–1528. - PMC - PubMed
    1. Purcell SM, Wray NR, Stone JL, Visscher PM, O'Donovan MC, et al. (2009) Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460: 748–752. - PMC - PubMed
    1. Ripke S, Sanders AR, Kendler KS, Levinson DF, Sklar P, et al. (2011) Genome-wide association study identifies five new schizophrenia loci. Nat Genet 43: 969–976. - PMC - PubMed
    1. Hamshere ML, O'Donovan MC, Jones IR, Jones L, Kirov G, et al. (2011) Polygenic dissection of the bipolar phenotype. Br J Psychiatry 198: 284–288. - PMC - PubMed

Publication types