Abstract
In the analysis of prevention and intervention studies, it is often important to investigate whether treatment effects vary among subgroups of patients defined by individual characteristics. These “subgroup analyses” can provide information about how best to use a new prevention or intervention program. However, subgroup analyses can be misleading if they test data-driven hypotheses, employ inappropriate statistical methods, or fail to account for multiple testing. These problems have led to a general suspicion of findings from subgroup analyses. This article discusses sound methods for conducting subgroup analyses to detect moderators. Multiple authors have argued that, to assess whether a treatment effect varies across subgroups defined by patient characteristics, analyses should be based on tests for interaction rather than treatment comparisons within the subgroups. We discuss the concept of heterogeneity and its dependence on the metric used to describe treatment effects. We discuss issues of multiple comparisons related to subgroup analyses and the importance of considering multiplicity in the interpretation of results. We also discuss the types of questions that would lead to subgroup analyses and how different scientific goals may affect the study at the design stage. Finally, we discuss subgroup analyses based on post-baseline factors and the complexity associated with this type of subgroup analysis.
Similar content being viewed by others
References
Aguinis, H., & Gottfredson, R.K. (2010). Best-practice recommendations for estimating interaction effects using moderated multiple regression. Journal of Organizational Behavior, 31, 776–786. doi:10.1002/job.719.
Aiken, L.S., & West, S.G. (1991). Multiple regression: testing and interpreting interactions. Newbury Park, CA: Sage.
Altman, D.G., & Andersen, K. (1999). Calculating the number needed to treat for trials where the outcome is time to an event. British Medical Journal, 319, 1492–1495. Retrieved from http://www.bmj.com/.
Altman, D.G., Schulz, K.F., Moher, D., Egger, M., Davidoff, F., Elbourne, D., ... Lang, T. (2001). The revised CONSORT statement for reporting randomized trials: Explanation and elaboration. Annals of Internal Medicine, 134, 663–694. Retrieved from http://www.annals.org/.
Assmann, S.F., Hosmer, D.W., Lemeshow, S., & Mundt, K.A. (1996). Confidence intervals for measures of interactions. Epidemiology, 7, 286–290. doi:10.1097/00001648-199605000-00012.
Assmann, S.F., Pocock, S.J., Enos, L.E., & Kasten, L.E. (2000). Subgroup analysis and other (mis)uses of baseline data in clinical trials. Lancet, 355, 1064–1069. doi:10.1016/S0140-6736(00)02039-0.
Baron, R.M., & Kenny, D.A. (1986). The moderator-mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51, 1173–1182. doi:a0020761/0022-3514.51.6.1173.
Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B, 57, 289–300. Retrieved from http://www.wiley.com/bw/journal.asp?ref=1369-7412&site=1.
Bombardier, C., Laine, L., Reicin, A., Shapiro, D., Burgos-Vargas, R., Davis, B., ...Schnitzer, T.J. (2000). Comparison of upper gastrointestinal toxicity of rofecoxib and naproxen in patients with rheumatoid arthritis. New England Journal of medicine, 343, 1520–1528. doi:10.1056/NEJM200011233432103.
Bonetti, M., & Gelber, R.D. (2000). A graphical method to assess treatment-covariate interactions using the Cox model on subsets of the data. Statistics in Medicine, 19, 2595–2609. doi:10.1002/1097-0258(20001015)19:19<2595::AIDSIM562>3.0.CO;2-M.
Bonetti, M., & Gelber, R.D. (2004). Patterns of treatment effects in subsets of patients in clinical trials. Biostatistics, 5, 465–481. doi:10.1093/biostatistics/kxh002.
Breiman, L., Friedman, J.H., Olshen, R.A., & Stone, C.J. (1998). Classification and regression trees. Boca Raton, FL: Chapman & Hall/CRC.
Byar, D. P. (1985). Assessing apparent treatment-covariate interactions in randomized clinical trials. Statistics in Medicine, 4, 255–263. doi:10.1002/sim.4780040304.
Byar, D.P., & Green, S. (1980). The choice of treatment for cancer patients based on covariate information: Application to prostate cancer. Bulletin du Cancer, 67, 477–490. Retrieved from http://www.john-libbey eurotext.fr/en/revues/medecine/bdc/sommaire.md.
Cai, T., Tian, L., Wong, P.H., & Wei, L.J. (2010). Analysis of randomized comparative clinical trial data for personalized treatment selections. Biostatistics, Advance online publication. doi: 10.1093/biostatistics/kxq060
Cole, S.R., & Hernan, M.A. (2002). Fallibility in estimating direct effects. International Journal of Epidemiology, 31, 163–165. doi:10.1093/ije/31.1.163.
Collins, L.M. (2009). Design of experiments with multiple independent variables: A resource management perspective on complete and reduced factorial designs. Psychological Methods., 14, 202–224. doi:a0020761/a0015826.
Cook, R.J., & Sackett, D.L. (1995). The number needed to treat: A clinically useful measure of treatment effect. BMJ, 310, 452–454. Retrieved from http://www.bmj.com/.
Curfman, G.D., Morrissey, S., & Drazen, J.M. (2005). Expression of concern: Bombardier et al., Comparison of upper gastrointestinal toxicity of rofecoxib and naproxen in patients with rheumatoid arthritis. N Engl J Med 2000;343:1520–8. New England Journal of Medicine, 353, 2813–14. doi: 10.1056/NEJMe058314
Gail, M., & Simon, R. (1985). Testing for qualitative interactions between treatment effects and patient subsets. Biometrics, 41, 361–372. doi:10.2307/2530862.
Gardner, F., Connell, A., Trentacosta, C.J., Shaw, D.S., Dishion, T.J., & Wilson, M.N. (2009). Moderators of Outcome in a Brief Family-Centered Intervention for Preventing Early Problem Behavior. Journal of Consulting and Clinical Psychology, 77, 543–553. doi:a0020761/a0015622.
Halperin, M., Ware, J.H., Byar, D.P., Mantel, N., Brown, C.C., Koziol, J., ...Green, S.B. (1977). Testing for interaction in an I × J × K contingency table. Biometrika, 64, 271–275. doi:10.2307/2335693.
Hastie, T., & Tibshirani, R. (1990). Generalised additive models. Boca Raton, FL: Chapman and Hall/CRC.
Hernández, A., Boersma, E., Murray, G.D., Habbema, J.D., & Steyerberg, E.W. (2006). Subgroup analyses in therapeutic cardiovascular clinical trials: Are most of them misleading? American Heart Journal, 151, 257–264. doi:10.1016/j.ahj.2005.04.020.
Hochberg, Y. (1988). A sharper Bonferroni procedure for multiple tests of significance. Biometrika, 75, 800–802. doi:10.1093/biomet/75.4.800.
Holm, S. (1979). A simple sequential rejective multiple test procedure. Scandinavian Journal of Statistics, 6, 65–70. Retrieved from http://www.blackwellpublishing.com/journal.asp?ref=0303-6898.
Hommel, G. (1988). A stagewise rejective multiple test procedure on a modified Bonferroni test. Biometrika, 75, 383–386. doi:10.1093/biomet/75.2.383.
Hosmer, D.W., & Lemeshow, S. (1992). Confidence interval estimation of interaction. Epidemiology, 3, 452–456. doi:10.1097/00001648-199209000-00012.
Imai, K., Keele, L., & Tingley, D. (2010). A general approach to causal mediation analysis. Psychological Methods, 15, 309–334. doi::a0020761/a0020761.
Jackson, R.D., LaCroix, A.Z., Gass, M., Wallace, R.B., Robbins, J., Lewis, C.E., ...Barad, D. (2006). Calcium plus vitamin D supplementation and the risk of fractures. New England Journal of Medicine, 354, 669–683. doi:10.1056/NEJMoa055218 [Erratum, N Engl J Med 2006; 354:1102].
Jo, B. (2008). Causal inference in randomized experiments with mediational processes. Psychological Methods, 13, 314–336. doi:a0020761/a0014207.
Judd, C.M., & Kenny, D.A. (1981). Process analysis: Estimating mediation in treatment evaluations. Evaluation Review, 5, 602–619. doi:10.1177/0193841X8100500502.
Julius, S., Nesbitt, S.D., Egan, B.M., Weber, M.A., Michelson, E.L., Kaciroti, N.,.... Schork, M.A. (2006). Feasibility of treating prehypertension with an angiotension-receptor blocker. New England Journal of Medicine, 354, 1685–1697. doi:10.1056/NEJMoa060838.
Kent, D.M., & Hayward, R.A. (2007). Limitations of applying summary results of clinical trials to individual patients, the need for risk stratification. Journal of American Medical Association, 298, 1209–1212. doi:10.1001/jama.298.10.1209.
Keppel, G., & Wickens, T.D. (2004). Design and analysis: A researcher’s handbook. Upper Saddle River, NJ: Pearson/Prentice Hall.
Koch, G.G., & Gansky, S.A. (1996). Statistical considerations for multiplicity in confirmatory protocols. Drug Information Journal, 30, 523–533. Retrieved from http://www.diahome.org/DIAHome/Resources/FindPublications.aspx.
Kraemer, H.C. (2004). Reconsidering the odds ratio as a measure of 2 × 2 association in a population. Statistics in Medicine, 23, 257–270. doi:10.1002/sim.1714.
Kraemer, H.C. (2006). Moderators of treatment outcomes: Clinical, research, and policy importance. Journal of the American Medical Association, 296, 1–4. doi:10.1001/jama.296.10.1286.
Kraemer, H.C. (2008). Toward non-parametric and clinically meaningful moderators and mediators. Statistics in Medicine, 27, 1679–1692. doi:10.1002/sim.3149.
Kraemer, H.C., Wilson, T., Fairburn, C. G., & Agras, W. S. (2002). Mediators and moderators of treatment effects in randomized clinical trials. Archives of General Psychiatry, 59, 877–883. doi:10.1001/archpsyc.59.10.877.
Kraemer, H.C., Kiernan, M., Essex, M., & Kupfer, D. J. (2008). How and why criteria defining moderators and mediators differ between the Baron & Kenny and the MacArthur approaches. Health Psychology, 27, S101–S108. Retrieved from http://www.apa.org/pubs/journals/hea/.
Lagakos, S.W. (2006). The challenge of subgroup analyses—reporting without distorting. New England Journal of Medicine, 354, 1667–1669. doi:10.1056/NEJMp068070.
Lemon, S.C., Roy, J., Clark, M.A., Friedmann, P.D., & Rakowski, W. (2003). Classification and regression tree analysis in public health: Methodological review and comparison with logistic regression. Annals of Behavioral Medicine, 26, 172–181. doi:10.1207/S15324796ABM2603_02.
Li, R., & Chambless, L. (2007). Test for additive interaction in proportional hazards models. Annals of Epidemiology, 17, 227–236. doi:10.1016/j.annepidem.2006.10.009.
MacCallum, R.C., Zhang, S., Preacher, K.J., & Rucker, D.D. (2002). On the practice of dichotomization of quantitative variables. Psychological Methods, 7, 19–40. doi:a0020761/1082-989X.7.1.19.
MacKinnon, D.P. (2008). Introduction to statistical mediation analysis. New York, NY: Taylor & Francis Group.
MacKinnon, D.P., & Dwyer, J.H. (1993). Estimating mediated effects in prevention studies. Evaluation Review, 17, 144–158. doi:10.1177/0193841X9301700202.
Marra, G., & Radice, R. (2010). Penalised regression splines: Theory and application to medical research. Statistical Methods in Medical Research, 19, 107–125. doi:10.1177/0962280208096688.
Meckstroth, A., Burwick, A., Moore, Q., Ponza, M., Marsh, S., McGuirk, A., Zhao, Z. (2008). Teaching self-sufficiency: An impact and benefit-cost analysis of a home visitation and life skills education program. Retrieved from Mathematics Policy Research website: http://www.mathematica-mpr.com/publications/pdfs/teaching_self.pdf
Newcombe, R.G. (2006). A deficiency of the odds ratio as a measure of effect size. Statistics in Medicine, 25, 4235–4240. doi:10.1002/sim.2683.
Pan, G., & Wolfe, D.A. (1997). Test for qualitative interaction of clinical significance. Statistics in Medicine, 16, 1645–1652. doi:10.1002/(SICI)1097-0258(19970730)16:14<1645::AID-SIM596>3.0.CO;2-G.
Patel, K.M., & Hoel, D.G. (1973). A nonparametric test for interaction in factorial experiments. Journal of the American Statistical Association, 68, 615–620. doi:10.2307/2284788.
Pearl, J. (2001). Direct and indirect effects. Proceedings of the Seventeenth Conference on Uncertainty and Artificial Intelligence, 411–20. San Francisco: Morgan Kaufmann.
Peto, R. (1982). Statistical aspects of cancer trials. In K. E. Halnan (Ed.), Treatment of Cancer (pp. 867–871). London: Chapman and Hall.
Piantadosi, S., & Gail, M.H. (1993). A comparison of the power of two tests for qualitative interactions. Statistics in Medicine, 12, 1239–1248. doi:10.1002/sim.4780121105.
Robins, J.M., & Greenland, S. (1992). Identifiabilty and exchangeability for direct and indirect effects. Epidemiology, 3, 143–155. doi:10.1097/00001648-199203000-00013.
Rothman, K.J. (1986). Modern Epidemiology. Boston, MA: Little, Brown and Company.
Sackett, D.L. (1996). Down with odds ratios! Evidence-Based Medicine, 1, 164–166. doi:10.1629/09178.
Sacks, F.M., Pfeffer, M.A., Moye, L.A., Rouleau, J.L., Rutherford, J.D., Cole, T.G.,... Braunwald, E. (1996). The effect of Pravastatin on coronary events after Myocardial infarction in patients with average cholesterol levels. The New England Journal of Medicine, 335, 1001–1009. doi:10.1056/NEJM199610033351401.
Schemper, M. (1988). Non-parametric analysis of treatment-covariate interaction in the presence of censoring. Statistics in Medicine, 7, 1257–1266. doi:10.1002/sim.4780071206.
Schwartz, L.M., Woloshin, S., & Welch, H.G. (1999). Misunderstandings about the effects of race and sex on physicians’ referrals for cardiac catheterization. New England Journal of Medicine, 341, 279–283. doi:10.1056/NEJM199907223410411.
Shaffer, J.P. (1995). Multiple Hypothesis Testing. Annual Review of Psychology, 46, 561–584. doi:10.1146/annurev.ps.46.020195.003021.
Shuster, J., & van Eys, J. (1983). Interaction between prognostic factors and treatment. Controlled Clinical Trials, 4, 209–214. doi:10.1016/0197-2456(83)90004-1.
Silvapulle, M.J. (2001). Tests against qualitative interaction: Exact critical values and robust tests. Biometrics, 57, 1157–1165. doi:10.1111/j.0006-341X.2001.01157.x.
Simes, J.R. (1986). An improved Bonferroni procedure for multiple tests of significance. Biometrika, 73, 751–754. doi:10.1093/biomet/73.3.751.
Sleeper, L.A., & Harrington, D.P. (1990). Regression splines in the Cox model with application to covariate effects in liver disease. Journal of the American Statistical Association, 85, 941–949. doi:10.2307/2289591.
Sobel, M.E. (2008). Identification of causal parameters in randomized studies with mediating variables. Journal of Educational and Behavioral Statistics, 33, 230–251. doi:10.3102/1076998607307239.
Song, S., & Pepe, M.S. (2004). Evaluating markers for selecting a patient’s treatment. Biometrics, 60, 874–883. doi:10.1111/j.0006-341X.2004.00242.x.
Storey, J.D., & Tibshirani, R. (2003). Statistical significance for genomewide studies. Proceedings of the National Academy of Sciences of the United States of America, 100, 9440–9445. doi:10.1073/pnas.1530509100.
Tolan, P.H., Gorman-Smith, D., Henry, D., & Schoney, M. (2009). The Benefits of Booster Interventions: Evidence from a Family-Focused Prevention Program. Prevention Science, 10, 287–297. doi:10.1007/s11121-009-0139-8.
Van den Berghe, G., Wilmer, A., Hermans, G., Meersseman, W., Wouters, P.J., Milants, L., ... Bouillon, R. (2006). Intensive Insulin Therapy in the Medical ICU. New England Journal of Medicine, 354, 449–461. doi:10.1056/NEJMoa052521.
VanderWeele, T.J. (2010). Bias formulas for sensitivity analysis for direct and indirect effects. Epidemiology, 21, 540–551. doi:10.1097/EDE.0b013e3181df191c.
VanderWeele, T.J., & Knol, M.J. (2011). The interpretation of subgroup analyses in randomized trials: Heterogeneity versus secondary interventions. Annals of Internal Medicine, in press.
VanderWeele, T.J., & Robins, J.M. (2007). Four types of effect modification: A classification based on directed acyclic graphs. Epidemiology, 18, 561–568. doi:10.1097/EDE.0b013e318127181b.
VanderWeele, T.J., & Vansteelandt, S. (2009). Conceptual issues concerning mediation, interventions and composition. Statistics and Its Interface, 2, 457–468. Retrieved from http://www.intlpress.com/SII/.
VanderWeele, T.J., & Vansteelandt, S. (2010). Odds ratios for mediation analysis with a dichotomous outcome. American Journal of Epidemiology, 172, 1339–1348. doi:10.1093/aje/kwq332.
Wactawski-Wende, J., Kotchen, J.M., Anderson, G.L., Assaf, A.R., Brunner, R.L., O’Sullivan, M.J., ... Manson, E. (2006). Calcium plus vitamin D supplementation and the risk of colorectal cancer. New England Journal of Medicine, 354, 684–696. doi:10.1056/NEJMoa055222.
Wang, R., Lagakos, S.W., Ware, H., Hunter, D.J., & Drazen, J.M. (2007). Statistics in medicine—reporting of subgroup analyses in clinical trials. New England Journal of Medicine, 357, 2189–2194. doi:10.1056/NEJMsr077003.
Wen, L., Badgett, R., & Cornell, J. (2005). Number needed to treat: A descriptor for weighing therapeutic options. American Journal of Health-System Pharmacology, 62, 2031–2036. doi:10.2146/ajhp040558.
Acknowledgment
We dedicate this paper to our friend and colleague, Dr. Stephen W. Lagakos, who inspired the work and provided valuable insights and discussions on many aspects of subgroup analyses. We are grateful to Drs. Robert J. McMahon, David P. Mackinnon, Tyler VanderWeele, and three reviewers for their comments, which have improved the paper. This work was in part supported by grant AI24643 from the National Institutes of Health.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, R., Ware, J.H. Detecting Moderator Effects Using Subgroup Analyses. Prev Sci 14, 111–120 (2013). https://doi.org/10.1007/s11121-011-0221-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11121-011-0221-x