Methods: Original Article

Estimating Direct Effects in Cohort and Case–Control Studies

Vansteelandt, Stijn

Author Information
Epidemiology 20(6):p 851-860, November 2009. | DOI: 10.1097/EDE.0b013e3181b6f4c9

Abstract

Estimating the effect of an exposure on an outcome, other than through some given mediator, requires adjustment for all risk factors of the mediator that are also associated with the outcome. When these risk factors are themselves affected by the exposure, then standard regression methods do not apply. In this article, I review methods for accommodating this and discuss their limitations for estimating the controlled direct effect (ie, the exposure effect when controlling the mediator at a specified level uniformly in the population). In addition, I propose a powerful and easy-to-apply alternative that uses G-estimation in structural nested models to address these limitations both for cohort and case–control studies.

Many research questions in epidemiology require estimating that part of the effect of an exposure on an outcome, which is not mediated by a given intermediate variable or mediator. This is the case when the exposure of interest stimulates secondary exposures whose presence complicates the interpretation of the targeted exposure effect. For instance, Joffe et al1 note that screening by mammogram could be intermediate in the causal pathway from postmenopausal hormone use to breast cancer diagnosis. The overall effect of postmenopausal hormone use on breast cancer diagnosis might therefore not reflect a “pure” effect. For policy makers, interest lies in the direct effect other than through increased mammography screening.

In general, the main motivation for studying direct exposure effects in epidemiology stems from the desire to unveil the biologic mechanism underlying the effect of an exposure on an outcome. For instance, to develop insight into the protective effect of moderate alcohol consumption against coronary heart disease, Brenner et al2 demonstrate that this effect may be mediated in part by beneficial effects of ethanol on lipids and hemostatic factors.

Estimating a direct effect requires adjustment for all common risk factors of mediator and outcome3; it thus invokes untestable assumptions even when the exposure is randomly assigned. When these risk factors are themselves affected by the exposure, then standard regression adjustment does not apply.4,5 In this article, I review G-computation and inverse weighting methods for accommodating this in random population samples. I discuss their limitations, in particular for handling quantitative mediators, and overcome them via sequential G-estimation in structural nested models. The proposed estimation strategy is intuitive, powerful, and easy to apply using standard software. I show that a small adaptation of the estimator makes it valid for the analysis of case–control studies when the disease prevalence is below 5%.

ESTIMATION IN COHORT STUDIES

Controlled Direct Effects

Suppose first that the data arose from a cohort study, in which the goal is to assess the part of the effect of an exposure X on a disease outcome Y that is not mediated by a given mediator M. Let Y(x,m) denote the disease outcome that would be observed for a given subject if, possibly contrary to fact, the exposure X were set to x and the mediator M to m.6 Then our goal is to measure

which expresses the average change in outcome when the exposure X is increased with x units while holding the mediator fixed at a specified level m. This is termed a controlled direct effect6 because it evaluates the exposure effect that would be realized if the mediator were controlled at the same level for every subject. For instance, (1) could denote the change in risk of breast cancer diagnosis that would be realized if all women were to use postmenopausal hormones (vs. not) and the frequency of mammography screening were controlled at m for all. Alternative definitions of direct effects have been proposed in the epidemiologic literature to allow for controlling the mediator at a more natural, subject-specific level.6–10 While the corresponding “natural” direct effects have a more meaningful interpretation in many applications, our focus in this article is on estimation of the controlled direct effect (Equation 1) because this can serve as a building block for natural direct effects,8,11 may be of interest in itself, and requires weaker assumptions for identification.

Various approaches for estimating the controlled direct effect have been considered in the literature, all suffering from a number of limitations. In the next section, I will briefly review these and then present a novel approach designed to overcome these limitations. I will do so first under the assumption that the association between exposure and outcome is not confounded, as would be the case if X was a randomly assigned exposure. This assumption is expressed through the absence of arrows entering X in the causal diagram12 of Figure 1A. I will relax this restriction later.

F1-14
FIGURE 1.:
A, Causal diagram (with U unmeasured); B, Causal diagram after removal of the mediator effect; C, Causal diagram after removal of the mediator effect and the direct exposure effect; D, Causal diagram after weighting by the reciprocal of the density of the mediator, given exposure and confounders.

Estimation of Controlled Direct Effects

It is common to infer a direct exposure effect by adjusting the association between exposure and disease for the mediator using standard regression adjustment or stratification. This is fallible because the mediator is a postexposure measurement, adjustment for which induces selection bias whenever it has prognostic factors L that are also associated with the disease outcome (Fig. 1A). Technically, this is because adjustment for a collider M (ie, a node in which 2 edges converge) along the path X→M←L←U→Y may render X and Y dependent along that path, and may thus induce a noncausal association.3 In practice, even when X is a randomly assigned exposure, one often cannot ignore the possible presence of prognostic factors of the mediator that are also associated with the outcome. I will assume throughout that all such factors L have been measured. Under this assumption, additional regression adjustment for L removes the suggested selection bias, unless some of these factors are affected by the exposure. Indeed, in the latter case, regression adjustment for the postexposure measurement L induces another selection bias. Technically, this is because regression adjustment for L, while needed to block the noncausal path X→M←L←U→Y in Figure 1A, may induce a noncausal association along the path X→L←U→Y where L now forms a collider. In many practical settings, especially those where some of the prognostic factors L arise later in time than the exposure itself, it is quite possible that L is itself affected by the exposure and, thus, that there is confounding of the association between mediator and outcome by a so-called causal intermediate (L) on the path from exposure to outcome. It thus follows that, even when all such intermediate confounders L have been measured, standard regression methods cannot be used to adjust for them.

Robins showed that this problem can be avoided by using G-computation,4 ie, by computing both counterfactual expectations in Equation (1) using the following identity:

This involves the following 3 steps:

1. First estimate the expected outcome at given values x for the exposure and m for the mediator by fitting a regression model for the outcome involving exposure, mediator, and intermediate confounders. For instance, fit model

using ordinary least squares estimation and obtain the fitted value E(Y|M = m, X = x, L = l) = γ0 + γmm + γxx + γll in expression (2).

2. Next, estimate the distribution of confounders L at given values x for the exposure (ie, f(L|X=x) in expression (2)). For instance, for a univariate, quantitative confounder L, it may be reasonable to assume that L is normally distributed for each given X, with mean

and constant variance, which can be estimated via ordinary least squares regression. When the expected outcome is linear in L, as in, (3) only specification of the mean (4) is required to evaluate (2).

3. Estimate the expected counterfactual outcome E{Y(x, m)} using identity (2). For instance, for models (3) and (4) one obtains

Evaluating this at x=0 yields E{Y(0, m)}. The direct causal effect (1) can now be evaluated as the difference between E{Y(x, m)} and E{Y(0, m)}. For the above models (3) and (4), this yields

G-computation is attractive in the above example and yields identical estimates as would be obtained using maximum likelihood estimation under linear structural equation models obeying the path diagram of Figure 1A. However, G-computation has a number of important shortcomings in more general settings. First, when the association between outcome and confounder L is linear after adjustment for the exposure and mediator, as in model (3), then G-computation requires regressing all confounders L on the exposure at the risk of bias when the corresponding regression models are misspecified. When this relationship is nonlinear (eg, due to interactions between several confounders L or due to nonlinear link functions in the working models), then the integration in (2) becomes tedious. In addition, this approach then requires modeling the joint distribution of L conditional on the exposure, which can be very cumbersome when there are many confounders.

Second, this approach does not allow for specifying substantive models of interest for the direct effect (1). Instead, it works indirectly by combining models for the expected outcome in function of exposure, mediator, and confounders and for the distribution of the confounders in function of the exposure. This implies that estimands of interest are difficult to capture in 1 single parameter, and thus that interesting research hypotheses become difficult to test.4

To address the latter limitation, Robins4 proposed to model controlled direct effects directly via the following additive direct effect model

This model assumes the direct exposure effect to be linear in the exposure and to be the same regardless of the level at which the mediator is controlled. More general models involving interactions between exposure and mediator are considered in the next section. Alternatively, one may choose to model both counterfactual expectations in (1) via a marginal structural model11,13 for a joint exposure (X, M), for instance

which implies (5). Both models (5) and (6) can be fitted via inverse probability weighting. Specifically, the consequence of weighting each subject's data by the reciprocal of the density of the mediator M, given L and X, is to remove the association between M on the one hand and X and L on the other hand. It thus removes the arrows pointing into M (Fig. 1D) and thereby also the indirect exposure effect. Since only a direct effect remains, the direct effect (1) can be estimated under the marginal structural model (6) via a weighted least-squares regression of outcome on exposure and mediator, using the above described weights.11 A similar (but slightly more involved) estimation strategy can be used for estimating the direct effect in model (5) without relying on a model for the mediator effect (ie, without relying on correct specification of the term γmm in model (6)).4,14

A drawback of both inverse weighting proposals is that the obtained estimators can behave quite erratically when the mediator is quantitative because subjects for whom the observed mediator value was not very likely (on the basis of their observed exposure and confounder level) receive a large weight in the analysis and may thus become very influential.14 By the same token, unstable effect estimates are typically obtained when exposure or confounders are highly predictive of the mediator (ie, when there is strong confounding). Vansteelandt and coworkers14,15 and Joffe and Green16 independently proposed the following 2-stage approach for estimating the direct effect parameter ψ, which avoids inverse weighting altogether:

1. First estimate the causal effect of mediator on outcome by regressing the outcome on mediator, exposure, and (intermediate) confounders. For instance, let γˆm be the ordinary least squares estimate of γm in model (3). Because the causal diagram in Figure 1A assumes that L, besides X, contains all prognostic factors of the mediator that are also associated with the outcome, the parameter γm in the corresponding model encodes the causal effect of mediator on outcome.

2. Next remove the effect of mediator on outcome by evaluating the residual outcome

The impact of this on the causal diagram of Figure 1A is roughly depicted as in Figure 1B (this manipulation of the causal diagram is shown here only for illustrative purposes, since it holds only w.r.t. specific parametric models). Since after removing the effect of mediator on outcome, only a direct effect remains, the direct effect parameter ψ can now be obtained by regressing Y−γˆm M on X as in

The resulting estimator for ψ will be called a sequential G-estimator.

When using linear working models, as in (3) and (4), the G-computation estimator is mathematically equivalent to the sequential G-estimator. The major advantage of sequential G-estimation relative to G-computation is that it is simple and enables direct modeling of the targeted direct effect. In the eAppendix (https://links.lww.com/EDE/A340), I demonstrate the validity of this approach, and describe how to obtain the standard error of this estimator. In the following 2 sections, I explain how to use sequential G-estimation under more general models than (5).

Exposure-Mediator Interactions

The interpretation of controlled direct effects is more natural in settings where the magnitude of the direct effect is the same regardless of the level at which the mediator is controlled, and thus regardless of whether it is controlled at a “natural” subject-specific level (rather than at the same level for all).8 It is therefore of interest to assess whether exposure-mediator interactions exist. This can be done by fitting the direct effect model given by

where ψxm encodes the degree to which the direct exposure effect depends on the level at which the mediator is controlled.

For estimating the direct effect parameters ψx and ψxm in this model, a more subtle sequential G-estimation analysis is warranted. First note that the interaction parameter ψxm in the direct effect model (7) equals

It thus measures the difference in mediator effect corresponding to different levels of X. Because the mediator effect is identifiable under the assumptions embodied in the causal diagram of Figure 1A, it can be directly estimated by regressing the outcome on mediator, exposure, and confounders, including the exposure-mediator interaction, as in model

The ordinary least squares estimate ψˆxm of the exposure-mediator interaction in this model is thus taken as the final interaction effect estimate of ψxm, so that only the direct effect ψx at m=0 remains to be determined. Since after removing the effect of mediator on outcome, only a direct effect remains, ψx can now be obtained as the regression slope in a regression model for the residual outcome Y−γˆmM−ψˆxmXM on X. When the model for the expected outcome E(Y|M, X, L) involves 3-way interactions between X, M and L, then estimating ψxm requires an additional model for the expected intermediate confounder, E(L|X), in function of X.

Confounding of the Exposure Effect on Outcome

In most observational studies, the association between exposure and outcome is itself confounded. When all confounders C for that association have been measured, then this can be accommodated by including C in all the above considered models. For instance, suppose that the direct exposure effect were linear in the exposure, with magnitude possibly depending on the confounder level C, as in

In this model, ψxc expresses the extent to which the direct exposure effect varies depending on C (if no interactions are anticipated, one could evidently assume that ψxc=0). Then the sequential G-estimator is easily extended as follows:

1. First estimate the causal effect of mediator on outcome by regressing the outcome on mediator, exposure and confounders, as in model

and let γˆm denote the maximum likelihood estimator (eg, the ordinary least squares estimator) of γm in that model.

2. The parameter γm in model (9) continues to encode the causal effect of mediator on outcome. One can thus remove the effect of mediator on outcome by evaluating the residual outcome

Since only a direct effect remains, the direct effect parameters ψx and ψxc can now be obtained by regressing Y−γˆmM on X and C as in

In the eAppendix (https://links.lww.com/EDE/A340), I describe sequential G-estimators that are valid under weaker modeling assumptions.

ESTIMATION IN CASE–CONTROL STUDIES

Suppose now that the data arose from a case–control design. In particular, suppose that subjects were sampled conditional on their disease status, with cases referring to subjects with disease (Y=1) and controls referring to subjects without disease (Y=0). Then inferring the direct effect of interest becomes more challenging and has, to the best of my knowledge, not previously been considered. In this section, I will develop inference for the multiplicative direct effect model

Here, the causal parameter exp(ψ) expresses the relative change in probability of disease when the exposure X is increased from 0 to x units while holding the mediator fixed at m.

The effect of the mediator on disease risk can be estimated through the odds ratio exp(γm) in the logistic regression model:

When the disease prevalence is low (as in most case–control studies, as this often forms the motivation for adopting this study design) the odds ratio approximates the relative change in disease risk corresponding to a unit increase in the mediator. In the eAppendix (https://links.lww.com/EDE/A340), I show that, in expectation,

therefore approximates the expected outcome that would be observed if the effect of mediator on outcome were removed (or more precisely, if the mediator were fixed at zero for all subjects). By further removing the direct exposure effect under model (11) (Fig. 1C), one obtains a residual outcome

which is no longer affected by X. The impact of this transformation on the causal diagram of Figure 1A is roughly as depicted in Figure 1C, which suggests that Yexp(−γmMψX) is independent of X; in the eAppendix (https://links.lww.com/EDE/A340) I show more formally that the independence holds in expectation. Substituting γm with the usual maximum likelihood estimator γˆm obtained by fitting model (12) via standard software, one can thus estimate ψ as the value for which this independence holds in the population. In the eAppendix (https://links.lww.com/EDE/A340), I show that this estimation principle continues to be valid in case–control studies. In particular, I show that approximately unbiased estimators can be obtained by solving the following equation

for ψ, where 0 denotes the sample average of the observed exposure levels in the controls. Here, I assume that 0 is representative of the average exposure in the population, which is reasonable when the disease prevalence is low. When the average exposure in the population is known, as would be realistic in some genetic association studies where X refers to a genetic marker coding, then this may be used instead of 0 to avoid making this assumption.

For dichotomous exposures X, solving (13) yields the following closed-form estimator of the relative risk

For more general exposures, numerical procedures must be adopted for solving equation (13) (see the annotated R-code in the eAppendix, https://links.lww.com/EDE/A340). In the eAppendix, I discuss alternative direct effect estimators that are approximately unbiased in case–control studies, explain how to adjust for measured confounders of the association between exposure and outcome, and how to obtain standard errors.

Simulation Study

I conducted a series of simulations to evaluate the performance of the sequential G-estimators in case–control studies. Motivated by a case–control study on the effect of alcohol consumption on coronary heart disease,2 described in the introduction, data were generated under the causal diagram of Figure 1A with X indicating drinking (1, drinker; 0, nondrinker), M indicating HDL cholesterol, L indicating smoking (1, smoker; 0, nonsmoker), and Y indicating coronary heart disease (1, case; 0, control). Parameter values were chosen in line with the literature. In particular, the percentage drinkers was assumed2 to be 82%, the percentage smokers was assumed2 to be 48% in nondrinkers and 58% in drinkers, HDL cholesterol was assumed normally distributed (conditional on smoking and alcohol) with an average 4 mg/dL difference between drinkers and nondrinkers2 and, likewise, between smokers and nonsmokers.17 I assumed an odds ratio18 of coronary heart disease of 2 for smokers versus nonsmokers (after adjustment for drinking and HDL cholesterol) and an average difference of 10 mg/dL HDL cholesterol between drinkers and nondrinkers. The data-generating models were chosen so that model (11) holds with ψ chosen equal to −0.394, so as to reproduce2 an odds ratio of coronary heart disease of 0.55 for drinkers versus nondrinkers. Further simulation details are provided in the annotated R code listed in the eAppendix (https://links.lww.com/EDE/A340).

Each simulation run generated 100 cases and 100 controls in a case–control design. Simulations were repeated for effect size ψ=0, for varying baseline prevalences between 0.75% and 15%, and for a larger samplesize of 250 cases and 250 controls. The results in Table 1 show that the sequential G-estimator is roughly unbiased at baseline prevalences of 5% and less, and increasingly biased at larger prevalences, except in the absence of a direct effect where the unbiasedness is maintained at larger prevalences. However, in cases where the intermediate confounders L are not affected by X and thus standard regression analysis taking into account the case–control design is valid, a larger bias is observed for the standard logistic regression results. This suggests that the magnitude of the bias in the sequential G-estimator is less than the bias typically encountered when approximating relative risks with odds ratios from a logistic regression analysis. Interestingly, virtually no precision is lost due to using the sequential G-estimator relative to logistic regression in these cases (ie, when the intermediate confounders are not affected by the exposure). Similar conclusions, but smaller bias, were obtained in larger sample sizes.

T1-14
TABLE 1:
Simulation Results for Case–Control Studies in Function of the Population Prevalence p and Sample Size n

I conducted a second simulation study to compare the sequential G-estimator with inverse probability weighted estimators in random population samples. Random samples of size n=200 or 500 were repeatedly taken with (X, L, M) generated as before and with Y following a normal distribution with mean −2.5 + 2X+βuU−0.05M and unit variance. Because the data-generating model for Y was no longer linked to the motivating example,2 the parameter βu was allowed to vary between 0 and −3 to represent varying degrees of intermediate confounding, with the multiple correlation coefficient R2yu between Y and U (conditional on X and M) correspondingly varying between 0 and approximately 0.7. In particular, 100 equally-distant values of βu were considered and 500 simulations were conducted for each value. Under the data-generating model, the linear direct effect model (7) is valid with ψx = 2 and ψxm=0. I then evaluated the sequential G-estimator, the ordinary least squares estimator and the inverse probability weighted estimator. The latter was obtained by regressing the outcome on exposure and mediator after weighting each subject's data by the stabilized weights:13

Here, f(M|X) and f(M|X,L) were chosen to be normal densities with linear mean and constant variance that were estimated from the data. I additionally evaluated the inverse probability weighted estimator obtained after truncating weights at 5 when they were larger than 5, and at 0.2 when they were below 0.2. The average estimates for the main effect ψx are displayed in Figure 2 (top row) for varying degrees of dependence between L and M, corresponding to a multiple correlation coefficient R2ml (conditional on X) of approximately 0.025 (left), 0.20 (middle), and 0.40 (right). They show that the sequential G-estimator is roughly unbiased and that the ordinary least squares estimator is biased whenever the association between Y and L is confounded by U (Fig. 1A). Interestingly, the inverse probability weighted estimator is more biased than the ordinary least squares estimator in small samples. The smaller bias of the inverse probability weighted estimator based on truncated weights suggests that this is partly related to weight instability. The fact that this bias reduces with larger sample sizes (Fig. 3, top row) suggests that this is additionally because inverse weighting by the density of a continuous mediator requires larger sample sizes. Figure 4 displays the relative efficiency of the various estimators in comparison with the ordinary least squares estimator. It reveals comparable precision of the sequential G-estimator and the ordinary least squares estimator, but a much larger variance of the inverse probability weighted estimators. Figure 2 (bottom) shows the mean estimates for the main effect ψx in function of the multiple correlation coefficient R2ml for varying degrees of dependence between U and Y, corresponding to a multiple correlation coefficient R2yu of approximately 0.16 (left), 0.33 (middle) and 0.42 (right). It supports the earlier conclusion that the sequential G-estimator is unbiased, unlike the ordinary least squares estimator, with important efficiency advantages (Fig. 4, bottom) over the inverse probability weighted estimator.

F2-14
FIGURE 2.:
Average estimates of the main direct effect at n = 200 (Sequential G-estimator, solid line; Ordinary least squares estimator, dashed line; Inverse probability weighted estimator, dotted line; Inverse probability weighted estimator with truncated weights, dashed-dotted line). Top row: in function of R 2 yu and corresponding to R 2 ml ≈0.025 (left), 0.20 (middle) and 0.40 (right); Bottom row: in function of R 2 ml and corresponding to R 2 yu ≈ 0.16 (left), 0.33 (middle) and 0.42 (right).
F3-14
FIGURE 3.:
Average estimates (top row) and relative efficiency of the main direct effect estimator versus the ordinary least squares estimator (bottom row) at n = 1000 (Sequential G-estimator, solid line; Ordinary least squares estimator, dashed line; Inverse probability weighted estimator, dotted line; Inverse probability weighted estimator with truncated weights, dashed-dotted line): in function of R 2 yu and corresponding to R 2 ml ≈0.025 (left), 0.20 (middle left) and 0.40 (middle right); in function of R 2 ml and corresponding to R 2 yu ≈0.42 (right).
F4-14
FIGURE 4.:
Relative efficiency (ie, ratio of the variances) of the main direct effect estimator versus the ordinary least squares estimator at n = 200 (Sequential G-estimator, solid line; ordinary least squares estimator, dashed line; Inverse probability weighted estimator, dotted line; Inverse probability weighted estimator with truncated weights, dashed-dotted line). Top row: in function of R 2 yu and corresponding to R 2 ml ≈ 0.025 (left), 0.20 (middle), and 0.40 (right); Bottom row: in function of R 2 ml and corresponding to R 2 yu ≈0.16 (left), 0.33 (middle), and 0.42 (right).

DISCUSSION

Inferring controlled direct effects requires adjustment for all confounders of the effect of mediator on outcome. When these confounders are themselves affected by the exposure, then standard methods are not applicable. In this article, I have reviewed G-computation and inverse probability weighting approaches to accommodate this, and have proposed sequential G-estimators as a valid, easy-to-obtain, and powerful alternative. Both G-computation and sequential G-estimation rely on correct specification of a model for the expected outcome in function of mediator, exposure and confounders. In contrast, the inverse probability weighted estimator relies on correct specification of a model for the distribution of the mediator in function of exposure and confounders (and possibly on correct specification of the causal effect of mediator on outcome). When the mediator is quantitative, then it seems that a model for the expected outcome will generally be easier to specify than a model for the mediator distribution. In addition, Goetgeluk et al14 demonstrate that sequential G-estimators (there referred to as “unweighted estimators”) are reasonably insensitive to misspecification of the working model for the expected outcome.

As demonstrated by the simulation studies, an important advantage of sequential G-estimators is that they are substantially more efficient and less biased than inverse probability weighted estimators, especially when the mediator is quantitative or has strong predictors. In addition, a minor modification of the estimation principle yields approximately unbiased direct effect estimators in case–control studies when the population prevalence is low, without requiring precise knowledge of the prevalence (see the eAppendix, https://links.lww.com/EDE/A340, for how to estimate direct effects when the prevalence is known). An important advantage of inverse probability weighted estimators is that the underlying estimation principle can be applied to essentially any type of statistical model, eg, logistic and Cox regression models. Sequential G-estimators remain to be extended to address settings with dichotomous/survival outcomes or repeated measurements on mediator and outcome.

Finally, the validity of all discussed direct-effect estimators relies on the assumptions encoded in the causal diagram of Figure 1A. Even in the ideal setting where the exposure X is randomly assigned, critical assumptions will usually remain. A first critical assumption is that all confounders L for the effect of mediator on outcome have been collected. With concern for the validity of this assumption, progress can be made when X is a randomly assigned exposure by relying solely on the assumption of initial randomization,19,20 at the expense of decreased precision and less robustness to misspecification of the causal model.16

A second critical assumption, which is often not easy to judge on subject-matter grounds, is that the mediator affects the outcome and not the other way around. With concern for feedback relationships, it is necessary to collect repeated measures of mediator and outcome. This calls for accompanying statistical methods. A third critical assumption embodied in Figure 1A is that (conditional on X) the relationship between the causal intermediate L and the mediator is causal. When there exist unmeasured risk factors U* of the mediator that are also associated with L, then the considered analyses remain valid provided that (conditional on X and M) the relationship between outcome Y and causal intermediate L is causal (ie, provided that U is absent). When, as is likely the case in practice, the relationships of L with both mediator and outcome are not entirely causal (ie, are themselves confounded), then all methods considered in this article are prone to some bias. This is because the adjustment for L then induces a spurious association between mediator and outcome through the path M←U*→L←U→Y so that the causal effect of mediator on outcome cannot be unbiasedly assessed. One should therefore ideally measure causal risk factors of the mediator. Simulation studies and theoretical developments are needed to gain insight into the extent of the suggested bias.

REFERENCES

1. Joffe MM, Byrne C, Colditz GA. Postmenopausal hormone use, screening, and breast cancer: characterization and control of a bias. Epidemiology. 2001;12:429–438.
2. Brenner H, Rothenbacher D, Bode G, Marz W, Hoffmeister A, Koenig W. Coronary heart disease risk reduction in a predominantly beer-drinking population. Epidemiology. 2001;12:390–395.
3. Cole SR, Hernán MA. Fallibility in estimating direct effects. Int J Epidemiology. 2002;31:163–165.
4. Robins JM. Testing and estimation of direct effects by reparameterizing directed acyclic graphs with structural nested models. In: Glymour C, Cooper G, eds. Computation, Causation, and Discovery. Menlo Park, CA: AAAI Press; 1999:349–405.
5. Rosenbaum PR. The consequences of adjustment for a concomitant variable that has been affected by the treatment. J Roy Stat Soc Ser A. 1984;147:656–666.
6. Pearl J. Direct and indirect effects. In: Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence. San Francisco, CA: Morgan Kaufmann; 2001:411–420.
7. Didelez V, Dawid AP, Geneletti S. Direct and indirect effects of sequential treatments. In: Proceedings of the 22nd Annual Conference on Uncertainty in Artifical Intelligence. Corvallis, OR: AUAI Press; 2006:138–146.
8. Petersen ML, Sinisi SE, van der Laan MJ. Estimation of direct causal effects. Epidemiology. 2006;17:276–284.
9. Robins JM, Greenland S. Identifiability and exchangeability for direct and indirect effects. Epidemiology. 1992;3:143–155.
10. Robins JM, Rotnitzky A, Vansteelandt S. Discussion on ‘principal stratification designs to estimate input data missing due to death.’ Biometrics. 2007;63:650–653.
11. VanderWeele TJ. Marginal structural models for the estimation of direct and indirect effects. Epidemiology. 2009;20:18–26.
12. Robins JM. Data, design, and background knowledge in etiologic inference. Epidemiology. 2001;12:313–320.
13. Robins JM, Hernán MA, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology. 2000;11:550–560.
14. Goetgeluk S, Vansteelandt S, Goetghebeur E. Estimation of controlled direct effects. J Roy Stat Soc Ser B. 2008;70:1049–1066.
15. Vansteelandt S, Goetgeluk S, Lutz S, et al. On the adjustment for covariates in genetic association analysis: a novel, simple principle to infer causality. Genet Epidemiol. 2009;33:394–405.
16. Joffe MM, Greene T. Related causal frameworks for surrogate outcomes. Biometrics. 2009;65:530–538.
17. Wilson PW, Garrison RJ, Abbott RD, Castelli WP. Factors associated with lipoprotein cholesterol levels. The Framingham study. Arteriosclerosis. 1983;3:273–281.
18. Robins JM, Greenland S. Adjusting for differential rates of prophylaxis therapy for pcp in high-dose versus low-dose zat treatment arms in an aids randomized trial. J Am Stat Assoc. 1994;89:737–749.
19. Ten Have TR, Joffe MM, Lynch KG, Brown GK, Maisto SA, Beck AT. Causal mediation analyses with rank preserving models. Biometrics. 2007;63:926–934.
20. Ten Have TR, Joffe MM, Lynch KG, Brown GK, Maisto SA, Beck AT. Causal mediation analyses with rank preserving models. Biometrics. 2007;63:926–934.

A Call for Nominations: The 2010 Rothman Epidemiology Prize

Epidemiology presents an annual award for the best paper published by the journal during the previous year. This prize of $3000 and a plaque goes to the author whose paper is selected by the Editors and the Editorial Board for its originality, importance, clarity of thought, and excellence in writing.

With this issue, we close our 2009 volume. We invite our readers to nominate papers published during the past year. Please e-mail your nominations to Allen Wilcox, Editor-in-Chief: [email protected]

Nominations must be received no later than 1 December 2009. The winner will be announced in our September 2010 issue and at the 2010 annual meeting of the American College of Epidemiology.

This award is made possible by an endowment from Hoffman-LaRoche Ltd., managed by the American College of Epidemiology.

Supplemental Digital Content

© 2009 Lippincott Williams & Wilkins, Inc.