OUP user menu

Replication and heterogeneity in gene×environment interaction studies

Marcus R. Munafò , Jonathan Flint
DOI: http://dx.doi.org/10.1017/S1461145709000479 727-729 First published online: 1 July 2009

There has been considerable interest in gene×environment (G×E) interactions in psychiatric genetics since studies by Caspi and colleagues (Caspi et al. 2002, 2003) suggested that these may illuminate the aetiology and genetic architecture of behavioural phenotypes. However, as with earlier candidate gene studies (Ioannidis et al. 2001), initial promise was subsequently followed by failures to replicate and increasing inconsistency in the rapidly growing G×E literature (Munafò et al. 2009a). One reason for this is that the greater number of potential statistical tests that are possible when interaction effects are included in any analysis (e.g. by investigating multiple markers of environmental exposures) greatly increases the risk of false positives in studies of this kind (Munafò et al. 2009a). However, another possibility is that there is genuine heterogeneity between studies, which reflects the sampling of different populations across which the genetic effects of interest may vary in strength or direction.

Laucht and colleagues (Laucht et al. 2009) report evidence that individuals homozygous for the L allele of the 5-HTTLPR displayed higher rates of depressive or anxiety disorders when exposed to environmental adversity, contrary to the original report by Caspi and colleagues in which possession of the S allele increased susceptibility to depression under stressful conditions (Caspi et al. 2003). Laucht and colleagues argue that a ‘possible source for the conflicting findings might be attributed to heterogeneity in depression phenotypes and environmental adversity’. An alternative and perhaps more parsimonious explanation is that their result is further evidence that the literature to date is compatible with chance findings, perhaps when seen through the filter of selective reporting of the most promising findings obtained from multiple unreported statistical tests.

A recent simulation study by Sullivan (2007) illustrates the ease with which potentially publishable ‘findings’ may be obtained from random data. Sullivan simulated a candidate gene association dataset and showed that in over 90% of cases some potentially publishable (i.e. nominally significant) correlation between a genetic variant and a phenotype might be obtained, given multiple possible groupings of genotype groups, multiple polymorphisms tested, and so on. Furthermore, Sullivan showed that in the majority of cases these ‘findings’ can be replicated, given a weak definition of ‘replication’, again using random data. It is noteworthy that this simulation focused on simple gene–disease association studies, and the degree of unreported multiple testing in G×E studies is expected to be greater still.

It is of course theoretically possible that genetic associations (and, by extension, G×E effects) might depend on population structure, which may be specific to the sample analysed, or on the phenotype measure used. Indeed, there are precedents for this, where the widely reported association between the 5-HTTLPR and anxiety-related traits has been suggested to depend on the specific measurement instrument employed (Munafò et al. 2009b). However, few robust gene–disease associations differ as a function of sample ancestry (Ioannidis et al. 2004), and many investigations of apparent sex differences in genetic associations appear to be conducted post hoc (Patsopoulos et al. 2007), highlighting again the potential for multiple statistical tests to be conducted on datasets of this kind.

Can we discriminate between genuine heterogeneity which may inform and advance our understanding of the aetiology of these phenotypes, and stochastic variation across studies which reflects nothing more than variation that might be expected to be observed by chance (and subsequently filtered through various reporting and publication biases)? Clayton & McKeigue (2001) argue that, while it is theoretically possible that G×E studies might afford increased statistical power, by allowing us to focus on specific subgroups where any genetic effects might be expected to be largest, this is difficult in practice because it is rare that we will have strong a priori reasons for expecting any effect to be restricted to any specific subgroup. However, in the case of an initial finding we might specify a subgroup on the basis of this finding in our subsequent attempts at replication. That is, following the initial report of an interaction between 5-HTTLPR genotype and stressful life events on risk of subsequent depression, we might design our replication studies to focus exclusively on individuals exposed to stressful life events (defined in the same way as in the original study). This would substantially reduce the multiple statistical testing burden.

However, it might be argued that this approach would substantially restrict the scope for exploratory analyses in subsequent analyses, and might prevent existing datasets from being used if they did not include variables sufficiently comparable to those employed in the original report. One therefore needs to ask whether the potential gain from such exploratory analyses is offset by the potential to flood the literature with inconsistent findings which may serve to confuse to a greater degree than they inform.

We argue that, at present, the balance has shifted too far in the direction of publishing nominally significant findings with relatively little regard for whether these represent true insight. In our view, focusing on nominal statistical significance, rather than the exact nature of any observed interaction effect and the presence or absence of corresponding main effects, can be misleading and lead to unwarranted claims of replication (Munafò et al. 2009a). Moreover, if these effects are opposite to those predicted, this is frequently explained as reflecting genuine heterogeneity, and the other possibility that these findings are compatible with chance is rarely explored in depth, as the paper by Laucht and colleagues (2009) illustrates.

We have recently reported evidence of various factors associated with bias in genetic association studies (Munafò et al. 2008, 2009c). It is likely that subtle factors serve to influence the reporting of scientific studies (Martinson et al. 2005), and in ‘hot’ scientific fields where there is substantial flexibility in study design there is perhaps greater scope for these factors to play a role (Ioannidis, 2005). This flexibility is illustrated in the differing ways in which G×E interactions effects can be represented graphically, which can lead to surprisingly different subjective impressions. In the original Caspi report (Caspi et al. 2003), stressful life events were presented on the abscissa and genotypes were represented by separate lines. In the study by Laucht and colleagues the reverse is true. When the latter data are plotted in the same format as in the Caspi report, a different impression is conveyed of the underlying interaction effect (Fig. 1). In particular, among those exposed to environmental adversity, the depression score of the SS genotype group is intermediate between that of the LL and SL groups.

Fig. 1

Interaction effect of 5-HTTLPR and environmental adversity on Beck Depression Inventory (BDI) score. (a) Data reported by Laucht and colleagues (2009), as represented in the original report and (b) in the format used in the original report by Caspi and colleagues (2003). In addition to the opposite finding of greater depression score associated with LL homozygosity compared to the original report, the SS genotype group has an intermediate depression score relative to the SS and LL groups.

In our view, transparency in the reporting of genetic association studies is critical, in particular in G×E studies, so that the background rate of statistical testing (i.e. how many genes, polymorphisms, phenotypes, etc. were investigated) can be ascertained, and summary statistics incorporated into subsequent quantitative reviews. While there may be subpopulations within which the genetic architecture of complex behavioural and psychiatric traits may differ, there is minimal evidence for this to date. It is also worth considering whether any genetic effect which depends heavily on the measurement instruments used will ever be of substantial clinical importance.


Jonathan Flint is supported by the Wellcome Trust.

Statement of Interest



  • Focus on: Laucht et al. (2009). Interaction between the 5-HTTLPR serotonin transporter polymorphism and environmental adversity for mood and anxiety psychopathology: evidence from a high-risk community sample of young adults.