jackman.stanford.edu/blog
observations on politics, statistics, computing...

Analysis of experiments

Wednesday May 31, 2006

Filed under: statistics — jackman @ 9:37 am

David Freedman was down yesterday for a talk in our MAPSS series. His points were pretty simple: (1) experiments are superior to observational studies if you want to estimate causal effects; (2) the default position should be to analyze experiments as experiments (i.e., simple comparison of means), rather than jamming in covariates and even worse, interactions between covariates and treatment status in regression type models. He also laid out a template for writing up a study that uses experimental data: Table 1, demonstrate that randomization worked, with differences of means/proportions on covariates plausibly related to the response by treatment status; Table 2, estimates/inference for treatment effects; then and only then might you start fritzing around with covariate-adjustment via modeling.

I wanted to hear more on what to do when randomization fails, which is pretty typical in social-science field experiments. David’s survey mentioned matching and IV etc, but he seemed to be rather pessimistic about what those approaches can offer.
I thought the prescriptions about how to analyze experimental data were a good message for our crowd to hear. Indeed, I briefly took part on a project where, when we didn’t find a statistically significant treatment effect, the principal investigator suggested running interactions between treatment status and covariates (looking for moderators and mediators, as they call this kind of beating-the-bushes exercise in psychology). David’s point was that this kind of “sub-group” analysis, if done long enough, will inevitably turn up something statistically significant (i.e., eventually you’ll find a subset of the data where treated differ from control at p < .05 or whatever): the better thing to do, of course, is to run another/different experiment in which you randomly assign inside that subgroup (and/or other subgroups of interest). Of course, this presumes knowing something about the problem a priori: i.e., that a particular subgroup is likely to be more responsive to treatment than some other subgroup.
And, frankly, I think that that is the issue confronting a lot of social-science work: even when we run randomized experiments (“wahoo, we’re real scientists now”), we’re not confident about likely effect sizes, the mechanism that generates the response conditional on receipt of treatment, and hence vague about where or for whom the treatment will be more or less efficacious. In turn, this can lead to the awful data analysis that can often follow a social-science experiment: e.g., ransacking the available covariates looking for statistically significant interactions with treatment, star-gazing your way through tables of regression output, falling over the finish line with a post-hoc story about a particular set of (statistically significant) findings, with no elaboration of effect size, no mention of the long and wretched search through model space to the reported specification… Flame off.

Comments (2)

2 Comments

  1. Hi, Simon. I’m surprised that you are so receptive to the message that modeling is bad and that interactions are an unnecessary luxury. In my experience, interactions are important, and it’s a real trap to think of the goal of an experiment as estimation of a single “theta.” See here for my longer take:
    http://www.stat.columbia.edu/~cook/movabletype/archives/2006/06/treatment_inter.html

    Comment by Andrew Gelman — Thursday June 15, 2006 @ 6:01 pm

  2. Lest I be misunderstood: I have no problem with interactions per se in the analysis of experiments, just with “searches” for them that aren’t particularly well motivated by theory. Just in the last month or two I published a study in the Journal of Politics with Paul Sniderman where we interacted our treatment with linear *and* quadratic terms in political sophistication (an important moderator of effects in political science, if ever there was one, plenty of theory/previous work to believe that responses to political stimuli are conditioned on a subject’s level of political sophistication etc). So, (1) I’m not opposed to interactions as a general matter; but (2) I do have problems with ransacking data sets looking for a corner of the data set where the treated differ from control (i.e., the researcher *knows* the treatment “works”, but couldn’t specify where in the data it works a priori, or why; otherwise the experiment might have been designed that way from the get go…?).

    And of course, since I am card-carrying Bayesian, I totally agree that when you’ve got good reasons to believe treatments effects vary over the treated, and how/why this happens, hierachical Bayesian modeling is a great way to go.

    Comment by jackman — Sunday June 18, 2006 @ 2:30 pm

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress

Bad Behavior has blocked 397 access attempts in the last 7 days.