data wars (can you trust Internet samples?)

Wednesday April 22, 2009

Filed under: politics,statistics — jackman @ 3:57 pm

Gary Langer, the Director of Polling at ABC News discusses the properties of “opt-in” Internet samples. His chief gripe: “you need a probability sample to compute sampling error” and so any opt-in Internet poll that reports a standard error is lying.

This is a really important issue, since internet polling is not going away: its too fast, too cheap, and can generate big samples in a hurry; there is a lot to be said for self-completion and presenting multimedia content to respondents; and hence Internet is very attractive relative to other modes. So look for a response from proponents of opt-in sampling in the near future.

Observation: all survey respondents “opt-in”. Would-be respondents (selected via random sampling or not) decide whether to respond or not, or can’t be reached at all. We then weight the data we get to try to deal with any resulting biases. The resulting standard errors should be computed taking the weighting into account (in almost all media polling I see, they are not, and the standard error is computed a la Stats 101 with the number of completed interviews in the denominator), but in any event, even the correct standard errors are conditional on the way the weights were computed. The Stats 101 “textbook purity” of “simple random sampling” has long been left behind…particularly given some of the horror stories you hear about RDD response rates.

So I tend to think the “you can’t trust opt-in Internet polls” line is something of a beat-up. Sure, there is work to be done in understanding the properties of data generated this way, and how to compute a standard error with these data. I don’t see this as an impossible hill to climb. It is critical that this work get done, because if/when we can get comfortable with the bias issues (and we know what the issues are), then I think its game over.

If/when the bias issue more or less neutralized, Internet will most likely kill RDD in terms of sampling variability due to the huge effective sample sizes; exactly how much will turn on how much of a hit Internet takes in sampling variability when making bias adjustments, but that is going to have be a huge hit in order for RDD to wind up dominating Internet (given relative costs, and the fact that RDD is taking a variance hit too in making bias adjustments).

Comments (3)

Comments are closed.

Powered by WordPress

Bad Behavior has blocked 1594 access attempts in the last 7 days.