beating up on opt-in Internet samples (again)
Thursday September 10, 2009
Gary Langer does it again, this time with supporting references to a paper by Jon Krosnick and 6 co-authors; Doug Rivers (finally!) replies and at length. Two of Krosnick’s co-authors are former students of mine; current students are thanked in the acknowledgements; like Krosnick, Rivers is my colleague — Stanford really is ground-zero in this debate.
Langer says:
I welcome any coherent theoretical defense of the use of convenience samples in estimating population values; it’s a debate we need to have.
And in his earlier post he said:
I have yet to hear any reasonable theoretical justification for the calculation of sampling error with a convenience sample.
Got one? Hit me.
Try this: model-based inference is an idea that has been around for a long time, and contrasts quite markedly with design-based inference for data generated by surveys. There is plenty written on this, but I’d suggest starting with a reasonably accessible book on sampling, like Sharon Lohr’s Sampling: Design and Analysis. Model-based inference for survey data is discussed in various places, typically in a “starred section” in each chapter (e.g., here’s how we can do design of and inference for cluster sampling from the model-based perspective, etc). The references provided by Lohr include important works by Basu and Royall etc. See also the delightful book called Combined Survey Sampling Inference by Ken Brewer — if you can get your hands on it. Doug Rivers pointed me to this book a year or two ago and it is a treat (as these things go).
As I’ve said before, as soon as non-response enters the picture we’re relying on models (e.g., what variables to use when weighting for non-response) and the “purity” of randomization in the sampling design is starting to fall by the wayside.
Social scientists and pollsters etc would seem to have a reasonable bead on design-based inference, if the current stridency about “probability samples” is anything to go by. Collectively, we’re ignorant about other approaches, although we’ve been making use of model-based ideas for decades (e.g., weighting to correct for non-response). Doug Rivers is going to be teaching all this stuff and more in his Winter quarter sampling class.



Maybe they could do an internet poll to decide if internet polling is OK ?
Seriously, I think the problems of dealing with the self-selected nature of the sample are not going to be resolved in a way that would give anyone any confidence.
Everyone self-selects into a survey, irrespective of how the sampling was done.
that’s true, but there may be a difference between “opt-out” (having to refuse an interviewer) and “opt-in” (deciding to join
a panel).
Neil: Is it nice to see someone using the word “may” when they make that observation; the stridency I hear on this issue is really amazing.
Note also that we would need there to be “differences” after we condition on observables: i.e., opting-out vs opting-in are different processes net of conditioning on age/gender/races/educ/etc/etc (jointly, whateverly).
And then, even after that, we have to ask are any remaining biases in one method vs the other offset by any efficiency gains
I concur.