jackman.stanford.edu/blog
observations on politics, statistics, computing...

beating up on opt-in Internet samples (again)

Thursday September 10, 2009

Filed under: statistics — jackman @ 10:19 pm

Gary Langer does it again, this time with supporting references to a paper by Jon Krosnick and 6 co-authors; Doug Rivers (finally!) replies and at length. Two of Krosnick’s co-authors are former students of mine; current students are thanked in the acknowledgements; like Krosnick, Rivers is my colleague — Stanford really is ground-zero in this debate.

Langer says:

I welcome any coherent theoretical defense of the use of convenience samples in estimating population values; it’s a debate we need to have.

And in his earlier post he said:

I have yet to hear any reasonable theoretical justification for the calculation of sampling error with a convenience sample.

Got one? Hit me.

Try this: model-based inference is an idea that has been around for a long time, and contrasts quite markedly with design-based inference for data generated by surveys. There is plenty written on this, but I’d suggest starting with a reasonably accessible book on sampling, like Sharon Lohr’s Sampling: Design and Analysis. Model-based inference for survey data is discussed in various places, typically in a “starred section” in each chapter (e.g., here’s how we can do design of and inference for cluster sampling from the model-based perspective, etc). The references provided by Lohr include important works by Basu and Royall etc. See also the delightful book called Combined Survey Sampling Inference by Ken Brewer — if you can get your hands on it. Doug Rivers pointed me to this book a year or two ago and it is a treat (as these things go).
As I’ve said before, as soon as non-response enters the picture we’re relying on models (e.g., what variables to use when weighting for non-response) and the “purity” of randomization in the sampling design is starting to fall by the wayside.

Social scientists and pollsters etc would seem to have a reasonable bead on design-based inference, if the current stridency about “probability samples” is anything to go by. Collectively, we’re ignorant about other approaches, although we’ve been making use of model-based ideas for decades (e.g., weighting to correct for non-response). Doug Rivers is going to be teaching all this stuff and more in his Winter quarter sampling class.

Comments (5)

5 Comments

  1. Maybe they could do an internet poll to decide if internet polling is OK ?

    Seriously, I think the problems of dealing with the self-selected nature of the sample are not going to be resolved in a way that would give anyone any confidence.

    Comment by Ken — Thursday September 10, 2009 @ 10:50 pm

  2. Everyone self-selects into a survey, irrespective of how the sampling was done.

    Comment by jackman — Friday September 11, 2009 @ 5:31 am

  3. that’s true, but there may be a difference between “opt-out” (having to refuse an interviewer) and “opt-in” (deciding to join
    a panel).

    Comment by neil — Friday September 11, 2009 @ 8:19 pm

  4. Neil: Is it nice to see someone using the word “may” when they make that observation; the stridency I hear on this issue is really amazing.

    Note also that we would need there to be “differences” after we condition on observables: i.e., opting-out vs opting-in are different processes net of conditioning on age/gender/races/educ/etc/etc (jointly, whateverly).

    And then, even after that, we have to ask are any remaining biases in one method vs the other offset by any efficiency gains

    Comment by jackman — Saturday September 12, 2009 @ 4:10 am

  5. I concur.

    Comment by neil — Thursday September 24, 2009 @ 10:05 pm

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress

Bad Behavior has blocked 397 access attempts in the last 7 days.