jackman.stanford.edu/blog
• 114th U.S. Senate
• ideal point estimates svg pdf csv 12/29/15
• scatterplot against 2012 Obama vote share svg pdf
• roll call object: RData
• 114th U.S. House
• ideal point estimates svg pdf csv 4/29/16
• scatterplot vs Obama vote share svg pdf
• roll call object: RData
• Bayesian Analysis for the Social Sciences Wiley; Amazon; errata as of 3/6/15

## Saturday May 19, 2012

Filed under: ANES,statistics — jackman @ 5:51 pm

As one of the PIs of the 2012 ANES, I gained some exposure to the nitty-gritty of how area probability samples work in practice. We’re using an ABS-frame (the USPS Delivery Sequence File), which we will supplement with some field enumeration in Census tracts where the DSF is thought to be subject to a reasonable amount of under-coverage.

What I’ve learned thus far:

(1) Kish’s Survey Sampling remains something of a bible for practitioners.

(2) This book really is for practitioners, with large sections devoted to what actually occurs in the field, how to walk around a block, listing addresses, etc. Its odd to read this stuff. I mean the rubber does have to hit the road at some point. But so much of it seems a little, well, folksy and even ad hoc, unless I’m missing other parts of the book where the underlying rationales are more rigorously explicated. I guess it has to be that way, when you are trying to keep things simple for the non-statistician field workers.

(3) Take this, the case of how to augment a listing of dwellings when the field worker encounters dwellings not on the list (in our case this would be finding dwellings not on the DSF adjacent to a dwelling sampled from the DSF). From pp341-2 of Survey Sampling (JPGs below are clickable thumbnails), something of a “how-to” guide for the Half Open Interval procedure:

Take all unlisted dwellings if there are less than 5 of them? What is special about 5? If 5 or more, “write the office quickly” (presumably today, you’d call) and “wait for instructions”. And what, exactly, will those instructions be?

I’m sure there is some well-worked out basis for these recommendations somewhere, perhaps elsewhere in the book. At p56 Kish says that the “missed [but discovered] elements receive the same probability of selection as the pre-specified unique listings”.

Ok, but might you get too many unlisted dwelling this way? Interviewer workload becomes an issue then, which where I guess “no more than 5” might come from.

But could you exploit whatever prior information about DSF under-coverage specific to the locality you’re working in? And at that point I guess you might be stratifying dwellings in a given geographic unit into listed and unlisted and heading towards a dual frame design etc.

Sub-sampling seems another idea: e.g., the design calls for r attempted interviews in a given locale. We sample r listed dwellings in the locale from, say, the DSF; field enumeration adds k to the frame around the listed r, we attempt interviews at r SWOR from the r+k? This keeps the IWR workload down to r attempted interviews and the selection probabilities are “known”.

The literature on snowball or “respondent-driven” sampling in social network land must have some relevant ideas here too. Hitting r listed dwellings and then looking around for unlisted dwellings seems a lot like what goes on with sampling on networks for “hidden” populations etc.

Finally – I have to note that this stuff really is probably 2nd order at best. We’re doing our best on the design for the in-person components of ANES 2012, I think. But there is this big scary monster out there, waiting for us in the Fall when we go into the field, and its name is non-response. As a source of bias this has to be 10x what we’re looking at from DSF under-coverage.

Comments Off on under-coverage bias and Kish on field enumeration for area-based samples