Saturday May 19, 2012
As one of the PIs of the 2012 ANES, I gained some exposure to the nitty-gritty of how area probability samples work in practice. We’re using an ABS-frame (the USPS Delivery Sequence File), which we will supplement with some field enumeration in Census tracts where the DSF is thought to be subject to a reasonable amount of under-coverage.
What I’ve learned thus far:
(2) This book really is for practitioners, with large sections devoted to what actually occurs in the field, how to walk around a block, listing addresses, etc. Its odd to read this stuff. I mean the rubber does have to hit the road at some point. But so much of it seems a little, well, folksy and even ad hoc, unless I’m missing other parts of the book where the underlying rationales are more rigorously explicated. I guess it has to be that way, when you are trying to keep things simple for the non-statistician field workers.
(3) Take this, the case of how to augment a listing of dwellings when the field worker encounters dwellings not on the list (in our case this would be finding dwellings not on the DSF adjacent to a dwelling sampled from the DSF). From pp341-2 of Survey Sampling (JPGs below are clickable thumbnails), something of a “how-to” guide for the Half Open Interval procedure:
Take all unlisted dwellings if there are less than 5 of them? What is special about 5? If 5 or more, “write the office quickly” (presumably today, you’d call) and “wait for instructions”. And what, exactly, will those instructions be?
I’m sure there is some well-worked out basis for these recommendations somewhere, perhaps elsewhere in the book. At p56 Kish says that the “missed [but discovered] elements receive the same probability of selection as the pre-specified unique listings”.
Ok, but might you get too many unlisted dwelling this way? Interviewer workload becomes an issue then, which where I guess “no more than 5” might come from.
But could you exploit whatever prior information about DSF under-coverage specific to the locality you’re working in? And at that point I guess you might be stratifying dwellings in a given geographic unit into listed and unlisted and heading towards a dual frame design etc.
Sub-sampling seems another idea: e.g., the design calls for r attempted interviews in a given locale. We sample r listed dwellings in the locale from, say, the DSF; field enumeration adds k to the frame around the listed r, we attempt interviews at r SWOR from the r+k? This keeps the IWR workload down to r attempted interviews and the selection probabilities are “known”.
The literature on snowball or “respondent-driven” sampling in social network land must have some relevant ideas here too. Hitting r listed dwellings and then looking around for unlisted dwellings seems a lot like what goes on with sampling on networks for “hidden” populations etc.
Finally – I have to note that this stuff really is probably 2nd order at best. We’re doing our best on the design for the in-person components of ANES 2012, I think. But there is this big scary monster out there, waiting for us in the Fall when we go into the field, and its name is non-response. As a source of bias this has to be 10x what we’re looking at from DSF under-coverage.