On the Roy Morgan research website there are often these interesting remarks from Gary Morgan about methodology. For instance, in a report of two polls (one face-to-face, the other phone):
For the academics: Telephone polls have two inherent biases:
1. The sample (by design) only includes those who have telephones (approximately 1% bias toward L-NP).
2. The sample achieved has a much lower response rate than face-to-face interviews (approximately twice as many people refuse to answer as refuse face-to-face). The ‘bias’ caused by this rate is less ‘tangible’ and ‘predictable’. Historically, Roy Morgan Research has evidence that supporters of the party that is ‘out of favour’ tend to be over represented among those who ‘refuse’ to be interviewed. However, a compelling event or news story can generate a desire for people to ‘have their say’ — so create its own bias.
Lets ignore the “oh, if you must” sentiment underlying the phrase “For the academics”: the issues here aren’t moot, or solely of interest to the Scholastics.
Surely post-stratification weighting can take care of these likely sources of bias? That is, once I weight on, say, age/region/gender/education, then I’d tend to think differences in sample composition (phone vs face-to-face) can be wiped out (i.e., after you tell me your age, region, gender, education, landline vs mobile-only has no predictive power for your vote intention, meaning that conditional on those demographic characteristic, the phone/mobile thing is said to be ignorable). And that kind of post-stratification might also wipe out any of the non-response bias GM is alluding to in his point #2.
Design effects. Buried at the bottom of Morgan’s reports you’ll also routinely see the following:
The margin of error to be allowed for in any estimate depends mainly on the number of interviews on which it is based. The following table gives indications of the likely range within which estimates would be 95% likely to fall, expressed as the number of percentage points above or below the actual estimate. The figures are approximate and for general guidance only, and assume a simple random sample. Allowance for design effects (such as stratification and weighting) should be made as appropriate.
To Morgan’s credit, this reference to design effects is one of the only places I’ve seen a non-academic pollster refer to the fact that the standard calculations of marginals of error are based on the assumption of simple random sampling (which is almost never true for polling in the real world); the actual confidence intervals are wider, depending on how hard one is weighting the data (see my earlier crack at this here).
What would be nice is some assessment of just how big and variable the weights are, such that the knowledgable poll consumer might actually be able to compute the right confidence intervals. Right now, the reference is design effects is kind of like a “your mileage may vary” disclaimer you get from a car-maker.