GRE standard errors of measurement
Friday October 30, 2009
I’m speaking at a conference at Berkeley sponsored by the American Political Science Association on “Democracy Audits and Governmental Indicators”. In getting some remarks together — on the the reliability of country-level measures of democracy etc — I wanted to compare the performance of measures of democracy against things like GRE scores, legislative ideal points.
ETS has a background document providing some technical data on GRE scores. The standard deviation of GRE-V scores issued in the 2003-08 period is 121 points, while the GRE-Q scores have a standard deviation of over 150 points. The standard errors of measurement are pretty small, relative to this cross-subject variation in the scores, and surprisingly uniform over the range of scores.
Usually you get a U-shape relationship between standard errors of measurement (or — if you are a Bayesian — standard deviations of marginal posterior densities of latent scores) and the scores; we have greater uncertainty about test subjects in the tails of the ability distribution, since the test items tend to be less informative about those subjects (as they rack up a lop-sided pattern of right/wrong answers).
The administered-by-computer, adaptive, version of the GREs helps smooth out that U-shape, with the computer administering items that have “cut-points” close to the running estimate of the subject’s ability.
To look at this I plotted the “conditional standard errors of measurement” for the GREs (as reported by the ETS) against scores; see below.
There is something of an inverted U, which is weird. We’re actually getting less precision in the middle of the scales than in the tails. The other thing is that we’ve got standard errors of measurement that are about 20%-35% of the between-subject score variation, which tails away to about 5-15% in the upper tails.
I wish those standard errors of measurement were smaller, and that is really only a function of the length of the test, given that ETS has near-perfect knowledge of the item parameters. So, does the GRE need to be longer?




.