jackman.stanford.edu/blog
• Bayesian Analysis for the Social Sciences Wiley; Amazon; errata as of 5/23/13

• 113th U.S. Senate
• ideal point estimates pdf csv 4/15/14
• scatterplot against 2012 Obama vote share pdf
• roll call object: RData
• 113th U.S. House
• ideal point estimates pdf csv 4/16/14
• scatterplot vs Obama vote share pdf svg
• roll call object: RData

## Thursday May 30, 2013

Filed under: Australian Politics,R,statistics — jackman @ 1:04 pm

I’ll be contributing a piece about once a week for the Guardian Australia, under a part of the web site we’re calling The Swing.

The set of graphs from my 1st effort were rendered in-line and rather low-res.

Bigger, full res versions appear below; click on the in-line versions.

It would be great to find a way to quickly make nice, web-friendly graphs out of R. Vega looks like a reasonable wrapper to d3. Datawrapper.de just doesn’t give me enough control over annotations, axes etc… I’m also looking at Rickshaw. Life is short, beautiful graphics are hard, sometimes…

## Friday August 24, 2012

Filed under: R,statistics — jackman @ 7:00 am

From one of the R lists I follow:

Today (2012-08-23) on CRAN [1]:

“Currently, the CRAN package repository features 4001 available packages.”

These packages are maintained by approximately 2350 different folks.

Previous milestones:

2011-05-12: 3,000 packages [1]
2009-10-04: 2,000 packages [2]
2007-04-12: 1,000 packages [3]
2004-10-01: 500 packages [4]
2003-04-01: 250 packages [4]

[1] http://cran.r-project.org/web/packages/
[2] https://stat.ethz.ch/pipermail/r-devel/2009-October/055049.html
[3] https://stat.ethz.ch/pipermail/r-devel/2007-April/045359.html
[4] My private in-house data.
[5] http://cran.r-project.org/web/checks/check_summary_by_maintainer.html

/Henrik

PS. This count includes only packages on CRAN. There are more
packages elsewhere.

## Wednesday April 4, 2012

Filed under: Australian Politics,R,statistics — jackman @ 12:40 am

Labor won 15 of Queensland’s 29 House of Reps seats in the 2007 Federal election (AEC details here). Yet just three years later, in the 2010 Federal election, Labor won only 8 of 30 Queensland Reps seats, with 33.6% of 1st preferences (a swing of -9.3 percentage points).

Labor’s best performance on 1st preferences in 2010 was in Capricornia (46%), which translated into a 54-46 2PP result. Kevin Rudd won Griffith with 44% of 1st preferences, resulting in a 58-42 2PP result. Wayne Swan and the LNP candidate split the 1st preferences in Lilley, 41-41, with Swan winning the seat with Green preferences, 53-47 2PP. Labor managed to get home in Moreton in 2010, with 36% of the 1st preference vote, and a 51-49 2PP result.

The state election of some 10 days ago was conducted under different district boundaries (89 seats in the Queensland parliament) and a different electoral system (optional preferential). Moreover, the Katter Australia Party ran candidates in 76 seats, winning 11.5% of 1st preferences, further complicating comparisons with previous elections (state or federal). In any event, Labor won about 26.7% of 1st preferences (ECQ results), down 6.9 percentage points from its performance in the 2010 Federal election, and down a staggering 15.6 percentage points from the 2009 state election.

How might these 2012 state-level results translate into Federal results?

There are many different ways of looking at this, all of which involve a little guesswork and assumptions given the differences in the two electoral systems, the configuration of parties and so on.

Here’s a stab that I’ve been working on over the last week or so (“Spring Break” here at Stanford). The AEC conveniently (!) geo-codes its polling places and publishes that data on its web site. Shape files for Federal electorates are also available. This makes it feasible to start re-aggregating booth-level results from the state election up to Federal seats.

A few steps and assumptions are required (and I’ll write this up at some point):

• Parse the ECQ’s XML presentation of the 2012 state election results; I used the XML package in R. By the way, it is terrific that both AEC and ECQ put the XML’d version of their results up in real time; reasonably sane schema, relatively easy to parse, etc.
• geo-code the state polling places. ECQ doesn’t put lat/lons of its polling places up on its web site, at least not that I could find. I thought about hacking its Google maps overlay javascript, but that was beyond me, and the maps there seemed to only provide a rough guide as to the actual locations of ECQ polling places.
• My next move was to recall that there is tremendous overlap between state and Federal polling places, at least in metro areas. I wrote some code to look for matches between the strings describing state and Federal polling places. I also wrote some code that asked the Google maps API to return lat/lons of the addresses associated with each state polling place, which turned out to be quite imprecise once you get away from metro areas. But between Google and the AEC geo-codes, I was able to come up with usable geo-codes for 2,100 ECQ polling places (all of the ECQ’s actual polling places). I performed more than a few sanity checks and manual corrections on the geocodes (“visiting” many Qld schools and community halls in Google maps), and actually corrected some of the AEC geocodes too. It is then straightforward to map these geocoded state booths into Federal electoral divisions using functionality in the sp package in R.
• In the 2012 Queensland state election, only 75.7% of ballots were cast at actual polling places on Election Day (ECQ). The remaining ballots were cast using a variety of methods: pre-poll votes, postal votes and Election Day absentees being the three most used methods. Fun fact: 41.2% of Burleigh’s ballots were cast this way, the most of any QLD electorate. I allocated these (state-level) non-standard votes to Federal seats in proportion to the spread of the state seat’s regular, polling-place votes across Federal seats (fun facts: the state seats of Algester, Everton, Maryborough and Springwood each take in 4 Federal seats; 25 of the 89 QLD state seats lie wholly within one of Qld’s 30 Federal seats).
• There is perhaps a little more work to do refining the way I handle state booths that lie outside but very close to a particular Federal seat, say, where that booth is also used in Federal elections and for the Federal seat in question. That is, the AEC is telling us that we’ve got a polling place outside the electorate boundaries; surely some (all?) of the state votes cast at that booth should count towards the estimate we make for the “logical” Federal seat, not the “physical” Federal seat. Some of these booths serve multiple Federal seats, suggesting some kind of proportional allocation heuristic. I’m yet to do this last bit of fiddling; life is short and Spring Break is over…
• Turnout! No one ever talks about this. But get this. The ECQ has 2,468,290 ballots cast, corresponding to 89.9% turnout (2,746,844 total enrolled). In the 2009 state election turnout wound up being 91.0%. In the 2010 Federal election turnout was 92.8% (2,521,574 ballots; 2,719,360 enrolled), down 1.6pp from 2007 (by the way). But the point is that state-level turnout trails Federal by about 2 to 3 percentage points. You wonder about the partisan leanings of those voters not turning out in state elections, but coming out for the Federal election.
• I also wonder how much any effect here might be offset by the differences in informality state to Federal, OPV to full preferential. 5.5% of House votes cast in QLD in the 2010 Federal election were informal; the corresponding figure for the 2012 state election (OPV) is just 2.5%.

So what do you get when do this re-aggregation, subject to all the caveats sounded above? Keep in mind I only have 1st preferences, at least for now.

The figure below (click for full-size) shows a scatterplot of imputed Federal results for the ALP given the 2012 state results, for each of Queensland’s 30 Federal seats, against the ALP’s actual 1st preference vote share (%) recorded in the 2010 Federal election. The diagonal line is a 45 degree line, a “no difference” line. On average, the data points lie below the diagonal, indicating what we know, that Labor did considerably better in the 2010 Federal election than in the 2012 state election.

Red dots and labels indicate the 8 seats won by Labor in 2010. The good news (!?) for Labor is that the Federal seats in which its primary vote utterly cratered are seats in which it had no chance of winning in the 1st place, where its 2010 1st preference vote share was below 30% or barely above 30% (e.g., Wide Bay, Maranoa, Fairfax, Wright, Fisher, Hinkler).

The bad news for Labor is that it would seem that most of its 8 Federal, Queensland seats are at some peril, with the exceptions perhaps being Griffith (Rudd’s seat), and maybe Rankin (Craig Emerson) and Oxley. The estimated ALP 1st preference vote share given the 2012 state results in these 3 seats lies above the actual ALP 1st preference recorded in Moreton in 2010, which was Labor’s weakest among the 8 seats it won in 2010 (and observe the many assumptions implied in that extrapolation).

Lilley — Swan’s seat — will be interesting. I grew up in Lilley on Brisbane’s northside. When Labor is really on the nose, it goes to the Coalition. Swan lost the seat in 1996 in his sophomore election, but has held it since 1998. I’m not sure the last redistribution helped, and its tough to see Labor win it if its primary vote share slips below 35%. Complicating factors are what role might the Katter party play, as well as some kind of “personal vote” for Swan (an incumbent Federal Treasurer, no less).

I also show the implied swings given by these estimates of ALP 1st preference vote share (bigger version available by clicking):

This presentation of the data highlights that Griffith (Rudd’s seat) has the smallest implied swing among Labor’s 8 seats, around about 5 percentage points. Coupled with the fact that Rudd starts off at a tolerable level of 1st preference support, this bolsters confidence that Griffith remains Labor’s best shot at a “retain” in 2013.

The implied swing in Moreton is only a little larger, but there is far less buffer there. Swings of -7 to -8 percentage points on 1st preferences in Lilley, Rankin and Oxley would have to be almost surely fatal to Labor’s chances there. And double digit swings in Petrie, Blair and Capricornia would also have be beyond the margin of survival.

Could Rudd be the last (QLD, Labor) one standing?

## Saturday October 15, 2011

Filed under: computing,R,statistics — jackman @ 5:01 pm

Update to my pscl package, now on CRAN.

Biggest change: fixing a bug in the way MCMC draws for item parameters were being stored and summarized by ideal.

## Wednesday October 12, 2011

Filed under: R,statistics — jackman @ 3:03 pm

Impressive.

You are not alone!

## Wednesday July 13, 2011

Filed under: Australian Politics,R — jackman @ 3:23 pm

The header of my blog (above) shows the latest prices on offer in some of Australia’s election betting markets.  I convert the prices to an implied probability of ALP win (factoring out the bookie’s profit margin, the so-called “overround”).

I’m using some Javascript by John Resig to make Tufte-ish sparklines, although the Google version of sparklines looks easy to work with too. I’m using some R to generate PNG files plotting the last 72 hours of data.

Time-series graphs appear as PDFs too, again see the header of the blog.

On the data themselves, the betting markets have been moving in a pro-Coalition direction over the last two weeks, with some movement around the time that recent polls have been released, showing that the Coalition would romp home.  I think we’re still waiting on some post-carbon-tax polling, and how the betting markets digest that.

## Wednesday June 29, 2011

Filed under: politics,R,statistics — jackman @ 11:09 am

Now that classes are over, I took a little time to update my scripts that update the analysis of Congressional roll calls in close to real time.   Links appear at the top of the blog.   As of about 15 minutes ago, we’re up to 77 non-unanimous roll calls in the 112th Senate.   The House has 474 non-unanimous roll calls under its belt.

I’m presenting estimates of legislators’ “ideal points” and 95% credible intervals (from a model that fits just a single underlying dimension to the roll calls) both graphically (House/Senate) and in CSV.  I also present scatterplots (and loess smoothing) of the estimated ideal points against a crude (but useful) measure of preferences in the legislators’ district/state, Obama vote share in the 2008 election (House/Senate). I’ve also got a SVG with rollovers for the dense House scatterplot, using the RSVGTipsDevice package, but the resulting SVG breaks in Chrome.

I’m scraping the roll calls and some meta data from the House and Senate sites, using the parsing in R’s XML package (which I’m finally understanding how to use effectively).   Analysis of the roll calls is via the ideal function in (my) R package, pscl.

Quite aside from the methodology/technology, the substantive story is very much business as usual: zero partisan overlap in the recovered ideal point estimates. About 1 to 1.5 standard deviations of the ideal point distribution separate the ideal points of Democrats and Republicans among districts/states that split 50-50 Obama/McCain in 2008.

The other striking feature of the data is how few Democrats remain in the 112th House in districts where McCain beat Obama: I count 12 such seats.

## Monday June 13, 2011

Filed under: R — jackman @ 3:53 pm

Sweave source for the poll report for those who expressed some interest.

You’ll also need this file of R function definitions, utilities.R.

I also wrote a little shell script that calls Sweave and xelatex etc, hacking the Sweave.sh script that ships with R.

## Thursday June 2, 2011

Filed under: Australian Politics,R,statistics,type — jackman @ 7:00 am

With Lynn Vavreck at UCLA, I ran parallel public opinion surveys in Australia and the United States, measuring attitudes on security, the fight against terrorism, the wars in Afghanistan etc, some 10 years after the 9/11 attacks.

Full report here (generated with Sweave, xelatex, etc).

## Wednesday April 20, 2011

Filed under: Apple,computing,R — jackman @ 7:56 pm

So the web lit up a little today with news that iPhones are collecting time-stamped location data, and in a form that isn’t particularly hard to look at (and even with some nice apps to make animated maps of your travels etc):

The database is SQLite, and I used R (and the RSQLite package) to open it up and see what is there. In my case, I found a database with 35,297 records, with timestamps ranging from June 22, 2010 to 4 days ago (the date I last synced my phone). That said, there are only 726 unique timestamps in the database, which makes me wonder what is going on. Incidentally, the time-stamps are seconds since 2001-01-01 (I don’t know which TZ): in R, I converted with:

 as.POSIXct(data\$Timestamp,origin=c("2001-01-01") 

There is a little weirdness in the data: the iPhone thinks I was in Scotland last August (when in fact I was no further north than Colchester), and a few other instances of the geo-data being off by as much as 100 kilometres or so. There are also 68 records with lat/long recorded as 0/0, almost all of which are records from when I was in England last August.

There are also fields whose names I can’t fathom, nor does a quick look suggest that any of them might be offsets to the timestamps:
 > names(data) [1] "MCC" "MNC" "LAC" [4] "CI" "Timestamp" "Latitude" [7] "Longitude" "HorizontalAccuracy" "Altitude" [10] "VerticalAccuracy" "Speed" "Course" [13] "Confidence" 

Does anyone know what the fields with non-obvious names might be?

Next Page »