jackman.stanford.edu/blog
bannerImage

my 1st post for the Guardian Australia

Thursday May 30, 2013

Filed under: Australian Politics,R,statistics — jackman @ 1:04 pm

I’ll be contributing a piece about once a week for the Guardian Australia, under a part of the web site we’re calling The Swing.

The set of graphs from my 1st effort were rendered in-line and rather low-res.

Bigger, full res versions appear below; click on the in-line versions.

2009-2010

campaign2007

It would be great to find a way to quickly make nice, web-friendly graphs out of R. Vega looks like a reasonable wrapper to d3. Datawrapper.de just doesn’t give me enough control over annotations, axes etc… I’m also looking at Rickshaw. Life is short, beautiful graphics are hard, sometimes…

Comments (8)

CRAN might get tenure at Yale?

Friday August 24, 2012

Filed under: R,statistics — jackman @ 7:00 am

From one of the R lists I follow:

Today (2012-08-23) on CRAN [1]:

“Currently, the CRAN package repository features 4001 available packages.”

These packages are maintained by approximately 2350 different folks.

Previous milestones:

2011-05-12: 3,000 packages [1]
2009-10-04: 2,000 packages [2]
2007-04-12: 1,000 packages [3]
2004-10-01: 500 packages [4]
2003-04-01: 250 packages [4]

[1] http://cran.r-project.org/web/packages/
[2] https://stat.ethz.ch/pipermail/r-devel/2009-October/055049.html
[3] https://stat.ethz.ch/pipermail/r-devel/2007-April/045359.html
[4] My private in-house data.
[5] http://cran.r-project.org/web/checks/check_summary_by_maintainer.html

/Henrik

PS. This count includes only packages on CRAN. There are more
packages elsewhere.

Comments Off

Rudd, the last one standing?: Federal implications of QLD state election results

Wednesday April 4, 2012

Filed under: Australian Politics,R,statistics — jackman @ 12:40 am

Labor won 15 of Queensland’s 29 House of Reps seats in the 2007 Federal election (AEC details here). Yet just three years later, in the 2010 Federal election, Labor won only 8 of 30 Queensland Reps seats, with 33.6% of 1st preferences (a swing of -9.3 percentage points).

Labor’s best performance on 1st preferences in 2010 was in Capricornia (46%), which translated into a 54-46 2PP result. Kevin Rudd won Griffith with 44% of 1st preferences, resulting in a 58-42 2PP result. Wayne Swan and the LNP candidate split the 1st preferences in Lilley, 41-41, with Swan winning the seat with Green preferences, 53-47 2PP. Labor managed to get home in Moreton in 2010, with 36% of the 1st preference vote, and a 51-49 2PP result.

The state election of some 10 days ago was conducted under different district boundaries (89 seats in the Queensland parliament) and a different electoral system (optional preferential). Moreover, the Katter Australia Party ran candidates in 76 seats, winning 11.5% of 1st preferences, further complicating comparisons with previous elections (state or federal). In any event, Labor won about 26.7% of 1st preferences (ECQ results), down 6.9 percentage points from its performance in the 2010 Federal election, and down a staggering 15.6 percentage points from the 2009 state election.

How might these 2012 state-level results translate into Federal results?

There are many different ways of looking at this, all of which involve a little guesswork and assumptions given the differences in the two electoral systems, the configuration of parties and so on.

Here’s a stab that I’ve been working on over the last week or so (“Spring Break” here at Stanford). The AEC conveniently (!) geo-codes its polling places and publishes that data on its web site. Shape files for Federal electorates are also available. This makes it feasible to start re-aggregating booth-level results from the state election up to Federal seats.

A few steps and assumptions are required (and I’ll write this up at some point):

So what do you get when do this re-aggregation, subject to all the caveats sounded above? Keep in mind I only have 1st preferences, at least for now.

The figure below (click for full-size) shows a scatterplot of imputed Federal results for the ALP given the 2012 state results, for each of Queensland’s 30 Federal seats, against the ALP’s actual 1st preference vote share (%) recorded in the 2010 Federal election. The diagonal line is a 45 degree line, a “no difference” line. On average, the data points lie below the diagonal, indicating what we know, that Labor did considerably better in the 2010 Federal election than in the 2012 state election.

Red dots and labels indicate the 8 seats won by Labor in 2010. The good news (!?) for Labor is that the Federal seats in which its primary vote utterly cratered are seats in which it had no chance of winning in the 1st place, where its 2010 1st preference vote share was below 30% or barely above 30% (e.g., Wide Bay, Maranoa, Fairfax, Wright, Fisher, Hinkler).

The bad news for Labor is that it would seem that most of its 8 Federal, Queensland seats are at some peril, with the exceptions perhaps being Griffith (Rudd’s seat), and maybe Rankin (Craig Emerson) and Oxley. The estimated ALP 1st preference vote share given the 2012 state results in these 3 seats lies above the actual ALP 1st preference recorded in Moreton in 2010, which was Labor’s weakest among the 8 seats it won in 2010 (and observe the many assumptions implied in that extrapolation).

Lilley — Swan’s seat — will be interesting. I grew up in Lilley on Brisbane’s northside. When Labor is really on the nose, it goes to the Coalition. Swan lost the seat in 1996 in his sophomore election, but has held it since 1998. I’m not sure the last redistribution helped, and its tough to see Labor win it if its primary vote share slips below 35%. Complicating factors are what role might the Katter party play, as well as some kind of “personal vote” for Swan (an incumbent Federal Treasurer, no less).

I also show the implied swings given by these estimates of ALP 1st preference vote share (bigger version available by clicking):

This presentation of the data highlights that Griffith (Rudd’s seat) has the smallest implied swing among Labor’s 8 seats, around about 5 percentage points. Coupled with the fact that Rudd starts off at a tolerable level of 1st preference support, this bolsters confidence that Griffith remains Labor’s best shot at a “retain” in 2013.

The implied swing in Moreton is only a little larger, but there is far less buffer there. Swings of -7 to -8 percentage points on 1st preferences in Lilley, Rankin and Oxley would have to be almost surely fatal to Labor’s chances there. And double digit swings in Petrie, Blair and Capricornia would also have be beyond the margin of survival.

Could Rudd be the last (QLD, Labor) one standing?

Comments Off

pscl 1.04 live on CRAN

Saturday October 15, 2011

Filed under: computing,R,statistics — jackman @ 5:01 pm

Update to my pscl package, now on CRAN.

Biggest change: fixing a bug in the way MCMC draws for item parameters were being stored and summarized by ideal.

Comments Off

Bay Area R Users group has 1300 members

Wednesday October 12, 2011

Filed under: R,statistics — jackman @ 3:03 pm

Impressive.

You are not alone!

Comments Off

tracking Australian election betting markets again (now with sparklines)

Wednesday July 13, 2011

Filed under: Australian Politics,R — jackman @ 3:23 pm

The header of my blog (above) shows the latest prices on offer in some of Australia’s election betting markets.  I convert the prices to an implied probability of ALP win (factoring out the bookie’s profit margin, the so-called “overround”).

I’m using some Javascript by John Resig to make Tufte-ish sparklines, although the Google version of sparklines looks easy to work with too. I’m using some R to generate PNG files plotting the last 72 hours of data.

Time-series graphs appear as PDFs too, again see the header of the blog.

On the data themselves, the betting markets have been moving in a pro-Coalition direction over the last two weeks, with some movement around the time that recent polls have been released, showing that the Coalition would romp home.  I think we’re still waiting on some post-carbon-tax polling, and how the betting markets digest that.

Comments Off

roll calls, ideal points, 112th Congress

Wednesday June 29, 2011

Filed under: politics,R,statistics — jackman @ 11:09 am

Now that classes are over, I took a little time to update my scripts that update the analysis of Congressional roll calls in close to real time.   Links appear at the top of the blog.   As of about 15 minutes ago, we’re up to 77 non-unanimous roll calls in the 112th Senate.   The House has 474 non-unanimous roll calls under its belt.

I’m presenting estimates of legislators’ “ideal points” and 95% credible intervals (from a model that fits just a single underlying dimension to the roll calls) both graphically (House/Senate) and in CSV.  I also present scatterplots (and loess smoothing) of the estimated ideal points against a crude (but useful) measure of preferences in the legislators’ district/state, Obama vote share in the 2008 election (House/Senate). I’ve also got a SVG with rollovers for the dense House scatterplot, using the RSVGTipsDevice package, but the resulting SVG breaks in Chrome.

I’m scraping the roll calls and some meta data from the House and Senate sites, using the parsing in R’s XML package (which I’m finally understanding how to use effectively).   Analysis of the roll calls is via the ideal function in (my) R package, pscl.

Quite aside from the methodology/technology, the substantive story is very much business as usual: zero partisan overlap in the recovered ideal point estimates. About 1 to 1.5 standard deviations of the ideal point distribution separate the ideal points of Democrats and Republicans among districts/states that split 50-50 Obama/McCain in 2008.

The other striking feature of the data is how few Democrats remain in the 112th House in districts where McCain beat Obama: I count 12 such seats.

Comments (2)

Sweave source for poll report

Monday June 13, 2011

Filed under: R — jackman @ 3:53 pm

Sweave source for the poll report for those who expressed some interest.

You’ll also need this file of R function definitions, utilities.R.

I also wrote a little shell script that calls Sweave and xelatex etc, hacking the Sweave.sh script that ships with R.

Comments (1)

Australians and Americans, 10 years after 9/11

Thursday June 2, 2011

Filed under: Australian Politics,R,statistics,type — jackman @ 7:00 am

With Lynn Vavreck at UCLA, I ran parallel public opinion surveys in Australia and the United States, measuring attitudes on security, the fight against terrorism, the wars in Afghanistan etc, some 10 years after the 9/11 attacks.

Full report here (generated with Sweave, xelatex, etc).

Comments (4)

iPhone geo-tracking database

Wednesday April 20, 2011

Filed under: Apple,computing,R — jackman @ 7:56 pm

So the web lit up a little today with news that iPhones are collecting time-stamped location data, and in a form that isn’t particularly hard to look at (and even with some nice apps to make animated maps of your travels etc):

The database is SQLite, and I used R (and the RSQLite package) to open it up and see what is there. In my case, I found a database with 35,297 records, with timestamps ranging from June 22, 2010 to 4 days ago (the date I last synced my phone). That said, there are only 726 unique timestamps in the database, which makes me wonder what is going on. Incidentally, the time-stamps are seconds since 2001-01-01 (I don’t know which TZ): in R, I converted with:


as.POSIXct(data$Timestamp,origin=c("2001-01-01")

There is a little weirdness in the data: the iPhone thinks I was in Scotland last August (when in fact I was no further north than Colchester), and a few other instances of the geo-data being off by as much as 100 kilometres or so. There are also 68 records with lat/long recorded as 0/0, almost all of which are records from when I was in England last August.

There are also fields whose names I can’t fathom, nor does a quick look suggest that any of them might be offsets to the timestamps:

> names(data)
[1] "MCC" "MNC" "LAC"
[4] "CI" "Timestamp" "Latitude"
[7] "Longitude" "HorizontalAccuracy" "Altitude"
[10] "VerticalAccuracy" "Speed" "Course"
[13] "Confidence"

Does anyone know what the fields with non-obvious names might be?

Comments (5)
Next Page »

Powered by WordPress

Bad Behavior has blocked 5109 access attempts in the last 7 days.