Goodbye static graphs, hello shiny, ggvis, rmarkdown (vs JS solutions)

Monday August 18, 2014

One of the very exciting and promising developments from RStudio is the rmarkdown/shiny/ggvis combination of tools.

We’re on the verge of static graphs and presentations being as old-fashioned as overhead transparencies.

I’ve spent the last couple of days giving these tools a test spin. Lots of comments and links to examples appear below.

I came to this investigation with a specific question in mind: how can I get a good-looking scatterplot with some rollover/tooltip functionality into a presentation, with one tool or one workflow?

Soft constraints: I’d prefer to use R, at least on the data side, and I would also like customization over look and feel (e.g, slide transitions), stylistic elements like type, color, sizes and spacing.

I use either Beamer or Keynote for presentations (Beamer for teaching/stats-type talks, Keynote for more substantive, general audience talks).   I began by investigating how one might drop a d3-rendered graph into a Keynote presentation, but this seems pretty hard.   Hacking at the files produced by Keynote’s export-to-HTML function seems formidable.

I’ve also been poking at solutions that are all on the JS side of the ledger (e.g., d3 + stack), inspired by this example from Karl Broman. I’m also interested in how one might roll an interactive graphic into Prezi.

But back to the RStudio workflow, using the rmarkdown/shiny/ggvis combination.  Here is some sample output I’ve created: a standalone scatterplot and a dummy presentation.

Some observations:

If you’re happy with the out-of-the-box style defaults, then this stack of tools is just about there and evolving rapidly. And keep in mind that rmarkdown does a lot more than make presentations. For instance, I’m yet to really explore rmarkdown for producing publish-to-web papers.

If you crave fine control over layout and graphical elements, then I think it might still be a d3/js world, at least for a while longer.

I’m still left thinking that if I could drop shiny apps or d3 into Keynote (somehow), then I’d have the best of both worlds.

ideal point graphics, via d3

Wednesday July 30, 2014

Filed under: R,statistics — jackman @ 1:04 pm

I’ve updated some of the graphical displays of the ideal point estimates I serve up here. I’ve rendered some of these in d3, with some rollover lah-de-dah: (1) 113th House ideal points in a long “caterpillar” format; (2) scatterplot of ideal point against Obama 2012 vote in district. Screenshot of the scatterplot appears below.

My R scripts dump csv containing the ideal point estimates, credible intervals, labeling info, which I then pick up on the d3 side.   Separate files dump fitted values from local regression fitting estimated ideal point as a function of Obama vote in district.

I toyed with the idea of loess on the d3/js side (with sliders for user control of bandwidth etc), more as a plausibility probe than anything, but it seems like a lot to push down through the browser.

Screen Shot 2014-07-30 at 12.59.38 PM

the last 747, SFO-SYD…

Wednesday March 26, 2014

Filed under: flight nerdery — jackman @ 8:23 am

Tonight is United’s last 747 on the Sydney run out of SFO. From tomorrow night it will be 777s servicing UAL 863 SFO-SYD.

It will be a nicer coach class experience in the 777s, but a longer flight with the 777s slightly slower cruising speed.

Farewell to seat 14K upstairs, visiting the cockpit prior to pushing back, and all that.

And no more views like this, United’s return flight 870 (SYD-SFO) banking right over Sydney after taking off from 34L at SYD (click for larger version):

MH370 hypotheses

Tuesday March 11, 2014

Filed under: flight nerdery — jackman @ 6:07 pm

A Payne Stewart type event? Here’s a version of that: realizing that they are being overwhelmed by hypoxia, the crew sets the autopilot for a left turn towards a nearby airport and a lower altitude? But it is too little too late. The crew’s incapacitated, the AP flys the settings input by the crew as they are literally losing consciousness, west and lower…

Fanciful, I know, and wouldn’t seem to explain the loss of the transponder.

And if loss of cabin pressure prompts the crew to descend – an emergency situation – then they’d almost surely shed the altitude manually, and in a hurry, not by dialing in a new altitude on the autopilot.

Then again, previous cases involving crew incapacitation due to hypoxia point to all kinds of impaired decision-making.

Hijacking gone wrong must surely be another candidate hypothesis at this stage.

In any event, I wonder about radar coverage out into the Indian Ocean, west of Aceh, south of Sri Lanka. I’m guessing that there’s not a lot there.

With fuel + reserve for a flight from KL to Beijing, suppose the plane had another four or five hours of fuel from the reported point of last radar contact, and continued on this reported “westerly” heading. Then there is a chance MH370 wound up flying a lot further West, perhaps somewhere roughly between the 8pm and 9pm point on the circled area, below (from Great Circle mapper, a circle with 2500mi radius centered on Kuala Lumpur):


ALP two-party preferred vs 1st preferences

Tuesday October 8, 2013

Filed under: Australian Politics — jackman @ 2:39 pm

[Update: Let's try this again...after Kevin Bonham pointed out that I'd screwed up some of the computation here]

I was inspired to make the attached graph after seeing some of Mumble’s tables (Mumble now has a table directly on this topic).

I plot ALP two-party preferred (vertical axis) against ALP 1st preferences (horizontal axis); see the in-line PNG below or this PDF in a separate browser window. The raw data are available in a table, CSV or as a RData object.

Keep in mind that we’re still waiting on TPP counts in 11 of the 150 seats in the House of Representatives, but that won’t change the big picture.

Some commentary below.

ALP two-party preferred vs 1st preferences, by electoral division

Note immediately that Labor won just one seven seats with a clear majority, without going to preferences; these are labeled on the upper right of the graph. It is interesting to see McMahon in this set, a seat which was thought to be part of the anticipated Western Sydney Labor wipeout at one point. I’m sure other commentators have noted this, but it is rather striking, a reflection of just how poorly Labor performed on 1st preferences in 2013.

Labor did not win more than 50% of 1st preferences in any of the seats yet to report a TPP count (and so not displayed on the scatterplot). Labor’s 1st preference performance in these seats ranges from 11.7% (Indi) to 45.1% (Wills).

The flow of preferences to Labor (inherent in the definition of two-party preferred) means that ALP TPP > ALP 1st prefs, in every electoral division, and so the data lie comfortably above the diagonal, 45 degree line. The dark line is a smoothing spline, that is pretty much linear, save for the kink around 46% ALP 1st preferences.

That is, preferences are flowing to Labor pretty much proportionally to their 1st preference vote share, at least on average. In fact, the linear model ALP TPP = 95% of ALP 1st prefs + 14 isn’t a bad approximation (OLS, r2 = .90), save for a few seats.

I’ve labelled some cases where Labor TPP is considerably higher than we’d expect given the 1st pref result: e.g., Grayndler, Gellibrand, Sydney, Melbourne Ports and Richmond. Unsurprisingly, these are all seats where the Green vote was quite strong.

Grayndler, Gellibrand and Sydney are seats where Labor polled 43% to 44% on 1st preferences, the Green result ranged between 16% and 21% (way above the Green’s national vote share), and/or (a) the vote for other minor parties was low; (b) Green preferences have flowed back to Labor at higher than typical rates. Melbourne Ports and Richmond are also “off-trend” but in a different class: the Labor vote there is in the low 30s (close to Labor’s national vote share), the Green vote at 17% (Richmond) or 19% (Melbourne Ports) – way above the Green’s national support level – helping put Labor over the top.

The AEC will eventually release preference flow data (here’s what that raw data looked like for 2010). Until then, some sense of the relationship between the Green vote and the impact on the election outcomes comes via the graph presented below. I’ve plotted the difference between ALP TPP and ALP 1st preferences against Green 1st preferences; again, the diagonal is a 45 degree line and the darker line is a smoothing spline.

Most of the data lies above the 45 degree line. That is, Labor accrues the vast majority of Green preferences and, of course, picks up preferences from other candidates too. Consequently, almost everywhere ALPTPP > ALP + Green. The exceptions are Bradfield, Curtin, Goldstein, Higgins, Kooyong, Mackellar, North Sydney, Warringah and Wentworth, all urban seats won by the Liberal Party. The Green vote in these seats is not small, and a reasonable proprtion of Green preferences flow back to the Coalition. A popular (if politically incorrect and not altogether accurate) euphemism for the Green > … > Liberal > … > ALP preference ordering is “doctor’s wives”. Imagining “leafy lined avenues” might help us get at the same niche of the electorate, without the patronizing, sexist overtones.

Note also the distinction between the Labor-won seats of Fowler, Blaxland and McMahon, versus, say, Grayndler, Melbourne Ports, Gellibrand and Sydney.

Perhaps all this is to say that not all Labor-won seats are the same, even those with reasonably similar outcomes on ALP TPP. The Green vote (and Green preference flow) is an interesting way to think about how those seats differ, but would themselves be explained by other things: e.g., level of tertiary education, income, espresso consumption, etc…

Tuesday October 1, 2013

Filed under: general — jackman @ 6:01 pm

Back in DC for a spot of work… Flew here on Day One of the Federal government shutdown. Not that you’d know if from the traffic coming in from Dulles at rush hour.

I sat next to a lobbyist on the flight. He’s with a bank, trying to get some proposed Dodd-Frank reporting requirements watered down. He fell asleep, woke with a start, bumped into me. Said he was dreaming about his golf swing. Said he wiffed it.

I had a drink or two at the Willard, dinner on the sidewalk, distracted now and then by the jets like rockets, through the trees over the Mall, coming out of National (um, Reagan), flying that slightly crazed up-stream Potomac departure.

The Indian waiter recommended the Rhone he sold me to the table of tourists from the PRC, lugging their Nieman Marcus booty around. DC might be the nation’s capital, but even (or perhaps especially) two blocks from the White House, it is crazy cosmopolitan…

The humidity of a DC summer is gone, but its still warm enough to stay dine outside. October 1, 2013, Washington DC. A beautiful night.

where will the two-party preferred result land?

Monday September 23, 2013

Filed under: Australian Politics — jackman @ 7:06 pm

We’re still waiting for TPP counts in 10 “non-classic” seats. With the TPP counting basically done in the other 140 seats, Labor has 46.61% TPP.

We’ll know soon enough, but my very rough guess goes something like this:
Screen Shot 2013-09-23 at 7.02.12 PM

That is, Labor picks up TPP swings against it in the 10 non-classic divisions awaiting TPP counts. I weight my rough guesses in these seats by each seat’s enrollment, getting 42.92% ALP TPP there. These 10 seats have total enrollment of 933K, which goes up against the 46.61% from the 14.7M – 933K of the electorate in the 140 seats with a TPP count. Overall result: about 46.4%.

But this estimate is as rough as guts, as they say… All the same, this would put the 2013 result in 1996 territory, where Labor got 46.37% TPP.

Update: another approach to the data generates a similar estimate. The 10 non-classic seats in 2013 produced 46.84% ALP TPP in 2010. The remaining 140 classic seats gave us 50.34%, the difference being 3.5 percentage points. The weighted combination of the 10 “non-classic” plus 140 “classic” divisions in 2010 is 50.12%, the overall 2010 ALP TPP result.

If we get that 3.5 percentage point classic/non-classic difference this year, then the 10 non-classic divisions will come in at 46.61-3.5 = 43.11% ALP TPP. This implies an overall ALP TPP result of 46.39%, again in 1996 territory.

We’d need the 10 non-classic divisions to produce something like 40.5% ALP TPP for the overall ALP TPP result to get down to 46.2% (the last point estimate I produced from my poll averaging machinery ahead of the election). We still might get there, but these rough estimates as to ALP TPP among the non-classic divisions suggests 46.3-46.5 might be more like it.

collection of my 2013 election analyses, lots of graphs and more to come

Friday September 20, 2013

Filed under: Australian Politics — jackman @ 12:08 am

I’ve created a page holding links to the various graphs and analyses I’ve got out of the 2013 Australian election.

This is still a work in progress. I’m sitting on quite a lot more content to go up yet. A sampler, for your consideration: two-party preferred swings, grouped by state… (click for a hi-res PDF in a separate window):

Two-party preferred swings, by electoral division, 2013 Australian Federal election, grouped by state

Moreover, the vote count continues, and will for a while. Many of the graphs I’m making update as the vote count continues; I’ve got time and date stamps on most of the output that is still in flux. The two-party preferred count will bounce around a bit once the AEC performs what is known as a “scrutiny for information” in 10 “non-classic” seats that aren’t ALP-Coalition two-party contests.

The vote count has settled down enough for another look at the performance of the marginal seat polls and the betting markets. At the seat-by-seat level, neither particular can claim a ton of glory.

I took a quick look at the marginal seat polling in a column I bashed out for Guardian Australia very late on Election Night; that needs to be updated and looked at a bit more carefully. I tend to think that the small samples used in much of that seat-specific polling will actually work to the pollsters’ advantage in this case; with so much sampling error in the estimates, the misses recorded by those polls will have to extremely large for us to be able to confidently conclude they are biased, or to say that one pollster is more biased than another.

But we’ll see. Its great when there is a data analysis ahead of you where you really don’t know how it will turn out.

pavement failure at YSSY

Monday September 9, 2013

Filed under: flight nerdery — jackman @ 10:17 pm

So this is a little odd:

Screen Shot 2013-09-09 at 9.45.53 PM

“Pavement failure” has closed 34L/16R at Sydney airport, the longest runway here (3962m, or 13,000 feet), typically used for heavy departures (and, in my flying experience out of YSSY, exclusively).

We were literally at the door of UAL870 (YSSY-KSFO) when they turned us back (hello, AirNZ Lounge, again).

There is a thumping breeze out of the West, so 25 seems to be the only active runway. I just saw a Virgin A340 get off 25 (2530m or 8300 feet), which was a little spectacular from the vantage point of the AirNZ Lounge overlooking the west end of 25:

Screen Shot 2013-09-09 at 10.02.20 PM

Oh, and note that temperature reading in the ATIS, above. 33C. What the hell? Apparently there is a southerly change coming our way shortly, which will be most welcome.

No word on the prognosis… Can a 744 loaded with pax and fuel for 12-13hrs get off in 8300 feet with a thumping headwind?

Update: So this is cool. So I’m sitting in the upper deck of this flight now, got to visit the cockpit and asked the crew what’s up. 34L is repaired and we intend to go out on that runway. The UAL LAX flight is going out ahead of us on that runway right now.

The LAX and SFO bound 744s carry too much fuel for the shorter runways here at Sydney; over 300,000 lbs of fuel.

An A380 headed for Singapore did get off 25 just before (nice of that from the cockpit). The UAL guys said that they can get off 7/25 in the 744s if they are only going to Melbourne.

final run of the 2013 poll averaging model: Labor 46.2% TPP

Friday September 6, 2013

Filed under: Australian Politics,statistics — jackman @ 2:18 pm

I’m looking to see if I have missed something, but I think thats it. I’ve got the releases yesterday, including the 54-46 results from Newspoll and Nielsen:

Screen Shot 2013-09-06 at 2.53.45 PM

Entering all this into the poll averaging model produces an ALP TPP estimate of 46.2%, +/-0.9. The 90-day trajectory (click for larger view):

This estimate is formed as a 67/33 combination of

  1. the model that is constrained to fit the 2010 election result exactly, and produces a set of house effects that are all positive (i.e., virtually all pollsters overestimated Labor’s 2010 TPP performance), which produces an estimate of 45.7% ALP TPP
  2. a model that is identified by assuming the house effects sum to zero (i.e., the polling industry is collective unbiased); this model produces an estimate of 47.1%.

I’ve left out the Lonergan mobile-only poll with n < 900; given that it was mobile-only, robo, I would have entered it into the model with its own unknown house effect. A pollster with an unknown house effect sees the impact of their estimate be greatly diminished, since the uncertainty generated by not knowing their house effect (and having only one poll with which to estimate it) winds up dramatically downweighting the effect of the poll on the overall estimate. Since this was a poll with a reasonably small sample size, it would have had a very small impact on the results in any event. I'll go ahead and re-run the models now with it in, but it won't change much at all, I should think.

Morgan multi-mode at 46.5 is interesting. Morgan multi-mode was showing a house effect of ALP +2, but then the last poll comes in very very close to the model consensus, dragging down the model consensus (even with some discount terms in the model) a little. There'll be time for "drop one pollster at a time" sensitivity analyses after the election, etc.

Screen Shot 2013-09-06 at 3.03.20 PM

Update (7.38am AEST): it turns out that Morgan’s final numbers differed slightly from those I had in the analysis, same TPP estimate, just bigger stated sample size and the field period closed Sept 6. Re-running…

Update (10.31am AEST): No change. Lonergan and fixed Morgan field period and sample size leave model estimate at 46.2% ALP TPP. See the graph above.

