Goodbye static graphs, hello shiny, ggvis, rmarkdown (vs JS solutions)

Monday August 18, 2014

Filed under: computing,R,statistics,type — jackman @ 6:00 am

One of the very exciting and promising developments from RStudio is the rmarkdown/shiny/ggvis combination of tools.

We’re on the verge of static graphs and presentations being as old-fashioned as overhead transparencies.

I’ve spent the last couple of days giving these tools a test spin. Lots of comments and links to examples appear below.

I came to this investigation with a specific question in mind: how can I get a good-looking scatterplot with some rollover/tooltip functionality into a presentation, with one tool or one workflow?

Soft constraints: I’d prefer to use R, at least on the data side, and I would also like customization over look and feel (e.g, slide transitions), stylistic elements like type, color, sizes and spacing.

I use either Beamer or Keynote for presentations (Beamer for teaching/stats-type talks, Keynote for more substantive, general audience talks). I began by investigating how one might drop a d3-rendered graph into a Keynote presentation, but this seems pretty hard. Hacking at the files produced by Keynote’s export-to-HTML function seems formidable.

I’ve also been poking at solutions that are all on the JS side of the ledger (e.g., d3 + stack), inspired by this example from Karl Broman. I’m also interested in how one might roll an interactive graphic into Prezi.

But back to the RStudio workflow, using the rmarkdown/shiny/ggvis combination. Here is some sample output I’ve created: a standalone scatterplot and a dummy presentation.

Some observations:

If you’re happy with the out-of-the-box style defaults, then this stack of tools is just about there and evolving rapidly. And keep in mind that rmarkdown does a lot more than make presentations. For instance, I’m yet to really explore rmarkdown for producing publish-to-web papers.

If you crave fine control over layout and graphical elements, then I think it might still be a d3/js world, at least for a while longer.

I’m still left thinking that if I could drop shiny apps or d3 into Keynote (somehow), then I’d have the best of both worlds.

Comments (17)

ideal point graphics, via d3

Wednesday July 30, 2014

Filed under: R,statistics — jackman @ 1:04 pm

I’ve updated some of the graphical displays of the ideal point estimates I serve up here. I’ve rendered some of these in d3, with some rollover lah-de-dah: (1) 113th House ideal points in a long “caterpillar” format; (2) scatterplot of ideal point against Obama 2012 vote in district. Screenshot of the scatterplot appears below.

My R scripts dump csv containing the ideal point estimates, credible intervals, labeling info, which I then pick up on the d3 side.   Separate files dump fitted values from local regression fitting estimated ideal point as a function of Obama vote in district.

I toyed with the idea of loess on the d3/js side (with sliders for user control of bandwidth etc), more as a plausibility probe than anything, but it seems like a lot to push down through the browser.

Screen Shot 2014-07-30 at 12.59.38 PM

Comments (9)

final run of the 2013 poll averaging model: Labor 46.2% TPP

Friday September 6, 2013

Filed under: Australian Politics,statistics — jackman @ 2:18 pm

I’m looking to see if I have missed something, but I think thats it. I’ve got the releases yesterday, including the 54-46 results from Newspoll and Nielsen:

Screen Shot 2013-09-06 at 2.53.45 PM

Entering all this into the poll averaging model produces an ALP TPP estimate of 46.2%, +/-0.9. The 90-day trajectory (click for larger view):

This estimate is formed as a 67/33 combination of

  1. the model that is constrained to fit the 2010 election result exactly, and produces a set of house effects that are all positive (i.e., virtually all pollsters overestimated Labor’s 2010 TPP performance), which produces an estimate of 45.7% ALP TPP
  2. a model that is identified by assuming the house effects sum to zero (i.e., the polling industry is collective unbiased); this model produces an estimate of 47.1%.

I’ve left out the Lonergan mobile-only poll with n < 900; given that it was mobile-only, robo, I would have entered it into the model with its own unknown house effect. A pollster with an unknown house effect sees the impact of their estimate be greatly diminished, since the uncertainty generated by not knowing their house effect (and having only one poll with which to estimate it) winds up dramatically downweighting the effect of the poll on the overall estimate. Since this was a poll with a reasonably small sample size, it would have had a very small impact on the results in any event. I'll go ahead and re-run the models now with it in, but it won't change much at all, I should think. Morgan multi-mode at 46.5 is interesting. Morgan multi-mode was showing a house effect of ALP +2, but then the last poll comes in very very close to the model consensus, dragging down the model consensus (even with some discount terms in the model) a little. There'll be time for "drop one pollster at a time" sensitivity analyses after the election, etc. Screen Shot 2013-09-06 at 3.03.20 PM

Update (7.38am AEST): it turns out that Morgan’s final numbers differed slightly from those I had in the analysis, same TPP estimate, just bigger stated sample size and the field period closed Sept 6. Re-running…

Update (10.31am AEST): No change. Lonergan and fixed Morgan field period and sample size leave model estimate at 46.2% ALP TPP. See the graph above.

Comments (17)

my 1st post for the Guardian Australia

Thursday May 30, 2013

Filed under: Australian Politics,R,statistics — jackman @ 1:04 pm

I’ll be contributing a piece about once a week for the Guardian Australia, under a part of the web site we’re calling The Swing.

The set of graphs from my 1st effort were rendered in-line and rather low-res.

Bigger, full res versions appear below; click on the in-line versions.



It would be great to find a way to quickly make nice, web-friendly graphs out of R. Vega looks like a reasonable wrapper to d3. Datawrapper.de just doesn’t give me enough control over annotations, axes etc… I’m also looking at Rickshaw. Life is short, beautiful graphics are hard, sometimes…

Comments (8)

Bayesian thinking about experiments, pre-registration, etc…

Friday May 10, 2013

Filed under: statistics — jackman @ 4:25 pm

I just spent a couple of hours at the meeting of the West Coast Experiments group, a set of political scientists interested in using experiments.

One the speakers was talking about the need for “credible” or “honest” p-values. “This will be good”, I thought to myself…

What the speaker was alluding to are the current moves afoot in political science to help stamp out “fishing” for statistically significant results, including pre-registering research plans. The problem is that after you’ve looked at a data set multiple times, the p-values aren’t telling what you think they are. The problem – as some Bayesians would point out – is that a p-value isn’t ever what you’d like it to be, even when you’re looking at the data the 1st time…

From the Bayesian perspective, all this stuff is kind of ridiculously overblown, a consequence of an unthinking acceptance of \(\)p < .05[/latex] as a model for scientific decision-making, point null hypothesis testing, the whole box and dice. That is worth a separate post one day. For now, I'll remark that pre-registration of research plans is a bit like eliciting very crude priors: i.e., enumerating things to be looked at in the analysis, because the effects aren't thought to be zero; enumerating things that won't be looked at, because prior beliefs over the effects are concentrated close to zero. The best moment of Bayesian irony was when the speaker emphasized that the need for honest p-values is especially pressing in situations where the experiment is expensive or intrusive and therefore unlikely to be run very often. This was just awesome, when you think what about a p-value is supposed to measure.

More generally, its been very interesting to bring a Bayesian perspective to my teaching about experimental design and analysis, or to a meeting like the one I was at today.

To begin with, try this on: the role of randomization in Bayesian inference. At least as a formal matter, randomization plays no formal role in the Bayesian analysis of data from an experiment, or any other data for that matter. This sounds so odd to non-Bayesians at first, particularly people who are doing a lot of experiments. But recall that repeated sampling properties like unbiasedness just aren’t the 1st or 2nd or even 3rd thing you consider in the Bayesian approach.

So just is the value of randomization to a Bayesian? Surely not zero, right? Don Rubin has written a little on this; Rubin’s point – that randomization limits the sensitivity of a Bayesian analysis to modeling assumptions – is a stronger conclusion than it first sounds, and one more of the more helpful things I’ve come across on the topic. I also found this note by J. Ghosh, a very concise and accessible summary of the issues too, summarizing some of the Bayesian thinking on the matter (Savage, Kadane, Berry etc). But my sense is that there’s not a lot out there on this. There is actually more writing on this in the literature putting model-based inference up against design-based inference in the sampling literature, which is essentially a parallel debate.

So, vast chunks of the (overwhelmingly classical/frequentist) literature on the analysis of experiments can seem very odd to a Bayesian. Randomization inference, or permutation tests. Re-randomization of assignment status if one detects imbalance. Virtually all Bayesians take the Likelihood Principle seriously, but so much of the work on experiments seems to violate it. It is also pretty obvious that experimenters are also carrying around prior information and using it: balance checks would seem to be guided by prior expectations as to likely confounders, no? Just in the same way that post-stratification weighting for non-response in a survey setting seems to be guided by an (implicit, and rather simplistic) model of response/non-response.

There is a lot to work through. Above all, it is important to keep in mind what is relevant for the applied scientist, what is more esoteric, and where Bayesian ideas can be of real practical use (e.g., Andy Gelman et al on hierarchical models for multiple comparison problems, or in the analysis of blocked or clustered designs, etc).

For now, I’m blessed to have colleagues like Persi Diaconis, Guido Imbens and Doug Rivers, who indulge (or encourage) my thinking out loud on these matters.

Comments (24)

Rachel Maddow reflecting on Stanford: “I took a ton of statistics classes…”

Monday April 1, 2013

Filed under: politics,statistics — jackman @ 12:18 pm

“I needed to learn how to be persuasive. I needed to learn how to win arguments. And so I did two things.”

“I took a ton of statistics classes…”

“And I enrolled in the Ethics in Society Honors program.”

20.30 in this YouTube video.

Comments (13)

NSF PoliSci funding…

Wednesday March 20, 2013

Filed under: ANES,politics,statistics — jackman @ 12:17 pm

The Senate just adopted Coburn’s (amended) amendment:

To prohibit the use of funds to carry out the functions of the Political Science Program in the Division of Social and Economic Sciences of the Directorate for Social, Behavioral, and Economic Sciences of the National Science Foundation, except for research projects that the Director of the National Science Foundation certifies as promoting national security or the economic interests of the United States.

Barbara Mikulski accepted the terms of the amendment, let it go through on the voices.

Nicely wedged, nicely played.

A great day for IR. A bad day for the study of American politics, political methodology…etc.

The “national security” or “economic interests” tests will be an interesting thing to see play out.

Or maybe we’re all sociologists now?

Comments (25)

Fiscal Cliff House Vote splits Republicans

Wednesday January 2, 2013

Filed under: politics,statistics — jackman @ 7:06 am

Here are two quick looks at the roll call last night from the House of Representatives, concurring with the Senate amendments to HR8. This was the so-called Fiscal Cliff vote, roll call 659 of the 2nd session of the 112th U.S. House of Representatives.

I’ve plotted the Ayes and Noes against each representative’s “ideal point”, a summary measure of each representative’s voting history based on previous roll calls in the 112th House, usually interpreted as an estimates of each legislator’s “left-right” or liberal/conservative ideological position. Democrats are shown in blue (and cluster to the left of the graph); Republicans are in red, on the right.

Notably, Speaker Boehner voted Aye, just the 9th time he has recorded a vote in the 112th Congress, which is not unusual for Speakers. With such a short, a largely one-sided short voting history, the algorithm puts Boehner out on the right-hand tail of the ideal point distribution. Majority Leader Cantor voted Nay.

The 1st graph shows a probit curve overlaid on the points, an estimate of the probability that a legislator occupying a particular point on the left-right continuum votes for the measure. Since all but 16 Democrats voted for the measure, the curve falls as we move from left to right across the page.

The 2nd graph fits probit curves separately for each party. The interesting action is among the Republicans, with the vote cleaving Republicans 151 – 85, and largely along ideological grounds. The ideal point estimate predicts Republican voting on this measure reasonably well (AUC = .816, Brier = .166).

In addition to being a rare instance of the Speaker casting a roll call vote, it is also an even rarer case of the Speaker voting against the majority of his party, including the majority leader. It might be a nice exercise to see when this last happened. We’ll see if this costs Boehner his job as Speaker; I suspect not.



Comments (7)

How Obama won: the swing state swings

Sunday November 11, 2012

Filed under: politics,statistics — jackman @ 1:06 pm

Obama won the 2012 election by keeping the swing small where it needed to be kept small. Just two states changed hands: Indiana and North Carolina, both won narrowly by Obama in 2008.

But consider this. Florida, Ohio and Virginia were all won by Obama in 2008 on margins smaller than the 53.4-46.6 national margin. Obama kept them all in 2012, with swings smaller than the national swing. Details…

Nationally, the two-party swing against Obama is currently -2.11 percentage points: that is, Obama beat McCain in 2008 53.4-46.6; at this stage of the 2012 count he is winning 51.3-48.7.

Let’s look at some key swing states.

And on it goes. Without more of these easier states coming over the line, getting to 270 was a very difficult proposition for Romney. The Romney campaign would have needed some large swings in other battlegrounds to get close.

And Pennsylvania? Forget it. With 20 Electoral College votes you can understand why it is an enticing target. But Pennsylvania was higher up the tree than Iowa, New Hampshire or Colorado, breaking 55.2-44.8 in 2008. Romney won a 2.6 point swing in Pennsylvania (slightly higher than the nationwide swing), but still only half of what he would have needed to win the state.

Indiana was the other Romney pickup. Obama’s 50.5-49.5 victory there in 2008 was a shocker, coming on the back of 10.9 point swing: to understand how big that was, recall that the national two-party swing to Obama was 4.8 points in 2008. Indiana was never on anyone’s battleground list in 2012. Expectations were that Indiana would “revert to type”. Absent any serious attempt to defend the narrow and unexpected 2008 win from the Obama camp, Indiana swung 5.8 points in 2012 (more than twice the national swing) with Romney winning 55.3-44.7.

A similar logic applies to Missouri, which Obama almost won in 2008, falling short by 0.06 of a percentage point, outpolled by McCain by just 3,600 votes. Missouri “reverted to type” in 2012 too, with a 4.8 point swing. In fact, Obama’s 2012 showing is worse than Kerry’s in 2004 (45.1% of the two party vote, versus Kerry’s 46.3%).

Two graphs summarize all this; clicking on each will open a larger version in a new browser window.

The first is a dot plot showing the swings in order. The color shows the disposition of the state in 2008. Two vertical bars show the national swing and no swing. Open plotting symbols for swing states show the swing they would have required to change hands: for North Carolina, the open plotting symbol lies to the right of the actual swing (the swing exceeded Obama’s 2008 margin). Again, note how the swings against Obama come up short in the other battleground states.

Second, a plot of 2012 swing against 2008 result. Shaded regions denote politically distinct regions: (1) Obama retains, where the 2012 swing isn’t enough for the state to fall to Romney, with 27 states in this category; (2) Obama losses, where the swing against Obama is enough for the state to fall to Romney (Indiana and North Carolina); (3) Obama gains (a null set in 2012); and (4) Republican retains, with 22 states in this category. The District of Columbia is excluded from this graph. The size of the plotting symbol is proportional to the state’s Electoral College votes; color indicates 2012 disposition; the national result is shown with a dark point labelled “USA”.

Obama wins the election on the back of that set of states that sit just to the right of the 50-50 threshold on the horizontal axis, but that didn’t drift down on the vertical “swing” dimension into the “change hands” cone.

Comments (16)

the last three elections

Saturday November 10, 2012

Filed under: politics,statistics — jackman @ 1:59 pm

I tweeted earlier that the correlation between 2012 and 2008 state-level, Democratic, two-party vote shares is .982. Take out (the outlier) DC and this goes down to .976.

Here’s another look at the data, comparing 04, 08 and 12. More on this later (as well as overlaying the national two-party vote shares), but we have stories that go along with Alaska, Hawaii, New Jersey (!), Utah, over these three elections.

What is striking is how the swing stayed small where Obama and Co. needed it to stay small. Big swings against them in Indiana and the South didn’t matter.

Comments (3)
Next Page »

Powered by WordPress