Friday November 30, 2007
Information Systems Colloquium Announcement
Title: Statistical Analysis of Online News
Speaker: Laurent El Ghaoui, Electrical Engineering and Computer Science Department, University of California, Berkeley
4:15 – 5:15 pm, Packard 101
Thursday, Nov 29, 2007
(Refreshments after the talk)
Each day we are inundated with an avalanche of online news. Yet is is currently hard to obtain a global view of this information. What are the images that various news media project about specific topics, such as global warming, human rights or presidential candidates? How do these images evolve over time? How do they differ across different media sources, scientific or mainstream? What are the dynamics of news events across news networks?
Modern statistical learning and optimization methods are having a great impact in fields where large amounts of data have become recently available, such as biology or finance. With no doubt, such methods can help shed light on the issues above as well, to the benefit of the social scientist or the ordinary citizen. In turn, online news analysis pushes the boundaries of statistics and optimization towards databases, networks, visualization, and calls for a renewed interaction between computer engineering and social sciences.
I will describe a project which aims at providing user-friendly tools for analyzing large amounts of text data residing in online databases, with a focus on online news data and voting records. I will discuss in particular how online learning and sparsity-inducing methods arise as key ingredients, and I will delineate some related fundamental challenges.
Wednesday November 28, 2007
From the Math/CS Library at Stanford. We’re likely to drop our subs of these two Taylor and Francis journals. The prices are staggering. Some of this is surely due to the fall of the USD, but still, it is a staggering sum. Online only is hardly any cheaper, so its not printing and postage that is the big cost driver:
Journal of Applied Statistics
$2,321 online only (v. $2,444 print + online, 5% difference) for 2008
Communications in Statistics package
$8,678 online only (v. $9,136 print + online, 5% difference) for 2008
More information here.
Nielsen have talked up the
rouge rogue poll angle on their Election Weeks polls. The “rogue poll” story doesn’t hold water here in two respects.
One, the differences from the “truth” are so big that they are extremely unlikely to have been caused by sampling error: I’ve blogged on this at The Bullring. That is, it is quite plausible that something other than sampling error is going here.
Second, Nielsen had not just one rogue poll, but two: one on-line poll that said 57-43, the other on the phone that said 56-44 (after you distribute preferences according to how they flowed in 2004). If we treat the two rogue polls as independent events (and they are not, since they are being done by the same organization), then the probability of being that far away from 53.3 with their sample sizes with TWO unbiased polls is about 1 in a 100,000. Bias-generating in-house procedures at Nielsen is the more likely culprit (e.g., sampling, weighting, question-wording). It is mysterious, since Nielsen did well in 2004.
Tuesday November 27, 2007
Photos from Election Day, the AEC’s National Tally Room and selected polling places in Eden-Monaro.
Monday November 26, 2007
Mackerras’ round-up of predictions:
I also tipped Howard would lose Bennelong, in the Oz, on the Thursday morning ahead of the election. McKew is ahead in the count there, but with quite a few absentee ballots to come. The long line of 52-48 polls out of Bennelong look pretty good. The bookies may well lose out big time on that one, by far the seat with the most punter interest, and with McKew flirting with even-money at one stage, but usually well above $2.00.
Saturday November 24, 2007
Didn’t change anything from my overnight runs, since it was so close to trend.
Thumbnails: level and trend, election period and entire 2004-07 period, with 95% bounds (dotted lines).
The national market backed away from Labor sharply in the 18 hours from noon yesterday to 6am this morning, but still had them firm favorites, in a number of various cuts at the election outcome:
Bennelong finished up with some movement back to the PM as well.
Most of the movement in the last 18 hours was towards the Coalition, with Ryan being a high-profile exception (putting on 4 percentage points towards Labor). See the daily fluctuations and weekly fluctuations.Labor is odds-on in all of its own seats, and in 21 seats currently held by the Coalition; see here. Petrie and Corangamite just got over the line, to be barely odds-on for Labor wins.
Portlandbet’s seat-by-seat market ended with 84 and 85 Labor seats on $9.50 each, “less than 74 seats” ended at $3.65.; more than 100 seats closed at $8.50. The implied probability that Labor wins 76 or more seats in this market wound up at .79., more confident of a Labor win than the national market.
Portlandbet’s vote share market wound up with ALP 2PP 54-54.99% being the most favoured outcome, at $2.85.
Sportingbet came in with 53-53.99% the favourite, at $2.50, with 54-55.99 (a big band) at $3.00; sporting bet’s markets are open until noon Eastern, Election Day. Indeed, those markets are still moving a little, with the national prices currently showing 1.31/3.45.
Past midnight here, so Happy Election Day.
Many jitters here in Canberra and elsewhere in reaction to the conflicting polls. I blogged on this over at The Bullring. See this comment thread and Peter Mumble Brent and I on the ABC’s PM show this afternoon.
Could this be Morgan’s election, running right down the middle with 54.5 2PP for Labor?
Interesting comments to my Unleashed story. Lots of eyeballs over there. Thanks again, Bruce B.
I’m updating my 2PP projections with that last batch of polls laid in. The algorithm (assuming no house effects, an assumption that is almost surely wrong given the dispersed poll results of Thursday) is running now, I will push graphs over here. It will come out at something like 54-46, with the Nielsen and Galaxy/Newspoll estimates split down the middle, right where Morgan is. I am writing a post-election poll assessment piece for the Bullring, a short version of which might make its way into the mag.
I will also hit the betting sites one more time in the early morning hours, before the polls open, for a last capture of data. The Coalition firmed markedly in the afternoon hours as the Newspoll situation started to break. The Coalition was at 4.50 on Centrebet at noon Thursday; 13 hours later it is at 3.50, the probability of a Labor win easing from .788 to .727. That some of the most rapid movement we’ve seen all year.
Friday November 23, 2007
The last change of government in this country was in 1996. You remember that one, right? The Howard-led Coalition picked up 32 seats from Labor. And it was all one-way traffic, with Labor failing to pick up Coalition seats. See this graph, and its cousins below..
Look for something from me at the Bullring from me on the conflicting polling between Galaxy and Nielsen.
And look for a seat count projection/simulation from me at the ABC Unleased site. The piece to appear there is formula free, but has a little bit of modeling and simulation underlying the graphs etc.
I’ve updated my poll trackers with the Nielsen and Galaxy polls. The poll results are over-dispersed (see the Bullring piece on 57-43 and 56-44, versus 52-48), but the trend line runs right between them, so they aren’t having a big impact on the estimated level, nor trend. If anything, the Nielsen polls (one phone, one on-line) win the battle of the weighted averages, as it were, with their bigger sample sizes, and the trend is even closer to a flat line than it was. Graphs here.
Thanks for the comments today (um, yesterday); I haven’t had time to respond to them all, I was on phone-mail only today, so I could see the comments arriving, just that I wasn’t in a position to give a considered response.