"Count what is countable. Measure what is measureable. What is not measureable, make measureable." -- Galileo

Saturday, August 30, 2008

Bossies and a Political Diverimento

InfoWorld's Bossies (Best of Open Source Software Awards) came out recently. The format is a terribly kludgy slide show, but if you persevere, you'll eventually figure out who got which awards.

Elgg garnered the best social networking tool. MediaWiki took the wiki top spot. WordPress was given blog honors. To my surprise they had an object database category, but they passed over Zope and went with db4o. Under content management (hidden in the Enterprise Application supercategory) Alfresco came away with the prize.

No mention of runners up, scoring, methodology, but they do accept nomenations from the public at large. Apparently they are already accpeting nominations for next year. Their website mentions some of their criteria:

Our selection committee is looking for innovation, functionality, ease of use and implementation, and a proven track record in serving the needs of businesses. The deadline for submitting nominations for the 2009 Bossies is June 26 [2009], and the winners will be announced August 3.

And now for some political statistics...

In the wake of the Democratic National Convention here in the U.S., today I'd like to putz around with some polling statistics. Prof. Tanenbaum (Computer Science, Vrije Universiteit Amsterdam and author of MINIX) maintains a wonderful electoral vote website full of fascinating polling data and an always-insightful daily commentary.

I've been following with interest his Electoral College graphs, which compare the 2004 campaign with what is unfolding right now in 2008. Although at times McCain has pulled close (and even ahead prior to last June), one thing that struck me was how little "noise" there was in the 2008 graphs compared with the jittery 2004 ones.

Tanenbaum classifies polling data as strong, weak, barely in favor of one candidate or another or else tied. In this case I ignored his "barely" and "tied" columns. Somewhat similar results occur if "barely" is added in, but beware, that means you are using statistically insignificant differences for scoring.

Armed with csv files with the relevant data from Tanenbaum's site, I did some curve fitting and see that my eyes were not deceiving me. The best-fit polynomial curves for the 2004 data had R2 values of 0.44 (Democrats) and 0.49 (Republicans). Using 2008 data has R2 values of 0.89 (Democrats) and 0.22 (Republicans).

R2 is the regression coefficient and is a measure of how well the line matches the observed data. A value near 1.0 means the line's equation fits extremely well, while a value near zero means that the equation has little power to explain the data. In this case I used a 5th order polynomial model (6th order for GOP 2008). Adding in the "barely" polls ups the GOP R2 to 0.83, so it appears that much of the noise in the data is cancelled by including so-called battleground states.

Conclusion--Obama's polling numbers are very stable with much less variability than McCain's. If I were McCain, I'd be worried. But then go back to Electoral-Vote.com next week--it may all have changed. These polynomials explain variability in the data, but they aren't predictive.


German Viscuso said...


Why was db4o a surprise to you in the Bossies? Zope is really great but maybe the fact that it only works with Python objects could be seen as a limitation.



Schlepp said...

Db4o wasn't the surprise... the fact that they actually had an object-oriented database category surprised me.