"Count what is countable. Measure what is measureable. What is not measureable, make measureable." -- Galileo

Saturday, August 30, 2008

Bossies and a Political Diverimento

InfoWorld's Bossies (Best of Open Source Software Awards) came out recently. The format is a terribly kludgy slide show, but if you persevere, you'll eventually figure out who got which awards.

Elgg garnered the best social networking tool. MediaWiki took the wiki top spot. WordPress was given blog honors. To my surprise they had an object database category, but they passed over Zope and went with db4o. Under content management (hidden in the Enterprise Application supercategory) Alfresco came away with the prize.

No mention of runners up, scoring, methodology, but they do accept nomenations from the public at large. Apparently they are already accpeting nominations for next year. Their website mentions some of their criteria:

Our selection committee is looking for innovation, functionality, ease of use and implementation, and a proven track record in serving the needs of businesses. The deadline for submitting nominations for the 2009 Bossies is June 26 [2009], and the winners will be announced August 3.

And now for some political statistics...

In the wake of the Democratic National Convention here in the U.S., today I'd like to putz around with some polling statistics. Prof. Tanenbaum (Computer Science, Vrije Universiteit Amsterdam and author of MINIX) maintains a wonderful electoral vote website full of fascinating polling data and an always-insightful daily commentary.

I've been following with interest his Electoral College graphs, which compare the 2004 campaign with what is unfolding right now in 2008. Although at times McCain has pulled close (and even ahead prior to last June), one thing that struck me was how little "noise" there was in the 2008 graphs compared with the jittery 2004 ones.

Tanenbaum classifies polling data as strong, weak, barely in favor of one candidate or another or else tied. In this case I ignored his "barely" and "tied" columns. Somewhat similar results occur if "barely" is added in, but beware, that means you are using statistically insignificant differences for scoring.

Armed with csv files with the relevant data from Tanenbaum's site, I did some curve fitting and see that my eyes were not deceiving me. The best-fit polynomial curves for the 2004 data had R2 values of 0.44 (Democrats) and 0.49 (Republicans). Using 2008 data has R2 values of 0.89 (Democrats) and 0.22 (Republicans).

R2 is the regression coefficient and is a measure of how well the line matches the observed data. A value near 1.0 means the line's equation fits extremely well, while a value near zero means that the equation has little power to explain the data. In this case I used a 5th order polynomial model (6th order for GOP 2008). Adding in the "barely" polls ups the GOP R2 to 0.83, so it appears that much of the noise in the data is cancelled by including so-called battleground states.

Conclusion--Obama's polling numbers are very stable with much less variability than McCain's. If I were McCain, I'd be worried. But then go back to Electoral-Vote.com next week--it may all have changed. These polynomials explain variability in the data, but they aren't predictive.

Sunday, August 24, 2008

Plone and Teaching Database Design

In six weeks to the day I'll be hopping a flight to Washington National for seven days of Plone-ness. I've got my slot at Joel Burton's workshop for Mon-Tues, plan on attending the entire 2008 Conference on Wed-Fri, and finish it off with the weekend sprints. I'll get a chance to hook up with Jason, our Python guru who left Sandia two years ago and is now esconced in DC. Should be an excellent week.

Meanwhile, on to today's topic: teaching database design. My CMS-476 class at the College of Santa Fe begins on Wednesday and I have 12 students. Four seniors, three juniors, a single sophomore, and four freshmen round out the class, so I'll have a wide spectrum of experience and IT backgrounds.

Normally, one would consider a database design class to be strictly a design class: ER diagrams and that sort of thing. However, in the single 8-week term at my disposal, I feel like I owe to my students to give them a very, very solid database foundation and skills that will pay the rent.

What does that mean? Well, I start out with the wherefore and why of databases--an assignment for them to spend a day of "database fasting," trying to get through a day without directly using a database (for example, online banking, phonebook, e-mail contact lists, ATM, TV channel guide, and strictly speaking, a computer's file system). I point out the types of databases--hierarchical, relational, and object-oriented--with examples--a computer's file system, a MySQL database, and Zope.

While I spend a portion of every class going over SQL in detail, working from simple to complex statements, I also introduce them to database modeling. I start with the Formal Object-Relational Modeling Language (FORML), now embedded in Visio, which gets them used to using tools to generate ER diagrams and schema.

But nowadays I quickly move on to UML and that means I have an opportunity to teach them how class and state diagrams can quickly generate archetypes in an object-oriented framework. We implement a custom Plone data type from the UML via Joel Burton's online ArchgenXML tool. By that evening's end, we've gone from conceptual model and requirements through to fully integrated, web-enabled data system.

It definitely opens minds and I hope that eventually it opens doors for them. I encourage other database instructors to use Plone and Zope as a model of state-of-the-art object-oriented information management in their classrooms. Providing students with tools to not only design their databases but implement them in a web environment gives them a very valuable toolset, even in a relatively short course.

Tuesday, August 12, 2008

Visualizing CMS

I'm still working on quantifying this, but here are some screen captures of website visualizations from http://www.aharef.info/static/htmlgraph/. Fun stuff. The colors are coded as follows:

blue: for links (the A tag)
red: for tables (TABLE, TR and TD tags)
green: for the DIV tag
violet: for images (the IMG tag)
yellow: for forms (FORM, INPUT, TEXTAREA, SELECT and OPTION tags)
orange: for linebreaks and blockquotes (BR, P, and BLOCKQUOTE tags)
black: the HTML tag, the root node
gray: all other tags

Finding the black root node in these small images can be tricky. Look for a starburst of grey nodes with no further daughter nodes and then go upstream.

From top to bottom they are:
  • Drupal.org
  • Plone.org
  • Wordpress.org
  • Joomla.org