Plone Metrics: The Weighting Game

Spent the morning doing laundry and looking back at the recent InfoWorld CMS ratings. These ratings illustrates a couple of dangers and a couple of best practices.

My first points have nothing to do with InfoWorld, but rather on what people do with review data.

On the down side, the ratings were taken by Matt Asay and resulted in his statement "The winner? Alfresco, and by a significant margin (over Plone, Drupal, DotNetNuke)." "Significance" is not a term to be lightly tossed around when dealing with statistics. It is usually accompanied by a significance level (often 5%, but occasionally "highly significant" at 1%) and denotes a strict statistical formula for distinguishing hypotheses. The InfoWorld data is not designed to provide significance sensu strictu and one is left to imagine what the difference of 8.6 vs 9.2 means. (Even then its actually 9.15, but they rounded up.)
Also, Matt forgot to mention that Joomla was scored in the survey and came in third.
On the positive side, Matt does full disclosure--he's VP of Business Development for Americas for Alfresco.

However, InfoWorld doesn't get a free statistical ride today.

On the positive side, they provide a link to a methodology page and make an effort to justify their results. This is all too rare and should be emulated.
On the negative side, they don't explain their selection of categories, even though they state that some combination of the listed categories will be used. The absence of Availability is understandable (these are all open-source), but Performance, Reliability, Setup, and Support should have been addressed.
Also, their methodology does not describe what is considered under the Feature category. Many would say that Interoperability and Setup are features. Curiously, only Alfresco warranted a 10 for Features and I seriously doubt that anyone, especially the developers at Alfresco, would claim that their feature set is 100% complete.
They also never explain how they arrived at an 86-80-70-60-50 grading curve when one expects 90-80-70-etc. On top of that they use rating names to bin results, thus disguising the numerical results (still my favorite complaint against school grading of A-B-C-D-F).
Finally, InfoWorld never explains the rationale behind their weighting of categories (25-25-15-15-10-10). If one doesn't weight scores (or uses a uniform 16.7% across 6 categories), Alfresco scores 9.0 and Plone comes in at 8.7, which puts them both in the InfoWorld "Excellent" bin. The weighting clearly doubles the Alfresco "gap," making it appear a clear leader, and moves Plone just barely into the "Very Good" rating.

Armed with this critic we can play all sorts of statistical games. (Remember that Mark Twain said that there are lies, damn lies, and statistics.)

Using equal weights, applying a 90-80-70 scale, and giving Alfresco a more realistic 9.5 for Features, we find that they then are only "Very Good."
Flipping the weights over (10-10-15-15-25-25) puts Alfresco and Plone in a dead heat (8.85 vs 8.75).

Another trick is to play with graphs. Here's the InfoWorld data done as a default Excel bar chart. Alfresco looks far out in front.

But here's a more accurate bar chart with a properly scaled y-axis. What is significant now?

I'd like to wrap up this long posting with a tip o' the hat to the comments by Amy and Bryan at the bottom of the CMS Report posting on the InfoWorld article. Amy from OpenSource.org correctly raises the point that some CMS reviews seem to take a perverse joy in pitting one open-source CMS against another. Bryan rejoins that reviews are both popular and useful as long as everyone stays well behaved.

Here on PloneMetrics, I am an unabashed Plonista. But also I am trying to look at the world with a little more rigor. Tomorrow I'll post the latest on my work to fill in the matrix ala Bullard's method. Stay tuna'd.

Plone Metrics

Saturday, November 24, 2007

The Weighting Game

No comments:

Blog Archive

About Me