"Count what is countable. Measure what is measureable. What is not measureable, make measureable." -- Galileo

Thursday, November 29, 2007

Regression Coefficients

I shared the graph of Plone site development at Sandia with a coworker and she asked what the numbers in the upper right-hand corner meant. So I figured that might be a common question and it would be worthwhile to discuss it here in today's posting.

The R-squared is the regression coefficient, which describes how much of the variability (the "wiggliness" among the data points) is explained by the linear equation. In this case, the coefficient is over 98%, meaning that the line is significant (>95%) but not highly significant (>99%). 98% of the "noise" in the data can be explained by the linear trend.

The linear equation y = 0.0269 x - 1031.5 simply defines the predicted best-fit straight line as having a slope of 0.0269 sites/day (about 1 site every 37 days). There is a non-zero x-intercept (where the line hits the x-axis, that is, has a y-value of 0) because we started our data with the first Plone site on day zero. That means that zero Plone sites is somewhere to the left of the origin.

