"Count what is countable. Measure what is measureable. What is not measureable, make measureable." -- Galileo

Sunday, January 12, 2014

Trends and Google Trends

I thought I'd take a shot at keeping my New Solar Orbit Resolution and get out the January edition today.  With this item I'm one twelfth of the way there.  I thought I'd discuss a curve I see when I look at the term "Plone" through the lens of Google Trends.
It's positively or right skewed, meaning that the mode (the high point) is far to the left of the mean. One might take this to be an ominous sign of a software product sliding gently towards oblivion.  However, this is not the case.  For example, the Plone curve has almost the exact same shape as that of another term.  Go ahead, take a guess before you read the next paragraph. 

The above is the Google Trend curve for Apache2.  No one is going to claim that Apache is sliding into oblivion.

What's interesting is that Google Trends now makes discovery of these correlations very simple.  The GT correlate page let's you enter a term and the system provides an ordered list of similar search terms.  Typically they have correlation coefficients above 95%, often close to 99%.  Enter "Plone" and this is one of the results:
This gives us a window into a number of other comparisons.  Here's LifeRay and Drupal.

Have they peaked and now are entering their golden years as faded has-been CMS's?

Here's WordPress compared with Google Analytics.  Even with the huge spike in 2006 for the rollout of GA, the correlation coefficient is on the order of 98%, something statisticians would give their eye teeth to see in experimental results. 
If one squints, it's possible to see that WP and GA have plateaued and may even begun to decline in their Google Trends scores.  Quick, sell your WordPress stock!  ;-)

Back to our GT graphs, here's the plot for Unix:

The initial upswing and peak almost certainly took place years if not decades ago and is simply truncated by the lack of Google history.  Is this graph showing us Unix dying with a whimper instead of a bang?

What is going on here?  All these Google Trends are showing a positively skewed distribution.  Several probability distributions exhibit these characteristics.  Candidates include the gamma distribution (this might be a subset, the Erlang distribution) and the log-normal distribution.  Log-normal distributions are maximum entropy probability distributions and that hints at what the underlying phenomenon is and why Unix, Apache, and Plone aren't soon going extinct.

Entropy when applied to Web phenomena is a measure of "buzz."  But here we are looking at the use of search terms, not website visits or software downloads or installations.  Once someone has found plone.org, there's rarely a need to search for it again.  Over time, everyone who is interested in a topic, discovers that topic's key online resources.  Thereafter, one rarely needs to repeat a search.  The result is steadily diminishing search volume, which doesn't mean that interest is waning.  Rather, it means that the interested population is being fully reached.  The steady but low-volume long tail of the GT graphs probably represents the entry rate of newcomers to a particular community as they search and discover the key resources for that domain.  Everyone else has long ago bookmarked Plone.org.