Data is important. It forms the basis for all of our decisions. When we use it correctly we make good decisions. When we use it incorrectly we make bad decisions. And why do we use it incorrectly? Because of a mixture of bad assumptions, partial data, and apophenia.
Apophenia, a term coined by the German neurologist and psychiatrist Klaus Conrad, is the perception of meaningfulness in unrelated phenomena. We find this in many places, including the perception of patterns in data. Not real patterns, of course, but ones that you think you see. This is easier to do than you might think. Especially when you start off looking to prove a pre-conceived conclusion. Then you can pick and choose the data that matches your pre-held conclusions. This is fun when you look at a cloud and see the shape of a camel. No so much when you look at your business intelligence and see market trends that aren’t really there.
Seeing patterns in your data that don’t really exist can result in false conclusions about correlations. This is known as a “spurious relationship” where two or more variables are not actually causally related to each other, despite having the appearance of being related. In the statistics world, this usually results in a Type 1 error, or false-positive result.
Take a look at http://www.tylervigen.com/ for a good laugh. The site has a number of graphs showing variables that appear to be related based on historical data, but obviously could not possibly be related. My favorite is the clear correlation between the between the number of people who drowned in pools with the number of films Nicholas Cage appeared in.
Should we then ignore correlations? Definitely not. Finding correlations is a very important step in highlighting relationships, but it is only the first step and should be used with caution. A correlation is merely a spotlight on something that deserves attention. It is a valuable pointer to where we should look. Often, we will be able to find an actual cause and effect relationship where no one had thought to look before. But until such a relationship is proven, correlation is merely a guide – an indicator of where to look – and not a hard and fast predictor of future events. Which means that, at least for now, it is probably safe to let Nicholas Cage make more movies. At least for people who are near swimming pools.