Correlation, Causation: Potato, Tomato

The Ministry Of Silly Bikes Correlation is not causation. How many times have we heard that sentence? Too many times maybe, because we seem to be mithridatized by it. Nowadays, it seems like it’s nothing more than an easy way to discard inconvenient facts. It is true that correlation is not causation, but what does that mean?

Correlation is statistical interdependence. In other words, we talk of correlation when a class of events happens systematically together with another class of events. Correlation can be weak or strong, depending on the statistical strength of the relationship: the more it fails, the weakest the correlation.

Causation is a kind of correlation where one of the events is the cause of the other, which we call the effect. The cause always happens before the effect, but that’s not enough to prove causation. Proving causation is, in fact, tricky business if all you have is statistical data, and arguably that’s all you ever have..

What are the different reasons for correlation then?

First, it can be pure chance, which is something we’ll want to rule out. If you throw coins in the air with both hands and they land on the same side three times in a row, you can talk about correlation, albeit not a very strong one. The good thing is that this is fairly easy to eliminate: it is generally easy to see what results pure chance would give, and to compare that with a large number of events (even if homeopaths haven’t been able to figure it out). While it’s technically possible that freak statistical events happen, they are by definition improbable and a high degree of confidence can be reached that pure chance has been eliminated, by simply repeating the experiment a large number of times.

Second, there can be an underlying common cause. If waves reach both sides of a lake with a good phase correlation, it’s not necessarily that the waves on one side cause the waves on the other side. It’s more likely that the waves have a common cause, such as a rock being thrown into the lake. Experiments could be to create waves on one side and see if the same correlation can be produced, and then, to throw a rock and see what happens.

Third, it can be real, non-causal correlation. For example, the surface of a square is strictly correlated to the length of its sides: if the side doubles, the surface quadruples. The sides are not the cause of the surface any more than the surface is the cause of the sides. We just have two variables that are interdependent. They are correlated without causation.

And finally, correlation actually is, in many cases, the manifestation of a real causal relationship, in one direction or the other. Pushing an object causes it to accelerate, drinking battery acid causes death, smoking causes lung cancer, HPV causes cervical cancer, etc. All of these causal relationships were proven through some form of experimentation but it all starts with noticing correlation.

Next time you notice a correlation, don’t just dismiss it because “correlation is not causation”. Consider it instead as the first step towards discovery. Isolate variables, compare with chance, try to find underlying causes, determine if A causes B or B causes A, experiment. In other words, be scientific about it…