Correlation, Causation: Potato, Tomato

The Ministry Of Silly Bikes Correlation is not causation. How many times have we heard that sentence? Too many times maybe, because we seem to be mithridatized by it. Nowadays, it seems like it’s nothing more than an easy way to discard inconvenient facts. It is true that correlation is not causation, but what does that mean?

Correlation is statistical interdependence. In other words, we talk of correlation when a class of events happens systematically together with another class of events. Correlation can be weak or strong, depending on the statistical strength of the relationship: the more it fails, the weakest the correlation.

Causation is a kind of correlation where one of the events is the cause of the other, which we call the effect. The cause always happens before the effect, but that’s not enough to prove causation. Proving causation is, in fact, tricky business if all you have is statistical data, and arguably that’s all you ever have..

What are the different reasons for correlation then?

First, it can be pure chance, which is something we’ll want to rule out. If you throw coins in the air with both hands and they land on the same side three times in a row, you can talk about correlation, albeit not a very strong one. The good thing is that this is fairly easy to eliminate: it is generally easy to see what results pure chance would give, and to compare that with a large number of events (even if homeopaths haven’t been able to figure it out). While it’s technically possible that freak statistical events happen, they are by definition improbable and a high degree of confidence can be reached that pure chance has been eliminated, by simply repeating the experiment a large number of times.

Second, there can be an underlying common cause. If waves reach both sides of a lake with a good phase correlation, it’s not necessarily that the waves on one side cause the waves on the other side. It’s more likely that the waves have a common cause, such as a rock being thrown into the lake. Experiments could be to create waves on one side and see if the same correlation can be produced, and then, to throw a rock and see what happens.

Third, it can be real, non-causal correlation. For example, the surface of a square is strictly correlated to the length of its sides: if the side doubles, the surface quadruples. The sides are not the cause of the surface any more than the surface is the cause of the sides. We just have two variables that are interdependent. They are correlated without causation.

And finally, correlation actually is, in many cases, the manifestation of a real causal relationship, in one direction or the other. Pushing an object causes it to accelerate, drinking battery acid causes death, smoking causes lung cancer, HPV causes cervical cancer, etc. All of these causal relationships were proven through some form of experimentation but it all starts with noticing correlation.

Next time you notice a correlation, don’t just dismiss it because “correlation is not causation”. Consider it instead as the first step towards discovery. Isolate variables, compare with chance, try to find underlying causes, determine if A causes B or B causes A, experiment. In other words, be scientific about it…


Can God appear in a puff of logic?

Saint AnselmLogic is a tricky thing. Any sound argument must rely on it, but it is easy to build seemingly sound and logical arguments that are still wrong or fail to apply to the real world. Fuzzy or wrong premises, shortcuts in reasoning, as well as plain fallacies such as circular reasoning, are easy to obfuscate, and apologists are kings at this game. It's what they do: take the conclusion they want to reach, and then build the rationalization for it. A prime example of this is the age-old ontological argument for the existence of God, that I will be looking at in details in this post.

The argument is that because we can conceive of a perfect being (defined by the impossibility to improve it), then it must exist for surely existing is better than not existing. Really? We'll see.

But first, let me quote Douglas Adams...

Now it is such a bizarrely improbable coincidence that anything so mindbogglingly useful could have evolved purely by chance that some thinkers have chosen to see it as a final and clinching proof of the non existence of God.
The argument goes something like this: "I refuse to prove that I exist," says God, "for proof denies faith, and without faith I am nothing."
"But," says Man, "the Babel fish is a dead giveaway isn't it? It could not have evolved by chance. It proves that you exist, and so therefore, by your own arguments, you don't. QED."
"Oh dear," says God, "I hadn't thought of that," and promptly disappears in a puff of logic.
"Oh, that was easy," says Man, and for an encore goes on to prove that black is white and gets himself killed on the next zebra crossing.

What Douglas Adams articulates so brilliantly here is that with badly defined premises and "pure logic", you can prove anything and its opposite, and that therefore you can prove nothing. There is no such thing as a puff of logic of course, as puffs are physical, and logic is mathematical, independent of the physical world, and therefore utterly unable to puff. Of course, I could have quoted Hume and Kant to pretty much the same effect, but this is a lot more fun, isn't it?

To drive the point home, let me paraphrase a reverse formulation of the argument I found in the comments of Ambrose's recent post on the subject:

We can conceive of maximal evil, for which one cannot possibly imagine anything more evil. Surely, it must exist, as something maximally evil would be quite benign if it didn't exist, and would assuredly be more evil if it existed. Therefore, it exists.

Oops. Putting empirical credibility aside, it doesn't look any more or less logically sound than the original argument. So where's the flaw?

What most people call "pure logic" is actually much trickier to define than they may think. I learned that in France a little more than 20 years ago when I was preparing the entry contest for college. One of the students in my class was an orthodox Jew, convinced that the world was 6000 years old, but also a genius, who had already explored Mathematics way farther than any of us. What he taught me was that words are not appropriate to do mathematics. One must be absolutely formal in order to avoid talking nonsense. Here is the example he used, also known as Russel's paradox:

A mathematical set is a pretty simple entity, right? It is defined by its elements. OK, so now consider the set of  non-auto-inclusive sets, defined as the set of all sets that do not contain themselves. Well, that set cannot include itself, by definition, because all its elements are non-auto-inclusive. Therefore, it must include itself since it doesn't.

Uh? Yeah, exactly. Mathematics don't have paradoxes, they only have reductio ad absurdum. This so-called paradox only proves that the naïve concept of set we used here is inconsistent. In particular, the notion of a set of all sets can't be rigorously defined, although an English formulation of it seems to present no challenge. This is known as the naïve set theory, and it had to be replaced by something much more rigorous, which eventually led to a re-foundation of all of Mathematics by the Bourbaki group. This is an eminently modern idea that  Anselm of Canterbury, Kant, Leibniz, Descartes or Plantinga could not possibly have known. We need to apply formal logic in order to determine what in the ontological argument is valid formal logic and what constitutes its premises and hidden assumptions.

Several people have done exactly that with varying success, but the attempt that I find the most interesting consisted in feeding the argument into a computer algorithm that automatically proves mathematical theorems. If that wasn't awesome enough, the good news is that the algorithm not only showed the logical soundness of the argument, it was actually able to simplify it and reduce the assumptions to a single one. The bad news is that this remaining assumption is not trivial. Here it is:

If the conceivable thing than which nothing greater
is conceivable fails to exist, then something greater than it is conceivable.

Makes sense? Suffice it to say that this still needs independent justification that cannot be reduced to formal logic. Back to square one are we? You can still argue one way or the other, but you are outside of the realm of logic doing so, which pretty much means that the argument, while quite subtle and logically sound, is not a complete proof of the existence of God.

Before I conclude this post, I'd like to point out that such attempts to make God appear in a puff of logic are not only doomed logically, they also constitute poor theology (assuming for a second there is such a thing as good theology). For really, doesn't it degrade the idea of God to reduce it to something that can be described and constrained by mathematical expressions? Doesn't that bring him down to the realm of the natural?


Do not reward luck

Dead monkeyHere is a little experiment. I have built a bunch of programmed agents that are using a variety of strategies to try and predict the outcome of a randomized event. The event in question is the roll of a dice. The twist is that the probabilities of all sides of the dice are not equal: there is a distribution of probabilities that is itself decided randomly before the experiment.

Here is a table showing how well each of the agents did on a hundred throws of that unfair dice:

  • Agent 1: 18
  • Agent 2: 22
  • Agent 3: 20
  • Agent 4: 10
  • Agent 5: 14

Assuming you can't make any further tests, which agent would you hire to predict future throws?

The correct answer is none of them: you just don't have enough data. So let's throw the dice an additional thousand times:

  • Agent 1: 180
  • Agent 2: 188
  • Agent 3: 176
  • Agent 4: 136
  • Agent 5: 168

What can we notice here? Agent 2 still seems to be doing pretty well, and agent 4 is still doing poorly, might you say. Well, that's true but irrelevant.

What you should be noticing is that the average probability of a hit over all the agents and over all the throws we've made is 0.169. But wait a minute. The probability of a completely random number between 1 and 6 is 0.167. That's pretty close. Just way too close to be a coincidence, as additional data would confirm.

You should by now have understood that I lied: that dice is not weighed at all, the numbers are as close to random as I know how to make (I used crypto-random numbers).

Our five agents do have various strategies (ranging from always picking the same number to picking the number that came out most often in the past) but the point is that it doesn't matter. There is no way to predict a random phenomenon (otherwise it's not random). No strategy works. None ever will. They are all equivalent to chance.

Now what am I getting at? The main lessons we can extract from this are the following:

  1. You can usually determine with a good level of certainty whether a phenomenon is random by confronting its statistics with something you know to be random (it's sometimes trickier than that but is mostly reliable).
  2. Luck in the past is not an indicator of luck in the future. Do not reward it.
  3. In order to distinguish luck from talent, you need to determine first whether the domain where they apply is predictable and only once that's been shown, to consider previous results of the candidate.

To conclude, I'll leave you with this thought. What profession rewards its members with extravagant bonuses whereas it's been shown that dart-throwing monkeys were consistently doing better than any of them in the long run?


“One cannot prove a universal negative” Oh really?

The Mad Hatter stuffing a teapotThis is a claim I've read so many times in comments that I think it deserves a little debunking. If you do a search on that little sentence, you'll see that it's very rarely if ever used in a scientific context but is repeated like a mantra by religious apologists. They seem to be persuaded that it is an established rule of logic.

Let's get it out of the way: it isn't. Here is a counter-example:

No even number that is larger than two is prime.

Done. I hope you'll agree that the proof to this is trivial.

One can prove a universal negative. Declaring otherwise pretty much constitutes a logical fallacy in itself. A universal negative poses no logical challenge whatsoever.

Now this of course applies to mathematics. When it comes to empirical truth, the challenge is quite different: one doesn't deal with proof whatsoever. Instead, you deal with evidence, no amount of which is ever equivalent to proof. Some assertions can be backed with more or less lines of evidence of various quality, which does make some assertions more valuable than others. But proof? Nope. Never. No big deal, too.

Apologists who resort to this false argument might as well say that science cannot prove anything. No it can't. Nor does it ever claim to.

But the absence of proof for not A does not mean that A, let alone B is true. The absence of proof for the non-existence of God does not mean that the Abrahamic God, or Zeus, or the Flying Spaghetti Monster exists. The absence of proof for the absence of a teapot orbiting the Sun between the Earth and Mars does not mean that there is a red teapot there.

Believe what you will. But if you are going to argue for your belief in scientific-sounding terms, or if you make claims that overlap with science (which all religions do), be prepared, and avoid making stuff up. Oh wait...


*theism: how many gods are there?

quetzalcoatlMost of the debate tends to be around theism versus atheism. But there is so much more! Let's review the full set of hypotheses:

- x = 0: atheists think there is no god.
- 0 ≤ x ≤ Infinity: agnostics think there may be between zero and an infinity of gods. Interestingly, the set of natural numbers plus infinity is called "supernatural numbers".
- x = 1: monotheists think there is one God.
- x ∈ ℕ*, x > 1: polytheists think there is more than one god.
- x ∈ ℚ: in some polytheist religions, gods can procreate with humans, which gives demigods. If demigods then procreate with humans, does that make quartergods? This is of course assuming the divinity of humans is zero. ℚ is called the set of rational numbers, which doesn't make this position especially more rational than the others...

So where do I stand? I think x ∈ ℂ: there is a number of imaginary gods. I guess that makes me a complexotheist.