I expect that most people reading this have studied statistics at some time. You may have gone no further than calculating the arithmetic mean and standard deviation for an arbitrary list of number, or you may have used statistical methods such as Anova, the Student ‘t’ test and looked for confidence limits – you may even have encroached into analysing non-parametric data. Congratulations on wherever you got to but I’m going to go a slightly different way. I’m putting aside the equations, tables and calculators and consider an associated topic – probability.

Let me be bold and say that I think very few people understand probability – I don’t mean just the detailed theories and ways to calculate it – most folk don’t really understand or appreciate the basic concept. For a start, many think probability is the same as statistics – they’re closely linked but not exactly the same. When a schoolboy hears the term “statistics” he may well smile and think “vital statistics” – as he gets older and has to tackle the calculation of means and standard deviations, the word ceases to bring on the smile. Yet it is useful to return to his earlier concept. Statistics are measures and reports on a population; it’s looking to what is or has already happened; it is retrospective. Probability may use the statistics of the past but it is an attempt to predict the future; it considers what is yet to happen, or might happen – and the key word is *might*.

A probability of 0 (zero) means an event will definitely not happen; a probability of 1 (one) means it definitely will; a probability between 0 and 1 means it might, the closer to 1, the more likely. Probabilities can only be expressed as a positive number in the range from 0 to 1, or from 0% to 100% – a negative value, or a value greater than 1 is meaningless as a probability (in this universe, anyway).

In my experience, students of the life sciences usually have a better understanding of probability than those of the non-life sciences (especially engineering). For the latter, life is very black and white – a construction will stand if it is designed and built correctly, it will collapse or otherwise fail if not. Decisions are binary – yes/no, right/wrong, pass/fail. This approach seems to have worked its way into the general public psyche as well.

When the weather forecast says a 10% chance of rain, many people will be surprised if it rains – 10% is pretty low so they must be quite confident it will remain dry (though that now adds further confusion because the forecast made no mention of confidence – but a discussion for another day). If the Met Office has done its work correctly, it will rain on approximately 10 days out of 100 when they make that prediction. I deliberately haven’t said 1 day out of 10 because that gives an impression of much greater precision (and confidence) than is really there. Throw a die and, unless the die is biased, probability theory tells you there is a 1/6 chance of any chosen number coming up; however, the die doesn’t have a memory and the same theory tells you there is the same 1/6 chance of the same number coming up again – six throws will not inevitably show all six numbers once each – you could throw six consecutive sixes.

An experiment: Toss a coin 100 times; there’s a 50% of it coming down heads on each toss, so you would expect to get 50 “heads” in the 100 attempts. You might get exactly 50 but, if you actually try it, you’re more likely to get something a bit different – you could get 100 heads (unlikely, but not impossible). Run the trial repeatedly and see how the result varies. To avoid a blistered thumb, I simulated this in an Excel spreadsheet and, in 100,000 runs (of 100 coin tosses): well, before I give you the result, write down how often you think I got exactly 50 heads. The answer is below…

.

.

.

.

.

.

My 100,000 trials tossing a coin 100 times gave exactly 50 heads (and exactly 50 tails) just 8% of the time. Were you close? In the long run, of course, we would expect to get very close to getting the same number of heads and tails; in my 10,000,000 coin tosses there were 4,998,942 heads. Close and only 0.02% out – but it demonstrates that even 10 million is not yet “in the long run”. So let’s not be surprised when the weather forecast isn’t always right.

It’s a big topic and has a lot of twists and turns and most people get it wrong (even the experts, at times). I’m certainly not an expert but it’s a subject that fascinates me. That’s as far as I’m going on the subject for now but I’ll be coming back to it quite a lot. However, if I’ve whetted your appetite to read more and not wait for me, I recommend a couple of authors and books (and where the talk is in English and not equations):

Tim Harford (aka The Financial Times’ Undercover Economist) is the presenter of the BBC Radio4 programme “More or Less”; he’s the author of “The Undercover Economist”, “The Logic of Life” and “Adapt: Why Success Always Starts with Failure”. As an economist, Tim starts from an economist’s viewpoint but manages to apply the principles he’s learned to many other walks of life; he also does it primarily from a British viewpoint (although he doesn’t confine his examples to the UK). Steven D. Levitt does the same from a USA viewpoint in “Freakonomics” and Superfreakonomics”.

*(Posted as a blog on 18th September 2011)*