The power of prediction

By Dr. Stephen Gorard. Dr. Gorard, author of How to Make Sense of Statistics, was a Methodspace Mentor in Residence in 2021.


Use code SAGE30 for a 30% discount through the end of 2021

Use the code MSPACEQ423 for a 20% discount

Imagine a stage ‘magician’ who takes a standard pack of playing cards, shuffles them and asks a member of the audience to pick one card. The magician asks to see the card, and correctly identifies it as the 8 of hearts. The magician can do this repeatedly, each time correctly identifying the selected card after the member of the audience has revealed it. This would be a very poor stage act, with no surprises for the audience. A much more impressive act would be one where the magician correctly identified the card before it was chosen by the member of the audience. It is important for analysts to envisage and recall their reaction to the first version of the act in order to contrast it with their reaction to the better version. The two acts are very different. The first is trivial and meaningless; the second would be impressive (and perhaps more or less convincing depending upon a range of other performance factors). The same contrasting reactions should occur to social science researchers when reading data analyses. Only some numeric analyses check whether a predicted result is in evidence. These are intrinsically much more impressive than most published analyses which simply report what was found, after it has been found.

Dredging

All of the standard analytical techniques used by statisticians, with all of their failings, were clearly devised for testing pre-specified ideas or ‘hypotheses’. The testing would be one-off in the sense that one prediction was made and found to be either supported or not supported by the data collected subsequently. All of the probability calculations made by statisticians, and by software like SPSS, are based mathematically on this predictive situation. Once data has been collected, there are no real probabilities. After all, the probability of rolling a 3 followed by a 4 with an unbiased die is 1/36 beforehand but always 1 after it has occurred. Predicting that a die would produce a 3 then a 4 would be impressive, whatever it meant.  The low probability of 1/36 might offer support or evidence for whatever reasoning was used to make the prediction. But rolling a 3 followed by a 4 first, and then saying that the probability of this was low (1/36 is less than the 0.05 used in the obsolete idea of significance testing), and that therefore the die is likely to be biased is clearly nonsense. Yet this is what most researchers do, whether they work with numbers or not.

Irrespective of method of data collection

Almost exactly the same comments apply however data is collected and whatever form it takes. In information theory, ‘information’ is evidence or data that reduces uncertainty about something. A tautology such as ‘either today is Tuesday or it is not Tuesday’ reduces no uncertainty and so the statement contains no information. The more uncertainty a fact reduces the more informative it is. But this information comes within a structure. If you want to know whether today is Tuesday then being told that it is raining is not very informative. Being told that it is either Tuesday or Wednesday is more informative, and being told that it is not Tuesday is even more informative again (as long as it is true). Information comes in many forms such as words, numbers and primary sense data. The ridiculous notion of ‘quantitative’ and ‘qualitative’ research makes no difference here. Prediction, or a hypothesis, provides the uncertainty that provides one context for future research evidence. Without that context, data of any kind is not really information at all. Testing a prediction from a theory is a context, and it amounts to a test of the theory.

The role of theory

A theory is a possible explanation for a set of observations, or a proposed mechanism for how something works. A good theory must explain existing observations but it must also yield accurate predictions about things not yet observed. A theory is only any good if it predicts the future (just like a good stage ‘magician’ appears to). A great theory makes totally unexpected predictions that turn out to be true (for the present at least).

Summary

Imagine again how disappointed you would be with a stage magician who saw a playing card and then told the audience what it was. And contrast this with how much more impressive it would be to state what the card would be beforehand. This level of distinction is the one you should also make between those widespread dredged results and descriptions you read so often and the results of a genuine theory- or prediction-driven piece of research.


Stephen Gorard is the author of How to Make Sense of Statistics, Professor of Education and Public Policy, and Director of the Evidence Centre for Education, at Durham University. He is a Fellow of the Academy of Social Sciences, and a member of the Cabinet Office Trials Advice Panel as part of the Prime Minister’s Implementation Unit. His work concerns the robust evaluation of education as a lifelong process. He is author of around 30 other books and over 1,000 other publications. Stephen is currently funded by the British Academy to look at the impact of schooling in India and Pakistan, by the Economic and Research Council to work out how to improve the supply and retention of teachers, and by the Education Endowment Foundation to evaluate the impact of reduced teacher marking in schools. Follow him on Twitter @SGorard.


More Methodspace Posts about Data Analysis

Previous
Previous

What is randomness?

Next
Next

Part Two: Equity Approaches in Quantitative Analysis