What is randomness?

by Stephen Gorard, PhD. Dr. Gorard is Professor of Education and Public Policy, and Director of the Evidence Centre for Education, at Durham University. He is the author of How to Make Sense of Statistics, and served as a Methodspace Mentor in Residence in 2021.


In statistics and the philosophy of science, “randomness” means something rather more than its everyday meaning of merely haphazard, or without apparent pattern. It is stronger, in the sense that something either is or is not random. There are no degrees of randomness, because a random event is one that is completely unpredictable in form, outcome or timing. Randomness is the quality of such an event – its unpredictability, and lack of intention.

Applied to a set of numbers, randomness means that each number in the set is unrelated to any other. Knowing one or more numbers in the set will not help identify any other numbers in the set. By analogy, if a standard six-sided die roll has a random outcome, then knowing the previous 10, 100 or 1,000 results from that die will not assist you in predicting of the next one. The chances of guessing the result of the next roll correctly remain at 1 in 6, however many rolls have been seen previously. Randomness is the characteristic of chance, as illustrated by a fair die.

There have been various attempts to define randomness more formally than this over time, in terms of unpredictability, the non-computability of random events, and the inability to describe a set of random elements more efficiently than by repeating the entire set.

Randomness and social science

Use code SAGE30 for a 30% discount through the end of 2021

Use the code MSPACEQ323 for a 20% discount

The term ‘random’ is widely used in social science in the context of sampling and statistical analysis. A sample is random if all of the cases in it were selected by chance from a larger set of cases known as the population, and if all of the cases in the population had a genuine chance of being in the sample.  A population itself is clearly not a random sample, and nor is a sample selected by other means (ad hoc, convenience, purposive etc.). A sample selected at random but in which cases cannot be found, do not respond, or are otherwise not recorded is no longer a random sample. And there is no reason to believe that such missing cases are a random subset of the planned sample. Those refusing to take part in a piece of research, for example, can be predicted on the basis of their prior characteristics with more success than chance alone.

None of the statistical techniques predicated on working with a random sample can or should be used with populations, or with samples that are not random or incomplete. This means that significance tests of significance, p-values, standard errors, or confidence intervals should not be calculated or reported with such cases. It also means that anything like these should be ignored in the work of others unless the study is based on a complete random sample. I have never seen a real study that had a complete random sample, which means that such statistical techniques should rarely, if ever, be reported. However, their abuse remains widespread in the literature.  

Real-life samples cannot and should not be used with any technique predicated on random samples. Given that randomness is an ideal not often seen in practice in social science samples, it is most important that the much more common non-random (and incomplete “random”) samples are analysed properly. We need to teach more about the appropriate techniques for handling these real-life samples that are not random, even though these approaches are largely ignored in most statistical texts. What these techniques are will be addressed in future blogs, and are covered in my new book.


More Methodspace Posts about Data Analysis

Previous
Previous

Analyzing Qualitative and/or Quantitative Data

Next
Next

The power of prediction