Effect Sizes


Null Hypothesis Significance Testing (NHST)

When you read an empirical paper, the first question you should ask is 'how important is the effect obtained'. When carrying out research we collect data, carry out some form of statistical analysis on the data (for example, a t-test or ANOVA) which gives us a value known as a test statistic. This test statistic is then compared to a known distribution of values of that statistic that enables us to work out how likely it is to get the value we have if there were no effect in the population (i.e., if the null hypothesis were true). If it is very unlikely that we would get a test statistic of the magnitude we have (typically, if the probability of getting the observed test statistic is less than .05) then we attribute this unlikely event to an effect in our data . We say the effect is 'statistically significant'. This is known as Null Hypothesis Significance Testing (NHST for short).

NHST is used throughout psychology (and most other sciences) and is what you have been taught for the past 2 courses. It may, therefore, surprise you to know that it is a deeply flawed process for many reasons. Here are what some much respected statistics experts have to say about NHST.  Schmidt & Hunter (2002): "Significance testing almost invariably retards the search for knowledge by producing false conclusions about research literature" (p. 65). "Significance tests are a disastrous method for testing hypotheses" (p. 65) Meehl (1978): "The almost universal reliance on merely refuting the null hypothesis is a terrible mistake, is basically unsound, poor scientific strategy, and one of the worst things that ever happened in the history of psychology" (p. 817). Cohen (1994): 'NHST; I resisted the temptation to call it Statistical Hypothesis Inference Testing". (p. 997).

Reason 1: NHST is Misunderstood Many social scientists (not just students) misunderstand what the p value in NHST actually represents. If I were to ask you what p actually means which answer would you pick:  

a) p is the probability that the results are due to chance, the probability that the null hypothesis (HO) is true.  

b) p is the probability that the results are not due to chance, the probability that the null hypothesis (HO) is false.  

c) p is the probability of observing results as extreme (or more) as observed, if the null hypothesis (HO) is true.  

d) p is the probability that the results would be replicated if the experiment was conducted a second time.  

e) None of these

Someone did actually ask undergraduates this question on a questionnaire and 80% chose (a) although the correct answer is ...