Null Hypothesis Significance Testing (NHST)
When you read an empirical paper, the first question you should ask is 'how
important is the effect obtained'. When carrying out research we collect data,
carry out some form of statistical analysis on the data (for example, a t-test
or ANOVA) which gives us a value known as a test statistic. This test statistic
is then compared to a known distribution of values of that statistic that
enables us to work out how likely it is to get the value we have if there were
no effect in the population (i.e., if the null hypothesis were true). If it is
very unlikely that we would get a test statistic of the magnitude we have
(typically, if the probability of getting the observed test statistic is less
than .05) then we attribute this unlikely event to an effect in our data . We
say the effect is 'statistically significant'. This is known as Null Hypothesis
Significance Testing (NHST for short).
NHST is used throughout psychology (and most other sciences) and is what you
have been taught for the past 2 courses. It may, therefore, surprise you to know
that it is a deeply flawed process for many reasons. Here are what some much
respected statistics experts have to say about NHST. Schmidt & Hunter
(2002): "Significance testing almost invariably retards the search for knowledge
by producing false conclusions about research literature" (p. 65). "Significance
tests are a disastrous method for testing hypotheses" (p. 65) Meehl (1978): "The
almost universal reliance on merely refuting the null hypothesis is a terrible
mistake, is basically unsound, poor scientific strategy, and one of the worst
things that ever happened in the history of psychology" (p. 817). Cohen (1994):
'NHST; I resisted the temptation to call it Statistical Hypothesis Inference
Testing". (p. 997).Reason 1: NHST is Misunderstood Many social
scientists (not just students) misunderstand what the p value in NHST actually
represents. If I were to ask you what p actually means which answer would you
pick:
a) p is the probability that the results are due to chance,
the probability that the null hypothesis (HO) is true.
b) p is the probability that the results are not due to
chance, the probability that the null hypothesis (HO) is false.
c) p is the probability of observing results as extreme (or
more) as observed, if the null hypothesis (HO) is true.
d) p is the probability that the results would be replicated
if the experiment was conducted a second time.
e) None of these
Someone did actually ask undergraduates this question on a questionnaire
and 80% chose (a) although the correct answer is
...