Correlation (Bivariate or Zero-Order)
We are going to start by analyzing some data regarding undergraduate exam performance. The data for several examples are stored in a single file called ExamAnx.sav. If you open this data file you will see that these data are laid out in the spreadsheet as separate columns and that gender has and have been coded appropriately.

Preliminary Analysis of the Data: the Scatterplot
Before conducting any kind of correlational analysis it is essential to plot a scatterplot and look at the shape of your data. A scatterplot is simply a graph that displays each subject’s scores on two variables (or three variables if you do a 3-D scatterplot). A scatterplot can tell you a number of things about your data such as whether there seems to be a relationship between the variables, what kind of relationship it might be and whether there are any cases that are markedly different from the others. A case that differs substantially from the general trend of the data is known as an outlier and if there are such cases in your data they can severely bias the correlation coefficient. Therefore, we can use a scatterplot to show us if any data points are grossly incongruent with the rest of the data set. Drawing a scatterplot using SPSS is dead easy. Simply use the menus as follows: Graphs-->Scatter …. This activates the dialogue box in the figure, which in turn gives you four options for the different types of scatterplot available. By default a simple scatterplot is selected as is shown by the black rim around the picture. If you wish to draw a different scatterplot then move the on-screen arrow over one of the other pictures and click with the left button of the mouse.
When you have selected a scatterplot click on .
                                  
Simple scatterplots are used to look at just two variables. For example, a psychologist was interested in the effects of exam stress on exam performance. So, she devised and validated a questionnaire to assess state anxiety relating to exams (called the Exam Anxiety Questionnaire, or EAQ). This scale produced a measure of anxiety scored out of 100. Anxiety was measured before an exam, and the percentage mark of each student on the exam was used to assess the exam performance. Before seeing if these variables were correlated, the psychologist would draw a scatterplot of the two variables (her data are in the file ExamAnx.sav and you should have this file loaded into SPSS). To plot these two variables you can leave the default setting of simple in the main scatterplot dialogue box and click on . This process brings up another dialogue box, which is shown in the figure. In this dialogue box all of the variables in the spreadsheet are displayed on the left-hand side and there are several empty spaces on the right hand side. You simply click on a variable from the list on the left and move it to the appropriate box by using one of the buttons.
             
  Y Axis: Specify the variable that you wish to be plotted on the y axis (ordinate) of the graph. This should be the
dependent variable, which in this case is exam performance. Use the mouse to select exam from the list (which will become highlighted) and then click on to transfer it to the space under where it says Y Axis.
X Axis: Specify the variable you wish to be plotted on the x axis (abscissa) of the scatterplot. This should be the independent variable, which in this case is anxiety. You can highlight this variable and transfer it to the space underneath where it says X axis. At this stage, the dialogue box should look like the figure.
Set Markers by: You can use a grouping variable to define different categories on the scatterplot (it will display each category in a different color). This function is useful, for example, for looking at the relationship between two variables for different age groups. In the current example, we have data relating to whether the student was male and female, so it might be worth using the variable gender in this option. If you would like to display the male and female data separately on the same graph, then select gender from the list and transfer it to the appropriate space.
Label Cases by: If you have a variable that distinguishes each case, then you can use this function to display that label on the scatterplot. So, you could have the subject’s name, in which case each point on the scatterplot will be labeled with the name of the subject who contributed that data point. In situations where there are lots of data points this function has limited use.
When you have completed these options you can click on , which displays a dialogue box that gives you space to type in a title for the scatterplot. You can also click on , which allows you decide how you want to treat missing values.

The resulting scatterplot is shown in the figure. The scatterplot on your screen will display the male and female data in different colors, but I have replaced the markers with different symbols. The scatterplot shows is that the majority of students suffered from high levels of anxiety (there are very few cases that had anxiety levels below 60). Also, there are no obvious outliers in that most points seem to fall within the vicinity of other points. There also seems to be some general trend in the data such that higher levels of anxiety are associated with lower exam scores and low levels of anxiety are almost always associated with high examination marks. The gender markers show that anxiety seems to affect males and females in the same way (because the · and o symbols are fairly evenly interspersed). Another noticeable trend in these data is that there were no cases having low anxiety and low exam performance — in fact, most of the data are clustered in the upper region of the anxiety scale. Had there been any data points which obviously didn’t fit the general trend of the data then it would be necessary to try to establish if there was a good reason why these subjects responded so differently, and also consider what to do with these outliers. Sometimes outliers are just errors of data entry (i.e., you mistyped a value) and so it is wise to double-check the data in the spreadsheet for that case. If an outlier can’t be explained by incorrect data entry, then it is important to try to establish whether there might be a third variable affecting this person’s score. For example, a student could be experiencing anxiety about something other than the exam and their score on the anxiety questionnaire might have picked up on this anxiety, but it may be specific anxiety about the exam that interferes with performance. Hence, this subject’s unrelated anxiety did not affect their performance. If there is a good reason why a subject responds differently to everyone else then you can consider eliminating that subject from the analysis in the interest of building an accurate model. However, subjects’ data should not be eliminated because they don’t fit with your hypotheses — only if there is a good explanation of why they behaved so oddly.

On to ... Pearson’s Correlation Coefficient