| Regression Analysis |
An Introduction to Regression
In regression analysis we fit a predictive model to our
data: we use a model to predict values of the dependent variable (DV) from one
or more independent variables (IVs). Simple regression seeks to predict an
outcome from a single predictor whereas multiple regression seeks to predict an
outcome from several predictors. This is an incredibly useful tool because it
allows us to go a step beyond the data that we actually possess. The model that
we fit to our data is a linear one and can be imagined by trying to summarize a
data set with a straight line. With any data set there are a number of lines
that could be used to summarize the general trend and so we need a way to decide
which of many possible lines to choose. For the sake of drawing accurate
conclusions we want to fit a model that best describes the data. The simplest
way to fit a line is to use your eye to gauge a line that looks as though it
summarizes the data well. However, the "eyeball" method is very subjective and
so offers no assurance that the model is the best one that could have been
chosen. Instead, we use a mathematical technique called the method of least
squares to find the line that best describes the data collected.
Some Important Information about Straight Lines
Any straight line can be drawn if you know: (1) the slope
(or gradient) of the line, and (2) the point at which the line crosses the
vertical axis of the graph (the intercept of the
line). The equation of a straight line is defined in equation (1), in which
Y is the outcome variable that we want to predict and Xi
is the ith subject's score on the predictor variable. b1
is the gradient and b0 is the intercept of the straight line
fitted to the data. There is a residual term, εi,
which represents the difference between the score predicted by the line for
subject i and the score that subject i actually obtained. The
equation is often conceptualized without this residual term (so, ignore it if
it's upsetting you); however, it is worth knowing that
this term represents the fact our model will not fit perfectly the data
collected.
Yi = b0 + b1 Xi +
εi
A particular line has a specific intercept and
gradient. Figure 1 shows a set of lines that have the same intercept but
different gradients, and a set of lines that have the same gradient but
different intercepts. Figure 1 also illustrates another useful point: that the
gradient of the line tells us something about the nature of the relationship
being described: a line that has a gradient with a positive value describes a
positive relationship, whereas a line with a negative gradient describes a
negative relationship. So, if you look at the graph in Figure 1 in which the
gradients differ but the intercepts are the same, then the thicker line
describes a positive relationship whereas the thinner line
describes a negative relationship.
If it is possible to describe a line knowing only the gradient and the intercept of that line, then the model that we fit to our data in linear regression (a straight line) can also be described mathematically by equation (1). With regression we strive to find the line that best describes the data collected, then estimate the gradient and intercept of that line. Having defined these values, we can insert different values of our predictor variable into the model to estimate the value of the outcome variable.
