| Factor Analysis - Example & Additional Considerations |
It has already been noted that factor analysis is primarily used for purposes of grouping variables. The procedure has three possible purposes in this regard:
| !
One purpose is to show the structure underlying a large
number of variables. This provides insight into the data. For example, a
factor analysis of V6 to V16 in the Sportdat data will show whether there
are eleven different dimensions of store choice or a smaller number of
underlying dimensions. It will also illuminate the nature of individual
variables by showing how they fit into the bigger picture.
! A second purpose is to simplify discussion of the data. For example, instead of talking to managers about eleven different store choice variables, it might be better to focus on four broad factors. ! A third purpose is to create new, combined variables for use in other analysis procedures. For example, multicollinearity (correlation among the independent variables) can cause problems in analysis procedures such as multiple regression and discriminant analysis. One way of reducing multicollinearity is to:
There are two ways of combining related variables into broader scales. One method is to save factor scores from the factor analysis, then use the factors as variables in subsequent analyses. Another method is simply to add together the variables that are found to be related (i.e., SCALE = V1 + V2 +. . . + Vk). This latter approach often produces more reliable results because factor scores may be unstable. |
In addition to being used to group variables, factor analysis can be used to group respondents who share similar response patterns across some set of variables. A factor analysis used for this purpose is called a Q-type factor analysis. The usual goal of Q-type factor analysis is the identification of market segments.
Q-type factor analysis does not differ from regular factor analysis in its basic method or interpretation. However, since Q-type factor analysis requires that the data be analyzed across rows of the data set (the observations) instead of across columns (the variables), it constitutes a special option that is not available in all software packages.
| An Example... |
What follows are the results of a factor analysis applied to V6 to V16 in the Sportdat data. As with the analyses we've seen elsewhere, this output was generated with SPSS for Windows. The overview of the dataset can be found here.

| The Commands |
From the Statistics menu, choose Data Reduction and then Factor. Move variables V6 through V16 to "Variables:". Then click the Rotation... button.

Next, indicate that you want to use "Varimax". Then click Continue.

Finally, under Extraction, choose Principle components (as your Method), Analyze Correlation matrix, and check under Display both Unrotated factor solution and Scree plot. Then click Continue and Okay.

| The Results |
First, the results are obtained from a principal components analysis, which is the most common type of factor analysis. In principal components analysis (PCA), the goal is to develop factors that explain the maximum amount Out of the total variance in the variables being analyzed.
Next comes a listing of factors, eigenvalues for each factor, and variance explained. There are eleven factors because there were eleven variables in the analysis, and a factor analysis will generate as many factors as variables (or as many factors as the number of observations, if that is smaller than the number of variables). The eigenvalue associated with the first factor is 2.16949 (Hint: if you clicked on the cell to expand the number of places behind the decimal), the eigenvalue associated with the second factor is 1.52651, and so on.


The sum of the eigenvalues, which is not shown in the output but can easily be calculated, is 11.00 (the sum of the initial communalities). This represents the total amount of variance to be explained in the eleven variables used in the analysis. The first factor accounts for 2.16949/11.00 = .197 or 19.7% of that total variance. The second factor accounts for 1.52651/11.00 = 13.9 percent of the variance. Together, the first two factors account for a cumulative 19.7% + 13.9% = 33.6% of the variance.
Four factors have eigenvalues larger than 1.0. According to a default decision rule built into the software package, only these four factors (yellow) are retained for further analysis and display. This default could have been overridden if desired.
Collectively, the four retained factors account for only 55.1% of the total variance in the eleven variables. This figure is somewhat lower than desired. It indicates that the variables contain substantial information that is not captured by the four factors.
The scree plot (below) tells us much the same thing.

Next comes the factor matrix, which is the matrix of loadings for the eleven variables on the four retained factors. These loadings show that all of the variables load moderately well on the first factor, with loadings ranging from .29 for V13 to .58 for V7. The second factor has strong negative loadings for V9 and V1O (importance of taking third-party credit cards and having a store credit card) and strong positive loadings for V11 and V12 (importance of merchandise quality and merchandise selection). The third factor has strong negative loadings for V15 and V16 (importance of helpful people and speedy service), and the fourth factor has strong negative loadings for V7 and V8 (importance of everyday prices and sale prices).

Overall, the results shown in this initial factor matrix are not easily interpretable. The first factor is undifferentiated, indicating that all of these variables
move together to some extent. The second factor is a mixture of disparate elements. The third and fourth factors are cleaner, though negative. This type of result -somewhat messy and not easy to interpret- is often obtained from the first stage of a factor analysis.
The purple area above
(i.e., "Extraction" column) shows
the final communalities for the eleven variables. These communalities indicate
the percentage of variance in each variable that is explained by the four
retained factors. They are calculated as the sums of the squared loadings for
each variable across the four factors.
The communality for V6 (importance of location) is .26071, which indicates that the four retained factors account for only 26% of the variance in this variable. The communality for V14 (importance of having merchandise in stock) is even lower, at .15829. The low communalities mean that these variables are not well captured by the four retained factors. If we wish to retain these variables for purposes of interpretation or analysis, then we will need to retain them as individual variables in addition to the factors obtained from the factor analysis.
The next section shows a "rotated" factor matrix. The concept of factor rotation is explained here.
The rotated factors are easier to interpret than the initial factors. The first factor is clearly a price factor, with very high loadings for V7 and V8 (importance of everyday prices and sale prices) and low loadings for all other variables. The second factor is clearly a merchandise factor, with high loadings for V11, V12, and V13 (importance of merchandise quality, merchandise selection, and brand names). The third factor is clearly a credit factor, with high loadings for V9 and V10 (importance of taking credit cards and having a store card). The fourth factor is a service factor, with high loadings for V15 and V16 (importance of helpful people and speedy service). The names given to these factors are a judgment call by the analyst.

Given this rotated factor
matrix, the eleven original variables can be grouped as follows:
|
Keep in mind that this list does not reflect the importance of these factors in determining store choice for any given store. The list simply represents a grouping of the original eleven variables.
| Things to Think About When You Do Factor Analysis |
Issues that arise in doing or interpreting factor analyses include the following:
| !
All of the variables in a factor analysis should be
quantitative in nature, so that correlations are meaningful. Strictly
speaking, this means that the variables should have interval or ratio
scaling. However, as a practical matter, ordinal variables (such as V6 to
V16) and dummy (0-1) variables also produce satisfactory results.
! Factor analysis relies on linear relationships among variables. If some variables are expected to have nonlinear relationships, and you want the analysis to reflect these relationships, then some type of data transformation may be appropriate. In practice, this is seldom done. ! A factor analysis can be either a principal components analysis or a common factor analysis, as discussed earlier in this chapter. A principal components analysis will explain a lower percentage of the available variance because it has been asked to explain unique as well as common variance, but the two methods usually produce similar interpretation patterns in other regards. It is best not to worry about the distinction between these methods. It is best just to run principal components analysis when doing factor analysis; this is the method used in most market research applications of factor analysis. ! Since there is no formal significance test to indicate whether the overall results of a factor analysis are meaningful, it is necessary to make a judgment in this regard. One way of making this judgment is by using a rule of thumb that a meaningful factor analysis should account for at least 70% of the total variance in participating variables. However, this is not a hard-and-fast rule. For example, the factor analysis of V6 to V16 seems meaningful even though the four "significant" factors explain only 55% of the variance. ! Judgment is also necessary to determine the number of factors that should be retained and/or interpreted. A good rule of thumb is to retain only as many factors as have eigenvalues larger than 1.0. However, this approach may fail to account for meaningful items. For example, in the interpretation of the V6 to V16 results, two single-variable factors (for V6 and V14) were added to account for variables not captured in the four "significant" multivariate factors. ! Judgment is also needed to label or interpret factors. As stated earlier, the usual rule of thumb is to ignore variables with loadings less than .50 (in absolute value) and to name the factor based on the variables with high loadings. However, this rule is open to judgment. ! If the purpose of the factor analysis is to simplify the data by reducing the number of variables, there are various ways of using the results. Two options were discussed earlier in this chapter. One option is to use the factors themselves as the new variables. Another option is to use the factor analysis simply to identify variables that should be combined and to combine them with simple summated scales. A third option is to choose one representative variable from each group and simply ignore other variables. In general, summated scales tend to work the best. !The final issue concerns the difference between exploratory factor analysis and confirmatory factor analysis. The focus in this discussion has been on "exploratory" analysis, in which the analyst makes no prior specification of groups and the computer forms the groups according to purely mathematical criteria. In the alternative, "confirmatory" approach, the analyst specifies variable groupings and tests whether this grouping scheme seems to provide adequate fit to the data. Confirmatory factor analysis can be done as a subset of structural equation modeling. In general, confirmatory factor analysis is more appropriate when the purpose of the analysis is to test a theory of multivariate relationships. If the purpose of the analysis is simply to describe or simplify the data, exploratory analysis is appropriate. |
| Oh!, and One More Thing... |
It is almost always useful to rotate the initial results obtained from a factor analysis. The decision between orthogonal and oblique rotations depends on the objectives of the analysis. If the purpose of a factor analysis is to remove multicollinearity from a set of variables and produce uncorrelated factors that can be used in subsequent analyses, orthogonal rotations are the way to go. If the purpose of a factor analysis is to get the sharpest possible definition of factors, oblique rotations are appropriate (though orthogonal rotations may also produce satisfactory results). In SPSS for Windows, the "Varimax" option will produce an orthogonal rotation and the "Direct oblimin" option produces an oblique rotation).
Where applicable, © 2005, David
M. Compton
Last modified: March 26, 2005