Factor Analysis

The following is an example of factor analysis taken from Biber (1993).

Biber wanted to identify word senses and uses from large number of concordance lines extracted from corpora, but found that collocational techniques such as mutual information did not enable him to identify relationships between different collocations of the same word in order to identify its different senses. Thus, mutual information might identify riding, cowboy, disk and PC as significant collocates of boot but could not tell us that riding and cowboy were from a group representative of footwear, while disk and PC were from a group representative of computers. To overcome this problem, Biber suggests the use of factor analysis. Here we'll consider one of his examples - that of the word right.

Biber counted the frequencies of all left-hand and right-hand collocates of right in the Longman-Lancaster corpus and selected those which occurred more than 30 times as being the most important collocates. He then counted the frequencies of the collocations in each of these words with right in texts longer than 20,000 words in his corpus. He thus constructed a cross-tabulation from this data from which he computed an intercorrelation matrix. This was then factor analysed.

The factor analysis suggested that four factors best accounted for the data in the original table. Each item (collocation) received a loading on each factor which signified its contribution to that factor. So by looking down the list of loadings, it was therefore possible to see which items recieved the highest loadings on each factor and hence which were most characteristic of those individual factors. Biber was able to see that each factor appeared to represent a different usage of the word right.

Factor 1 gave high loadings to collocations such as right hemisphere, right sided, right hander and so on. This factor thus appeared to identify the locational sense of right.
Factor 2 gave high loadings to collocations such as right now, right away and right here, thus identifying the sense of immediately or exactly.
Factor 3 had high loadings for collocations such as that's right, you're right and not right indicating the sense of correct.
Finally, Factor 4 appeared to mark a somewhat less clearly defined stylistic usage of right at the end of a clause.

Hopefully, this example shows how factor analysis can take a large number of variables (such as the different collocations in the example) and reduce them to a much smaller number of reference factors, with loadings indicating the degree of association of each variable with each factor.