Lancaster University Department of Linguistics and Modern English Language
Corpus Linguistics Home
Page index
Basic WordSmith
Using Concord
Frequency Lists and Keywords
Part-of-speech Tags
DIY Corpora
Page One
Page Two
Current page
Page Four

Comparing frequencies for corpora of different sizes


We cannot easily compare the results of the previous exercise, because the sections of the corpora are of different sizes.

A common solution to this problem is to convert each frequency into a value per million words, or per thousand words. This is called normalizing the frequency scores.

Frequency per million words = ( frequency text no. words ) x 1,000,000

Now try filling in the "per million" column of the table, and think about the patterns.

Use the computer's calculator if you don't have your own pocket calculator:

[ Start - Programs - Accessories - Calculator ]