25 September 2017 12:50

Language experts at Lancaster University and Cambridge University Press have today published the largest ever public collection of transcribed British conversations, totalling 11.5 million words of spontaneous British English collected between 2012 and 2016.

The study has revealed that use of the word ‘like’ at the beginning of sentences has risen substantially in the last few decades, from 160 per million sentences in the 1990s to 625 per million in the 2010s.

Use of the split infinitive, as in the infamous Star Trek line ‘To boldly go’, has almost tripled over the last three decades. This grammatical construction sees the word ‘to’ and the verb broken up by an intervening word, usually an adverb.

Linguists working on the project found the split infinitive had risen from a mere 44 words per million in the early 1990s to a staggering 117 words per million in the 2010s, with common examples including ‘to just go’, ‘to actually get’ and ‘to really want’.

The split infinitive and the word ‘like’ at the beginning of sentences are just two examples of language that is becoming a normal part of speech. Researchers on the Spoken British National Corpus 2014 project have previously identified that words such as ‘marvellous’ and ‘marmalade’ were out of fashion and that newcomers such as ‘awesome’ and ‘massively’ were bang on trend.

The recordings used for the project were carried out between 2012—2016. They were gathered by members of the British public, who used their smartphones to record everyday conversations with their families and friends. These included: a newlywed couple reminiscing about their recent honeymoon, students drinking in their halls, a father and daughter chatting in the car and grandparents visiting family for the day.

In a landmark moment for social science, the anonymised transcripts of these recordings have today (Monday, 25 September) been released, free of charge, to the public. This is the largest collection or ‘corpus’ of British English conversations ever made freely available.

The creators, including Lancaster University’s Professor Tony McEnery and Cambridge University Press’s Dr Claire Dembry, intend for the transcripts to be used by linguists and language educators around the world.

These conversations will help linguists to understand what influences language change over short periods of time, as well as how best to teach learners of English.

Professor McEnery, who set up the research project, said: “The launch of the Spoken British National Corpus 2014 is an important moment for the study of spoken English. Never before has it been possible to compare millions of words of spoken English across decades in this way. This will help linguists to understand better the changing nature of English speech and help a new generation of learners of English in the modern world.”

Principal Research Manager at Cambridge University Press, Dr Dembry, highlighted the importance of keeping up with language change. She said: “Learners of English deserve to be taught in a way which is informed by the most up to date research into how the language is used in the real world.

“The rise of the split infinitive is just one example of language phenomena which some commentators might not like, but which are becoming a normal part of everyday speech. Language teaching should reflect these changes, which can only be observed in a corpus such as this.”

The corpus will also make it possible to compare how different social groups talk, including men vs. women, young vs. old and north vs. south, as well as to study how the British public discusses topics including politics, religion, immigration and the economy.

The Spoken British National Corpus 2014 was gathered by the ESRC-funded Centre for Corpus Approaches to Social Science (CASS) at Lancaster University and Cambridge University Press.

To access the corpus, please visit www.corpora.lancs.ac.uk/bnc2014.