19 January 2016 17:27

Software developed at Lancaster shows that many popular new words have passed through social media sites long before going mainstream.

Computer scientists Dr Matthew Rowe and Daniel Kershaw have developed software which allows them to track the source and popularity of new words.

The researchers were able to analyse the data at a level that has not been feasible before, looking at which communities were using certain words, when they were first used and in what context – for example, new words emerging in individual subreddits and regions in the UK on Twitter.

The team analysed two datasets made up of around 7 and 15 million words worth of Twitter and Reddit posts, respectively.

They found that:

  • Binge-watch, Collins’ dictionary’s word of the year in 2015, was first used in the r\netflix, r\doctorwho and r\houseofcards subreddits in early 2013 and has grown in popularity rapidly across more general subreddits since;
  • Manspreading – when a man sits with his legs wide apart on public transport, encroaching on other seats – has its origins in 2015, where there was a spike of usage on Reddit in February, followed by a general decline;
  • Bootyful (beautiful) is one of the top 5 new words on Twitter in South Wales, while cyw (coming your way) is popular in North Wales.

Nationally on Reddit the words lamo (someone possessing the quality of ‘lameness’) and bruh (variation of the slang term bro) have rapidly increased in popularity; while on Twitter, fleek (a word invented by a teenager in Chicago to describe her perfect eyebrows) has also seen a dramatic increase in use in the UK.

Dr Matthew Rowe explained: “Using large-scale social media data, of approximately 184 million Reddit and Twitter posts, we were able to track the growth of words longitudinally and identify key high growth ‘innovative’ terms.

“This provides valuable insights to researchers studying language who wish to know where such terms originate from and how language evolves, and to digital marketing agencies keen to understand what terms will become popular in the future.

“We are now building on this work to model the diffusion of language and to understand the conditions under which people choose to adopt a new term and then begin using it.”

Reddit is an online news and entertainment forum made up of ‘subreddits’ – subject-specific mini communities – where users can post content (text or direct links), as well as comment on and rate articles.

The researchers created a computer programme, based on the same methods lexicographers use to identify the acceptance of new words, which allowed them to study huge datasets from Twitter and Reddit.