4 February 2014 09:17

As Facebook reaches its tenth birthday today it is a good time to reflect on how this social networking site rose to world domination, with 1.3 billion active users as of this year.

I like to think that I have been there from the start, or rather close to it: I began using Facebook when it was restricted to only those with university email addresses. At the time (circa 2005) the market for social networking sites was beginning to grow; sites such as MySpace and Bebo (remember them?) were becoming popular, and photo-sharing on Flickr and video-blogging on YouTube were the de facto resources for content sharing. Facebook became the dominant force in social networking because it offered two things that other sites did not: (i) an enhanced user experience that left the user feeling like they were actually using an application rather than a web site; and (ii) an online identity for users, which could be used around the Web – this is now used as a ‘sign-in’ mechanism for many web sites.

The provision of online identity-building functionality formed a core piece of my Ph.D. research (undertaken at the University of Sheffield). I was interested in how personal information was being spread around the Web and the damage that this could cause – e.g. identify theft, lateral surveillance. Therefore I explored machine learning techniques that could automatically tell if a web page contained a user’s personal information, or not – I termed such techniques ‘disambiguation techniques’. One problem with these approaches, however, was that they needed to be told what to look for. I realised that data mined from social web platforms, including Facebook, could be used to train the disambiguation techniques. This resulted in an effective and highly accurate, let alone scalable, approach to detect pages that contained personal information. What this showed was that users were sharing their personal, and thus identifiable, information on Facebook which could then be used to detect sites elsewhere that contained that information. In essence, users were constructing reflections of their true (offline) identities in an online environment.

Since the time of my Ph.D. work, Facebook has continually strived to update and improve its platform: the addition of 'like' buttons has allowed web site owners to embed buttons on their site’s pages which users could then click and thus specify to their friends that they have an interest in a given page. This page could be about a movie, a music artist, and in a bizarre twist health conditions - not joking! Web site owners would specify within the web page some basic machine-readable tags that would feed back to Facebook information about what the user had ‘liked’, Facebook could then ‘understand’ what the user had liked and update the user’s profile with that information. The harvesting of web content through 'like' buttons has been a major source of revenue for Facebook: in essence, every time you like something you are enhancing the profile that you have on the site. This in turn means that you are easier to market to by advertisers, as they know your interests - both from defining these on Facebook and pages browsed on the web.

The current problem that Facebook is addressing is making sense of the huge volume of posts and links that are shared on the platform every day. When we communicate with one another we express information that can be used to qualify the semantics of our relationships – i.e. if I often talk to one of my friends about dystopian literature then we share an affinity for that genre of literature. If you are human then reading and understanding what users are communicating to one another is, generally, a straightforward process and we can therefore infer relationship semantics. However given the volume of interactions that occur on Facebook, using humans to parse and decipher every interaction is impossible, therefore machines must be used.

Facebook has recently recruited several high-profile machine learning researchers to work on 'deep learning' problems. The aim here is to build automated techniques that can parse what you are communicating with other users and infer the semantic affinity that you share with your peers. Should they achieve this, then content, page, and thus product recommendations would be enhanced on the site: if you can tap into and understand the rich interaction graph and get one user who communicates a lot about a given topic to endorse a product from that topic then this will spread through their network. Secondly, it would also lead to better information management by filtering the news feed to show content that a user is most likely to engage with – i.e. by understanding their behaviour and the topical affinity they share with other users. This is only possible through machine intelligence, something that Facebook sees as a real possibility and their next challenge.

What do you think? Share your comments with us below.

Dr Matthew Rowe teaches on our BSc Computer Science programme.


The opinions expressed by our bloggers and those providing comments are personal, and may not necessarily reflect the opinions of Lancaster University. Responsibility for the accuracy of any of the information contained within blog posts belongs to the blogger.