In a drive-by-download attack, a user's computer system is infected when visiting a malicious web page (and represents one of the most common attacks employed nowadays). An attacker attempts to lure users to these malicious web pages so that they can hijack their system by exploiting some system vulnerability.

One method used by cyber criminals to attract user traffic to these malicious web pages is by posting the URL of these websites on Twitter. Twitter over the years has emerged as one of the go-to places to get an update on news, current affairs, entertainment news or to get updates on sporting events and celebrity activities. The popularity of Twitter and its inbuilt feature of shortening a URL, due to its 140 character restriction, gives a cybercriminal an opportunity to obfuscate the URL of a malicious web page.

Cyber criminals carry out a drive-by download attack by tweeting a shortened URL pointing to a malicious website around a trending topic. The rationale is to tweet something that will stand out amongst other tweets and will make a user curious enough to click on the shortened URL. In this paper, we build a machine learning model using machine activity data and tweet metadata to detect such URLs at 99\% accuracy (using 10-fold cross validation) and 82\% (using an unseen test set) at 1 second into the interaction with a URL.

About the Speaker

I am a Senior Lecturer (Associate Professor) at Cardiff University and Social Computing research priority area leads in the School of Computer Science & Informatics’ Complex Systems research group. I have developed a reputation for data-driven, innovative, and interdisciplinary research that broadly contributes to the growing field of Data Science, working closely with the Cardiff School of Social Sciences and School of Engineering. I am an applied computer scientist with a principal focus on data and computational methods to improve understanding, operations and decision making outside of academia while contributing to the academic fields of Social Computing, Web Science and Cybersecurity.

These three fields are integrated within my research through the analysis and understanding of Web-enabled human and software behaviour, with a particular interest in emerging and future risks posed to civil society, business, economies and governments. I achieve this using computational methods such as machine learning and statistical data modelling, and interaction and behaviour mining, opinion mining and sentiment analysis to derive key features of interest.

My research outcomes, which include more than 50 academic articles – stemming from funded research projects worth over £7.2million, are organised and disseminated via the Social Data Science Lab, of which I am a director and the computational lead. The Lab’s core funding comes from a £450k ESRC grant and it forms part of the £64m ‘Big Data Network’. Core funding runs between 2017 and 2020, during which time the Lab will host 5 post-doctoral researchers and 9 PhD students, all studying topics related to Risk, Safety & Human/Cybersecurity.

Add to my calendar

Back to listing