About Me

Dr. Mahmoud El-Haj is a Senior Research Associate at the School of Computing and Communications at Lancaster University. Mahmoud received his PhD in Computer Science from The University of Essex working on Arabic Multi-document Summarization. His research interests include Arabic and multilingual NLP, Machine Learning, Information Extraction, Financial Narratives Processing and Corpus and Computational Linguistics. Mahmoud worked on multidisciplinary research projects at Lancaster University collaborating with big financial firms in London and has previously worked as a Data Mining developer and researcher at the UK Data Archive.


► Research Associate at the at the School of Computing and  Communications - Lancaster University.

► Worked as a Developmental Systems and Data Mining Developer at the UK Data Archive at Essex University.


► PhD in Computer Science, Essex University 2012.
► MSc in Information Systems, Jordan University 2008.
► BSc in Computer Information Systems, Jordan University 2005.

Research Interests

Big Data, Natural Language Processing (NLP), Data Visualisation, Corpus and Computational Linguistics, Analysing Financial Narratives and Disclosures, data trustworthiness, interdisciplinary research, machine translation, text classiffication, crowd-sourcing, information extraction and creating language resources.

PhD Thesis

Thesis Topic: Multi-document Arabic Text Summarisation.
Candidacy: Research and investigate the field of Arabic Natural Language Processing for both Single and Multi-Document Text Summarisation and providing resources and corpora that could help in advancing and push forward the research on this field.

Download my PhD Thesis


1- Bio Text Mining, Data Science, Lancaster Uni, UK

Comparing Medical Journals Using Corpus and Computational Linguistics in addition to applying NLP tools and techniques.

Dr Jo Knight, Data Science, Lancaster University.
Dr Paul Rayson, SCC, Lancaster University.
Dr Mahmoud El-Haj, SCC, Lancaster University.
Dr Scott Piao, SCC, Lancaster University.

2- Financial Narratives Processing, SCC and Acc&Fin, Lancaster Uni, UK

Analysing UK financial narratives using NLP, Corpus and Computational Linguistics.

Professor Steven Young, Lancaster University.
Dr Paul Rayson, SCC, Lancaster University.
Dr Mahmoud El-Haj, SCC, Lancaster University.

3- Understanding Corporate Communications (UCC), Lancaster Uni, UK

A comprehensive analysis of the form, content and impact of communications between large, publicly traded corporations and their key stakeholder groups concerning the following three key aspects of corporate governance: i) compliance with governance requirements and recommendations (e.g. The Combined Code in the UK); ii) executive remuneration; and iii) senior management turnover..

Professor Tony McEnery, LAEL Lancaster University.
Professor Steven Young, LUMS Lancaster University.
Dr Paul Rayson, SCC Lancaster University.
Dr Mahmoud El-Haj, SCC/LAEL Lancaster University.
Dr Andrew Hardie, LAEL Lancaster University.

4- VardSourcing and SenseSourcing, Lancaster Uni, UK

The use of crowd-sourcing to build lexicons and check spelling variation in historical data.
Dr Paul Rayson, SCC Lancaster University.
Dr Mahmoud El-Haj, SCC/LAEL Lancaster University.
Dr Alistair Barron, SCC Lancaster University.

5- Corporate Financial Information Environment (CFIE), Lancaster Uni, UK

To advance research on the lexical properties and narrative aspects of corporate disclosures by developing a suite of statistical natural language processing (NLP) tools for analysing firms' narrative communication practices.
Professor Martin Walker, The University of Manchester.
Professor Steven Young, Lancaster University.
Dr Paul Rayson, Lancaster University.
Dr Mahmoud El-Haj, Lancaster University.
Dr Vasiliki Athanasakou, London School of Economics.
Dr Thomas Schleicher, The University of Manchester.

6- SKOS-HASSET Project at the UK Data Archive, Essex, UK.

The objective of this project is to bring HASSET, the leading and well-respected English language social science thesaurus, into the Linked Data web. Its aims are twofold: firstly, it will apply SKOS to HASSET, thus creating SKOS-HASSET, a Linked Open Data product for the use of the wider social science community; secondly, it will test SKOS-HASSET's automatic indexing capabilities in relation to survey data resources. The project is funded by the Joint Information Systems Committee (JISC).
My role is to automatically index the HASSET thesaurus, publications and questionnaires and evaluate the automatic indexing with other human manual indexing.
Apply Natural Language Processing tools to connect the thesaurus index terms with the related terms in the index of the publications and questionnaires to enhance the retrieving process of these documents.

7- Updating Digital Preservation and Systems (DPS)

at the UK Data Archive, Essex, UK.
The objective of this project is to build applications to help organise and manage the DPS current systems. The project is funded by the Economic and Social Research Council (ESRC).
My role is to write PowerShell scripts to enhance the process of organising the Archive's studies and to manage the process of creating and downloading the studies zip bundles which requires security and validation check to ensure that the uploaded studies and zip bundles meet Archive's required specifications and standards.

Professional Services

► Programme Committee for the Third International Conference on Arabic Computational Linguistics (ACLing 2017), Dubai, UAE

► Programme Committee for the The 6th International Conference on Arabic Language Processing (ICALP 2017), Fez, Morocco

► Organiser of the Third Arabic Natural Language Processing Workshop co-located with EACL 2017, Valencia, Spain

► Programme Committee for MultiLing 2017: Summarization and summary evaluation across source types and genres co-located with EACL 2017, Valencia, Spain

► Programme Committee for the Big Data and NLP workshop hosted at IEEE Big Data 2016

► Summer School Tutor UCREL NLP Summer School 2016 Lancaster University

► Programme Committee for the Corpus Linguistics 2015 . Lancaster, UK.

► Coordinator of the 7th LSE/LUMS/MBS Conference 2013. London, UK.

► Organiser of the UCREL Corpus Research Seminars (CRS) at Lancaster University.

► Coordinator of the MultiLing Workshop at the ACL 2013 Conference in Sofia, Bulgaria.

► Coordinator of the MultiLing Pilot at the Text Analysis Conference (TAC) 2011.

► Organiser of the disciplinary Language And Computation (LAC) group at Essex University.

► Organiser of the FlatLands 2012 Workshop on NLP 2012 at Essex University, UK.

Journal and Conference Reviewing

Reviewer for:

International Journal of Corpus Linguistics 2017

Big Data and NLP workshop hosted at IEEE Big Data 2016

Digging into Data Challenge grant program (project proposal) 2016.

Computational Linguistics journal 2016.

ESRC Research Project Proposal (RCUK) 2015.

International Journal of Corpus Linguistics 2015.

Journal of Natural Language Engineering 2014, 2015.

MDPI Future Internet Journal 2014.

15th International Conference on Intelligent Text Proc & Comp Ling (CICLing) 2014.
LRE-Rel Workshop at the 8th LREC Conference 2012.
Fourth Computer Science & Electronic Engineering Conference (CEEC) 2012.

2nd IEEE Conference on Computer and Communication Technology (ICCCT) 2011.

32nd European Conference on Information Retrieval (ECIR) 2010.


► Winning team for the best audience-facing tool - BBC NewsHack event , London, 2016.

► Fully funded Internship at the National Institute of Informatics, Tokyo, Japan, 2011.

► Best Paper Award at the 4th LTC Conference, Poznan, Poland, 2009. The paper was then selected to appear at the