About Me

NLP Research Associate and co-organiser of the UCREL Corpus Research Seminars (CRS) at Lancaster University. Working on the ESRC funded Corporate Financial Information Environment (CFIE) project at the School of Computing and  Communications .

Previously

Worked as a Developmental Systems and Data Mining Developer at the UK Data Archive at Essex University.

Education

PhD in Computer Science, Essex University 2012.
MSc in Information Systems, Jordan University 2008.
BSc in Computer Information Systems, Jordan University 2005.

Research Interests

Natural Language Processing (NLP); mainly on multi-document text summarisation for both Arabic and English, Information Retrieval, Question Answering, machine translation, text classification, crowd-sourcing, information extraction and creating NLP resources.

PhD Thesis

Thesis Topic: Multi-document Arabic Text Summarisation.
Candidacy: Research and investigate the field of Arabic Natural Language Processing for both Single and Multi-Document Text Summarisation and providing resources and corpora that could help in advancing and push forward the research on this field.

http://serlib0.essex.ac.uk/record=b1807018~S5

Bibtex Reference:
@PHDTHESIS{
AUTHOR= {Mahmoud El-Haj},
TITLE= {{Arabic Multi-document Text Summarisation}},
SCHOOL= {{University of Essex}},
YEAR = {2012},
ADDRESS = {{The Albert Sloman Library: University of Essex}},
PAGES = {165},
BOOKNUMBER = {139000488},
NOTE = {Thesis (Ph.D.), School of Computer Science and Electronic Engineering, University of Essex, 2012},
URL= {http://serlib0.essex.ac.uk/record=b1807018~S5}

}


Download my PhD Thesis

Projects

1- Corporate Financial Information Environment (CFIE), Lancaster Uni, UK

The project has five primary objectives:
1. To advance research on the lexical properties and narrative aspects of corporate disclosures by developing a suite of statistical natural language processing (NLP) tools for analysing firms' narrative communication practices.
2. To use the methods developed in objective 1 to measure the linguistic characteristics of key corporate disclosures (both mandatory and voluntary), to identify determinants of cross-sectional variation in these characteristics, and to relate these characteristics to disclosure informativeness. Analysis of the content of interim management statements will form a specific application of these methods.
3. To apply the methods developed in objective 1 to advance research on the interactions between corporate voluntary disclosures and accounting quality, including new work on the joint effects of corporate disclosure and earnings management practices on share price anticipation of earnings and companies' standing in published rankings of investor relations quality.
4. To apply NLP scoring methods to UK corporate news stories in the financial media with the aim of developing a more complete measure of corporate financial communications quality.
5. To use the methods and insights from objectives 1 to 4 to provide new evidence on the links between earnings quality, disclosure quality, and cost of capital.

Team:
Professor Martin Walker, The University of Manchester.
Professor Steven Young, Lancaster University.
Dr Paul Rayson, Lancaster University.
Dr Mahmoud El-Haj, Lancaster University.
Dr Vasiliki Athanasakou, London School of Economics.
Dr Thomas Schleicher, The University of Manchester.

2- SKOS-HASSET Project at the UK Data Archive, Essex, UK.
The objective of this project is to bring HASSET, the leading and well-respected English language social science thesaurus, into the Linked Data web. Its aims are twofold: firstly, it will apply SKOS to HASSET, thus creating SKOS-HASSET, a Linked Open Data product for the use of the wider social science community; secondly, it will test SKOS-HASSET's automatic indexing capabilities in relation to survey data resources. The project is funded by the Joint Information Systems Committee (JISC).
My role is to automatically index the HASSET thesaurus, publications and questionnaires and evaluate the automatic indexing with other human manual indexing.
Apply Natural Language Processing tools to connect the thesaurus index terms with the related terms in the index of the publications and questionnaires to enhance the retrieving process of these documents.


3- Updating Digital Preservation and Systems (DPS) at the UK Data Archive, Essex, UK
The objective of this project is to build applications to help organise and manage the DPS current systems. The project is funded by the Economic and Social Research Council (ESRC).
My role is to write PowerShell scripts to enhance the process of organising the Archive's studies and to manage the process of creating and downloading the studies zip bundles which requires security and validation check to ensure that the uploaded studies and zip bundles meet Archive's required specifications and standards.

Professional Services

- Co-organiser of the UCREL Corpus Research Seminars (CRS)

- Ccoordinator of the MultiLing Workshop at the ACL 2013 Conference in Sofia, Bulgaria.

- Ccoordinator of the MultiLing Pilot at the Text Analysis Conference (TAC) 2011 in Maryland, USA.

- Organiser of the disciplinary Language And Computation (LAC) group at Essex University.

- Organiser of the FlatLands 2012 Workshop on Natural Language Processing Research for postgraduate students at Cambridge, Essex, Open, and Oxford Universities Friday, 29th June, 2012 at Essex University, Wivenhoe Park, Colchester, Essex, UK.

Conference Reviewing

Reviewer for the 32nd European Conference on Information Retrieval (ECIR) 2010.
Reviewer for the 2nd IEEE International Conference on Computer and Communication Technology (ICCCT) 2011.
Reviewer for the LRE-Rel Workshop at the eighth international conference on Language Resources and Evaluation (LREC) 2012.
Reviewer for the fourth Computer Science and Electronic Engineering Conference (CEEC) 2012.

Awards

Best Paper Award at the 4th LTC Conference, Poznan, Poland, 2009. The paper was then selected to appear at the Springer's Lecture Notes in Computer Science.