NLP Tools

The following is a list of freely available tools and resources from the projects I worked on at Lancaster University.


Corporate Financial Information Environment (CFIE)-Final Report Structure Extractor (FRSE) is a desktop application to detect the structure of UK Annual Report and extract the reports' contents on a section level. CFIE-FRSE []

2- OSMAN Readability Metric

Java open Source tool for Arabic text readability. The tool calculates readability for Arabic text with and without diacritics (Tashkeel). The tool works better with diacritics added in (we provide a method to allow you add diacritics to plain Arabic text).
OSMAN Readability []

3- NLP & ML Visualization Code and Tutorial

This is a step by step tutorial for text analyst who want an easy start to basic and and common techniques in NLP, Text Analysis, Machine Learning, Topic Modelling and Corpus Linguistics. The tutorial is pat of the "Visualise My Corpus" UCREL and DSG Seminar and Tutorial as well as the "Data Visualisation Workshop for Critical Computational Discourse" at the Data Science Institute at Lancaster University, UK.. NLP_ML_Visualization_Tutorial []

4- Java word cloud with Log Likelihood

Java tool to create word clouds using log Likelihood and word frequencies (or any other weight values). Log Likelihood is calculated for a word between two large corpora input that could be in any language. The tool is language independent and was tested on Arabic and English
Java word cloud with Log Likelihood []

5- Machine Learning Java code

Java Code that trains classifiers for chairman's statements, governance & remuneration sections from 1,000 annual financial reports (Part of UCREL NLP Summer School 2016-2018 Lancaster University)
Machine Learning Tutorial []

6- Gene Ontology Semantic Tagger (GOST)

Our code allowed us to generate a USAS tagger dictionary file where each entry in the OBO ontology is tagged with the GO IDs shown in its path. Taking the “mucosal immune response” OBO entry shown in Figure 1 we can see there are two paths starting from the child node towards the “biological process” root.
GOST Semantic Tagger GitHub

7- Welsh Summary Creator (ACC)

A simple text tool for extracting and summarizing free Welsh. It allows the users to paste, drag and drop, or upload text files as well as determine the size of the summary..
Welsh Summary Creator (ACC) GitHub

Welsh Summaries Dataset on GitHub

Welsh Summary Creator (ACC) Demo