- Home
- Study
- Postgraduate
- Postgraduate Courses
- Data Science MSc
Data Science MSc - 2019 Entry
Entry Year
2019
Duration
Full time 12 Month(s)
Course Overview
From business and finance or health and medicine, to infrastructure or education, data science plays a vital role in all aspects of the modern world. Our MSc programme will ensure you have an advanced level of skills, knowledge, and experience to achieve your career aspirations.
Studying for an MSc in Data Science at Lancaster will provide you with the perfect environment to develop an expertise in the discipline. Your study will build upon the fundamentals, and our specialist pathways will allow you to practise and enhance technical skills, while gaining professional knowledge that will support and advance your career aspirations.
Over the year, you will explore five core Data Science modules. These will ensure you have a solid advanced grounding in the subject, to support your choice of specialism.
You can choose from two specialisms according to your background and interests:
- Computing
- Statistical Inference
In taking one of these routes, you will gain access to a range of exciting, advanced pathway-specific modules. These modules will allow you to either enhance your understanding of data science technologies; or to gain expertise in the application of data science to business intelligence, bioinformatics, population health, the environment, or the study of society. Our specialist modules will provide you with detailed, expert knowledge and will enhance your employability. This format means that you will be equipped to apply for any data science related career, while providing you with an advantage in many industries.
In addition to these taught modules, you will also have the opportunity to undertake a 12-week placement either within industry or as part of an academic research project. This will provide you with a fantastic opportunity to apply your skills and knowledge to real-world situations and challenges, allowing you to gain valuable professional experience and demonstrate a working grasp of the discipline.
The placement project represents a substantial, independent research project. Supervised by an academic, you will develop your ability to gather and analyse data, draw valuable conclusions, and present findings in a professional environment. This research will be an opportunity to bring together everything you have learnt over the year, exercise your ability to solve problems and manage a significant project. This will be great experience for you to draw upon in an interview and in your career.
Assessment
We offer an excellent range of learning environments, which include traditional lectures, laboratories, and workshops. We are also committed to providing timely feedback for all submitted work and projects.
Assessment varies across modules, allowing students to demonstrate their capabilities in a range of ways, including laboratory reports, essays, exercises, literature reviews, short tests, poster sessions, oral presentations, and formal examination.
Community
We have a great relationship with our students and alumni, who have praised the School for its ambition, positivity and friendly atmosphere. By providing a number of support methods, accessible at any stage of your degree, we strive to give our students the best opportunity to fulfil their potential and attract the very best opportunities for a successful career. Our academics are welcoming and helpful; you will be assigned an academic advisor who can offer advice and recommended reading; and our open door policy has been a popular feature among our students. We believe in encouraging and inspiring our computing and communications scientists of the future.
Career
The gathering, interpretation and evaluation of data is fundamental to all aspects of modern life. As a result, data science can lead to a career in a wide range of industries. The core modules of this programme will ensure you are properly equipped to apply yourself to any data role, while your specialist pathway will enhance your opportunities in specific industries, should that be the route you wish to pursue.
Studying at Masters level will further enhance your career prospects, opening up opportunities to progress further in your career.
In addition, many of our Data Scientists also elect to study a PhD qualification.
We provide careers advice and host a range of events throughout the year, including our annual careers fair, attended by exhibitors who are interested in providing placements and vacancies to computer science students and graduates. You can speak face-to-face with employers such as Network Rail, Oracle, and Johnson and Johnson, in addition to a large range of SMEs.
Course Structure
You will study a range of modules as part of your course, some examples of which are listed below.
-
Data Mining
Students are provided with a comprehensive coverage of the problems related to data representation, manipulation and processing in terms of extracting information from data, including big data. They will apply their working understanding to the data primer, data processing and classification. They will also enhance their familiarity with dynamic data space partitioning, using evolving, clustering and data clouds, and monitoring the quality of the self-learning system online.
Students will also gain the ability to develop software scripts that implement advanced data representation and processing, and demonstrate their impact on performance. In addition, they will develop a working knowledge in listing, explaining and generalising the trade-offs of performance, as well as the complexity in designing practical solutions for problems of data representation and processing in terms of storage, time and computing power.
-
Data Science Fundamentals
This module will help you understand what the data science role entails and how that individual performs their job within an organisation on a day-to-day basis. You will look at how research is performed in terms of formulating a hypothesis and the implication of research finidngs, and be aware of different research strategies and when these should be applied. You will gain an understanding of data processing, preparation and integration, and how this enables research to be performed and you will learn how data science problems are tackled in an industrial setting, and how such findings are communicated to people within the organisation.
-
Programming for Data Scientists
This module is designed for students that are completely new to programming, and for experienced programmers, bringing them both to a high-skilled level to handle complex data science problems. Beginner students will learn the fundamentals of programming, while experienced students will have the opportunity to sharpen and further develop their programming skills. The students are going to learn data-processing techniques, including visualisation and statistical data analysis. For a broad formation, in order to handle the most complex data science tasks, we will also cover problem solving, and the development of graphical applications. Two open source programming languages will be used, R and Python.
-
Likelihood Inference
This module presents the key tools for statistical inference, stressing the fundamental role of the likelihood function. It addresses how the likelihood function, that is, the probability of the observed data viewed as a function of unknown parameters, can be used to make inference about those parameters, in addition to working with models which do not assume the data are independent and identically distributed. Students will also be introduced to basic computational aspects of likelihood inference that are required in many practical applications.
Students will engage with a range of features, including the definition of the likelihood model for multi-parameter models, and how it is used to calculate point estimates, the asymptotic distribution of the maximum likelihood estimator, the definition and use of orthantology, and the simple use for computational methods to calculate maximum likelihood estimates.
On completion of this module, students will be able to appreciate how information about the unknown parameters is obtained and summarised via the likelihood function, in addition to the level of skill required to calculate the likelihood function for some statistical models which do not assume independently identically distributed data, as well as a developed understanding of the inter-relationships between parameters, and the concept of orthogonality.
-
Generalised Linear Models
Generalised linear models are now one of the most frequently used statistical tools of the applied statistician. They extend the ideas of regression analysis to a wider class of problems such as the relationship between a response and one or more explanatory variables. This module discusses applications of the generalised linear models to a diverse range of practical problems involving data from the area of biology, social sciences and time series to name a few, and aims to explore the theoretical basis of these models. The syllabus consists of formulating sensible models for a relationship between a response and one or more explanatory variables, taking account of the motivation for data collection, whilst checking these models in the statistical package R, producing confidence intervals and tests corresponding to questions of interest, and stating conclusions in everyday language.
On successful completion of this module, students will develop the ability to apply their knowledge of model formulation in order to judge how the probability of success will depend on the patient’s age, weight, blood pressure and so on. Finally, students will become familiar with a common algorithm called ‘iteratively reweighted last squares’ algorithm, which is intended for the attention of parameters.
-
MSc Data Science Dissertation
A large part of the masters involves completing the industry or research related project. This starts with the students selecting an industry or research partner, undertaking a placement in June - July, and then submitting a written dissertation of up to 20,000 words in early September.
This is primarily a self-study module designed to provide the foundation of the main dissertation, at a level considered to be of publishable quality. The project also offers students the opportunity to apply their technical skills and knowledge on current world class research problems and to develop an expert knowledge on a specific area.
The topic of the project will vary from student to student, depending on the data science specialism (eg computing may involve the design of a system, while specialism in data analytics, health or environment, are likely to be more applied, perhaps focusing upon inherent data structure and processes).
Core
-
Statistical Inference
This modules aims to provide an in-depth understanding of statistics as a general approach to the problem of making valid inferences about relationships from observational and experimental studies. Examples from social science and environmental science are used to illustrate this approach. The emphasis will be on the principle of Maximum Likelihood as a unifying theory for estimating parameters.
-
Statistical Methods and Modelling
The aim of this module will be to address the fundamentals of statistics for those who do not have a mathematics and statistics background. The module is delivered over three intensive two-day sessions of lectures and practicals. You will develop an understanding of the theory behind core statistical topics; sampling, hypothesis testing, and modelling.
-
Distributed Artificial Intelligence
Distributed artificial intelligence is fundamental in contemporary data analysis. Large volumes of data and computation call for multiple computers in problem solving. Being able to understand and use those resources efficiently is an important skill for a data scientist. A distributed approach is also important for fault tolerance and robustness, as the loss of a single component must not significantly compromise the whole system. Additionally, contemporary and future distributed systems go beyond computer clusters and networks. Distributed systems are often comprised of multiple agents -- multiple software, humans and/or robots that all interact in problem solving. As a data scientist, we may have control of the full distributed system, or we may have control of only one piece, and we have to decide how it must behave in face of others in order to accomplish our goals.
-
Systems Architecture and Integration
In this module we explore the architectural approaches, techniques and technologies that underpin today's Big Data system infrastructure and particularly large-scale enterprise systems. It is one of two complementary modules that comprise the Systems stream of the Computer Science MSc, which together provide a broad knowledge and context of systems architecture enabling students to assess new systems technologies, to know where technologies fit in the larger scheme of enterprise systems and state of the art research thinking, and to know what to read to go deeper.
The principal ethos of the module is to focus on the principles of Big Data systems, and applying those principles using state of the art technology to engineer and lead data science projects. Detailed case studies and invited industrial speakers will be used to provide supporting real-world context and a basis for interactive seminar discussions.
-
Applied Data Mining
This module provides students with up-to-date information on current applications of data in both industry and research. Expanding on the module ‘Fundamentals of Data’, students will gain a more detailed level of understanding about how data is processed and applied on a large scale across a variety of different areas.
Students will develop knowledge in different areas of science and will recognise their relation to big data, in addition to understanding how large-scale challenges are being addressed with current state-of-the-art techniques. The module will provide recommendations on the Social Web and their roots in social network theory and analysis, in addition their adaption and extension to large-scale problems, by focusing on primer, user-generated content and crowd-sourced data, social networks (theories, analysis), recommendation (collaborative filtering, content recommendation challenges, and friend recommendation/link prediction).
On completion of this module, students will be able to create scalable solutions to problems involving data from the semantic, social and scientific web, in addition to abilities gained in processing networks and performing of network analysis in order to identify key factors in information flow.
-
Clinical Trials
Clinical trials are planned experiments on human beings designed to assess the relative benefits of one or more forms of treatment. For instance, we might be interested in studying whether aspirin reduces the incidence of pregnancy-induced hypertension, or we may wish to assess whether a new immunosuppressive drug improves the survival rate of transplant recipients.
This module combines the study of technical methodology with discussion of more general research issues, beginning with a discussion of the relative advantages and disadvantages of different types of medical studies. The module will provide a definition and estimation of treatment effects. Furthermore, cross-over trials, issues of sample size determination, and equivalence trials are covered. There is an introduction to flexible trial designs that allow a sample size re-estimation during the ongoing trial. Finally, other relevant topics such as meta-analysis and accommodating confounding at the design stage are briefly discussed.
Students will gain knowledge of the basic elements of clinical trials. They will develop the ability to recognise and use principles of good study design, and will also be able to analyse and interpret study results to make correct scientific inferences.
-
Principles of Epidemiology
Introducing epidemiology, the study of the distribution and determents of disease in human population, this module presents its main principles and statistical methods. The module addresses the fundamental measures of disease, such as indicence, prevalence, risk and rates, including indices of morbidity and mortality.
Students will also develop awareness in epidemiologic study design, such as ecological studies, surveys, and cohort and case-control studies, in addition to diagnostic test studies. Epidemiological concepts will be addressed, such as bias and confounding, matching and stratification, and the module will also address calculation of rates, standardisation and adjustment, as well as issues in screening.
This module provides students with a historical and general overview of epidemiology and related strategies for study design, and should enable students to conduct appropriate methods of analysis for rates and risk of disease. Students will develop skills in critical appraisal of the literature and, in completing this module, will have developed an appreciation for epidemiology and an ability to describe the key statistical issues in the design of ecological studies, surveys, case-control studies, cohort studies and RCT, whilst recognising their advantages and disadvantages.
-
Longitudinal Data Analysis
This module presents an approach to the analysis of longitudinal data, based on statistical modelling and likelihood methods of parameter estimation and hypothesis testing. Among other topics, students will learn about the exploratory and simple analysis strategies, the independence working assumption, normal linear model with correlated errors and generalised estimation questions.
Students will develop an understanding in dealing with correlated data commonly arising in longitudinal studies, as well as an awareness of issues associated with collecting and analysing longitudinal data, whilst gaining a higher level of knowledge different modelling assumptions used in the analysis and their relations to the scientific aims of the study.
On module completion, students will gain the ability to explain the difference between longitudinal studies and cross-sectional studies, in addition to the knowledge required to select appropriate techniques to explore data, and the ability to compare different approaches to estimation and their usage in the analysis. Finally, students will obtain the skill level required to build statistical models for longitudinal data and draw valid conclusions from their models.
-
Environmental Epidemiology
This module introduces students to the kinds of statistical methods commonly used by epidemiologists and statisticians to investigate the relationship between risks of disease with environment factors. Students will discover motivation examples for methods in course, and will engage with spatial point-processes, including the theory and methods for the analysis of point-processes in two-dimensional space.
A number of published studies will be used to illustrate the methods described, and students will learn how to perform similar analyses using the statistical package, R. Students will learn methods and theory for analysing point-patterns, such as univariate and bivariate K-functions, methods for analysing care control data, including kernel intensity estimation, binary regression and generalised additive models. Students will also explore spatial generalised linear-mixed models including Poisson models for counts of a disease in regions and the concept of ecological bias, along with modelling elevated disease risk due to the presence of a point source and continuous spatial variation including the Gaussian geostatistical model, variograms and spatial prediction.
Students will develop the ability to recognise the difference between point process data, area-level data and geostatistical data, and the skills required to define and estimate the intensity of K functions for a spatial point process. Finally, successful students will be able to perform basic analyses of case-control and geostatistical data, along with a broad understanding of practical issues involved in undertaking environmental epidemiology studies.
-
Survival and Event History Analysis
This module addresses a range of topics relating to survival data; censoring, hazard functions, Kaplan-Meier plots, parametric models and likelihood construction will be discussed in detail. Students will engage with the Cox proportional hazard model, partial likelihood, Nelson-Aalen estimation and survival time prediction and will also focus on counting processes, diagnostic methods, and frailty models and effects.
The module provides an understanding of the unique features and statistical challenges surrounding the analysis of survival avant history data, in addition to an understanding of how non-parametric methods can aid in the identification of modelling strategies for time-to-event data, and recognition of the range and scope of survival techniques that can be implemented within standard statistical software.
General skills will be developed, including the ability to express scientific problems in a mathematical language, improvement of scientific writing skills, and an enhanced range of computing skills related to the manipulation on analysis of data.
On successful completion of this module, students will be able to apply a range of appropriate statistical techniques to survival and event history data using statistical software, to accurately interpret the output of statistical analyses using survival models, fitted using standard software, and the ability to construct and manipulate likelihood functions from parametric models for censored data. Students will also gain observation skills, such as the ability to identify when particular models are appropriate, through the application of diagnostic checks and model building strategies.
-
Forecasting
Every managerial decision concerned with future actions is based upon a prediction of some aspects of the future. Therefore Forecasting plays an important role in enhancing managerial decision making.
After introducing the topic of forecasting in organisations, time series patterns and simple forecasting methods (naïve and moving averages) are explored. Then, the extrapolative forecasting methods of exponential smoothing and ARIMA models are considered. A detailed treatment of causal modelling follows, with a full evaluation of the estimated models. Forecasting applications in operations and marketing are then discussed. The module ends with an examination of judgmental forecasting and how forecasting can best be improved in an organisational context. Assessment is through a report aimed at extending and evaluating student learning in causal modelling and time series analysis.
-
Introduction to Intelligent Data Analysis (Data Mining)
This module develops modelling skills on synthetic and empirical data by showing simple statistical methods and introducing novel methods from artificial intelligence and machine learning.
The module will cover a wide range of data mining methods, including simple algorithms such as decision trees all the way to state of the art algorithms of artificial neural networks, support vector regression, k-nearest neighbour methods etc. We will consider both Data Mining methods for descriptive modelling, exploration & data reduction that aim to simplify and add insights to large, complex data sets, and Data Mining methods for predictive modelling that aim to classify and cluster individuals into distinct, disjoint segments with different patterns of behaviour.
The module will also include a series of workshops in which you will learn how to use the SAS Enterprise Miner software for data mining (a software skill much sought after in the job market) and how to use it on real datasets in a real world scenario.
-
Optimisation and Heuristics
Optimisation, sometimes called mathematical programming, has applications in many fields, including operational research, computer science, statistics, finance, engineering and the physical sciences. Commercial optimisation software is now capable of solving many industrial-scale problems to proven optimality.
The module is designed to enable students to apply optimisation techniques to business problems. Building on the introduction to optimisation in the first term, students will be introduced to different problem formulations and algorithmic methods to guide decision making in business and other organisations.
-
Extreme Value Theory
This module aims to develop the asymptotic theory, and associated techniques for modelling and inference, associated with the analysis of extreme values of random processes. The module focuses on the mathematic basis of the models, the statistical principles for implementation and the computational aspects of data modelling.
Students will develop an appreciation of, and facility in, the various asymptotic arguments and models, and will also gain the ability to fit appropriate models to data using specially developed R software, in addition to a working understanding of fitted models. Knowledge in R software computing is an essential skill that is transferrable with a wide range of modules on the mathematics programme, and beyond.
-
Multi Level Models
Introducing data analysis, this module focuses on data that has a multi-level, hierarchical structure. Students will explore the use of multi-level structures in real dataset, working with statistical software such as Stata, R and MLwiN. The module will also provide an understanding in the classical ANOVA model.
Students will develop the ability to perform model diagnostics whilst applying appropriate notation and will develop skills in the area of presenting and interpreting output from statistical models through preparation of the coursework.
-
Methods for Missing Data
This module offers students an advanced understanding of statistics, and will explore the idea of missingness as a stochastic process. Students will develop and apply their knowledge in missing data formulas, focusing on the imputation model and the model of interest. More naive methods will be introduced, such as single imputation and list wise deletion, and students will develop the ability to recognise the limitations of each method and gain the knowledge required to identify situations where their use may be appropriate.
A portion of the module will introduce VIM software and explore its uses for finding missingness patterns.
This module will enhance deduction skills, and students will become accustomed to the differences between sampling and parameter uncertainty, in addition to noticing similarities between the Bayesian and imputation approaches.
Optional
Information contained on the website with respect to modules is correct at the time of publication, but changes may be necessary, for example as a result of student feedback, Professional Statutory and Regulatory Bodies' (PSRB) requirements, staff changes, and new research.
Key Information
Please email Dr Chris Edwards with general questions about this MSc.
To discuss a specific data science specialism, please contact the relevant Programme Director:
MSc Data Science - Statistical Inference - Dr Deborah Costain
MSc Data Science - Computing - Dr Chris Edwards
Duration: 12 months.
Entry requirements: An upper-second class honours degree, or its equivalent, in a subject relevant to Computer Science, Mathematics or Statistics
IELTS: 6.5 or equivalent
Assessment: Coursework and examination
Funding: All applicants should consult our information on fees and funding.
For further information please see our website
Fees
Fees
Full Time (per year) | Part Time (per year) | |
---|---|---|
UK/EU | £9,500 | £4,750 |
Overseas | £20,500 | n/a |
The University will not increase the Tuition Fee you are charged during the course of an academic year.
If you are studying on a programme of more than one year's duration, the tuition fees for subsequent years of your programme are likely to increase each year. The way in which continuing students' fee rates are determined varies according to an individual's 'fee status' as set out on our fees webpages.
What are tuition fees for?
Studying at a UK University means that you need to pay an annual fee for your tuition, which covers the costs associated with teaching, examinations, assessment and graduation.
The fee that you will be charged depends on whether you are considered to be a UK, EU or overseas student. Visiting students will be charged a pro-rata fee for periods of study less than a year.
Our annual tuition fee is set for a 12 month session, which usually runs from October to September the following year.
How does Lancaster set overseas tuition fees?
Overseas fees, alongside all other sources of income, allow the University to maintain its abilities across the range of activities and services. Each year the University's Finance Committee consider recommendations for increases to fees proposed for all categories of student and this takes into account a range of factors including projected cost inflation for the University, comparisons against other high-quality institutions and external financial factors such as projected exchange rate movements.
What support is available towards tuition fees?
Lancaster University's priority is to support every student in making the most of their education. Many of our students each year will be entitled to bursaries or scholarships to help with the cost of fees and/or living expenses. You can find out more about financial support, studentships, and awards for postgraduate study on our website.
Related Courses
- Applied Social Statistics : PhD
- Communication Systems : MSc by Research
- Communication Systems : PhD
- Computer Science : MPhil/PhD
- Computer Science : MSc
- Computer Science (by research) : MSc
- Cyber Security : MSc
- Data Science : PgCert
- Data Science : PgDip
- Electronic Engineering : MSc
- Mathematics : PhD
- Quantitative Finance : MSc
- Statistics : MSc
- Statistics : PGDip
- Statistics : PhD
- Statistics and Operational Research (STOR-i) : MRes
- Statistics and Operational Research (STOR-i) : PhD
-
Course Overview
Course Overview
From business and finance or health and medicine, to infrastructure or education, data science plays a vital role in all aspects of the modern world. Our MSc programme will ensure you have an advanced level of skills, knowledge, and experience to achieve your career aspirations.
Studying for an MSc in Data Science at Lancaster will provide you with the perfect environment to develop an expertise in the discipline. Your study will build upon the fundamentals, and our specialist pathways will allow you to practise and enhance technical skills, while gaining professional knowledge that will support and advance your career aspirations.
Over the year, you will explore five core Data Science modules. These will ensure you have a solid advanced grounding in the subject, to support your choice of specialism.
You can choose from two specialisms according to your background and interests:
- Computing
- Statistical Inference
In taking one of these routes, you will gain access to a range of exciting, advanced pathway-specific modules. These modules will allow you to either enhance your understanding of data science technologies; or to gain expertise in the application of data science to business intelligence, bioinformatics, population health, the environment, or the study of society. Our specialist modules will provide you with detailed, expert knowledge and will enhance your employability. This format means that you will be equipped to apply for any data science related career, while providing you with an advantage in many industries.
In addition to these taught modules, you will also have the opportunity to undertake a 12-week placement either within industry or as part of an academic research project. This will provide you with a fantastic opportunity to apply your skills and knowledge to real-world situations and challenges, allowing you to gain valuable professional experience and demonstrate a working grasp of the discipline.
The placement project represents a substantial, independent research project. Supervised by an academic, you will develop your ability to gather and analyse data, draw valuable conclusions, and present findings in a professional environment. This research will be an opportunity to bring together everything you have learnt over the year, exercise your ability to solve problems and manage a significant project. This will be great experience for you to draw upon in an interview and in your career.
Assessment
We offer an excellent range of learning environments, which include traditional lectures, laboratories, and workshops. We are also committed to providing timely feedback for all submitted work and projects.
Assessment varies across modules, allowing students to demonstrate their capabilities in a range of ways, including laboratory reports, essays, exercises, literature reviews, short tests, poster sessions, oral presentations, and formal examination.
Community
We have a great relationship with our students and alumni, who have praised the School for its ambition, positivity and friendly atmosphere. By providing a number of support methods, accessible at any stage of your degree, we strive to give our students the best opportunity to fulfil their potential and attract the very best opportunities for a successful career. Our academics are welcoming and helpful; you will be assigned an academic advisor who can offer advice and recommended reading; and our open door policy has been a popular feature among our students. We believe in encouraging and inspiring our computing and communications scientists of the future.
Career
The gathering, interpretation and evaluation of data is fundamental to all aspects of modern life. As a result, data science can lead to a career in a wide range of industries. The core modules of this programme will ensure you are properly equipped to apply yourself to any data role, while your specialist pathway will enhance your opportunities in specific industries, should that be the route you wish to pursue.
Studying at Masters level will further enhance your career prospects, opening up opportunities to progress further in your career.
In addition, many of our Data Scientists also elect to study a PhD qualification.
We provide careers advice and host a range of events throughout the year, including our annual careers fair, attended by exhibitors who are interested in providing placements and vacancies to computer science students and graduates. You can speak face-to-face with employers such as Network Rail, Oracle, and Johnson and Johnson, in addition to a large range of SMEs.
-
Course Structure
Course Structure
You will study a range of modules as part of your course, some examples of which are listed below.
-
Data Mining
Students are provided with a comprehensive coverage of the problems related to data representation, manipulation and processing in terms of extracting information from data, including big data. They will apply their working understanding to the data primer, data processing and classification. They will also enhance their familiarity with dynamic data space partitioning, using evolving, clustering and data clouds, and monitoring the quality of the self-learning system online.
Students will also gain the ability to develop software scripts that implement advanced data representation and processing, and demonstrate their impact on performance. In addition, they will develop a working knowledge in listing, explaining and generalising the trade-offs of performance, as well as the complexity in designing practical solutions for problems of data representation and processing in terms of storage, time and computing power.
-
Data Science Fundamentals
This module will help you understand what the data science role entails and how that individual performs their job within an organisation on a day-to-day basis. You will look at how research is performed in terms of formulating a hypothesis and the implication of research finidngs, and be aware of different research strategies and when these should be applied. You will gain an understanding of data processing, preparation and integration, and how this enables research to be performed and you will learn how data science problems are tackled in an industrial setting, and how such findings are communicated to people within the organisation.
-
Programming for Data Scientists
This module is designed for students that are completely new to programming, and for experienced programmers, bringing them both to a high-skilled level to handle complex data science problems. Beginner students will learn the fundamentals of programming, while experienced students will have the opportunity to sharpen and further develop their programming skills. The students are going to learn data-processing techniques, including visualisation and statistical data analysis. For a broad formation, in order to handle the most complex data science tasks, we will also cover problem solving, and the development of graphical applications. Two open source programming languages will be used, R and Python.
-
Likelihood Inference
This module presents the key tools for statistical inference, stressing the fundamental role of the likelihood function. It addresses how the likelihood function, that is, the probability of the observed data viewed as a function of unknown parameters, can be used to make inference about those parameters, in addition to working with models which do not assume the data are independent and identically distributed. Students will also be introduced to basic computational aspects of likelihood inference that are required in many practical applications.
Students will engage with a range of features, including the definition of the likelihood model for multi-parameter models, and how it is used to calculate point estimates, the asymptotic distribution of the maximum likelihood estimator, the definition and use of orthantology, and the simple use for computational methods to calculate maximum likelihood estimates.
On completion of this module, students will be able to appreciate how information about the unknown parameters is obtained and summarised via the likelihood function, in addition to the level of skill required to calculate the likelihood function for some statistical models which do not assume independently identically distributed data, as well as a developed understanding of the inter-relationships between parameters, and the concept of orthogonality.
-
Generalised Linear Models
Generalised linear models are now one of the most frequently used statistical tools of the applied statistician. They extend the ideas of regression analysis to a wider class of problems such as the relationship between a response and one or more explanatory variables. This module discusses applications of the generalised linear models to a diverse range of practical problems involving data from the area of biology, social sciences and time series to name a few, and aims to explore the theoretical basis of these models. The syllabus consists of formulating sensible models for a relationship between a response and one or more explanatory variables, taking account of the motivation for data collection, whilst checking these models in the statistical package R, producing confidence intervals and tests corresponding to questions of interest, and stating conclusions in everyday language.
On successful completion of this module, students will develop the ability to apply their knowledge of model formulation in order to judge how the probability of success will depend on the patient’s age, weight, blood pressure and so on. Finally, students will become familiar with a common algorithm called ‘iteratively reweighted last squares’ algorithm, which is intended for the attention of parameters.
-
MSc Data Science Dissertation
A large part of the masters involves completing the industry or research related project. This starts with the students selecting an industry or research partner, undertaking a placement in June - July, and then submitting a written dissertation of up to 20,000 words in early September.
This is primarily a self-study module designed to provide the foundation of the main dissertation, at a level considered to be of publishable quality. The project also offers students the opportunity to apply their technical skills and knowledge on current world class research problems and to develop an expert knowledge on a specific area.
The topic of the project will vary from student to student, depending on the data science specialism (eg computing may involve the design of a system, while specialism in data analytics, health or environment, are likely to be more applied, perhaps focusing upon inherent data structure and processes).
Core
-
Statistical Inference
This modules aims to provide an in-depth understanding of statistics as a general approach to the problem of making valid inferences about relationships from observational and experimental studies. Examples from social science and environmental science are used to illustrate this approach. The emphasis will be on the principle of Maximum Likelihood as a unifying theory for estimating parameters.
-
Statistical Methods and Modelling
The aim of this module will be to address the fundamentals of statistics for those who do not have a mathematics and statistics background. The module is delivered over three intensive two-day sessions of lectures and practicals. You will develop an understanding of the theory behind core statistical topics; sampling, hypothesis testing, and modelling.
-
Distributed Artificial Intelligence
Distributed artificial intelligence is fundamental in contemporary data analysis. Large volumes of data and computation call for multiple computers in problem solving. Being able to understand and use those resources efficiently is an important skill for a data scientist. A distributed approach is also important for fault tolerance and robustness, as the loss of a single component must not significantly compromise the whole system. Additionally, contemporary and future distributed systems go beyond computer clusters and networks. Distributed systems are often comprised of multiple agents -- multiple software, humans and/or robots that all interact in problem solving. As a data scientist, we may have control of the full distributed system, or we may have control of only one piece, and we have to decide how it must behave in face of others in order to accomplish our goals.
-
Systems Architecture and Integration
In this module we explore the architectural approaches, techniques and technologies that underpin today's Big Data system infrastructure and particularly large-scale enterprise systems. It is one of two complementary modules that comprise the Systems stream of the Computer Science MSc, which together provide a broad knowledge and context of systems architecture enabling students to assess new systems technologies, to know where technologies fit in the larger scheme of enterprise systems and state of the art research thinking, and to know what to read to go deeper.
The principal ethos of the module is to focus on the principles of Big Data systems, and applying those principles using state of the art technology to engineer and lead data science projects. Detailed case studies and invited industrial speakers will be used to provide supporting real-world context and a basis for interactive seminar discussions.
-
Applied Data Mining
This module provides students with up-to-date information on current applications of data in both industry and research. Expanding on the module ‘Fundamentals of Data’, students will gain a more detailed level of understanding about how data is processed and applied on a large scale across a variety of different areas.
Students will develop knowledge in different areas of science and will recognise their relation to big data, in addition to understanding how large-scale challenges are being addressed with current state-of-the-art techniques. The module will provide recommendations on the Social Web and their roots in social network theory and analysis, in addition their adaption and extension to large-scale problems, by focusing on primer, user-generated content and crowd-sourced data, social networks (theories, analysis), recommendation (collaborative filtering, content recommendation challenges, and friend recommendation/link prediction).
On completion of this module, students will be able to create scalable solutions to problems involving data from the semantic, social and scientific web, in addition to abilities gained in processing networks and performing of network analysis in order to identify key factors in information flow.
-
Clinical Trials
Clinical trials are planned experiments on human beings designed to assess the relative benefits of one or more forms of treatment. For instance, we might be interested in studying whether aspirin reduces the incidence of pregnancy-induced hypertension, or we may wish to assess whether a new immunosuppressive drug improves the survival rate of transplant recipients.
This module combines the study of technical methodology with discussion of more general research issues, beginning with a discussion of the relative advantages and disadvantages of different types of medical studies. The module will provide a definition and estimation of treatment effects. Furthermore, cross-over trials, issues of sample size determination, and equivalence trials are covered. There is an introduction to flexible trial designs that allow a sample size re-estimation during the ongoing trial. Finally, other relevant topics such as meta-analysis and accommodating confounding at the design stage are briefly discussed.
Students will gain knowledge of the basic elements of clinical trials. They will develop the ability to recognise and use principles of good study design, and will also be able to analyse and interpret study results to make correct scientific inferences.
-
Principles of Epidemiology
Introducing epidemiology, the study of the distribution and determents of disease in human population, this module presents its main principles and statistical methods. The module addresses the fundamental measures of disease, such as indicence, prevalence, risk and rates, including indices of morbidity and mortality.
Students will also develop awareness in epidemiologic study design, such as ecological studies, surveys, and cohort and case-control studies, in addition to diagnostic test studies. Epidemiological concepts will be addressed, such as bias and confounding, matching and stratification, and the module will also address calculation of rates, standardisation and adjustment, as well as issues in screening.
This module provides students with a historical and general overview of epidemiology and related strategies for study design, and should enable students to conduct appropriate methods of analysis for rates and risk of disease. Students will develop skills in critical appraisal of the literature and, in completing this module, will have developed an appreciation for epidemiology and an ability to describe the key statistical issues in the design of ecological studies, surveys, case-control studies, cohort studies and RCT, whilst recognising their advantages and disadvantages.
-
Longitudinal Data Analysis
This module presents an approach to the analysis of longitudinal data, based on statistical modelling and likelihood methods of parameter estimation and hypothesis testing. Among other topics, students will learn about the exploratory and simple analysis strategies, the independence working assumption, normal linear model with correlated errors and generalised estimation questions.
Students will develop an understanding in dealing with correlated data commonly arising in longitudinal studies, as well as an awareness of issues associated with collecting and analysing longitudinal data, whilst gaining a higher level of knowledge different modelling assumptions used in the analysis and their relations to the scientific aims of the study.
On module completion, students will gain the ability to explain the difference between longitudinal studies and cross-sectional studies, in addition to the knowledge required to select appropriate techniques to explore data, and the ability to compare different approaches to estimation and their usage in the analysis. Finally, students will obtain the skill level required to build statistical models for longitudinal data and draw valid conclusions from their models.
-
Environmental Epidemiology
This module introduces students to the kinds of statistical methods commonly used by epidemiologists and statisticians to investigate the relationship between risks of disease with environment factors. Students will discover motivation examples for methods in course, and will engage with spatial point-processes, including the theory and methods for the analysis of point-processes in two-dimensional space.
A number of published studies will be used to illustrate the methods described, and students will learn how to perform similar analyses using the statistical package, R. Students will learn methods and theory for analysing point-patterns, such as univariate and bivariate K-functions, methods for analysing care control data, including kernel intensity estimation, binary regression and generalised additive models. Students will also explore spatial generalised linear-mixed models including Poisson models for counts of a disease in regions and the concept of ecological bias, along with modelling elevated disease risk due to the presence of a point source and continuous spatial variation including the Gaussian geostatistical model, variograms and spatial prediction.
Students will develop the ability to recognise the difference between point process data, area-level data and geostatistical data, and the skills required to define and estimate the intensity of K functions for a spatial point process. Finally, successful students will be able to perform basic analyses of case-control and geostatistical data, along with a broad understanding of practical issues involved in undertaking environmental epidemiology studies.
-
Survival and Event History Analysis
This module addresses a range of topics relating to survival data; censoring, hazard functions, Kaplan-Meier plots, parametric models and likelihood construction will be discussed in detail. Students will engage with the Cox proportional hazard model, partial likelihood, Nelson-Aalen estimation and survival time prediction and will also focus on counting processes, diagnostic methods, and frailty models and effects.
The module provides an understanding of the unique features and statistical challenges surrounding the analysis of survival avant history data, in addition to an understanding of how non-parametric methods can aid in the identification of modelling strategies for time-to-event data, and recognition of the range and scope of survival techniques that can be implemented within standard statistical software.
General skills will be developed, including the ability to express scientific problems in a mathematical language, improvement of scientific writing skills, and an enhanced range of computing skills related to the manipulation on analysis of data.
On successful completion of this module, students will be able to apply a range of appropriate statistical techniques to survival and event history data using statistical software, to accurately interpret the output of statistical analyses using survival models, fitted using standard software, and the ability to construct and manipulate likelihood functions from parametric models for censored data. Students will also gain observation skills, such as the ability to identify when particular models are appropriate, through the application of diagnostic checks and model building strategies.
-
Forecasting
Every managerial decision concerned with future actions is based upon a prediction of some aspects of the future. Therefore Forecasting plays an important role in enhancing managerial decision making.
After introducing the topic of forecasting in organisations, time series patterns and simple forecasting methods (naïve and moving averages) are explored. Then, the extrapolative forecasting methods of exponential smoothing and ARIMA models are considered. A detailed treatment of causal modelling follows, with a full evaluation of the estimated models. Forecasting applications in operations and marketing are then discussed. The module ends with an examination of judgmental forecasting and how forecasting can best be improved in an organisational context. Assessment is through a report aimed at extending and evaluating student learning in causal modelling and time series analysis.
-
Introduction to Intelligent Data Analysis (Data Mining)
This module develops modelling skills on synthetic and empirical data by showing simple statistical methods and introducing novel methods from artificial intelligence and machine learning.
The module will cover a wide range of data mining methods, including simple algorithms such as decision trees all the way to state of the art algorithms of artificial neural networks, support vector regression, k-nearest neighbour methods etc. We will consider both Data Mining methods for descriptive modelling, exploration & data reduction that aim to simplify and add insights to large, complex data sets, and Data Mining methods for predictive modelling that aim to classify and cluster individuals into distinct, disjoint segments with different patterns of behaviour.
The module will also include a series of workshops in which you will learn how to use the SAS Enterprise Miner software for data mining (a software skill much sought after in the job market) and how to use it on real datasets in a real world scenario.
-
Optimisation and Heuristics
Optimisation, sometimes called mathematical programming, has applications in many fields, including operational research, computer science, statistics, finance, engineering and the physical sciences. Commercial optimisation software is now capable of solving many industrial-scale problems to proven optimality.
The module is designed to enable students to apply optimisation techniques to business problems. Building on the introduction to optimisation in the first term, students will be introduced to different problem formulations and algorithmic methods to guide decision making in business and other organisations.
-
Extreme Value Theory
This module aims to develop the asymptotic theory, and associated techniques for modelling and inference, associated with the analysis of extreme values of random processes. The module focuses on the mathematic basis of the models, the statistical principles for implementation and the computational aspects of data modelling.
Students will develop an appreciation of, and facility in, the various asymptotic arguments and models, and will also gain the ability to fit appropriate models to data using specially developed R software, in addition to a working understanding of fitted models. Knowledge in R software computing is an essential skill that is transferrable with a wide range of modules on the mathematics programme, and beyond.
-
Multi Level Models
Introducing data analysis, this module focuses on data that has a multi-level, hierarchical structure. Students will explore the use of multi-level structures in real dataset, working with statistical software such as Stata, R and MLwiN. The module will also provide an understanding in the classical ANOVA model.
Students will develop the ability to perform model diagnostics whilst applying appropriate notation and will develop skills in the area of presenting and interpreting output from statistical models through preparation of the coursework.
-
Methods for Missing Data
This module offers students an advanced understanding of statistics, and will explore the idea of missingness as a stochastic process. Students will develop and apply their knowledge in missing data formulas, focusing on the imputation model and the model of interest. More naive methods will be introduced, such as single imputation and list wise deletion, and students will develop the ability to recognise the limitations of each method and gain the knowledge required to identify situations where their use may be appropriate.
A portion of the module will introduce VIM software and explore its uses for finding missingness patterns.
This module will enhance deduction skills, and students will become accustomed to the differences between sampling and parameter uncertainty, in addition to noticing similarities between the Bayesian and imputation approaches.
Optional
Information contained on the website with respect to modules is correct at the time of publication, but changes may be necessary, for example as a result of student feedback, Professional Statutory and Regulatory Bodies' (PSRB) requirements, staff changes, and new research.
-
Data Mining
-
Key Information
Key Information
Please email Dr Chris Edwards with general questions about this MSc.
To discuss a specific data science specialism, please contact the relevant Programme Director:
MSc Data Science - Statistical Inference - Dr Deborah Costain
MSc Data Science - Computing - Dr Chris Edwards
Duration: 12 months.
Entry requirements: An upper-second class honours degree, or its equivalent, in a subject relevant to Computer Science, Mathematics or Statistics
IELTS: 6.5 or equivalent
Assessment: Coursework and examination
Funding: All applicants should consult our information on fees and funding.
For further information please see our website
-
Fees
Fees
Fees
Full Time (per year) Part Time (per year) UK/EU £9,500 £4,750 Overseas £20,500 n/a The University will not increase the Tuition Fee you are charged during the course of an academic year.
If you are studying on a programme of more than one year's duration, the tuition fees for subsequent years of your programme are likely to increase each year. The way in which continuing students' fee rates are determined varies according to an individual's 'fee status' as set out on our fees webpages.
What are tuition fees for?
Studying at a UK University means that you need to pay an annual fee for your tuition, which covers the costs associated with teaching, examinations, assessment and graduation.
The fee that you will be charged depends on whether you are considered to be a UK, EU or overseas student. Visiting students will be charged a pro-rata fee for periods of study less than a year.
Our annual tuition fee is set for a 12 month session, which usually runs from October to September the following year.
How does Lancaster set overseas tuition fees?
Overseas fees, alongside all other sources of income, allow the University to maintain its abilities across the range of activities and services. Each year the University's Finance Committee consider recommendations for increases to fees proposed for all categories of student and this takes into account a range of factors including projected cost inflation for the University, comparisons against other high-quality institutions and external financial factors such as projected exchange rate movements.
What support is available towards tuition fees?
Lancaster University's priority is to support every student in making the most of their education. Many of our students each year will be entitled to bursaries or scholarships to help with the cost of fees and/or living expenses. You can find out more about financial support, studentships, and awards for postgraduate study on our website.
-
Related Courses
Related Courses
Related Courses
- Applied Social Statistics : PhD
- Communication Systems : MSc by Research
- Communication Systems : PhD
- Computer Science : MPhil/PhD
- Computer Science : MSc
- Computer Science (by research) : MSc
- Cyber Security : MSc
- Data Science : PgCert
- Data Science : PgDip
- Electronic Engineering : MSc
- Mathematics : PhD
- Quantitative Finance : MSc
- Statistics : MSc
- Statistics : PGDip
- Statistics : PhD
- Statistics and Operational Research (STOR-i) : MRes
- Statistics and Operational Research (STOR-i) : PhD
Flexible Course Structure
The Data Science MSc is structured into three distinct terms. Your path through the course has a series of options depending on both your academic background and your choice of modules.
Work Placements
Lancaster's Data Science programme has been developed in co-operation with industry to give our students a great start to their data science careers. Preparation for a career in data science is at the heart of Lancaster’s MSc. We have worked with a wide range of organisations to provide our students with insight and experience of real-world data science. Throughout their course our students have contact with representatives from industry through group projects on live data, company talks and skills workshops.

Dom Clarke
WWL NHS Trust
"The whole process was great, everyone at the office welcomed me with open arms and helped me with anything I wanted. I was provided total flexibility to use any methods I desired, it was really MY project. I was gifted with the reward of seeing the predictions implemented and seeing the work I had done presented and discussed in front of executives from trusts around the north-west."

Hwan Lee
AstraZeneca
"This experience was unique and valuable since it allowed me to learn from experts in academia and industry and taught me how to approach a data science project from a business perspective. The placement made me easier to be incorporated into the industry as a future data scientist by providing a clear understanding of a data scientist’s role."

Sean Sheehan
The Behavioural Insights Team
"I learned an enormous amount over course of the project as well as putting into practice and developing the technical skills and knowledge learned on the MSc course, and being able to contribute to a substantial project in this way has increased my confidence in my own ability. The experience has proved to be especially valuable in securing work after completion of the course."

Ioannis Tsalamanis
Uniper Energy
"Personally, the placement provided a transition from the academic environment to the industrial environment and gave me the opportunity to prepare for the job interviews that came afterwards. The fact that there was a short industrial placement after the MSc was highly appreciated by most of the interviewers and offered a great means of describing how my data scientist skills were applied and enhanced."

Placement Hosts
Our students have taken their work placements in a wide range of businesses, from global corporations to local SMEs. These are just some of the places that have provided students with opportunities to gain experience in real-world data science.
Learn moreCareers and Employability
The demand for people with data science skills is predicted to double over the next five years. This rising need is reflected in the average salary for data scientists, which is now £60,000 per annum.
Designed with Industry
If you wish to pursue a career in data science need to be able to demonstrate both technical knowledge, an understanding of the role of data in modern enterprises and an ability to communicate the meaning of data. Our MSc programme has been designed in collaboration with industry to give students the opportunity both to learn the necessary skills and to use them in real-world settings through industry-hosted placements.
Career Options
Our programme opens the door to many possible careers, including:
- Data scientist or data science consultant
- Financial modeller
- Clinical and pharmaceutical analyst
- Data technologies specialist
Our alumni have gone on to data science roles at Amazon, Deloitte, Santander, Bloomberg, The Office of National Statistics, The Environment Agency and more.
Lifetime Support
As a student at Lancaster, you will gain access to our excellent careers service, offering lifetime support, help and friendly advice. We offer lifetime support, help and advice to all of our students. This includes one-to-one support and advice on work experience, employability skills and careers.
Enterprise Education
Lancaster University is committed to providing its entrepreneurial students with the support they need to launch their own enterprises. We understand that you may wish to start your own company as soon as possible. We offer you the opportunity to incorporate an Enterprise Project into your studies, instead of a work placement. You may prefer to complete your studies with a project that will form the basis of a future enterprise and we help you to develop your ideas.