Data Science MSc course structure

You will learn core skills in the first term, as the second and third terms allow you to shape the degree according to your interests and background.

You will begin with a series of core modules that are studied by all students. These core modules are augmented with other modules depending on your academic background. You can then further tailor the course with Specialism modules - Computing and Statistical Inference - each with its own range of designated pathways.

We encourage you to conclude your studies with a 3-month placement project with an external organisation. These generally attract a stipend of £3,000 and will provide you with the professional experience to stand out from the crowd.

If you would like to apply for one of our Data Science Masters degrees, you need to use Lancaster University's My Applications website.

Term 1: Core modules

Term 1 provides core data science knowledge and skills training and is divided into five compulsory modules delivered by the Department of Mathematics and Statistics and the School of Computing and Communications. These modules seek to introduce, develop and consolidate core techniques which will form a strong foundation for the specialist pathways modules in Term 2.

Accordion

  • Data Science Fundamentals SCC460

    This module teaches students about how data science is performed within academic and industry (via invited talks), research methods and how different research strategies are applied across different disciplines, and data science techniques for processing and analysing data. Students will engage in group project work, based on project briefs provided by industrial speakers, within multi-skilled teams (e.g. computing students, statistics students, environmental science students) in order to apply their data science skills to researching and solving an industrial data science problem.

    Topics covered will include

    • The role of the data scientist and the evolving epistemology of data science
    • The language of research, how to form research questions, writing literature reviews, and variance of research strategies across disciplines
    • Ethics surrounding data collection and re-sharing, and unwanted inferences
    • Identifying potential data sources and the data acquisition processes
    • Defining and quantifying biases, and data preparation (e.g. cleaning, standardisation, etc.)
    • Choosing a potential model for data, understanding model requirements and constraints, specifying model properties a priori, and fitting models
    • Inspection of data and results using plots, and hypothesis and significance tests
    • Writing up and presenting findings

    Learning

    Students will learn through a series of group exercises around research studies and projects related to data science topics. Invited talks from industry tackling data science problems will be given to teach the students about the application of data science skills in industry and academia. Students will gain knowledge of:

    • Defining a research question and a hypothesis to be tested, and choosing an appropriate research strategy to test that hypothesis
    • Analysing datasets provided in heterogeneous forms using a range of statistical techniques
    • How to relate potential data sources to a given research question, acquire such data and integrate it together
    • Designing and performing appropriate experiments given a research question
    • Implementing appropriate models for experiments and ensuring that the model is tested in the correct manner
    • Analysing experimental findings and relating these findings back to the original research goal

    Recommended texts and other learning resources

    • O'Neil. C., and Schutt. R. (2013) Doing Data Science: Straight Talk from the Frontline. O’Reilly
    • Trochim. W. (2006) The Research Methods Knowledge Base. Cenage Learning
  • Programming for Data Scientists SCC461

    This module is designed for students that are completely new to programming, and for experienced programmers, bringing them both to a high-skilled level to handle complex data science problems. Beginner students will learn the fundamentals of programming, while experienced students will have the opportunity to sharpen and further develop their programming skills. The students are going to learn data-processing techniques, including visualisation and statistical data analysis. For a broad formation, in order to handle the most complex data science tasks, we will also cover problem solving, and the development of graphical applications.

    In particular students will gain experience with two very important open source languages: R and Python. R is the best language for statistical analysis, being widely applied in academia and industry to handle a variety of different problems. Being able to program in R gives the data scientists access to the best and most updated libraries for handling a variety of classical and state of the art statistical methods. Python, on the other hand, is a general purpose programming language, also widely used for three main reasons: it is easy to learn, being recommended as a "first" programming language; it allows easy and quick development of applications; it has a great variety of useful and open libraries. For those reasons, Python has also been widely applied for scientific computing and data analysis. Additionally, Python enables the data scientist to easily develop other kinds of useful applications: for example, searching for optimal decisions given a data-set, graphical applications for data gathering, or even programming Raspberry Pi devices in order to create sensors or robots for data collection. Therefore, learning these two languages will not only enable the students to develop programming skills, but it will also give them direct access to two fundamental languages for contemporary data analysis, scientific computing, and general programming.

    Additionally, students will gain experience by working through exercise tasks and discussing their work with their peers; thereby fostering interpersonal communications skills. Students that are new to programming will find help in their experienced peers, and experienced programmers will learn how to assist and explain the fundamental concepts to beginners.

    Topics covered will include

    • Fundamental programming concepts (statements, variables, functions, loops, etc)
    • Data abstraction (modules, classes, objects, etc)
    • Problem-solving
    • Using libraries for developing applications (e.g., SciPy, PyGames)
    • Performing statistical analysis and data visualisation

    On successful completion of this module, students will be able to

    • Solve data science problems in an automatic fashion
    • Handle complex data-sets, which cannot be easily analysed "by hand"
    • Use existing libraries and/or develop their own libraries
    • Learn new programming languages, given the background knowledge of two important ones

    Bibliography

  • Data Mining SCC403

    This module will provide a comprehensive coverage of the problems related to Data representation, storage, manipulation, retrieval and processing in terms of extracting information from the data. It has been designed to provide a fundamental theoretical level of knowledge and skills (at the related laboratory sessions) to this specific aspect of Data Science, which plays an important role in any system and application. In this way it prepares students for the second module on the topic of Data as well as for their projects.

    Topics to be covered will include

    • Data Primer: Setting the scene: Big Data, Cloud Computing; The time, storage and computing power compromise: off-line versus on-line
    • Data Representations
    • Storage Paradigms
    • Vector-space models
    • Hierarchical (agglomerative/diversive)
    • k means
    • SQL and Relational Data Structures (short refresher)
    • NoSQL: Document stores, graph databases
    • Inference and reasoning
    • Associative and Fuzzy Rules
    • Inference mechanisms
    • Data Processing
    • Clustering
    • Density-based, on-line, evolving
    • Classification
    • Randomness and determinism, frequentist and belief based approaches, probability density, recursive density estimation, averages and moments, important random signals, response of linear systems to random signals, random signal models
    • Discriminative (Linear Discriminant Analysis, Single Perceptron, Multi-layer Perceptron, Learning Vector Classifier, Support Vector Machines), Generative (Naive Bayes)
    • Supervised and unsupervised learning, online and offline systems, adaptive and evolving systems, evolving versus evolutionary systems, normalisation and standardisation
    • Fuzzy Rule-based Classifiers, Regression or Lable based classifiers
    • Self-learning Classifiers, evolving Classifiers, dynamic data space partitioning using evolving clustering and data clouds, monitoring the quality of the self-learning system online, evolving multi-model predictive systems
    • Semi-supervised Learning (Self-learning, evolving, Bootstrapping, Expectation-Maximisation, ensemble classifiers)
    • Information Extraction vs Retrieval

    On successful completion of this module students will

    • Demonstrate understanding of the concepts and specific methodologies for data representation and processing and their applications to practical problems
    • Analyse and synthesise effective methods and algorithms for data representation and processing
    • Develop software scripts that implement advanced data representation and processing and demonstrate their impact on the performance
    • List, explain and generalise the trade-offs of performance and complexity in designing practical solutions for problems of data representation and processing in terms of storage, time and computing power

Term 2: Optional modules

Term 2 allows for specialism in either advanced technical or application areas. Applications offered are those for which there is a considerable demand for data scientists. Subject-specific experts from across the University will deliver the specific pathways. We currently provide pathways in:

The available modules build upon the core set and have a prerequisite level of skills and knowledge.

Business intelligence

Accordion

  • Forecasting

    Every managerial decision concerned with future actions is based upon a prediction of some aspects of the future. Therefore Forecasting plays an important role in enhancing managerial decision making.

    After introducing the topic of forecasting in organisations, time series patterns and simple forecasting methods (naïve and moving averages) are explored. Then, the extrapolative forecasting methods of exponential smoothing and ARIMA models are considered. A detailed treatment of causal modelling follows, with a full evaluation of the estimated models. Forecasting applications in operations and marketing are then discussed. The module ends with an examination of judgmental forecasting and how forecasting can best be improved in an organisational context. Assessment is through a report aimed at extending and evaluating student learning in causal modelling and time series analysis.

  • Optimisation and Heuristics

    Optimisation, sometimes called mathematical programming, has applications in many fields, including operational research, computer science, statistics, finance, engineering and the physical sciences. Commercial optimisation software is now capable of solving many industrial-scale problems to proven optimality.

    The module is designed to enable students to apply optimisation techniques to business problems. Building on the introduction to optimisation in the first term, students will be introduced to different problem formulations and algorithmic methods to guide decision making in business and other organisations.

  • Introduction to Intelligent Data Analysis (Data Mining)

    This module develops modelling skills on synthetic and empirical data by showing simple statistical methods and introducing novel methods from artificial intelligence and machine learning.

    The module will cover a wide range of data mining methods, including simple algorithms such as decision trees all the way to state of the art algorithms of artificial neural networks, support vector regression, k-nearest neighbour methods etc. We will consider both Data Mining methods for descriptive modelling, exploration & data reduction that aim to simplify and add insights to large, complex data sets, and Data Mining methods for predictive modelling that aim to classify and cluster individuals into distinct, disjoint segments with different patterns of behaviour.

    The module will also include a series of workshops in which you will learn how to use the SAS Enterprise Miner software for data mining (a software skill much sought after in the job market) and how to use it on real datasets in a real world scenario.

Health

Accordion

  • Principles of Epidemiology

    Introducing epidemiology, the study of the distribution and determents of disease in human population, this module presents its main principles and statistical methods. The module addresses the fundamental measures of disease, such as indicence, prevalence, risk and rates, including indices of morbidity and mortality.

    Students will also develop awareness in epidemiologic study design, such as ecological studies, surveys, and cohort and case-control studies, in addition to diagnostic test studies. Epidemiological concepts will be addressed, such as bias and confounding, matching and stratification, and the module will also address calculation of rates, standardisation and adjustment, as well as issues in screening.

    This module provides students with a historical and general overview of epidemiology and related strategies for study design, and should enable students to conduct appropriate methods of analysis for rates and risk of disease. Students will develop skills in critical appraisal of the literature and, in completing this module, will have developed an appreciation for epidemiology and an ability to describe the key statistical issues in the design of ecological studies, surveys, case-control studies, cohort studies and RCT, whilst recognising their advantages and disadvantages.

  • Longitudinal Data Analysis

    This module presents an approach to the analysis of longitudinal data, based on statistical modelling and likelihood methods of parameter estimation and hypothesis testing. Among other topics, students will learn about the exploratory and simple analysis strategies, the independence working assumption, normal linear model with correlated errors and generalised estimation questions.

    Students will develop an understanding in dealing with correlated data commonly arising in longitudinal studies, as well as an awareness of issues associated with collecting and analysing longitudinal data, whilst gaining a higher level of knowledge different modelling assumptions used in the analysis and their relations to the scientific aims of the study.

    On module completion, students will gain the ability to explain the difference between longitudinal studies and cross-sectional studies, in addition to the knowledge required to select appropriate techniques to explore data, and the ability to compare different approaches to estimation and their usage in the analysis. Finally, students will obtain the skill level required to build statistical models for longitudinal data and draw valid conclusions from their models.

  • Survival and Event History Analysis

    This module addresses a range of topics relating to survival data; censoring, hazard functions, Kaplan-Meier plots, parametric models and likelihood construction will be discussed in detail. Students will engage with the Cox proportional hazard model, partial likelihood, Nelson-Aalen estimation and survival time prediction and will also focus on counting processes, diagnostic methods, and frailty models and effects.

    The module provides an understanding of the unique features and statistical challenges surrounding the analysis of survival avant history data, in addition to an understanding of how non-parametric methods can aid in the identification of modelling strategies for time-to-event data, and recognition of the range and scope of survival techniques that can be implemented within standard statistical software.

    General skills will be developed, including the ability to express scientific problems in a mathematical language, improvement of scientific writing skills, and an enhanced range of computing skills related to the manipulation on analysis of data.

    On successful completion of this module, students will be able to apply a range of appropriate statistical techniques to survival and event history data using statistical software, to accurately interpret the output of statistical analyses using survival models, fitted using standard software, and the ability to construct and manipulate likelihood functions from parametric models for censored data. Students will also gain observation skills, such as the ability to identify when particular models are appropriate, through the application of diagnostic checks and model building strategies.

  • Bioinformatics

    This course will equip students with a working knowledge of the main themes in bioinformatics. On successful completion, students should be confident and competent in all aspects of bioinformatics that can be executed via the web or on software running on Windows/Mac systems. They will have an understanding of the theoretical algorithms that underpin the various software applications that they use, and be able to perform bioinformatics within their own biological sub-field. More generally, this module also aims to encourage students to access and evaluate information from a variety of sources and to communicate the principles in a way that is well-organised, topical and recognises the limits of current hypotheses. It also aims to equip students with practical techniques including data collection, analysis and interpretation.

  • Modelling of Infectious Diseases

    This module aims to provide students with the necessary knowledge, and analytical and modelling skills to develop and fit mathematical transmission models to understand infection dynamics, explore interventions, and to inform control policy. It will also provide students with the ability to analyse outbreak information, and to implement transmission models using the R programming language. Students will gain experience of handling and linking epidemiological data relevant to infectious disease outbreaks. They will gain hands-on experience of developing transmission models, appropriate to a specific research question or epidemiological application, and of using those models for scenario exploration. Students will also gain experience in communicating and presenting epidemic models and their outputs.

  • Clinical Trials

    Clinical trials are planned experiments on human beings designed to assess the relative benefits of one or more forms of treatment. For instance, we might be interested in studying whether aspirin reduces the incidence of pregnancy-induced hypertension, or we may wish to assess whether a new immunosuppressive drug improves the survival rate of transplant recipients.

    This module combines the study of technical methodology with discussion of more general research issues, beginning with a discussion of the relative advantages and disadvantages of different types of medical studies. The module will provide a definition and estimation of treatment effects. Furthermore, cross-over trials, issues of sample size determination, and equivalence trials are covered. There is an introduction to flexible trial designs that allow a sample size re-estimation during the ongoing trial. Finally, other relevant topics such as meta-analysis and accommodating confounding at the design stage are briefly discussed.

    Students will gain knowledge of the basic elements of clinical trials. They will develop the ability to recognise and use principles of good study design, and will also be able to analyse and interpret study results to make correct scientific inferences.

Environmental

Accordion

  • Geoinformatics

    This module introduces students to the fundamental principles of Geographical Information Systems (GIS) and Remote Sensing (RS) and shows how these complementary technologies may be used to capture/derive, manipulate, integrate, analyse and display different forms of spatially-referenced environmental data. The module is highly vocational with theory-based lectures complemented by hands-on practical sessions using state-of-the-art software (ArcGIS & ERDAS Imagine).

    In addition to the subject-specific aims, the module provides students with a range of generic skills to synthesise geographical data, develop suitable approaches to problem-solving, undertake independent learning (including time management) and present the results of the analysis in novel graphical formats.

  • Modelling Environmental Processes

    This module provides an introduction to the basic principles and approaches to computer-aided modelling of environmental processes with applications to real environmental problems such as catchment modelling, pollutant dispersal in rivers and estuaries, population dynamics etc. More general, the module provides an introduction to general aspects of dynamic systems modelling including the role of uncertainty and data in the modelling process.

  • Extreme Value Theory

    This module aims to develop the asymptotic theory, and associated techniques for modelling and inference, associated with the analysis of extreme values of random processes. The module focuses on the mathematic basis of the models, the statistical principles for implementation and the computational aspects of data modelling.

    Students will develop an appreciation of, and facility in, the various asymptotic arguments and models, and will also gain the ability to fit appropriate models to data using specially developed R software, in addition to a working understanding of fitted models. Knowledge in R software computing is an essential skill that is transferrable with a wide range of modules on the mathematics programme, and beyond.

Societal

Accordion

  • Methods for Missing Data

    This module offers students an advanced understanding of statistics, and will explore the idea of missingness as a stochastic process. Students will develop and apply their knowledge in missing data formulas, focusing on the imputation model and the model of interest. More naive methods will be introduced, such as single imputation and list wise deletion, and students will develop the ability to recognise the limitations of each method and gain the knowledge required to identify situations where their use may be appropriate.

    A portion of the module will introduce VIM software and explore its uses for finding missingness patterns.

    This module will enhance deduction skills, and students will become accustomed to the differences between sampling and parameter uncertainty, in addition to noticing similarities between the Bayesian and imputation approaches.

  • Principles of Epidemiology

    Introducing epidemiology, the study of the distribution and determents of disease in human population, this module presents its main principles and statistical methods. The module addresses the fundamental measures of disease, such as indicence, prevalence, risk and rates, including indices of morbidity and mortality.

    Students will also develop awareness in epidemiologic study design, such as ecological studies, surveys, and cohort and case-control studies, in addition to diagnostic test studies. Epidemiological concepts will be addressed, such as bias and confounding, matching and stratification, and the module will also address calculation of rates, standardisation and adjustment, as well as issues in screening.

    This module provides students with a historical and general overview of epidemiology and related strategies for study design, and should enable students to conduct appropriate methods of analysis for rates and risk of disease. Students will develop skills in critical appraisal of the literature and, in completing this module, will have developed an appreciation for epidemiology and an ability to describe the key statistical issues in the design of ecological studies, surveys, case-control studies, cohort studies and RCT, whilst recognising their advantages and disadvantages.

  • Longitudinal Data Analysis

    This module presents an approach to the analysis of longitudinal data, based on statistical modelling and likelihood methods of parameter estimation and hypothesis testing. Among other topics, students will learn about the exploratory and simple analysis strategies, the independence working assumption, normal linear model with correlated errors and generalised estimation questions.

    Students will develop an understanding in dealing with correlated data commonly arising in longitudinal studies, as well as an awareness of issues associated with collecting and analysing longitudinal data, whilst gaining a higher level of knowledge different modelling assumptions used in the analysis and their relations to the scientific aims of the study.

    On module completion, students will gain the ability to explain the difference between longitudinal studies and cross-sectional studies, in addition to the knowledge required to select appropriate techniques to explore data, and the ability to compare different approaches to estimation and their usage in the analysis. Finally, students will obtain the skill level required to build statistical models for longitudinal data and draw valid conclusions from their models.

  • Introduction to Intelligent Data Analysis (Data Mining)

    This module develops modelling skills on synthetic and empirical data by showing simple statistical methods and introducing novel methods from artificial intelligence and machine learning.

    The module will cover a wide range of data mining methods, including simple algorithms such as decision trees all the way to state of the art algorithms of artificial neural networks, support vector regression, k-nearest neighbour methods etc. We will consider both Data Mining methods for descriptive modelling, exploration & data reduction that aim to simplify and add insights to large, complex data sets, and Data Mining methods for predictive modelling that aim to classify and cluster individuals into distinct, disjoint segments with different patterns of behaviour.

    The module will also include a series of workshops in which you will learn how to use the SAS Enterprise Miner software for data mining (a software skill much sought after in the job market) and how to use it on real datasets in a real world scenario.

Computing

Accordion

  • Applied Data Mining

    This module provides students with up-to-date information on current applications of data in both industry and research. Expanding on the module ‘Fundamentals of Data’, students will gain a more detailed level of understanding about how data is processed and applied on a large scale across a variety of different areas.

    Students will develop knowledge in different areas of science and will recognise their relation to big data, in addition to understanding how large-scale challenges are being addressed with current state-of-the-art techniques. The module will provide recommendations on the Social Web and their roots in social network theory and analysis, in addition their adaption and extension to large-scale problems, by focusing on primer, user-generated content and crowd-sourced data, social networks (theories, analysis), recommendation (collaborative filtering, content recommendation challenges, and friend recommendation/link prediction).

    On completion of this module, students will be able to create scalable solutions to problems involving data from the semantic, social and scientific web, in addition to abilities gained in processing networks and performing of network analysis in order to identify key factors in information flow.

  • Building Big Data Systems

    In this module we explore the architectural approaches, techniques and technologies that underpin today's Big Data system infrastructure and particularly large-scale enterprise systems. It is one of two complementary modules that comprise the Systems stream of the Computer Science MSc, which together provide a broad knowledge and context of systems architecture enabling students to assess new systems technologies, to know where technologies fit in the larger scheme of enterprise systems and state of the art research thinking, and to know what to read to go deeper.

    The principal ethos of the module is to focus on the principles of Big Data systems, and applying those principles using state of the art technology to engineer and lead data science projects. Detailed case studies and invited industrial speakers will be used to provide supporting real-world context and a basis for interactive seminar discussions.

  • Distributed Artificial Intelligence

    Distributed artificial intelligence is fundamental in contemporary data analysis. Large volumes of data and computation call for multiple computers in problem solving. Being able to understand and use those resources efficiently is an important skill for a data scientist. A distributed approach is also important for fault tolerance and robustness, as the loss of a single component must not significantly compromise the whole system. Additionally, contemporary and future distributed systems go beyond computer clusters and networks. Distributed systems are often comprised of multiple agents - multiple software, humans and/or robots that all interact in problem solving. As a data scientist, we may have control of the full distributed system, or we may have control of only one piece, and we have to decide how it must behave in face of others in order to accomplish our goals.

Term 3: Dissertation and work placement

We have arranged placement projects for over 200 students over the last6 years of our programme. Our students have undertaken projects at many leading companies, including Unilever, Siemens and The Bank of England.

If you do not choose to work in a partner organisation, you also have the opportunity to base your dissertation on either:

  • A research project based at the University -if you are interested in building a career in data science research.
  • An enterprise project - if you are interested in starting your own data science business.

We have worked with a wide range of organisations to provide our students with insight and experience of real-world data science. Throughout their course our students have contact with representatives from industry through group projects on live data, company talks and skills workshops. Our students have performed placement projects at Unilever, Raytheon, The Bank of England, Siemens, The Co-operative Insurance and many more. Within our programme, we also emphasise helping students develop the transferable skills that will enable them to make the most of careers opportunities. In addition to our industry seminars, we work closely with students to help them present themselves to potential employers most effectively.

Our placement process has been consistently praised both by our partner organisations and by our students as offering exceptional opportunities for students to launch their data science careers.

Previous placements

If you would like to apply for one of our Data Science Masters degrees, you need to use Lancaster University's My Applications website.

Linked icons