Course Structure
The Data Science MSc is structured into three distinct terms. Your path through the course has a series of options depending on both your academic background and your choice of modules.
Term 1 provides core data scientific knowledge and skills training and is divided into five study modules, worth a total of 75 credits - 15 credits per module. You will study three Common Core data science modules that are compulsory, together with two Core statistics modules that are Specialism-specific and tailored according to your academic background - Statistics I or Statistics II.
This module teaches students about how data science is performed within academic and industry (via invited talks), research methods and how different research strategies are applied across different disciplines, and data science techniques for processing and analysing data. Students will engage in group project work, based on project briefs provided by industrial speakers, within multi-skilled teams (e.g. computing students, statistics students, environmental science students) in order to apply their data science skills to researching and solving an industrial data science problem.
Students will learn through a series of group exercises around research studies and projects related to data science topics. Invited talks from industry tackling data science problems will be given to teach the students about the application of data science skills in industry and academia. Students will gain knowledge of:
This module is designed for students that are completely new to programming, and for experienced programmers, bringing them both to a high-skilled level to handle complex data science problems. Beginner students will learn the fundamentals of programming, while experienced students will have the opportunity to sharpen and further develop their programming skills. The students are going to learn data-processing techniques, including visualisation and statistical data analysis. For a broad formation, in order to handle the most complex data science tasks, we will also cover problem solving, and the development of graphical applications.
In particular students will gain experience with two very important open source languages: R and Python. R is the best language for statistical analysis, being widely applied in academia and industry to handle a variety of different problems. Being able to program in R gives the data scientists access to the best and most updated libraries for handling a variety of classical and state of the art statistical methods. Python, on the other hand, is a general purpose programming language, also widely used for three main reasons: it is easy to learn, being recommended as a "first" programming language; it allows easy and quick development of applications; it has a great variety of useful and open libraries. For those reasons, Python has also been widely applied for scientific computing and data analysis. Additionally, Python enables the data scientist to easily develop other kinds of useful applications: for example, searching for optimal decisions given a data-set, graphical applications for data gathering, or even programming Raspberry Pi devices in order to create sensors or robots for data collection. Therefore, learning these two languages will not only enable the students to develop programming skills, but it will also give them direct access to two fundamental languages for contemporary data analysis, scientific computing, and general programming.
Additionally, students will gain experience by working through exercise tasks and discussing their work with their peers; thereby fostering interpersonal communications skills. Students that are new to programming will find help in their experience peers, and experienced programmers will learn how to assist and explain the fundamental concepts to beginners.
This module will provide a comprehensive coverage of the problems related to Data representation, storage, manipulation, retrieval and processing in terms of extracting information from the data. It has been designed to provide a fundamental theoretical level of knowledge and skills (at the related laboratory sessions) to this specific aspect of Data Science, which plays an important role in any system and application. In this way it prepares students for the second module on the topic of Data as well as for their projects.
Statistics Modules I is for students with a degree in Mathematics and/or Statistics. Statistics Modules II is for students with A-level Mathematics or equivalent.
The aim of this module will be to address the fundamentals of statistics for those who do not have a mathematics and statistics undergraduate degree. Building upon the pre-learning ‘mathematics for statistics’ module is delivered over five weeks via a series of lectures and practical’s. Students will develop an understanding of the theory behind core statistical topics; sampling, hypothesis testing, and modelling. They will also be putting this knowledge into practice, by applying it to real data to address research questions.
The module is an amalgamation of three short courses and is taught in weeks 1-5.
There will be three pieces of coursework:
This modules aims to provide an in-depth understanding of statistics as a general approach to the problem of making valid inferences about relationships from observational and experimental studies. The emphasis will be on the principle of Maximum Likelihood as a unifying theory for estimating parameters. The module is delivered as a combination of lectures and practical’s over four week.
Students will learn through the application of concepts and techniques covered in the module by application to real data sets. Students will be encouraged to examine issues of substantive interest in these studies. Students will acquire knowledge of:
Generalised linear models are now one of the most frequently used statistical tools of the applied statistician. They extend the ideas of regression analysis to a wider class of problems that involves exploring the relationship between a response and one or more explanatory variables. In this course we aim to discuss applications of the generalised linear models to diverse range of practical problems involving data from the area of biology, social sciences and time series to name a few and to explore the theoretical basis of these models.
This course considers the idea of statistical models and how the likelihood function, defined to be the probability of the observed data viewed as a function of unknown model parameters, can be used to make inference about those parameters. This inference includes both estimates of the values of these parameters, and measures of the uncertainty surrounding these estimates. We consider single and multi-parameter models, and models which do not assume the data are independent and identically distributed. We also cover computational aspects of likelihood inference that are required in many practical applications, including numerical optimization of likelihood functions and bootstrap methods to estimate uncertainty.