Statistical Learning

The world's most successful companies collect and use data as never before: Amazon's product suggestions are tailored to each individual, Google's advertising is targeted at specific individuals, and Tesco's clubcard offers are made for a shopper's particular situation. The computer methods used to do so are usually called learning algorithms, since they "learn" from data about the environment and the users.

Statistical learning analyses these algorithms, and provides new innovations, from the perspective of statistical theory, using various techniques of pure mathematics. In particular, functional analysis, probability theory and combinatorics have had a profound impact on the analysis and development of learning algorithms. With ever more data and computational power available, the field of statistical learning is in high demand in industry, and is at the forefront of research that will undoubtedly impact every individual's life in the years to come.

The statistical learning group in Lancaster has strong links with industry, particularly in the area of so-called "bandit algorithms". Each time a company has an opportunity to display an advert, it may choose one of several possible adverts to display. Each display opportunity can be used to exploit an advert currently believed to be the best, or to explore an option about which insufficient information is yet known. Managing this exploration-exploitation trade-off is a cornerstone of bandit research, and Lancaster researchers have made fundamental contributions to the area, most recently providing the first theoretical performance guarantees on a popular method called Thompson sampling.

Furthermore, in modern applications of statistical learning, any method must be sufficiently computationally tractable to run in real time. Both for these bandit algorithms, and for statistical learning algorithms more generally, our research also focusses on how to make algorithms' computation scale well to large-data applications whilst still retaining a high statistical efficiency.