Statistical Learning Workshop

The statistical learning workshop, organised by Azadeh Khaleghi and funded by LMS Scheme 1 Grant, will take place at Lancaster University on 23rd March 2017. The workshop is centred around statistical learning theory and machine learning. This is an interdisciplinary research topic at the intersection of mathematics, statistics and computer science. The focus of the meeting will be on theoretical foundations of learning theory, where the objective is to construct efficient automatic inference, learning and decision-making algorithms with theoretical guarantees that require little to no human supervision to function as desired.

Participation

Everybody is welcome to attend this event and there is no registration fee. Please note that for UK-based mathematicians with caring duties the LMS has a Caring Supplementary Grant scheme which allows participants of meetings like ours to apply for help covering caring costs.

Venue

All talks will be held in the Postgraduate Statistics Centre Lecture Theatre (Room A54, building 48 on the campus map). Information on travelling to Lancaster University and some maps can be found in the section here.

Schedule

12:00 - 13:00: Azadeh Khaleghi
13:00 - 14:00: Lunch break
14:00 - 15:00: Quentin Berthet
15:00 - 16:00: Andras Gyorgy
16:00 - 16:30: Coffee break
16:30 - 17:30: Dino Sejdinovic
18:30: Conference Dinner

Abstracts

Exact recovery in the Ising blockmodel

Quentin Berthet

We consider the problem associated to recovering the block structure of an Ising model given independent observations on the binary hypercube. This new model called the Ising blockmodel, is a perturbation of the mean-field approximation of the Ising model known as the Curie-Weiss model: the sites are partitioned into two blocks of equal size and the interaction between those of the same block is stronger than across blocks, to account for more order within each block. We study probabilistic, statistical and computational aspects of this model in the high-dimensional case when the number of sites may be much larger than the sample size.

A Modular Analysis of Generalized Online and Stochastic Optimization with Applications to Non-Convex Optimization

Andras Gyorgy

Recently, much work has been done on extending the scope of online learning and incremental stochastic optimization algorithms. In this work we contribute to this effort in two ways: First, based on a new regret decomposition and a generalization of Bregman divergences that requires only directional differentiability of the functions involved, we provide a self-contained, modular analysis of the two workhorses of online learning: (general) adaptive versions of Mirror Descent (MD) and the Follow-the-Regularized-Leader (FTRL) algorithms. The analysis is done with extra care so as to not to introduce assumptions not needed in the proofs. This way we are able to reprove, extend and refine a large body of the literature while keeping the proofs concise. The second contribution is a byproduct of this careful analysis: Whereas previously the algorithms mentioned have been analyzed in the convex setting, we provide results for several practically relevant non-convex problem settings, with essentially no extra effort. Some consequences of our general analysis to distributed asynchronous optimization will also be discussed.
Based on joint work with Pooria Joulani and Csaba Szepesvari

Consistent Sequential Learning Algorithms for Highly Dependent Data

Azadeh Khaleghi

One of the main challenges in statistical learning today is to make sense of complex sequential data which typically represent interesting, unknown phenomena to be inferred. To address the problem from a mathematical perspective, it is usually assumed that data has been generated by some random process where the goal is to make inference about the stochastic mechanisms that produce the samples. Since little is usually known about the nature of the data, it is important to address inference beyond parametric and modelling assumptions. One approach is to assume that the process distributions are stationary ergodic but do not belong to any simpler class of processes. This paradigm has proved useful in a number of learning problems that involve dependent sequential data. At the same time, many natural problems already turn out to be impossible to solve under this assumption. In this talk, I will discuss the possibilities and limitations of sequential inference in the stationary ergodic framework. This is specifically analysed in the context of change-point estimation, time-series clustering and the restless bandit problem.

Learning with Kernel Embeddings

Dino Sejdinovic

Embeddings into reproducing kernel Hilbert spaces (RKHS) provide flexible representations of probability measures. They have been used to construct powerful nonparametric hypothesis tests and association measures and lead to a notion of Maximum Mean Discrepancy (MMD), a nonparametric distance between probability measures, popular in machine learning literature. I will overview recent developments within this framework, including an extension that models differences between probability measures which are invariant to additive symmetric noise, useful for learning on distributions under covariate shift, and a Bayesian method to estimate kernel embeddings leading to a new approach to learn kernel hyperparameters and detect multiscale properties in the data.