Statistical Learning Workshop
Titles and Abstracts
We consider the problem associated to recovering the block structure of an Ising model given independent observations on the binary hypercube. This new model, called the Ising blockmodel, is a perturbation of the mean field approximation of the Ising model known as the Curie-Weiss model: the sites are partitioned into two blocks of equal size and the interaction between those of the same block is stronger than across blocks, to account for more order within each block. We study probabilistic, statistical and computational aspects of this model in the high-dimensional case when the number of sites may be much larger than the sample size.
Recently, much work has been done on extending the scope of online learning and incremental stochastic optimization algorithms. In this work we contribute to this effort in two ways: First, based on a new regret decomposition and a generalization of Bregman divergences that requires only directional differentiability of the functions involved, we provide a self-contained, modular analysis of the two workhorses of online learning: (general) adaptive versions of Mirror Descent (MD) and the Follow-the-Regularized-Leader (FTRL) algorithms. The analysis is done with extra care so as to not to introduce assumptions not needed in the proofs. This way we are able to reprove, extend and refine a large body of the literature, while keeping the proofs concise. The second contribution is a byproduct of this careful analysis: Whereas previously the algorithms mentioned have been analyzed in the convex setting, we provide results for several practically relevant non-convex problem settings, with essentially no extra effort. Some consequences of our general analysis to distributed asynchronous optimization will also be discussed.
Based on joint work with Pooria Joulani and Csaba Szepesvari
One of the main challenges in statistical learning today is to make sense of complex sequential data which typically represent interesting, unknown phenomena to be inferred. To address the problem from a mathematical perspective, it is usually assumed that data have been generated by some random process where the goal is to make inference about the stochastic mechanisms that produce the samples. Since little is usually known about the nature of the data, it is important to address inference beyond parametric and modelling assumptions. One approach is to assume that the process distributions are stationary ergodic but do not belong to any simpler class of processes. This paradigm has proved useful in a number of learning problems that involve dependent sequential data. At the same time, many natural problems already turn out to be impossible to solve under this assumption. In this talk I will discuss the possibilities and limitations of sequential inference in the stationary ergodic framework. This is specifically analysed in the context of change-point estimation, time-series clustering and the restless bandit problem.
Learning with Kernel Embeddings
Abstract: Embeddings into reproducing kernel Hilbert spaces (RKHS) provide flexible representations of probability measures. They have been used to construct powerful nonparametric hypothesis tests and association measures and lead to a notion of Maximum Mean Discrepancy (MMD), a nonparametric distance between probability measures, popular in machine learning literature. I will overview recent developments within this framework, including an extension that models differences between probability measures which are invariant to additive symmetric noise, useful for learning on distributions under covariate shift, and a Bayesian method to estimate kernel embeddings leading to a new approach to learn kernel hyperparameters and detect multiscale properties in the data.