Concentration of variational approximations of posterior distributions

While Bayesian methods are extremely popular in statistics and machine learning, their application to massive datasets is often challenging, when possible at all. Indeed, the classical MCMC algorithms are prohibitively slow when both the model dimension and the sample size are large. Variational Bayesian methods aim at approximating the posterior by a distribution in a tractable family. Thus, MCMC are replaced by an optimization algorithm which is order of magnitude faster. VB methods have been applied in such computationally demanding applications as including collaborative filtering, image and video processing, NLP and text processing… However, despite very nice results in practice, the theoretical properties of these approximations are usually not known. In this talks I will present a general approach to prove concentration of variational approximations of (tempered) posteriors. Our approach also provides a new look on the assumptions usually required to derive concentration of the posterior in Bayesian statistics. I will illustrate the method on several examples including Bayesian matrix completion.

(Join work with James Ridgway)