Thursday, June 22, 2017

Note: Marginal likelihood for Bayesian models with a Gaussian-approximated posterior

I first learned this solution from Botond Cseke . I'm not sure where it originates; It is essentially Laplace's method for approximating integrals using a Gaussian distribution, where the parameters of the Gaussian distribution might come from any number of various approximate inference approaches.

If I have a Bayesian statistical model with hyperparameters Θ, with a no closed-form posterior, how can I optimize Θ?

Consider a Bayesian statistical model with observed data yY and hidden (latent) variables zZ, which we infer. We have a prior on z, Pr(z;Θ), and a model for the probability of y given z (likelihood), Pr(y|z;Θ). The prior and likelihood are controlled by "hyperparameters" Θ, which we would like to estimate. Recall that Bayes theorem states:

(1)Pr(z|y;Θ)=Pr(y|z;Θ)Pr(z;Θ)Pr(y;Θ)

It is common for the posterior Pr(z|y;Θ) to lack a closed-form solution. In this case, one typically approximates the posterior with a more tractable distribution Q(z)Pr(z|y;Θ). Common ways of estimating (z) include the Laplace approximation , variational Bayes , expectation propagation , and expectation maximization algorithms. The only approximating distribution in common use for high-dimensional z is the multivariate Gaussian (or some nonlinear transformation thereof), which succinctly captures joint statistics with limited computational overhead. Assume we have an inference procedure which returns the approximate posterior Q(z)=N(μq,Σq).

We optimize the hyperparameters "Θ" of the prior kernel to maximize the marginal likelihood of the observations y

(2)θargmaxΘPr(y;Θ)Pr(y;Θ)=ZPr(y,z;Θ)dz=ZPr(y|z)Pr(z;Θ)dz

Except in rare special cases, this integral does not have a closed form. However, we have already obtained a Gaussian approximation to the posterior distribution, Q(z)Pr(z|y;Θ). If we replace Pr(z|y;Θ) with our approximation Q(z) in this equation, we can solve for (an approximation) of Pr(y;Θ):

(3)Q(z)Pr(y|z)Pr(z;Θ)Pr(y;Θ)Pr(y;Θ)Pr(y|z)Pr(z;Θ)Q(z)

Working in log-probability, and evaluating the expression at the (approximated) posterior mean z=μq, we get

(4)lnPr(z=μq;Θ)=12{ln|2πΣz|+(μqμz)Σz1(μqμz)}lnQ(z=μq)=12{ln|2πΣq|+(μqμq)Σq1(μqμq)}=12ln|2πΣq|  lnPr(y;Θ)lnPr(y|z=μq;Θ)+lnPr(z=μq;Θ)lnQ(z=μq)=lnPr(y|z=μq;Θ)12{ln|Σq1Σz|+(μqμz)Σz1(μqμz)}.

This is quite tractable to compute.

No comments:

Post a Comment