These notes provide the derivatives of the KL-divergence
between two multivariate Gaussian distributions and with respect to a few parameterizations of the covariance matrix of . This is useful for variational Gaussian process inference, where clever parameterizations of the posterior covariance are required to make the problem tractable. Tables for differentiating matrix-valued functions can be found in The Matrix Cookbook .
Consider two multivariate Gaussian distributions
In variational Bayesian inference, we minimize
We evaluate the following parameterizations for
- Optimizing the full
directly -
-
-
-
, where , and ,
Optimizing directly
We first obtain gradients of
The Hessian in
where
Optimizing
We consider an approximate posterior covariance of the form
where
Since
The Hessian-vector product requires the derivative of
Goulob and Pereya (1972) Eq. 4.12 gives the derivative of a fixed-rank pseudoinverse:
Since
Since the derivative of a trace of a matrix-valued function is just the (transpose) of the scalar derivative,
Overall, we obtain the following Hessian-vector product:
Optimizing when is full-rank
Equations
Optimizing
Let
The hessian in
This parameterization is useful for spatiotemporal inference problems, where the matrix
Inverse-diagonal approximation
Let
From
We also need
where
Applying
where
The Hessian-vector product is cumbersome, since each term in the expression
This parameterization resembles the closed-form covariance update for a linear, Gaussian model, where
Optimizing
Let
Since the trace is invariant under cyclic permutation,
This form is convenient for spatiotemporal inference problems that are sparse in frequency space. In this application,
Conclusion
These notes provide the gradients and Hessian-vector products for four simplified parameterizations of the posterior covariance matrix for variational Gaussian process inference. If combined with the gradients and Hessian-vector products for the expected log-likelihood, these expressions can be used with Krylov-subspace solvers to compute the Newton-Raphson update to optimize
We evaluated the following parameterizations for
-
:
-
:
-
:
-
:
-
:
In future notes, we will consider the full derivatives required for variational latent Gaussian-process inference for the Poisson and probit generalized linear models.
No comments:
Post a Comment