Wednesday, October 31, 2012

Sometimes the spike-triggered average is just as good as Poisson GLMs for spike-train analysis

(hopefully no major mistakes in this one; get PDF here)

The Poisson GLM for spiking data

Generalized Linear Models (GLMs) are similar to linear regression, but account for nonlinearities and non-uniform noise in the observations. In neuroscience, it is common to predict a sequence of spikes $Y=\{y_1,..,y_T\}$, $y_i\in\{0,1\}$, from a series of observations $X=\{\mathbf x_1,..,\mathbf x_T\}$, using a Poisson GLM:

$$ \begin{aligned} y_i &\sim \operatorname{Poisson}(\lambda_i\cdot\Delta t) \\ \lambda_i &= \exp\left( \mathbf a^\top \mathbf x_i + b \right) \end{aligned} $$

These models are fit by minimizing the negative log-likelihood of the observations, given the vector of regression weights $\mathbf a$ and mean offset parameter $b$:

$$ \underset{a,b}{\operatorname{argmin}} \left< \lambda - y \ln \lambda \right> = \underset{\mathbf a,b}{\operatorname{argmin}} \left< e^{\mathbf a^\top \mathbf x_i + b}- y [\mathbf a^\top \mathbf x_i + b] \right> $$

Such models are usually optimized via gradient descent, or iterated re-weighted least squares. The gradient

$$ \begin{aligned} \partial_\mathbf b (...) &= \left< \lambda - y \right> \\ \partial_\mathbf a (...) &= \left< (\lambda - y) \mathbf x^\top \right> \end{aligned} $$

This is convex, and the optimum occurs at an inflection point where the gradient is zero, i.e.:

$$ \begin{aligned} \left< \lambda \right> &= \left< y \right> \\ \left< \lambda \mathbf x^\top \right> &= \left< y \mathbf x^\top \right> \end{aligned} $$

For simplicity, assume that $\mathbf x$ are a collection of uncorrelated zero-mean Gaussian variables. (If $\mathbf x$ has a multivariate Gaussian distribution, we an always transform into coordinates so that this is true by subtracting the mean and whitening the data.)

Linear models are often fine

What happens if we expand our likelihood at first order, and solve for a zero of the gradient? We start by expanding around the point $b_0 = \ln \left<y\right>$:

$$ \begin{aligned} \lambda &= e^{\mathbf a^\top \mathbf x + b} = e^{\mathbf a^\top \mathbf x + b - b_0} e^{b_0} \\ &\approx [1+\mathbf a^\top \mathbf x + b - b_0] e^{b_0} \\ &= [1+\mathbf a^\top \mathbf x + b - b_0] \left< y \right> \end{aligned} $$

Substituting this approximation in to the identity for the optimal coefficients:

$$ \begin{aligned} &\left< [1+\mathbf a^\top \mathbf x + b - b_0] \right> = 1 \\ &\left< [1+\mathbf a^\top \mathbf x + b - b_0] \mathbf x^\top \right> = \left< y \mathbf x^\top \right>/\left<y\right> \end{aligned} $$

If we have chosen $\Delta t\ll 1$, so that $y$ is indeed binary, then the right hand side of the second equation is simply the spike-triggered average. Denote this as $\mathbf x_1^\top = \langle y \mathbf x^\top \rangle/\left<y\right>$. Solving for $(\mathbf a, b)$ in the above yields:

$$ \begin{aligned} b &= b_0 \\ \mathbf a &= \mathbf x_1 \end{aligned} $$

Which is to say: to a first approximation, the coefficients $\mathbf a$ of the Poisson GLM are simply the spike triggered average (of the mean-centered and whitened $\mathbf x$).

Intuitively, this means if you're just trying to demonstrate some basic correlation between a variable and neuronal spiking, linear methods like the spike-triggered-average are fine. Moving to a GLM might give you more accuracy, since it has a better model of the observation process, but it often doesn't tell you much more, qualitatively.

For Gaussian covariates, GLMs capture how noise and nonlinearity interact to influence firing rate

Let's see if we can get further intuition if $\mathbf x$ is Gaussian, beyond first order. Work in normalized firing rate by defining $\bar \lambda = \lambda / \left< y \right>$. The optimum can then be written as:

$$ \begin{aligned} \left< \bar \lambda \right> &= 1 \\ \left< \bar \lambda \mathbf x \right> &= \mathbf x_1 \end{aligned} $$

If $\mathbf x$ is Gaussian, the expectation $\left< \bar \lambda \mathbf x^\top \right>$ has a closed form. Defining $\bar b = b - b_0 = b - \ln \left<y\right>$:

$$ \left< \bar \lambda \mathbf x \right> = \int_{d\mathbf x} \mathbf x e^{\mathbf a^\top \mathbf x + \bar b} (2 \pi)^{-D/2} |\Sigma_{\mathbf x}|^{-1/2} \exp\left( -\tfrac 1 2 (\mathbf x - \mu_{\mathbf x})^\top \Sigma_{\mathbf x}^{-1} (\mathbf x - \mu_{\mathbf x}) \right), $$

where $D$ is the dimensionality of $\mathbf x$. Working in coordinates where $\Sigma_{\mathbf x}=I$ and $\mu_{\mathbf x}=0$, this simplifies to:

$$ \left< \bar \lambda \mathbf x \right> = (2 \pi)^{-D/2} \int_{d \mathbf x} \mathbf x e^{\mathbf a^\top \mathbf x + \bar b - \tfrac 1 2 {\mathbf x}^\top \mathbf x}. $$

This integral can be solved by completing the square:

$$ \begin{aligned} \mathbf a^\top \mathbf x + \bar b - \tfrac 1 2 {\mathbf x}^\top \mathbf x &= - \tfrac 1 2 \left[ -2\mathbf a^\top \mathbf x -2 \bar b + {\mathbf x}^\top \mathbf x \right] \\&= - \tfrac 1 2 \left[{\mathbf x}^\top \mathbf x -2\mathbf a^\top \mathbf x + \mathbf a^\top \mathbf a - \mathbf a^\top \mathbf a -2 \bar b \right] \\&= - \tfrac 1 2 \left[(\mathbf x - \mathbf a)^\top (\mathbf x - \mathbf a) - \mathbf a^\top \mathbf a -2 \bar b \right] \\&=- \tfrac 1 2 \left[(\mathbf x - \mathbf a)^\top (\mathbf x - \mathbf a) \right] + \left[ \tfrac 1 2 \mathbf a^\top \mathbf a + \bar b \right] \end{aligned} $$$$ \left< \bar \lambda \mathbf x^\top \right> = (2 \pi)^{-D/2} \int_{d\mathbf x} \mathbf x e^{- \tfrac 1 2 \left[(\mathbf x - \mathbf a)^\top (\mathbf x - \mathbf a) \right] + \left[ \tfrac 1 2 \mathbf a^\top \mathbf a + \bar b \right]}. $$

This is a Gaussian integral. It evaluates to:

$$ \left< \bar \lambda \mathbf x \right> = \mathbf ae^{\tfrac 1 2 \mathbf a^\top \mathbf a+ \bar b }. $$

At the optimum, we therefore have the condition that the optimal decoding weights $\mathbf a$ should be proportional to the spike-triggered average $\mathbf x_1$:

$$ \begin{aligned} \mathbf x_1 &= \mathbf ae^{\tfrac 1 2 \mathbf a^\top \mathbf a+ \bar b} \\&= \mathbf ae^{\tfrac 1 2 \|\mathbf a\|^2 + \bar b} \\ \Rightarrow \mathbf a &\propto \mathbf x_1 \end{aligned} $$

Solving similarly for $\langle\bar\lambda\rangle$ gives $\langle\bar\lambda\rangle = \exp(\tfrac 1 2 \mathbf a^\top \mathbf a+ \bar b)$, which seems to imply that the Poisson GLM has a closed-form solution in the limit where is $y$ sparse and $\mathbf x$ is Gaussian:

$$ \begin{aligned} \mathbf a &= \mathbf x_1 \\ b &= \ln\left<y\right> -\tfrac 1 2 \| \mathbf x_1 \|^2 \end{aligned} $$

This further supports the idea that the spike-triggered average is an fast, adequate approach to statistics on spike trains. In some cases, it may be unnecessary to train a GLM.

Of course, Poisson GLMs can be fit almost as quickly as linear regressions or spike-triggered averages, using iteratively re-weighted least squares. They can also gracefully and automatically handle cases where the covariates $\mathbf x$ are not jointly Gaussian. So perhaps the question should be why not use a GLM.

No comments:

Post a Comment