Tuesday, November 8, 2011

Point-Process Generalized Linear Models (PPGLMs) for spike train analysis

Get these notes as PDF here.

In neuroscience, we often want to understand how neuronal spiking relates to other variables. For this, tools like linear regression and correlation often work well enough. However, linear approaches presume that the underlying relationships are linear and the noise is Gaussian, neither of which are true for neural datasets. The Generalized Linear Model (GLM) extends linear methods to better account for nonlinearity and non-Gaussian noise.

GLMs for spike train analysis

We're interested in understanding how a neural spike train y(t) relates to some other covariates x(t). In continuous time, a spike train is modeled as a train of impulses at each spike time ti, y(t)=iδ(tti). In practice, we always work in discrete time. It is common to choose a time step Δt small enough so that at most one spike occurs in each time bin.

Two types of GLM that are common in spike train analysis:

  1. The Poisson GLM assumes that spikes arise from an inhomogeneous Poisson process with time-varying rate λ(t) that depends log-linearly on the covariates x(t). In practice, it is common to treat the Poisson GLM in discrete time by assuming that the rate λ(t) is constant within each time-bin:
ytPoisson(λtΔt)λt=exp(wxt)
  1. The Bernoulli GLM models each time-bin of a binary, discrete-time spike train yt{0,1} as a Bernoulli "coin flip" with Pr(y=1)=p.
ytBernoulli(pt)pt=11+exp[wx(t)]

Maximum likelihood

GLMs are typically estimated using maximum likelihood when testing how covariates x(t) influence spiking. (Other fitting procedures may be more appropriate when GLMs are used to model spiking dynamical systems.) The maximum likelihood procedure finds the weights w that maximize the likelihood of observing spike train y={y1,..,yT}, given covariates X={x1,..,xT}, over all T recorded time-bins.

w=argmaxwPr(y|w,X)

In practice, likelihood maximization is typically phrased in terms of minimizing the negative log-likelihood. Working with log-likelihood is more numerically stable, and the problem is phrased as minimization so that it is more straightforward to plug-in to off-the-shelf minimization code.

w=argminwlnPr(y|w,X)

Assuming observations are independent, the negative log-likelihood factors into a sum over time-points. Equivalently, one can minimize the average negative log-likelihood, over samples, lnPr(yt|w,xt). The maximum likelihood parameters are typically solved for via gradient descent or the Newton-Raphson method . This requires computing the gradient and hessian of the negative log-likelihood.

Gradient and Hessian for the Poisson GLM

We can calculate the gradient and Hessian for the Poisson GLM by substituting the Poisson probability density function, Pr(y)=λyeλ/y!, in to our expression for the negative log-likelihood:

lnPr(yt|w,xt)=λtytln(λt)ln(y!)

Because the term ln(y!) does not depend on w, we can ignore it without affecting the location of our optimum. Substituting in λ=exp(wx), we get:

=exp(wx)ytwxt

Finding weight w that minimize is equivalent to solving for the maximum-likelihood weights. The gradient and Hessian of of in w are:

wi=[exp(wxt)yt]xi=(λtyt)xiwiwj=[exp(wxt)]xixj=λtxixj

In matrix notation, these derivatives can then be written as:

=x(λy)H=λxx

Gradient and Hessian for the Bernoulli GLM

The Bernoulli GLM is similar, with the observation probability given by the Bernoulli distribution Pr(y)=py(1p)1y

lnPr(yt|w,xt)=ytln(pt)+(1yt)ln(1pt)=ytln(pt1pt)+ln(1pt)

Then, using p=[1+exp(wx)]1, i.e. wx=ln[p/(1p)], we get:

=ln[1+exp(wxt)]ytwxt

The gradient and Hessian of of in w are:

wi=[exp(wxt)1+exp(wxt)yt]xi=(ptyt)xiwiwj=pt(1pt)xixj

In matrix notation:

=x(py)H=p(1p)xx

Iteratively reweighted least squares

PPGLMs can also be fit to spike train data using Iteratively Reweighted Least Squares (IRLS) . Recall that for a linear model y=wx, the Ordinary Least Squares (OLS) solution is:

w=xx1xy.

The IRLS approach phrases optimizing the parameters of the GLM in terms of repeated iterations of a reweighted least-squares problem. To derive this, first recall the definition of the Newton-Raphson update:

wn+1=wnH(wn)1(wn)

For the Poisson GLM, this is

wn+1=wn+λxx1x(yλ)

For e.g. the Poisson GLM, IRLS rewrites this as a least squares problem by defining weights λ and pseudo-variables z=wx+1λ(yλ). We can confirm that the IRLS update is equivalent to the Newton-Raphson update:

wn+1=λxx1λxz=λxx1[λx[xwn+1λ(yλ)]]=λxx1[λxxwn+x(yλ)]=wn+λxx1x(yλ)

Expected log-likelihoods

It's also possible to approximately fit w knowing only the mean and covariance of x. This reduces computational complexity, since it avoids having to process the whole data matrix when optimizing w. Previously, we calculated negative log-likelihood as an expectation over the data time-points. Here, we instead calculate these expectations based on a Gaussian model of the covariates xN(μx,Σx). For example in the Poisson case, the gradient of the log-likelihood is :

=x(λy)=xexp(wx)xy

The term xy does not depend on w, and can be computed in advance. The term xexp(wx) has a closed-form solutions based on the log-normal distribution :

xλ=[x+wΣx]λλ=exp(wx)=exp(wμx+12wΣxw)

The Hessian also has a closed form. This avoids having to recompute the re-weighted mean/covariances on every iteration. However, one still must calculate a mean and covariance initially. This approximation will only remain valid in the case that x is truly Gaussian. However, it can be used to pick an initial w0 before continuing with Newton-Raphson.

Edit: it seems like the Poisson GLM for Gaussian x might reduce to something like the spike-tiggered average in the limit where spikes are rare?