These notes provide a function for calculating the AUC for a Poisson regression that works when the spike counts are given only up to some (unknown) normalization constant. Skip to the end for the Python code.
Poisson regression and point processes
Neuroscientists use Poisson regression (in the continuous limit: Poisson point process ) to assess the correlation between neuronal spiking and other experimental covariates. These models are well suited to the scenario in which spikes are binary all-or-nothing events, recorded based on electrical potentials.
The area under the receiver operating characteristic curve
It is common to assess the performance of a Poisson GLM using the Area Under the receiver-operator-characteristic Curve (AUC) .
Definition of the AUC: The AUC is defined as the probability that a randomly-sampled (true-negative, true-positive) pair will be correctly ranked by the predicted firing-rates. We can calculate it by integrating over all such pairs. We will work in continuous time
Denote the sets of true positives and true negatives as
[auc0]
where
Why use the AUC?
Because it depends only on the relative ranks of predicted firing rates, AUC is insensitive to errors in predicting a cell's mean rate, firing-rate variance, or any other strictly increasing transformation of the true rates. These properties make the AUC a desirable performance metric if we only care how well a GLM predicts neuronal tuning in a very general sense. It can be viewed as a type of nonlinear (rank) correlation between the test data and model predictions.
Calcium imaging yields fractional spike counts
Calcium fluorescence imaging has emerged as a standard method for simultaneously recording from large populations of neurons. Since calcium concentrations change relatively slowly following spiking, spike times are recovered indirectly by deconvolving calcium fluorescence signals.
These time-series of deconvolved spikes are sparse, with most entries zero. However, when spikes are detected, they are often detected with a fractional (non integer) value. This introduces some subtleties when working with point-process models, which are defined in terms of all-or-nothing (integer) events.
In practice, non-integer spike counts are no issue for the Poisson regression. Although a rigorous definition of the Poisson distribution is defined only on integers, it can be evaluated for any non-negative real number. This is technically improper, but works just as well and is commonly used in quasi-Poisson regression .
However, integrating commonly used measurements of model performance with fractional spike counts is slightly more subtle.
The AUC for continuous-time spiking data
For spiking data, the "true positives" are the spike times, and the "true negatives" are times when a neuron does not spike. We will work in continuous time, in which a spike train can be treated as a sum of delta distributions at each spike time,
In this continuous case, the duration of spikes is infinitesimal. This means that the set of true positives (i.e. times containing a spike) is of measure zero, compared to the measure of the time-duration of the experiment. In other words, the probability that a given time-point
In contrast, the true positives are a discrete set of
[auc0b]
The AUC for deconvolved spikes
Deconvolved spikes from fluorescence data do not recover the true spikes counts
This is no problem in practice. To compute the AUC, we only need the density of true positive events per unit time, not their absolute number. The density of true positives per unit time is simply
[pt]
To calculate the AUC, we could sample
[auc1]
We'd like to transform this equation into a nicer form, getting rid of that
Define
[s]
Using
[auc2]
In discrete time,
def auc(λ,y):
'''
Area Under the ROC Curve (AUC) for deconvolved spike data.
Parameters
----------
λ: 1D np.float32, length NSAMPLES
Predicted scores for each datapoint
y: 1D np.float32, length NSAMPLES
Estimated spike counts for each datapoint
Returns
-------
AUC: float
Area under the receiver-operator-characteristic curve.
'''
# Cast to numpy arrays
λ = np.array(λ).ravel()
y = np.array(y).ravel()
# Ignore NaNs
bad = ~np.isfinite(λ)|~np.isfinite(y)
λ = λ[~bad]
y = y[~bad]
# AUC is undefined if there are no positive examples
d = np.sum(y)
if d<=0: return NaN
# Density of true positives per timepoint
py = y/d
# Convert scores to fractional ranks
rs = scipy.stats.rankdata(λ)/len(λ)
AUC = np.average(rs,weights=py)
return AUC