Saturday, December 10, 2011

Failed exploration: Fourier-related transforms and 2D Turing patterns

Edit: This notes are old, didn't work, and a bit embarrassing. I was looking for alternative perspectives on symmetric bifurcation theory for 2D Turing patterns. I encountered some entertaining Fourier-transformed related integrals, but didn't get anywhere. In retrospect, I think I simply re-derived a representation in terms of cylindrical harmonics, nothing new. Maybe, some day, someone will point out where things went wrong.

Context: 2D pattern formation in the spatial frequency domain

For context, I started by exploring some integro-differential equations in the 2D plane. These were neural-field type questions in which some collection of 2D "fields" (functions, really) $u(\boldsymbol x)$ evolve over time, coupled through a convolution kernel $\mathbf K$ and some non-negative nonlinearity $f(\cdot)$.

\begin{equation}\begin{aligned} \dot u_{t, \boldsymbol x} &= -u_{t, \boldsymbol x} + f( [\mathbf K * u]_{t, \boldsymbol x} ) \\ \boldsymbol x &= \{ x_1, x_2 \} \in \mathbb R^2 \end{aligned}\label{1}\tag{1}\end{equation}

For simplicity, I've dropped many details you normally see in neural field models, such as inputs, thresholds, etc; These can be absorbed into $\mathbf K$ and $f(\cdot)$. I've also denoted functions like $u(t,\boldsymbol x)$ using subscripts, rather than function arguments in parentheses (in analogy with discrete case system where $t$ and $\boldsymbol x$ index temporal and spatial bins).

A ring at the critical frequency $\omega_c$

I wanted to know what types of Turing patterns might emerge for various choices of $\mathbf K$ and $f(\cdot)$. In the course of examining which patterns might be stable, one might consider 2D solutions that are composed of various sinusoidal plane waves at some fixed critical frequency "$\omega_c$"

\begin{equation}\begin{aligned} u_{\boldsymbol x} &\propto \sum_i \cos\{ \omega_c [ \mathbf c_{\theta_i} x_1 + \mathbf s_{\theta_i} x_2 ] + \phi_i\}, \end{aligned}\label{2}\tag{2}\end{equation}

where $\mathbf c_\theta$, $\mathbf s_\theta$ abbreviate of cosine and sine, respectively (since I'm 90% sure no one will read this, we may as well have a little fun with the notation). Individual plane waves as various orientations $\theta$ and phases $\phi$ can sum up and interfere to form interesting patterns . More generally, we might imagine that such a pattern could be composed of infinitely many waves, with various weightings:

\begin{equation}\begin{aligned} u_{\boldsymbol x} &\propto \int_{d\theta} r_\theta \cos\{ \omega_c [ \boldsymbol c_{\theta} x_1 + \boldsymbol s_{\theta} x_2 ] + \phi_\theta\}. \end{aligned}\label{3}\tag{3}\end{equation}

This is a special case of the (inverse) Fourier transform. We will denote the unitary 2D Fourier transform as $\mathcal F$ and its inverse as $\mathcal F^\dagger$. In general, we can define (most) solutions $u(\boldsymbol x)$ as an integral over their Fourier components, denoted here as $\xi_{\boldsymbol\omega} \in \mathbb C$, where $\boldsymbol\omega = \{\omega_1, \omega_2\}$ are the two frequency components analogous to our two spatial components:

\begin{equation}\begin{aligned} u_{\boldsymbol x} &= [\mathcal F^\dagger \xi]_{\boldsymbol x} \\ &\propto \int_{d\boldsymbol\omega} \xi_{\boldsymbol\omega} e^{i \langle \boldsymbol\omega, \boldsymbol x \rangle}, \end{aligned}\label{4}\tag{4}\end{equation}

where $\langle\cdot,\cdot\rangle$ is the dot product. $\eqref{4}$ reduces to $\eqref{3}$ when we restrict the integral to spatial frequencies $\| \boldsymbol\omega \| = \omega_c$ at the critical frequency.

What if the ring of components at critical frequency is all we care about? We may parameterize our solutions in terms of the 1D function $\eta_\theta = \xi(\mathbf c_\theta, \mathbf s_\theta)$, corresponding to evaluating the Fourier transform of $u_{\boldsymbol x}$ along the ring at critical frequency. Let's denote this truncated (inverse) Fourier transform as $\mathfrak F^\dagger$, to remind us that it transforms a 1D rather than 2D representation of spatial frequency:

\begin{equation}\begin{aligned} u_{\boldsymbol x} &= [\mathfrak F^\dagger \eta]_{\boldsymbol x} \\ &\propto \int_{d\boldsymbol\omega} \delta_{\|\boldsymbol\omega\| = \omega_c} \xi_{\boldsymbol\omega} e^{i \langle \boldsymbol\omega, \boldsymbol x\rangle} \\ &\propto \int_{d\theta} \eta_{\theta} e^{i \omega_c (\mathbf c_\theta x_1 + \mathbf s_\theta x_2) } \end{aligned}\label{5}\tag{5}\end{equation}

where $\eta_\theta$ are the Fourier coefficients for $\boldsymbol\omega = \omega_c e^{i\theta}$. If we consider only the real part, we recover $\eqref{3}$ (recall that $\exp(i\theta) = \mathbf c_\theta + i \mathbf s_\theta$). The "proportional to" $\propto$ notation in $\eqref{5}$ is a reminder that I've summarily ignored some factors of $\sqrt{2\pi}$.

* Aside: * Recall that, for a Fourier transform of 1D real-valued signal, the negative-frequency components should be the complex conjugate of the positive frequency components. For real-valued $u$ in 2D, there is a similar symmetry, but under 180° rotations of the 2D frequency plane. This constrains which $\xi_{\boldsymbol\omega}$ correspond to real-valued $u_{\boldsymbol x}.$

A ring in the spatial domain

So what? We can represent a solution $u_{\boldsymbol x}$ in terms of its Fourier transform. In the special cases when all the spatial frequency components have the same frequency $\omega_c$, we can describe the evolution of the 2D $u_{\boldsymbol x}$ in terms of a 1D ring $\eta_\theta$ in Fourier space. This seems to suggest that the seemingly 2D system is, effectively, 1D. Does this mean that a 1D description in the spatial domain might also suffice? Well, yes.

For simplicity going forward, re-scale the system so that $\omega_c = 1$. We can consider $u$ evaluated along the unit circle, parameterized by angle $\vartheta$, i.e. $u$ evaluated at $x_\vartheta = \{ \mathbf c_\vartheta, \mathbf s_\vartheta \}$. Let's define $\upsilon_\vartheta := u_{\boldsymbol x_\vartheta}$ to distinguish the original 2D spatial domain $u$ from this reduced, circular, 1D spatial domain $\upsilon$. We can recover the spatial solution along this ring using the inverse Fourier transform $\eqref{5}$:

\begin{equation}\begin{aligned} \upsilon_\vartheta &\propto \int_{d\theta} \eta_{\theta} e^{i (\mathbf c_\theta \mathbf c_\vartheta + \mathbf s_\theta \mathbf s_\vartheta) } \end{aligned}\label{6}\tag{6}\end{equation}

Recalling the trigonometric identity $\mathbf c_\theta \mathbf c_\vartheta + \mathbf s_\theta \mathbf s_\vartheta = \mathbf c_\theta \mathbf c_\vartheta + \mathbf s_\theta \mathbf s_\vartheta = \cos(\theta - \vartheta)$, we can write $\eqref{6}$ as

\begin{equation}\begin{aligned} \upsilon_\vartheta &\propto \int_{d\theta} \eta_{\theta} e^{i \cos(\theta - \vartheta)} \end{aligned}\label{7}\tag{7}\end{equation}

This looks suspiciously like a circular convolution. Indeed, if we define $g_\theta = e^{-i \mathbf c_\theta}$, we can write $\eqref{7}$ as

\begin{equation}\begin{aligned} \upsilon &\propto g * \eta \end{aligned}\label{8}\tag{8}\end{equation}

Now, we could define the Fourier-related transform defined in Equations $\eqref{6}$--$\eqref{8}$ as something like $\upsilon = \mathcal G^\dagger \eta := g * \eta$. But, $\mathcal G^\dagger$ isn't much more succinct than $g *$, so I won't use this except perhaps as part of the summary at the end.

In words, this is saying that our spatial pattern evaluated on the unit circle ("$\upsilon_\vartheta$") can be expressed in terms of the Fourier coefficients at the critical frequency ("$\eta_\theta$"), convolved with this strange periodic function $g_\theta$.

* von Mises Aside 1 : Where have I seen $g_\theta$ before?*

(This is, as far as I can tell, completely useless, but…) It might be fun to note that $g_\theta$ is related to the von Mises distribution. For reference, the probability density function of the von Mises distribution centered at $\mu = 0$ is

\begin{equation}\begin{aligned}\Pr(\theta) &= \tfrac{1}{2\pi \operatorname I_0(\kappa)} \exp(\kappa \,\mathbf c_\theta),\end{aligned}\label{9}\tag{9}\end{equation}

where $\kappa$ is the shape parameter (larger = more concentrated), and $\operatorname I_0$ is the modified Bessel function of the first kind of order 0. we can take the shape parameter to be imaginary, i.e. $\kappa = -i$.

\begin{equation}\begin{aligned}\int_{d\theta} g_\theta =\int_{d\theta}e^{-i \mathbf c_\theta}&=2\pi \operatorname I_0(-i)\end{aligned}\label{10}\tag{10}\end{equation}

This connection to the von Mises distribution provides no useful intuition, beyond this: The partition functions for commonly used distributions double as a table of integrals, and there is some playful connection to the idea of a distribution with an imaginary shape parameter. Perhaps the mathematical physicists have done something playful with this, who knows.

To and from frequency space via convolution

Convolution can be evaluated in the Fourier domain as pointwise multiplication (the convolution theorem). Recall the discrete-time Fourier transform (DTFT). The DTFT takes a discrete-time signal, and maps it to a Fourier transform that is $2\pi$ periodic. The inverse DTFT then takes maps this periodic function to a discrete time-series. Denote the unitary DTFT $\mathbf F$, such that $\mathbf F^\dagger$ is the inverse DTFT.

In our case, the functions $\upsilon$, $g$, and $\eta$ are all defined on the unit circle, so we need a periodic (i.e. circular) convolution. Converting to the Fourier domain therefore involves the inverse DTFT operation, $\mathbf F^\dagger$. Using the convolution theorem, then, we can convert from the ring at critical frequency $\eta$ to the spatial domain on the unit circle $\upsilon$ as follows:

\begin{equation}\begin{aligned} \upsilon &\propto g * \eta = \mathbf F\{ (\mathbf F^\dagger g)(\mathbf F^\dagger \eta) \} \end{aligned}\label{11}\tag{11}\end{equation}

Let's be explicit about this, and give names to these periodic Fourier transforms. Remember, these are now discrete sequences with integer indeces $\alpha\in\mathbb Z$:

\begin{equation}\begin{aligned} \zeta_\alpha &:= \mathbf F^\dagger \eta_\theta \\ \rho_\alpha &:= \mathbf F^\dagger \upsilon_\vartheta. \\ \gamma_\alpha &:= \mathbf F^\dagger g_\theta \end{aligned}\label{12}\tag{12}\end{equation}

I will refer to $\alpha$ as "angular frequent" or "angular mode".

In words, $\zeta_\alpha$ is the circular Fourier transform of the Fourier coefficients at the ring at $\|\boldsymbol\omega\|=1$, $\rho_\alpha$ is the circular Fourier transform of the spatial solution along the unit circle, and $\gamma_\alpha$ is the circular Fourier transform of… something.

* von Mises Aside 2 : What is* $\gamma_\alpha$?

Does $\gamma_\alpha$ reduce to any other familiar objects? Remembering that $g_\theta$ is related to the von Mises distribution, we can actually get an "analytic" expression for $\gamma_\alpha = \mathbf F^\dagger g_\theta$. The Fourier transform of a distribution is called its characteristic function . Using the characteristic function of the von Mises distribution (analytically continued with the shape parameter $\kappa = -i$), we obtain

\begin{equation}\begin{aligned}\gamma_\alpha = 2\pi \operatorname I_\alpha(-i),\end{aligned}\label{13}\tag{13}\end{equation}

where $\operatorname I_j$ is the modified Bessel function of the first kind of order $l$.

\begin{equation}\begin{aligned}\operatorname I_\alpha(x) &= i^{-\alpha} \operatorname J_\alpha(ix) = \sum_{m=0}^\infty \frac{1}{m!\, \Gamma(m+\alpha+1)}\left(\frac{x}{2}\right)^{2m+\alpha}\end{aligned}\label{14}\tag{14}\end{equation}

This isn't exactly an elementary function, hence why "analytic" was in quotes above. The series definition of the Bessel function is no simpler that saying "The circular Fourier transform of $g_\theta$". Substituting in $\eqref{13}$ in $\eqref{14}$, we get

\begin{equation}\begin{aligned}\gamma_\alpha &= i^{-\alpha} \sum_{m=0}^\infty \frac{(-1)^{m}}{m!\, (m+\alpha)!}\left(\frac{1}{2}\right)^{2m+\alpha}\end{aligned}\label{15}\tag{15}\end{equation}

Since $\alpha$ is always an integer, these functions are actually "cylinder functions" (a.k.a. cylindrical harmonics). Perhaps if I were more familiar with special functions and their series definitions, I'd spot something interesting here. For now, it seems to me that the most compact description of $\gamma$ is simply "the circular Fourier transform of $g_\theta = e^{-i c_\theta}$".

With $\zeta$, $\rho$, and $\gamma$, we can write the conversion from spatial-frequency $\zeta$ to spatial-domain $\rho$ as pointwise multiplication:

\begin{equation}\begin{aligned} \rho &\propto \gamma\cdot\zeta \end{aligned}\label{16}\tag{16}\end{equation}

This also implies that we can convert from the spatial domain $\rho$ to the frequency domain $\zeta$ by dividing by $\gamma$, which reminds us that we only need to consider the evolution along the unit circle for solutions of this type:

\begin{equation}\begin{aligned} \zeta &\propto \gamma^{-1} \cdot \rho. \end{aligned}\label{17}\tag{17}\end{equation}

Note that the spatial solution along the unit circle, $\upsilon_\theta$, must be real valued. This means that $\rho_\alpha$ needs to have conjugate symmetry, namely $\rho_{-\alpha} = \rho_\alpha^*$. Inspecting $\eqref{15}$ reveals that the argument (angle in the complex plane) of $\gamma_\alpha$ is determined by $i^{-\alpha}$. The argument of $\gamma_\alpha$ therefore repeats as $\{1,-i,-1,i\}$ for $\alpha % 4 \in {0,1,2,3}$. The spatial Fourier components $\eta_\theta$ does have an interesting symmetry inherited from the conjugate symmetry of a 2D Fourier transform of a real-valued function: $\eta_{\theta+\pi} = \eta_\theta^*$. So, the real component of $\eta$ is $\pi$ periodic, and affects only even angular modes $\alpha$, and the imaginary component of $\eta$ affects only odd angular modes.

What did we learn?

We learned that 2D solutions consisting entirely of plane waves at a single frequency $\omega_c$ ("monochromatic") can be described entirely by a 1D ring of spatial frequency components $\eta_\theta$. Amusingly, this implies that we can also describe these solutions using only their values along the unit circle in the spatial domain. The 2D pattern is defined by the dynamics along a 1D boundary.

To review, we considered 2D solutions $u_{\boldsymbol x}$, and their Fourier transform $\xi_{\boldsymbol\omega}$. We converted from $\xi_{\boldsymbol\omega}$ to $u_{\boldsymbol x}$ using the inverse 2D Fourier transform. If $u_{\boldsymbol x}$ is "monochromatic", i.e. consist entirely of waves at a single frequency (which we will take to be $1$), we only need to consider frequency components $\eta_{\theta} = \xi(\mathbf c_\theta, \mathbf s_\theta)$ along the ring at this frequency. The frequency components are related to the spatial solution evaluated on the unit circle as $\upsilon_\vartheta \propto\int_{d\theta}\eta_{\theta} e^{i \cos(\theta - \vartheta)}$. This is a 1D circular convolution with a function $g_\theta = e^{-i c_\theta}$, which is connected to the von Mises distribution and cylindrical harmonics.

Applying the convolution theorem, we see that the spatial-domain angular-frequency components are related to the frequency-domain angular-frequency components by a simple scalar multiplication with a fixed function we called $\gamma$. The low-order angular frequency components may provide a compact low-order approximation of the solution.

Various transforms

The 1D unitary DTFT

\begin{equation}\begin{aligned} \mathbf F \zeta &= \frac {1}{\sqrt{2\pi}} \sum_{\alpha\in\mathbb Z} \zeta_\alpha e^{-i\theta\alpha} \\ \mathbf F^\dagger \eta &= \frac {1}{\sqrt{2\pi}} \int_{d\theta} \eta_\theta e^{i\theta\alpha} \end{aligned}\label{f1}\end{equation}

The 2D unitary FT

\begin{equation}\begin{aligned} \mathcal F u &= \frac 1 {2\pi} \iint_{d\boldsymbol x} u_{\boldsymbol x} e^{-i\langle \boldsymbol\omega, \boldsymbol x\rangle} \\ \mathcal F^\dagger \xi &= \frac 1 {2\pi} \iint_{d\boldsymbol\omega} \xi_{\boldsymbol\omega} e^{i\langle \boldsymbol\omega, \boldsymbol x\rangle} \end{aligned}\label{f2}\end{equation}

The 2D FT truncated to a 1D ring at unit frequency (double check normalization here, this might be wrong)

\begin{equation}\begin{aligned} \mathfrak F u &= \frac 1 {2\pi} \iint_{d\boldsymbol x} u_{\boldsymbol x} e^{-i\langle \boldsymbol\omega_\theta, \boldsymbol x\rangle} \\ \mathfrak F^\dagger \eta &= \frac 1 {\sqrt{2\pi}} \int_{d\theta} \eta_{\theta} e^{i \langle \boldsymbol\omega_\theta, \boldsymbol x\rangle } \\ \boldsymbol\omega_\theta &:= \{\mathbf c_\theta, \mathbf s_\theta\} \end{aligned}\label{f3}\end{equation}

The 1D transformation relating the spatial domain on the unit circle to the ring at unit frequency for monochromatic solutions:

\begin{equation}\begin{aligned} \mathcal G \upsilon &= \frac 1 {\sqrt{2\pi}} \int_{d\vartheta} \upsilon_{\vartheta} e^{-i \cos(\theta - \vartheta)} \\ \mathcal G^\dagger \eta &= \frac 1 {\sqrt{2\pi}} \int_{d\theta} \eta_{\theta} e^{i \cos(\theta - \vartheta)}. \end{aligned}\label{f4}\end{equation}

Application to pattern formation in neural field equations?

For the most part, this is wandering in the wilderness, but there is one hint of helpful intuition. In many cases, Turing patterns take on some interesting symmetries, for example, forming stripes (1 wave), or checker-boards (2 waves), or spots with the translation symmetry of a hexagonal/triangular lattice (3 waves), or perhaps even quasicrystals (4 or more waves). In these cases, rotational symmetry is broken. Instead of a ring at $\omega_c$, we might get solutions with some angular rotational symmetry. In this case, the circular Fourier transform, of the Fourier transform of the spatial solutions along the ring at critical frequency, may tell us all we need to know. This hints that perhaps we only need the low-order components, perhaps $\alpha \in \{0,\pm 1,\pm 2,\pm 3\}$, to capture most patterns (and maybe a handful more if we want to capture quasicrystalline solutions).

I am tempted to return to $\eqref{1}$ and explore how we might use this low order "angular mode" description to analyze the dynamics. The procedure looks something like this:

First, we consider a spatially homogeneous steady-state solution $u_0$:

\begin{equation}\begin{aligned} u_0 &= f( k_0 u_0 ), \end{aligned}\label{q1}\end{equation}

where $k_0$ is the DC (zero frequency, mean) term in the convolution $\mathbf K*$. We then consider the evolution of solutions $u_{t, \boldsymbol x} = \epsilon_{t, \boldsymbol x} + u_0$ that are an $\epsilon_{t, \boldsymbol x}$ perturbation away from this homogeneous steady-state. We usually then consider the low-order terms in a Taylor expansion of the nonliterary $f(\cdot)$:

\begin{equation}\begin{aligned} \dot \epsilon &= \dot u - \dot u_0 = \dot u \\&= -u + f( [\mathbf K * u]_{t, \boldsymbol x} ) \\&= -u+ \sum_{n=0}^\infty \frac 1 {n!} f^{(n)}(u_0) (\mathbf K * u)^n \\&= -\epsilon+ \sum_{n=1}^\infty a_n \cdot (\mathbf K * \epsilon)^n. \end{aligned}\label{q2}\end{equation}

where we have abbreviated the coefficients of the Taylor expansion $f^{(n)}(u_0)/n!$ as $a_n$.

First, relate the spatial domain $\epsilon$ to the spatial frequency domain $\xi = \mathcal F \epsilon$. Define $k_{\boldsymbol\omega}$ as the Fourier coefficients of the convolution $\mathbf K *$. We are faced with the awkward situation that the exponentiation $(\cdot)^n$ is diagonal in the spatial basis, but convolution $\mathbf K *$ is diagonal in the spatial frequency basis.

We can therefore either transform to the spatial domain to exponentiate, or use the somewhat unconventional iterated convolution $(\cdot)^{*n}$ notation in the frequency domain, reflecting the fact that integer powers of $n$ in the spatial domain become iterated convolutions in the frequency domain:

\begin{equation}\begin{aligned} \dot \xi &= -\xi+ \sum_{n=1}^\infty a_n \mathcal F\{\mathcal F^\dagger\{k \xi\}^n \} \\ &= -\xi+ \sum_{n=1}^\infty a_n [k \xi]^{*n}. \end{aligned}\label{q3}\end{equation}

Now, maybe it's OK to assume that $\epsilon$ is a "monochromatic" solution with an especially simply low-order description. In this case, we care only about the components $\eta_\theta = \xi(\mathbf c_\theta, \mathbf s_\theta)$. Did I mention the kernel $\mathbf K$ is isotropic, i.e. the same in all directions? Let's assume that it is. In this case, the only component of $k$ that we care about is $k_c$, the Fourier coefficient of $\mathbf K *$ at the critical frequency. This is the same regardless of the direction $\theta$. We can write this reduced-order representation of $\eqref{q3}$ as:

\begin{equation}\begin{aligned} \dot \eta &\propto -\eta + \sum_{n=1}^\infty a_n \mathfrak F\{\mathfrak F^\dagger \{k_c \eta\}^n \}, \end{aligned}\label{q4}\end{equation}

where we've switched the notation for the Fourier transform to $\mathfrak F$ to remind us that our "spatial frequency domain" is now the 1D object $\eta$, as opposed to the full 2D $\xi$.

We need to be careful. So far, our equations $\eqref{q2}$ and $\eqref{q3}$ really do describe the full nonlinear dynamics of $u$, in the spatial and spatial-frequency domains respectively (assuming the Taylor series converges). We haven't truncated the series expansion yet. However, when we restrict are solutions to those that can be described as a sum of plane waves at fixed frequency, we are modifying the dynamics. In general, the nonlinearity will generate some frequencies outside $\omega_c$, so this low-order approximation is not closed. Dropping these these terms may change things, and it remains to be seen whether this simplified system can capture the dynamics we care about. In particular, restricting to $\eta_\theta$ from $\xi_{\boldsymbol\omega}$ is tantamount to assuming that the only nonzero Fourier coefficients of $\mathbf K$ are the DC term and the term at $\omega_c$.

Now, we could go a step further and change coordinates to $\zeta = \mathbf F^\dagger \eta$ and again to $\rho=\gamma\zeta$, but this will get messy. Let's try to make our lives easier. Our equations are defined by the following:

  1. A convolution, which is diagonal in the spatial frequency domain.
  2. A pointwise nonlinearity in the spatial domain, which we've expanded as a power series

If we restrict ourselves to unit frequency, step (1) is just scalar multiplication by a single Fourier component $k_c$. This commutes with any integral transform (these are linear). The only "hard" step is the pointwise nonlinearity. For this, we should convert to the spatial domain (and back again). If we work with $\rho$, this is just the circular Fourier transform of the spatial domain, so we can convert to/from the spatial domain using the DTFT. The dynamics on $\rho$ can therefore be written as:

\begin{equation}\begin{aligned} \dot \rho &\propto -\rho+ \sum_{n=1}^\infty a_n \mathbf F^\dagger\{[\mathbf F\{k_c \rho \}]^n\} \\ &= -\rho+ \sum_{n=1}^\infty a_n \{k_c\rho\}^{*n} \\ &= -\rho + \sum_{n=1}^\infty a_n k_c^n \rho^{*n}. \end{aligned}\label{q6}\end{equation}

This looks worryingly simple, and may indicate that something has gone wrong (we've thrown away too much by assuming a single spatial frequency). We can also convert back to $\upsilon_\vartheta = \mathbf F \rho$, the spatial solution on the unit circle, and find:

\begin{equation}\begin{aligned} \dot \upsilon_\vartheta &\propto -\upsilon_\vartheta + \sum_{n=1}^\infty a_n (k_c \upsilon_\vartheta)^n \\ & = -\upsilon_\vartheta+ f[ k_c \upsilon_\vartheta ] - \upsilon_0. \end{aligned}\label{q7}\end{equation}

This is strange, and obviously wrong: We have lost all spatial coupling! Is the mistake conceptual (this 1D ring parameterization is no good), or did I make a calculation error?

I think the mistake is conceptual. All solutions composed of only $\omega_c$ are eigenfunctions of $\mathbf K*$ with eigenvalue $k_c$. We made the very general assumption that our solution can be any function $\eta_\theta$ in the frequency domain confined to a ring at $\omega_0$. This is too permissive: Arbitrary $\eta_\theta$ mean that we can also have arbitrary $\upsilon_\vartheta$ in the spatial domain, even quite strange ones. Because all of these solutions are still eigenfunctions of $\mathbf K*$, the convolution kernel does not act on them to change their (angular) frequency structure. This removes the spatial coupling.

In order to recover a 1D approximation that includes coupling, we need to limit the solutions $\eta_\theta$ in some way. In retrospect, this makes sense. In symmetric bifurcation theory for Turing patterns, we typically consider a scenario with positive eigenvalues at more frequencies than just $\omega_c$. So I suppose I will come back to this some day: Can you find a 1D neural field equation that describes the evolution of rotational symmetries of emerging Turing patterns in 2D neural field equations?

Tuesday, November 8, 2011

Point-Process Generalized Linear Models (PPGLMs) for spike train analysis

Get these notes as PDF here.

In neuroscience, we often want to understand how neuronal spiking relates to other variables. For this, tools like linear regression and correlation often work well enough. However, linear approaches presume that the underlying relationships are linear and the noise is Gaussian, neither of which are true for neural datasets. The Generalized Linear Model (GLM) extends linear methods to better account for nonlinearity and non-Gaussian noise.

GLMs for spike train analysis

We're interested in understanding how a neural spike train $y(t)$ relates to some other covariates $\mathbf x(t)$. In continuous time, a spike train is modeled as a train of impulses at each spike time $t_i$, $y(t)=\sum_i \delta(t-t_i)$. In practice, we always work in discrete time. It is common to choose a time step $\Delta t$ small enough so that at most one spike occurs in each time bin.

Two types of GLM that are common in spike train analysis:

  1. The Poisson GLM assumes that spikes arise from an inhomogeneous Poisson process with time-varying rate $\lambda(t)$ that depends log-linearly on the covariates $\mathbf x(t)$. In practice, it is common to treat the Poisson GLM in discrete time by assuming that the rate $\lambda(t)$ is constant within each time-bin:
\begin{equation*} \begin{aligned} y_t &\sim \operatorname{Poisson}(\lambda_t \cdot \Delta t) \\ \lambda_t &= \exp\left(\mathbf w^\top \mathbf x_t\right) \end{aligned} \end{equation*}
  1. The Bernoulli GLM models each time-bin of a binary, discrete-time spike train $y_t\in\{0,1\}$ as a Bernoulli "coin flip" with $\Pr(y{=}1)=p$.
\begin{equation*} \begin{aligned} y_t &\sim \operatorname{Bernoulli}(p_t) \\ p_t &= \frac 1 {1+\exp[-\mathbf w^\top \mathbf x(t)]} \end{aligned} \end{equation*}

Maximum likelihood

GLMs are typically estimated using maximum likelihood when testing how covariates $\mathbf x(t)$ influence spiking. (Other fitting procedures may be more appropriate when GLMs are used to model spiking dynamical systems.) The maximum likelihood procedure finds the weights $\mathbf w$ that maximize the likelihood of observing spike train $\mathbf y=\{y_1,..,y_T\}$, given covariates $\mathbf X = \{ \mathbf x_1,..,\mathbf x_T\}$, over all $T$ recorded time-bins.

\begin{equation*} \begin{aligned} \mathbf w = \underset{\mathbf w}{\operatorname{argmax}} \Pr( \mathbf y | \mathbf w, \mathbf X) \end{aligned} \end{equation*}

In practice, likelihood maximization is typically phrased in terms of minimizing the negative log-likelihood. Working with log-likelihood is more numerically stable, and the problem is phrased as minimization so that it is more straightforward to plug-in to off-the-shelf minimization code.

\begin{equation*} \begin{aligned} \mathbf w = \underset{\mathbf w}{\operatorname{argmin}} -\ln\Pr( \mathbf y | \mathbf w, \mathbf X) \end{aligned} \end{equation*}

Assuming observations are independent, the negative log-likelihood factors into a sum over time-points. Equivalently, one can minimize the average negative log-likelihood, over samples, $- \left< \ln\Pr( y_t | \mathbf w, \mathbf x_t) \right>$. The maximum likelihood parameters are typically solved for via gradient descent or the Newton-Raphson method . This requires computing the gradient and hessian of the negative log-likelihood.

Gradient and Hessian for the Poisson GLM

We can calculate the gradient and Hessian for the Poisson GLM by substituting the Poisson probability density function, $\Pr(y)=\lambda^y e^{-\lambda}/y!$, in to our expression for the negative log-likelihood:

\begin{equation*} \begin{aligned} -\left<\ln\Pr( y_t | \mathbf w, \mathbf x_t)\right> &= \left<\lambda_t - y_t \ln(\lambda_t) - \ln(y!)\right> \end{aligned} \end{equation*}

Because the term $\ln(y!)$ does not depend on $\mathbf w$, we can ignore it without affecting the location of our optimum. Substituting in $\lambda = \exp(\mathbf w^\top \mathbf x)$, we get:

\begin{equation*} \begin{aligned} \ell &= \left< \exp(\mathbf w^\top \mathbf x) - y_t \mathbf w^\top \mathbf x_t\right> \end{aligned} \end{equation*}

Finding weight $\mathbf w$ that minimize $\ell$ is equivalent to solving for the maximum-likelihood weights. The gradient and Hessian of of $\ell$ in $\mathbf w$ are:

\begin{equation*} \begin{aligned} \frac {\partial \ell} {\partial w_i} &= \left<[ \exp(\mathbf w^\top \mathbf x_t) - y_t] x_i \right> = \left<(\lambda_t - y_t) x_i \right> \\ \frac {\partial \ell} {\partial w_i \partial w_j} &= \left<[ \exp(\mathbf w^\top \mathbf x_t) ] x_i x_j \right> = \left< \lambda_t x_i x_j \right> \end{aligned} \end{equation*}

In matrix notation, these derivatives can then be written as:

\begin{equation*} \begin{aligned} \nabla \ell &= \langle \mathbf x( \lambda - y) \rangle \\ \mathbf H \ell &= \langle \lambda \mathbf x \mathbf x^\top \rangle \end{aligned} \end{equation*}

Gradient and Hessian for the Bernoulli GLM

The Bernoulli GLM is similar, with the observation probability given by the Bernoulli distribution $\Pr(y) = p^y (1-p)^{1-y}$

\begin{equation*} \begin{aligned} \left< -\ln\Pr( y_t | \mathbf w, \mathbf x_t) \right> &=- \left< y_t \ln(p_t) + (1-y_t) \ln(1-p_t) \right> \\&=- \left< y_t \ln\left(\tfrac{p_t}{1-p_t}\right) + \ln(1-p_t) \right> \end{aligned} \end{equation*}

Then, using $p = [1 + \exp(-\mathbf w^\top \mathbf x)]^{-1}$, i.e. $\mathbf w^\top \mathbf x = \ln[p/(1-p)]$, we get:

\begin{equation*} \begin{aligned} \ell &= \left< \ln[1+\exp(\mathbf w^\top \mathbf x_t)] - y_t \mathbf w^\top \mathbf x_t \right> \end{aligned} \end{equation*}

The gradient and Hessian of of $\ell$ in $\mathbf w$ are:

\begin{equation*} \begin{aligned} \frac {\partial \ell} {\partial w_i} &= \left< \left[\frac{\exp(\mathbf w^\top \mathbf x_t)}{1+\exp(\mathbf w^\top \mathbf x_t)} - y_t\right] x_i\right> \\&= \left<(p_t - y_t) x_i\right> \\ \frac {\partial \ell} {\partial w_i \partial w_j} &= \left<p_t (1-p_t) x_i x_j\right> \end{aligned} \end{equation*}

In matrix notation:

\begin{equation*} \begin{aligned} \nabla \ell &= \langle \mathbf x (p-y)\rangle \\ \mathbf H \ell &= \langle p(1-p) \cdot \mathbf x \mathbf x^\top \rangle \end{aligned} \end{equation*}

Iteratively reweighted least squares

PPGLMs can also be fit to spike train data using Iteratively Reweighted Least Squares (IRLS) . Recall that for a linear model $\mathbf y = \mathbf w^\top \mathbf x$, the Ordinary Least Squares (OLS) solution is:

$$ \mathbf w = \langle \mathbf x \mathbf x^\top \rangle^{-1} \langle \mathbf x \mathbf y^\top \rangle. $$

The IRLS approach phrases optimizing the parameters of the GLM in terms of repeated iterations of a reweighted least-squares problem. To derive this, first recall the definition of the Newton-Raphson update:

\begin{equation*} \begin{aligned} \mathbf w_{n+1} = \mathbf w_n - \mathbf H\ell(\mathbf w_n)^{-1} \nabla \ell(\mathbf w_n) \end{aligned} \end{equation*}

For the Poisson GLM, this is

\begin{equation*} \begin{aligned} \mathbf w_{n+1} = \mathbf w_n + \langle \lambda \mathbf x \mathbf x^\top \rangle^{-1}\langle \mathbf x (y-\lambda) \rangle \end{aligned} \end{equation*}

For e.g. the Poisson GLM, IRLS rewrites this as a least squares problem by defining weights $\lambda$ and pseudo-variables $z = \mathbf w^\top \mathbf x + \tfrac 1 \lambda (y - \lambda)$. We can confirm that the IRLS update is equivalent to the Newton-Raphson update:

\begin{equation*} \begin{aligned} \mathbf w_{n+1} &= \langle \lambda \mathbf x \mathbf x^\top \rangle^{-1} \langle \lambda \mathbf x z^\top \rangle \\&= \langle \lambda \mathbf x \mathbf x^\top \rangle^{-1} \left[ \langle \lambda \mathbf x [ \mathbf x^\top \mathbf w_n + \tfrac 1 \lambda (y-\lambda)] \rangle \right] \\&= \langle \lambda \mathbf x \mathbf x^\top \rangle^{-1} \left[ \langle \lambda \mathbf x \mathbf x^\top\rangle \mathbf w_n + \langle \mathbf x (y-\lambda) \rangle \right] \\&= \mathbf w_n + \langle \lambda \mathbf x \mathbf x^\top \rangle^{-1} \langle \mathbf x (y-\lambda) \rangle \end{aligned} \end{equation*}

Expected log-likelihoods

It's also possible to approximately fit $\mathbf w$ knowing only the mean and covariance of $\mathbf x$. This reduces computational complexity, since it avoids having to process the whole data matrix when optimizing $\mathbf w$. Previously, we calculated negative log-likelihood as an expectation over the data time-points. Here, we instead calculate these expectations based on a Gaussian model of the covariates $\mathbf x \sim \mathcal N(\mu_{\mathbf x}, \Sigma_{\mathbf x} )$. For example in the Poisson case, the gradient of the log-likelihood is :

\begin{equation*} \begin{aligned} \nabla \ell &= \langle \mathbf x( \lambda - y) \rangle = \langle \mathbf x \exp(\mathbf w^\top \mathbf x) \rangle - \langle \mathbf x y \rangle \end{aligned} \end{equation*}

The term $\langle\mathbf x y\rangle$ does not depend on $\mathbf w$, and can be computed in advance. The term $\langle \mathbf x \exp(\mathbf w^\top \mathbf x) \rangle$ has a closed-form solutions based on the log-normal distribution :

\begin{equation*} \begin{aligned} \langle \mathbf x \lambda \rangle &= [\langle \mathbf x \rangle + \mathbf w^\top \Sigma_{\mathbf x}] \langle\lambda\rangle \\ \langle \lambda \rangle &= \langle \exp(\mathbf w^\top \mathbf x) \rangle \\&= \exp( \mathbf w^\top \mu_{\mathbf x} + \tfrac 1 2 \mathbf w^\top \Sigma_{\mathbf x} \mathbf w) \end{aligned} \end{equation*}

The Hessian also has a closed form. This avoids having to recompute the re-weighted mean/covariances on every iteration. However, one still must calculate a mean and covariance initially. This approximation will only remain valid in the case that $\mathbf x$ is truly Gaussian. However, it can be used to pick an initial $\mathbf w_0$ before continuing with Newton-Raphson.

Edit: it seems like the Poisson GLM for Gaussian $\mathbf x$ might reduce to something like the spike-tiggered average in the limit where spikes are rare? 

Wednesday, September 28, 2011

The origin and properties of flicker-induced geometric phosphenes

A Model for the Origin and Properties of Flicker-Induced Geometric Phosphenes (PDF).

Many people see geometric patterns when looking at flickering lights. The patterns depend on the frequency, color, and intensity of the flickering. Different people report seeing similar shapes, which are called “form constants”. 

Flicker hallucinations are best induced using a Ganzfeld (German for “entire field”): an immersive, full-field, uniform visual stimulation. Frequencies ranging from 8 to 30 Hz are most effective. 

This effect is used by numerous sound-and-light machines sold for entertainment purposes. Some of these devices claim to alter the frequency of brain waves. There is no scientific evidence for this. However, the flickering stimulus may increase the amplitude of oscillations that are already present in the brain, to the point where geometric visual hallucinations can occur.

Figure 1. Illustrations of basic phosphene patterns (form constants) as they appear subjectively (left), and their transformation to planar waves in cortical coordinates (right).

Wednesday, July 27, 2011

Online Fourier transforms as filter banks and state-space models


The online Fourier transform

The online Fourier transform tracks a sliding window of the recent history of the time series. This window is updated by advancing one time step, and the FFT is used to get the power spectrum. 

One can speed up computation by making updates recursive: on each time step, first subtract the contribution of the oldest timepoint, then add the contribution from the current timepoint. We can generalize this recursive approach by interpreting the online FT as a bank of band-pass filters.

Friday, May 13, 2011

Correlation, mean-squared-error, mutual information, and signal-to-noise ratio for Gaussian random variables

I was reading a paper, and encountered a figure that showed the correlation, mutual information, and mean-squared prediction error, for a pair of time-series. This seemed a bit redundant. It turns out it was added to the paper on the request of a reviewer. If your data are jointly Gaussian, these all measure the same thing; no need to clutter a figure by showing all of them.

[get these notes as PDF]

For a jointly Gaussian pair of random variables, correlation, root mean squared error, correlation, and signal to noise ratio, all measure the same thing and can be computed from each-other. 

Wednesday, April 13, 2011

Limit of an infinite chain of first-order exponential filters

First-order exponential filter

The simplest model how the voltage $x$ at a synapse responds to input $u$ is a first-order filter:

$$\tau \dot x = -x + u.$$

This corresponds to convolving signal $u(t)$ with exponential filter $\operatorname H(t) \exp(-t/\tau)$, where $\operatorname H(\cdot)$ is the Heaviside step function:

$$\begin{aligned}x(t) &= h(t) * u(t)\\h(t)&=\operatorname H(t) \exp(-t/\tau).\end{aligned}$$

The alpha function

A first-order filter has a discontinuous jump in response to an abrupt inputs (like spikes). A more realistic response is the "alpha function"  $t\cdot \exp(-t)$. The alpha function can be obtained by convolving two first decay functions (i.e. chaining together two first-order filters):