Wednesday, February 28, 2018

Optimal encoding in stochastic latent-variable models

Update: the review process for this was a long one, But! It is published now, in Entropy [PDF].

The sensory system encodes the external world using spiking population codes, and must contend with the fixed bandwidth and noise inherent to neural communication. In this work, we explored a machine-learning model that shares similar constraints, and examined the coding strategies it learned when trained to encode visual input.

We used Restricted Boltzmann Machines (RBMs), which are a two-layer stochastic binary neural network that learn generative models of their inputs. We explored how optimized encoders handle limited encoding resources by varying the number of binary "neurons" available to represent visual inputs.

We found several statistical signatures that emerge around an "optimal" model size: one that is just large enough to encode its inputs. Around the optimal model model size, we saw emergence of statistical features often observed in neural population codes: sparsity, decorrelation, statistical criticality, and variability suppression.

We interpret the learned encoding strategies as a way to encode stimuli with variable bit-rates over a noisy channel of fixed bandwidth. Low-noise regions of coding space are reserved for stimuli that take the most bits to describe. In our simulations, the networks learned to suppress noise by strongly silencing some neurons.

Common stimuli don't require very many bits to transmit (from a Shannon coding perspective). Such stimuli are encoded in the noisier regions of the neural code, and exhibit increased variability. Increase neural variability in the presence of limited information is another feature observed in neural population codes.

Preview: Figure 5

 

Analyses of parameter sensitivity suggests an optimal model size for encoding sensory statistics: (a) Analysis of the Fisher Information Matrix (FIM) over a range of hidden-layer sizes (top to bottom; 13 visible units). From left to right, (1) FIM eigenvalue spectra $\lambda_i$ (y-axis) over a range of inverse temperatures β indicate that model fits (β=1) past a certain size lie at a peak in their generalized susceptibility. This is a correlate of criticality in Ising spin models. Eigenvalues below $10^{−5}$ are truncated, and the largest and smallest eigenvalues are in red; (2) Important parameters in the leading FIM eigenvector align with individual hidden units, and become sparse for larger hidden layers. The eigenvector is displayed separately for the weights (matrix), and the visible (vertical) and hidden (horizontal) biases; (3) The average sensitivity of each parameter over all FIM eigenvectors, shown here as the square root of the FIM diagonal, also shows sparsity, indicating that beyond a certain size additional hidden units contribute little to model accuracy. Data is shown as in column 2; (4) Variance of the hidden unit activation as a function of stimulus energy. In larger models, units with sensitive parameters contribute to encoding low energy, less informative patterns. (b) The average sensitivity of each parameter, measured by the trace of the FIM, normalized by hidden-layer size, decreases as hidden-layer size grows. (c) Hidden unit projective fields from a model with 37 visible and 60 hidden units, ordered by relative sensitivity (rank indicated above each image). More important units (ranks 1–8) encode spatially simple features such as localized patches, while the least important ones (ranks 53–60) have complex features.

Overall, 

We showed that machine learning models that resemble spiking population codes learn coding strategies tantalizingly similar to what is seen in vivo. We also learned that the statistical features of the population code can be used to check whether a model is too small, too large, or just the right size, to encode its inputs. 

Many thanks to Martino Sorbaro and Matthias H. Hennig for sticking with this paper during the long review. This work can be cited as

Rule, M.E., Sorbaro, M. and Hennig, M.H., 2020. Optimal encoding in stochastic latent-variable Models. Entropy, 22(7), p.714.