Feed-forward neural networks consist of a series of layers. In each layer, outputs from past layers are combined linearly, then passed through some nonlinear transformation. As long as all computations are differentiable, the entire network is differentiable as well. This allows artificial neural networks to be trained using gradient-based optimization techniques (backpropagation).
Methods for training stochastic networks via backpropagation are less well developed, but solutions exist and are the subject of ongoing research (c.f. Rezende et al. 2014 and the numerous papers that cite it). In the context of models of neural computation, Echeveste et al. (2019) trained stochastic neural networks with rectified-polynomial nonlinearities.
One advantage of stochastic neural networks is that they allow a non-differentiable binary spiking network to be treated as differentiable. Backpropagation with binary spiking networks ill-posed, because the derivative of the hard-threshold (Heaviside step function) diverges. Deterministic binary networks are typically trained using pseudogradient methods, in which the hard-threshold is replaced by a differentiable soft-threshold when propagating gradients backwards.
However, in stochastic networks, it is possible for the moments of network activity (e.g. mean $\mu$ and covaraiance $\Sigma$) to be differentiable, even if the underlying activations are not. This opens up another avenue for training binary networks using backpropagation.
Figure 1: training stochastic neural networks (a) Weighted sums over many inputs average together into a Gaussian variable (due to the central limit theorem), and samples from the noisy network can be described in terms of their mean and covariance. (b) Instead of mean firing rates, one can propagate both means and covariances through the layers in a neural network, using moment approximations. This provides a differentiable representation of stochastic binary networks that can be trained via backpropagation.
No comments:
Post a Comment