Wednesday, May 16, 2012

Note: Differentiating expectations of a function of a random variable with respect to location and scale parameters

Consider a real-valued random variable with a known probability distribution $\Pr(z) = \phi(z)$. From $\phi(z)$, one can generate a scale/location family of probability densities by scaling and shifting $\phi(z)$:

$$ \Pr\left( x ; \mu, \sigma \right) = \frac 1 \sigma \phi \left( \frac {x - \mu}{\sigma} \right) $$

The most familiar example of such a family is the univariate Gaussian distribution, when $\phi(z) = [ 2\pi]^{-1/2}\exp\left(-\tfrac 1 2 z^2\right)$. Now, consider the expectation of a function of $\langle f(x) \rangle$ with respect to $\Pr\left( x \right)$.

$$ \langle f(x) \rangle = \int_{\mathbb R} f(x) \Pr(x) dx = \int_{\mathbb R} f(x) \frac 1 \sigma \phi\left(\frac{x-\mu}{\sigma}\right) dx $$

What are the derivatives of $\langle f(x) \rangle$ with respect to $\mu$ and $\sigma^2$? The answers are:

\begin{equation}\begin{aligned} \partial_\mu \langle f(x) \rangle &= \langle f'(x) \rangle \\ \partial_{\sigma^2} \langle f(x)\rangle &= \tfrac1{2\sigma^2} \left<(x-\mu) f'(x)\right>_x \end{aligned}\end{equation}

This question often appears in the special case that $x$ is normally distributed; You'll find derivations elsewhere online given in terms of the cumulative distribution function of the standard normal distribution.

This note outlines a derivation for any scale/location family using elementary calculus. These derivatives can be obtained by considering how perturbing $\mu$ or $\sigma$ shifts and/or scales the probability density.

For the mean, consider the definition of the derivative:

\begin{equation}\begin{aligned} \frac{d}{d\mu} \langle f(x) \rangle &= \lim_{\epsilon\to0} \frac1\epsilon \left\{ \int_{\mathbb R} f(x) \frac 1 \sigma \phi\left(\frac{x - \epsilon -\mu}{\sigma}\right) dx - \int_{\mathbb R} f(x) \frac 1 \sigma\phi\left(\frac{x-\mu}{\sigma}\right) dx \right\} \\ &= \lim_{\epsilon\to0} \frac1\epsilon \left\{ \int_{\mathbb R} f(x) \frac 1 \sigma \phi\left(\frac{x - \epsilon -\mu}{\sigma}\right) dx - \langle f(x) \rangle \right\} \end{aligned}\end{equation}

Let $y = x-\epsilon$. Then perform a change of variables ($dy = dx$ and $x = y + \epsilon$):

\begin{equation}\begin{aligned} \frac{d}{d\mu} \langle f(x) \rangle &= \lim_{\epsilon\to0} \frac1\epsilon\left\{ \int_{\mathbb R} f(y+\epsilon) \frac1\sigma \phi\left(\frac{y -\mu}{\sigma}\right) dy-\langle f(x)\rangle \right\} \\ &= \lim_{\epsilon\to0} \frac1\epsilon\left\{ \langle f(x+\epsilon) \rangle - \langle f(x)\rangle \right\} \\ &= \lim_{\epsilon\to0} \frac1\epsilon\left\{ \langle f(x)+\epsilon f'(x) \rangle - \langle f(x)\rangle \right\} \\ &= \langle f'(x) \rangle \end{aligned}\end{equation}

For the variance, consider the derivative in $\sigma$:

\begin{equation}\begin{aligned} \frac d {d\sigma} \langle f(x)\rangle &= \lim_{\epsilon\to0} \frac1\epsilon\left\{ \int_{\mathbb R} f(x) \frac 1 {\sigma+\epsilon} \phi\left(\frac{x-\mu}{\sigma+\epsilon}\right) dx - \langle f(x) \rangle \right\} \end{aligned}\end{equation}

Let $y=\frac{\sigma}{\sigma+\epsilon}(x-\mu)+\mu$. This gives the change of variables \begin{equation}\begin{aligned} dz &= \frac{\sigma+\epsilon}{\sigma} dy\\x &= \tfrac{\sigma+\epsilon}{\epsilon}(y-\mu) + \mu = y +\frac{\epsilon}{\sigma}(y-\mu) \end{aligned}\end{equation}

Substituting, and simplifying:

\begin{equation}\begin{aligned} \frac d {d\sigma} \langle f(x)\rangle &= \lim_{\epsilon\to0} \frac1\epsilon\left\{ \int_{\mathbb R} f(y +\tfrac{\epsilon}{\sigma}(y-\mu)) \frac 1 {\sigma+\epsilon} \phi\left(\frac{y-\mu}{\sigma+\epsilon}\right) \frac{\sigma+\epsilon}{\sigma} dy - \langle f(x) \rangle \right\} \\ &= \lim_{\epsilon\to0} \frac1\epsilon\left\{ \int_{\mathbb R} f(y +\tfrac{\epsilon}{\sigma}(y-\mu)) \frac 1 {\sigma} \phi\left(\frac{y-\mu}{\sigma+\epsilon}\right) dy - \langle f(x) \rangle \right\} \\ &= \lim_{\epsilon\to0} \frac1\epsilon\left\{ \langle f(y +\tfrac{\epsilon}{\sigma}(y-\mu)) \rangle - \langle f(x) \rangle \right\} \\ &= \lim_{\epsilon\to0} \frac1\epsilon \langle f'(x) \tfrac{\epsilon}{\sigma}(x-\mu) \rangle \\ &= \frac{1}{\sigma} \langle f'(x) (x-\mu) \rangle \end{aligned}\end{equation}

To get the derivative in terms of the variance $\sigma^2$, apply the chain rule

\begin{equation}\begin{aligned} \frac d {d\sigma^2} \langle f(x)\rangle = \frac {d\sigma} {{d\sigma^2}} \frac d {d\sigma}\langle f(x)\rangle = \frac{1}{2\sigma^2} \langle f'(x) (x-\mu) \rangle \end{aligned}\end{equation}

No comments:

Post a Comment