What is the Cramer-Rao Lower Bound?1 2

July 11 2016 by wacoder Tags Bayesian Estimation

In estimation theory and statistics, the Cramer-Rao lower bound (CRLB), named in honor of Harald Cramer and Calyampudi Rao who were among the first to derive it, expresses a lower bound on the variance of estimator of a deterministic parameter. The CRLB tells the best we can ever expect to be able to do (with an unbiased estimator).

Statement

Suppose \(\theta\) is unknown deterministic parameter which is to be estimated from measurements \(x\), distributed according to some probability density function \(p(x;\theta)\). The variance of any unbiased estimator \(\hat{\theta}\) of \(\theta\) is then bounded by the reciprocal of the Fisher information \(I(\theta)\).

\begin{equation}\label{Eqn:crlb} var(\hat{\theta}) \geq \frac{1}{I(\theta)} \end{equation}

where the Fisher information \(I(\theta)\) is defined by

\begin{equation}\label{Eqn:fisher} I(\theta) = E[(\frac{\partial l(x;\theta)}{\partial \theta})^2] = -E[(\frac{\partial^2 l(x;\theta)}{\partial \theta^2}] \end{equation}

and \(l(x;\theta)=ln(p(x;\theta))\) is the natural logarithm of the likehihood function and \(E\) denotes the expected value (over \(x\)).

Derivation

Generally, given measurements \(x\) we want to find a local maximum for \(p(x;\theta)\) by guessing \(\theta\) such that \(p(x;\theta)\) is high and \(\frac{\partial}{\partial \theta}p(x;\theta)=0\). Since it is often prefereable to work with logartithm, we use the log likelihood function \(l(x:\theta)\). The score function is the derivative of the log likelihood function with respect to \(\theta\).

\begin{equation}\label{Eqn:scorefunc} s(x;\theta) = \frac{\partial}{\partial \theta}l(x;\theta) = \frac{1}{p(x;\theta)}\frac{\partial}{\partial \theta}p(x;\theta) \end{equation}

To calculate the expected value of the score function, we integrate over all values of \(x\):

\begin{align} \begin{aligned} E[s(x;\theta)]&=\int_{x\in X}s(x;\theta)p(x;\theta)dx \nonumber\\ &=\int_{x \in X} \frac{1}{p(x;\theta)} \frac{\partial}{\partial \theta} p(x;\theta) p(x;\theta) dx\\ & = \int_{x \in X} \frac{\partial}{\partial \theta}p(x;\theta) dx\\ & = \frac{\partial}{\partial \theta} \int_{x \in X} p(x;\theta)dx\\ & = \frac{\partial}{\partial \theta} 1 = 0 \end{aligned} \end{align}

Since the expectation of the score function \(s(x;\theta)\) is 0, the variance is simply: \(var[s(x;\theta)]=E[s^2(x;\theta)]\).

Given the unbiased estimator \(g(x)\):

\begin{align*} \begin{aligned} \int g(x)p(x;\theta)dx & = \theta\\ \frac{\partial}{\partial \theta} \int g(x)p(x;\theta)dx & = 1\\ \int g(x) \frac{\partial}{\partial \theta} p(x;\theta) & = 1\\ \int g(x) s(x;\theta) p(x;\theta) & = 1\\ E(g(x)s(x;\theta))& = 1 \end{aligned} \end{align*}

Consider the covariance of \(g(x)\) and \(s(x;\theta)\)

\begin{align*} \begin{aligned} cov[g(x), s(x;\theta)] &= E[(g(x)-\theta)(s(x,\theta)-0)]\\ & = E[g(x)s(x;\theta)-\theta s(x;\theta)]\\ & = E[g(x)s(x;\theta)]\\ & = 1 \end{aligned} \end{align*}

Reminder, \(cov[v(x),w(x)]^2 \leq var[v(x)] var[w(x)]\). Therefore

\begin{align*} \begin{aligned} cov[g(x), s(x;\theta)]^2 = 1 &\leq var[g(x)] var[s(x;\theta)]\\ var[g(x)]& \geq \frac{1}{var[s(x;\theta)]} \end{aligned} \end{align*}

To obtain \(var[s(x;\theta)]=-E[s^2(x;\theta)]\), we start with

\begin{align*} \begin{aligned} \int_{x\in X}p(x;\theta)dx &=1\\ \end{aligned} \end{align*}

we differentiate with respect to the parameter \(\theta\)

\begin{align*} \begin{aligned} \frac{\partial}{\partial \theta} \int_{x\in X}p(x;\theta)dx& =\int_{x\in X} \frac{\partial}{\partial \theta} p(x;\theta)dx =0\\ \end{aligned} \end{align*}

we rewrite the above equation by dividing and multiplying by \(p(x;\theta)\)

\begin{align*} \begin{aligned} \int_{x\in X} \frac{\partial}{\partial \theta} p(x;\theta) \frac{1}{p(x;\theta)}p(x;\theta)dx&=0\\ \int_{x\in X} \frac{\partial}{\partial \theta} l(x;\theta) p(x;\theta)dx&=0\\ \end{aligned} \end{align*}

We take the derivative of the above equation with respect with parameter \(\theta\)

\begin{align*} \begin{aligned} \int_{x\in X} \frac{\partial^2}{\partial \theta^2} l(x;\theta)p(x:\theta)dx+\int_{x\in X} \frac{\partial}{\partial \theta} l(x;\theta)\frac{\partial}{\partial \theta} l(x;\theta) p(x;\theta)dx&=0\\ \end{aligned} \end{align*}

From the above equation, we could get

\begin{align*} \begin{aligned} E[\frac{\partial^2}{\partial \theta^2} l(x;\theta)]+ E[s^2(x;\theta)]&=0\\ \end{aligned} \end{align*}

Finally, we get the Fisher information \(I(\theta\)

\begin{align} \begin{aligned} E[s^2(x;\theta)] &=-E[\frac{\partial^2}{\partial \theta^2} l(x;\theta)] \end{aligned} \end{align}

References


  1. The score function and Cramer-Rao lower bound, [online]

  2. Cramer-Rao lower bound: deriving the Fisher Information matrix, [online]