#StackBounty: #maximum-likelihood #inference #fisher-information Connection between Fisher information and variance of score function

Bounty: 100

The fisher information’s connection with the negative expected hessian at $theta_{MLE}$, provides insight in the following way: at the MLE, high curvature implies that an estimate of $theta$ even slightly different from the true MLE would have resulted in a very different likelihood.
mathbf{I}(theta)=-frac{partial^{2}}{partialtheta_{i}partialtheta_{j}}l(theta),~~~~ 1leq i, jleq p

This is good, as that means that we can be relatively sure about our estimate.

The other connection of Fisher information to variance of the score, when evaluated at the MLE is less clear to me.
$$ I(theta) = E[(frac{partial}{partialtheta}l(theta))^2]$$

The implication is; high Fisher information -> high variance of score function at the MLE.

Intuitively, this means that the score function is highly sensitive to the sampling of the data. i.e – we are likely to get a non-zero gradient of the likelihood, had we sampled a different data distribution. This seems to have a negative implication to me. Don’t we want the score function = 0 to be highly robust to different sampling of the data?

A lower fisher information on the other hand, would indicate the score function has low variance at the MLE, and has mean zero. This implies that regardless of the sampling distribution, we will get a gradient of log likelihood to be zero (which is good!).

What am I missing?

Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.