#StackBounty: #sampling #asymptotics #measure-theory #survey-sampling #influence-function How is the asymptotic justification of the &q…

Bounty: 50

The survey R package recently adopted the "linearization by influence function" method of estimating covariances between domain estimates. The central paper justifying this method is Deville (1999). I’m trying to understand the main asymptotic claim made in the paper and some confusing aspects of the proof.

The result is summarized concisely by Deville and others in this 2009 Biometrika paper:

In Deville’s approach, a population parameter of interest $Phi$ can be written as a functional $T$ with respect to a finite and discrete measure $M$, namely $Phi=T(M)$. The substitution estimator $hat{Phi}=T(hat{M})$ is the functional $T$ of a random measure $hat{M}$ that is associated with sampling weights $w_{k}, k in U$, and is ‘close’ to $M$. Suppose that $T$ is homogeneous of degree $alpha$, so that $T(r M)=r^{alpha} T(M)$, and $lim _{N rightarrow infty} N^{-alpha} T(M)<infty$. Under broad assumptions, Deville shows that
sqrt{n} N^{-alpha}{T(hat{M})-T(M)} &=sqrt{n} N^{-alpha} int I_{T}(M, z) d(hat{M}-M)(z)+o_{p}(1) \
&=sqrt{n} N^{-alpha} sum_{k=1}^{N} u_{k}left(w_{k}-1right)+o_{p}(1)

The linearized variables $u_{k}$ are the influence functions $I_{T}left(M, z_{k}right)$, where $z_{k}$ is the value of the variable of interest for the $k$ th unit.

This exact equation doesn’t show up in Deville’s 1999 paper as far as I can see. Instead, there is a tantalizingly similar-looking result on page 6.

Result: Under broad assumptions, the substitution estimation of a functional $T(M)$ is linearizable. A linearized variable is $z_{k}=I Tleft(M ; x_{k}right)$ where $I T$ is the influence function of $T$ in $M$.

Proof of the result: Let us provide the space of measurements on $boldsymbol{R}^{q}$ with a metric $d$ accounting for the convergence: $dleft(M_{1}, M_{2}right) rightarrow 0$ if and only if $N^{-1}left(int y d M_{1}-int y d M_{2}right) rightarrow 0$ for any variable of interest $y$. The asymptotic postulates mean that $d(hat{M} / N, M / N)$ tends towards zero. We can visibly ensure that $d(hat{M} / N, M / N)$ is $O_{p}(1 / sqrt{n})$ according to the third postulate. Now, let us assume that $T$ can be derived in accordance with Fréchet, i.e., for any direction of the increase, in the space of "useful" measures provided with the abovementioned metric. Thus we have:
N^{-alpha}(T(hat{M})-T(M))=frac{1}{N} sum_{U} z_{k}left(w_{k}-1right)+oleft(dleft(frac{hat{M}}{M}, frac{M}{N}right)right)

The result is that:
sqrt{n} N^{-alpha}(T(hat{M})-T(M))=frac{sqrt{n}}{N} sum_{U} z_{k}left(w_{k}-1right)+o_{p}(1) .

However, this result in the original 1999 paper is not the same result as the 2009 Biometrika paper. Why does the right-hand side of the Deville 1999 equation use $N^{-1}$, while the right-hand side of the quoted equation in the 2009 paper uses $N^{-alpha}$?

The equation in Deville 1999 doesn’t make sense to me. For example, the mean is a statistic of degree 0 and its influence function is $z_{k}=frac{1}{N}left(y_{k}-bar{Y}right)$, so with the Deville 1999 equation we would end up with the nonsensical result that
$sqrt{n}(hat{bar{Y}} – bar{Y}) = frac{sqrt{n}}{N} left[hat{bar{Y}} – bar{Y}(frac{hat{N}}{N}) right] + o_p(1)$.

And the proof seems to contain some hidden steps. How is that first equation in Deville 1999 derived? It seems to involve the following missing step, but it’s not clear how even this equation would be established.

N^{-alpha}(T(hat{M})-T(M))= N^{-1} int I_{T}(M, z) d(hat{M}-M)(z)+oleft(dleft(frac{hat{M}}{M}, frac{M}{N}right)right)

Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.