#StackBounty: #mixed-model #biostatistics #asymptotics #auc #asymptotic-covariance Mixed Model in a repeated measurement design and AUC

Bounty: 50

I have data on healthy patients and patients with cancer. My goal is to predict the cancer risk for each patient based on certain biological markers. Since I have repeated measurements, I was told to use a mixed model strategy, i.e. I assume that $$mathbb P(text{patient $i$ has cancer}mid x_i) = frac{1}{1+ exp(-x_icdotbeta – mu_i)},$$
where $x_i$ is the vector of biological markers of patient $i$, $mu_i$ is the random effect to account for repeated measurements, and $beta$ is the (unknown) coefficient vector. Here $cdot$ denotes the usual Euclidean inner product.

The quality of my model is assessed by the $AUC$. I know how to compute the $AUC$ and in a previous question here on CV, it was clarified why I can expect the $AUC$ to be asymptotically normal. However, all proofs for asymptotic normality of the $AUC$ assume independent observations, which is not the case here due to repeated measurements.

I suppose that the normality argument still holds as there are CLTs for dependent data. However, I could not find any proofs for asymptotic normality of an $AUC$ in such a setting. This made me think about whether I would even have to worry as I account for repeated measurements in my model, which is used to obtain the $AUC$. I am very confused (mainly because I spent thinking about this issue for the last couple of days). So my key questions are:

  1. Can I expect the $AUC$ to be asymptotically normal in the given setting, and if so why.
  2. How would I account for the repeated measurements in the variance of the $AUC$? Do I even have to bother given the fact that I account for repeated measurements in the model.


Get this bounty!!!

#StackBounty: #sampling #asymptotics #measure-theory #survey-sampling #influence-function How is the asymptotic justification of the &q…

Bounty: 50

The survey R package recently adopted the "linearization by influence function" method of estimating covariances between domain estimates. The central paper justifying this method is Deville (1999). I’m trying to understand the main asymptotic claim made in the paper and some confusing aspects of the proof.

The result is summarized concisely by Deville and others in this 2009 Biometrika paper:

In Deville’s approach, a population parameter of interest $Phi$ can be written as a functional $T$ with respect to a finite and discrete measure $M$, namely $Phi=T(M)$. The substitution estimator $hat{Phi}=T(hat{M})$ is the functional $T$ of a random measure $hat{M}$ that is associated with sampling weights $w_{k}, k in U$, and is ‘close’ to $M$. Suppose that $T$ is homogeneous of degree $alpha$, so that $T(r M)=r^{alpha} T(M)$, and $lim _{N rightarrow infty} N^{-alpha} T(M)<infty$. Under broad assumptions, Deville shows that
$$
begin{aligned}
sqrt{n} N^{-alpha}{T(hat{M})-T(M)} &=sqrt{n} N^{-alpha} int I_{T}(M, z) d(hat{M}-M)(z)+o_{p}(1) \
&=sqrt{n} N^{-alpha} sum_{k=1}^{N} u_{k}left(w_{k}-1right)+o_{p}(1)
end{aligned}
$$

The linearized variables $u_{k}$ are the influence functions $I_{T}left(M, z_{k}right)$, where $z_{k}$ is the value of the variable of interest for the $k$ th unit.

This exact equation doesn’t show up in Deville’s 1999 paper as far as I can see. Instead, there is a tantalizingly similar-looking result on page 6.

Result: Under broad assumptions, the substitution estimation of a functional $T(M)$ is linearizable. A linearized variable is $z_{k}=I Tleft(M ; x_{k}right)$ where $I T$ is the influence function of $T$ in $M$.

Proof of the result: Let us provide the space of measurements on $boldsymbol{R}^{q}$ with a metric $d$ accounting for the convergence: $dleft(M_{1}, M_{2}right) rightarrow 0$ if and only if $N^{-1}left(int y d M_{1}-int y d M_{2}right) rightarrow 0$ for any variable of interest $y$. The asymptotic postulates mean that $d(hat{M} / N, M / N)$ tends towards zero. We can visibly ensure that $d(hat{M} / N, M / N)$ is $O_{p}(1 / sqrt{n})$ according to the third postulate. Now, let us assume that $T$ can be derived in accordance with Fréchet, i.e., for any direction of the increase, in the space of "useful" measures provided with the abovementioned metric. Thus we have:
$$
N^{-alpha}(T(hat{M})-T(M))=frac{1}{N} sum_{U} z_{k}left(w_{k}-1right)+oleft(dleft(frac{hat{M}}{M}, frac{M}{N}right)right)
$$

The result is that:
$$
sqrt{n} N^{-alpha}(T(hat{M})-T(M))=frac{sqrt{n}}{N} sum_{U} z_{k}left(w_{k}-1right)+o_{p}(1) .
$$

However, this result in the original 1999 paper is not the same result as the 2009 Biometrika paper. Why does the right-hand side of the Deville 1999 equation use $N^{-1}$, while the right-hand side of the quoted equation in the 2009 paper uses $N^{-alpha}$?

The equation in Deville 1999 doesn’t make sense to me. For example, the mean is a statistic of degree 0 and its influence function is $z_{k}=frac{1}{N}left(y_{k}-bar{Y}right)$, so with the Deville 1999 equation we would end up with the nonsensical result that
$sqrt{n}(hat{bar{Y}} – bar{Y}) = frac{sqrt{n}}{N} left[hat{bar{Y}} – bar{Y}(frac{hat{N}}{N}) right] + o_p(1)$.

And the proof seems to contain some hidden steps. How is that first equation in Deville 1999 derived? It seems to involve the following missing step, but it’s not clear how even this equation would be established.

$$
N^{-alpha}(T(hat{M})-T(M))= N^{-1} int I_{T}(M, z) d(hat{M}-M)(z)+oleft(dleft(frac{hat{M}}{M}, frac{M}{N}right)right)
$$


Get this bounty!!!

#StackBounty: #sampling #asymptotics #measure-theory #survey-sampling #influence-function How is the asymptotic justification of the &q…

Bounty: 50

The survey R package recently adopted the "linearization by influence function" method of estimating covariances between domain estimates. The central paper justifying this method is Deville (1999). I’m trying to understand the main asymptotic claim made in the paper and some confusing aspects of the proof.

The result is summarized concisely by Deville and others in this 2009 Biometrika paper:

In Deville’s approach, a population parameter of interest $Phi$ can be written as a functional $T$ with respect to a finite and discrete measure $M$, namely $Phi=T(M)$. The substitution estimator $hat{Phi}=T(hat{M})$ is the functional $T$ of a random measure $hat{M}$ that is associated with sampling weights $w_{k}, k in U$, and is ‘close’ to $M$. Suppose that $T$ is homogeneous of degree $alpha$, so that $T(r M)=r^{alpha} T(M)$, and $lim _{N rightarrow infty} N^{-alpha} T(M)<infty$. Under broad assumptions, Deville shows that
$$
begin{aligned}
sqrt{n} N^{-alpha}{T(hat{M})-T(M)} &=sqrt{n} N^{-alpha} int I_{T}(M, z) d(hat{M}-M)(z)+o_{p}(1) \
&=sqrt{n} N^{-alpha} sum_{k=1}^{N} u_{k}left(w_{k}-1right)+o_{p}(1)
end{aligned}
$$

The linearized variables $u_{k}$ are the influence functions $I_{T}left(M, z_{k}right)$, where $z_{k}$ is the value of the variable of interest for the $k$ th unit.

This exact equation doesn’t show up in Deville’s 1999 paper as far as I can see. Instead, there is a tantalizingly similar-looking result on page 6.

Result: Under broad assumptions, the substitution estimation of a functional $T(M)$ is linearizable. A linearized variable is $z_{k}=I Tleft(M ; x_{k}right)$ where $I T$ is the influence function of $T$ in $M$.

Proof of the result: Let us provide the space of measurements on $boldsymbol{R}^{q}$ with a metric $d$ accounting for the convergence: $dleft(M_{1}, M_{2}right) rightarrow 0$ if and only if $N^{-1}left(int y d M_{1}-int y d M_{2}right) rightarrow 0$ for any variable of interest $y$. The asymptotic postulates mean that $d(hat{M} / N, M / N)$ tends towards zero. We can visibly ensure that $d(hat{M} / N, M / N)$ is $O_{p}(1 / sqrt{n})$ according to the third postulate. Now, let us assume that $T$ can be derived in accordance with Fréchet, i.e., for any direction of the increase, in the space of "useful" measures provided with the abovementioned metric. Thus we have:
$$
N^{-alpha}(T(hat{M})-T(M))=frac{1}{N} sum_{U} z_{k}left(w_{k}-1right)+oleft(dleft(frac{hat{M}}{M}, frac{M}{N}right)right)
$$

The result is that:
$$
sqrt{n} N^{-alpha}(T(hat{M})-T(M))=frac{sqrt{n}}{N} sum_{U} z_{k}left(w_{k}-1right)+o_{p}(1) .
$$

However, this result in the original 1999 paper is not the same result as the 2009 Biometrika paper. Why does the right-hand side of the Deville 1999 equation use $N^{-1}$, while the right-hand side of the quoted equation in the 2009 paper uses $N^{-alpha}$?

The equation in Deville 1999 doesn’t make sense to me. For example, the mean is a statistic of degree 0 and its influence function is $z_{k}=frac{1}{N}left(y_{k}-bar{Y}right)$, so with the Deville 1999 equation we would end up with the nonsensical result that
$sqrt{n}(hat{bar{Y}} – bar{Y}) = frac{sqrt{n}}{N} left[hat{bar{Y}} – bar{Y}(frac{hat{N}}{N}) right] + o_p(1)$.

And the proof seems to contain some hidden steps. How is that first equation in Deville 1999 derived? It seems to involve the following missing step, but it’s not clear how even this equation would be established.

$$
N^{-alpha}(T(hat{M})-T(M))= N^{-1} int I_{T}(M, z) d(hat{M}-M)(z)+oleft(dleft(frac{hat{M}}{M}, frac{M}{N}right)right)
$$


Get this bounty!!!

#StackBounty: #sampling #asymptotics #measure-theory #survey-sampling #influence-function How is the asymptotic justification of the &q…

Bounty: 50

The survey R package recently adopted the "linearization by influence function" method of estimating covariances between domain estimates. The central paper justifying this method is Deville (1999). I’m trying to understand the main asymptotic claim made in the paper and some confusing aspects of the proof.

The result is summarized concisely by Deville and others in this 2009 Biometrika paper:

In Deville’s approach, a population parameter of interest $Phi$ can be written as a functional $T$ with respect to a finite and discrete measure $M$, namely $Phi=T(M)$. The substitution estimator $hat{Phi}=T(hat{M})$ is the functional $T$ of a random measure $hat{M}$ that is associated with sampling weights $w_{k}, k in U$, and is ‘close’ to $M$. Suppose that $T$ is homogeneous of degree $alpha$, so that $T(r M)=r^{alpha} T(M)$, and $lim _{N rightarrow infty} N^{-alpha} T(M)<infty$. Under broad assumptions, Deville shows that
$$
begin{aligned}
sqrt{n} N^{-alpha}{T(hat{M})-T(M)} &=sqrt{n} N^{-alpha} int I_{T}(M, z) d(hat{M}-M)(z)+o_{p}(1) \
&=sqrt{n} N^{-alpha} sum_{k=1}^{N} u_{k}left(w_{k}-1right)+o_{p}(1)
end{aligned}
$$

The linearized variables $u_{k}$ are the influence functions $I_{T}left(M, z_{k}right)$, where $z_{k}$ is the value of the variable of interest for the $k$ th unit.

This exact equation doesn’t show up in Deville’s 1999 paper as far as I can see. Instead, there is a tantalizingly similar-looking result on page 6.

Result: Under broad assumptions, the substitution estimation of a functional $T(M)$ is linearizable. A linearized variable is $z_{k}=I Tleft(M ; x_{k}right)$ where $I T$ is the influence function of $T$ in $M$.

Proof of the result: Let us provide the space of measurements on $boldsymbol{R}^{q}$ with a metric $d$ accounting for the convergence: $dleft(M_{1}, M_{2}right) rightarrow 0$ if and only if $N^{-1}left(int y d M_{1}-int y d M_{2}right) rightarrow 0$ for any variable of interest $y$. The asymptotic postulates mean that $d(hat{M} / N, M / N)$ tends towards zero. We can visibly ensure that $d(hat{M} / N, M / N)$ is $O_{p}(1 / sqrt{n})$ according to the third postulate. Now, let us assume that $T$ can be derived in accordance with Fréchet, i.e., for any direction of the increase, in the space of "useful" measures provided with the abovementioned metric. Thus we have:
$$
N^{-alpha}(T(hat{M})-T(M))=frac{1}{N} sum_{U} z_{k}left(w_{k}-1right)+oleft(dleft(frac{hat{M}}{M}, frac{M}{N}right)right)
$$

The result is that:
$$
sqrt{n} N^{-alpha}(T(hat{M})-T(M))=frac{sqrt{n}}{N} sum_{U} z_{k}left(w_{k}-1right)+o_{p}(1) .
$$

However, this result in the original 1999 paper is not the same result as the 2009 Biometrika paper. Why does the right-hand side of the Deville 1999 equation use $N^{-1}$, while the right-hand side of the quoted equation in the 2009 paper uses $N^{-alpha}$?

The equation in Deville 1999 doesn’t make sense to me. For example, the mean is a statistic of degree 0 and its influence function is $z_{k}=frac{1}{N}left(y_{k}-bar{Y}right)$, so with the Deville 1999 equation we would end up with the nonsensical result that
$sqrt{n}(hat{bar{Y}} – bar{Y}) = frac{sqrt{n}}{N} left[hat{bar{Y}} – bar{Y}(frac{hat{N}}{N}) right] + o_p(1)$.

And the proof seems to contain some hidden steps. How is that first equation in Deville 1999 derived? It seems to involve the following missing step, but it’s not clear how even this equation would be established.

$$
N^{-alpha}(T(hat{M})-T(M))= N^{-1} int I_{T}(M, z) d(hat{M}-M)(z)+oleft(dleft(frac{hat{M}}{M}, frac{M}{N}right)right)
$$


Get this bounty!!!

#StackBounty: #sampling #asymptotics #measure-theory #survey-sampling #influence-function How is the asymptotic justification of the &q…

Bounty: 50

The survey R package recently adopted the "linearization by influence function" method of estimating covariances between domain estimates. The central paper justifying this method is Deville (1999). I’m trying to understand the main asymptotic claim made in the paper and some confusing aspects of the proof.

The result is summarized concisely by Deville and others in this 2009 Biometrika paper:

In Deville’s approach, a population parameter of interest $Phi$ can be written as a functional $T$ with respect to a finite and discrete measure $M$, namely $Phi=T(M)$. The substitution estimator $hat{Phi}=T(hat{M})$ is the functional $T$ of a random measure $hat{M}$ that is associated with sampling weights $w_{k}, k in U$, and is ‘close’ to $M$. Suppose that $T$ is homogeneous of degree $alpha$, so that $T(r M)=r^{alpha} T(M)$, and $lim _{N rightarrow infty} N^{-alpha} T(M)<infty$. Under broad assumptions, Deville shows that
$$
begin{aligned}
sqrt{n} N^{-alpha}{T(hat{M})-T(M)} &=sqrt{n} N^{-alpha} int I_{T}(M, z) d(hat{M}-M)(z)+o_{p}(1) \
&=sqrt{n} N^{-alpha} sum_{k=1}^{N} u_{k}left(w_{k}-1right)+o_{p}(1)
end{aligned}
$$

The linearized variables $u_{k}$ are the influence functions $I_{T}left(M, z_{k}right)$, where $z_{k}$ is the value of the variable of interest for the $k$ th unit.

This exact equation doesn’t show up in Deville’s 1999 paper as far as I can see. Instead, there is a tantalizingly similar-looking result on page 6.

Result: Under broad assumptions, the substitution estimation of a functional $T(M)$ is linearizable. A linearized variable is $z_{k}=I Tleft(M ; x_{k}right)$ where $I T$ is the influence function of $T$ in $M$.

Proof of the result: Let us provide the space of measurements on $boldsymbol{R}^{q}$ with a metric $d$ accounting for the convergence: $dleft(M_{1}, M_{2}right) rightarrow 0$ if and only if $N^{-1}left(int y d M_{1}-int y d M_{2}right) rightarrow 0$ for any variable of interest $y$. The asymptotic postulates mean that $d(hat{M} / N, M / N)$ tends towards zero. We can visibly ensure that $d(hat{M} / N, M / N)$ is $O_{p}(1 / sqrt{n})$ according to the third postulate. Now, let us assume that $T$ can be derived in accordance with Fréchet, i.e., for any direction of the increase, in the space of "useful" measures provided with the abovementioned metric. Thus we have:
$$
N^{-alpha}(T(hat{M})-T(M))=frac{1}{N} sum_{U} z_{k}left(w_{k}-1right)+oleft(dleft(frac{hat{M}}{M}, frac{M}{N}right)right)
$$

The result is that:
$$
sqrt{n} N^{-alpha}(T(hat{M})-T(M))=frac{sqrt{n}}{N} sum_{U} z_{k}left(w_{k}-1right)+o_{p}(1) .
$$

However, this result in the original 1999 paper is not the same result as the 2009 Biometrika paper. Why does the right-hand side of the Deville 1999 equation use $N^{-1}$, while the right-hand side of the quoted equation in the 2009 paper uses $N^{-alpha}$?

The equation in Deville 1999 doesn’t make sense to me. For example, the mean is a statistic of degree 0 and its influence function is $z_{k}=frac{1}{N}left(y_{k}-bar{Y}right)$, so with the Deville 1999 equation we would end up with the nonsensical result that
$sqrt{n}(hat{bar{Y}} – bar{Y}) = frac{sqrt{n}}{N} left[hat{bar{Y}} – bar{Y}(frac{hat{N}}{N}) right] + o_p(1)$.

And the proof seems to contain some hidden steps. How is that first equation in Deville 1999 derived? It seems to involve the following missing step, but it’s not clear how even this equation would be established.

$$
N^{-alpha}(T(hat{M})-T(M))= N^{-1} int I_{T}(M, z) d(hat{M}-M)(z)+oleft(dleft(frac{hat{M}}{M}, frac{M}{N}right)right)
$$


Get this bounty!!!

#StackBounty: #sampling #asymptotics #measure-theory #survey-sampling #influence-function How is the asymptotic justification of the &q…

Bounty: 50

The survey R package recently adopted the "linearization by influence function" method of estimating covariances between domain estimates. The central paper justifying this method is Deville (1999). I’m trying to understand the main asymptotic claim made in the paper and some confusing aspects of the proof.

The result is summarized concisely by Deville and others in this 2009 Biometrika paper:

In Deville’s approach, a population parameter of interest $Phi$ can be written as a functional $T$ with respect to a finite and discrete measure $M$, namely $Phi=T(M)$. The substitution estimator $hat{Phi}=T(hat{M})$ is the functional $T$ of a random measure $hat{M}$ that is associated with sampling weights $w_{k}, k in U$, and is ‘close’ to $M$. Suppose that $T$ is homogeneous of degree $alpha$, so that $T(r M)=r^{alpha} T(M)$, and $lim _{N rightarrow infty} N^{-alpha} T(M)<infty$. Under broad assumptions, Deville shows that
$$
begin{aligned}
sqrt{n} N^{-alpha}{T(hat{M})-T(M)} &=sqrt{n} N^{-alpha} int I_{T}(M, z) d(hat{M}-M)(z)+o_{p}(1) \
&=sqrt{n} N^{-alpha} sum_{k=1}^{N} u_{k}left(w_{k}-1right)+o_{p}(1)
end{aligned}
$$

The linearized variables $u_{k}$ are the influence functions $I_{T}left(M, z_{k}right)$, where $z_{k}$ is the value of the variable of interest for the $k$ th unit.

This exact equation doesn’t show up in Deville’s 1999 paper as far as I can see. Instead, there is a tantalizingly similar-looking result on page 6.

Result: Under broad assumptions, the substitution estimation of a functional $T(M)$ is linearizable. A linearized variable is $z_{k}=I Tleft(M ; x_{k}right)$ where $I T$ is the influence function of $T$ in $M$.

Proof of the result: Let us provide the space of measurements on $boldsymbol{R}^{q}$ with a metric $d$ accounting for the convergence: $dleft(M_{1}, M_{2}right) rightarrow 0$ if and only if $N^{-1}left(int y d M_{1}-int y d M_{2}right) rightarrow 0$ for any variable of interest $y$. The asymptotic postulates mean that $d(hat{M} / N, M / N)$ tends towards zero. We can visibly ensure that $d(hat{M} / N, M / N)$ is $O_{p}(1 / sqrt{n})$ according to the third postulate. Now, let us assume that $T$ can be derived in accordance with Fréchet, i.e., for any direction of the increase, in the space of "useful" measures provided with the abovementioned metric. Thus we have:
$$
N^{-alpha}(T(hat{M})-T(M))=frac{1}{N} sum_{U} z_{k}left(w_{k}-1right)+oleft(dleft(frac{hat{M}}{M}, frac{M}{N}right)right)
$$

The result is that:
$$
sqrt{n} N^{-alpha}(T(hat{M})-T(M))=frac{sqrt{n}}{N} sum_{U} z_{k}left(w_{k}-1right)+o_{p}(1) .
$$

However, this result in the original 1999 paper is not the same result as the 2009 Biometrika paper. Why does the right-hand side of the Deville 1999 equation use $N^{-1}$, while the right-hand side of the quoted equation in the 2009 paper uses $N^{-alpha}$?

The equation in Deville 1999 doesn’t make sense to me. For example, the mean is a statistic of degree 0 and its influence function is $z_{k}=frac{1}{N}left(y_{k}-bar{Y}right)$, so with the Deville 1999 equation we would end up with the nonsensical result that
$sqrt{n}(hat{bar{Y}} – bar{Y}) = frac{sqrt{n}}{N} left[hat{bar{Y}} – bar{Y}(frac{hat{N}}{N}) right] + o_p(1)$.

And the proof seems to contain some hidden steps. How is that first equation in Deville 1999 derived? It seems to involve the following missing step, but it’s not clear how even this equation would be established.

$$
N^{-alpha}(T(hat{M})-T(M))= N^{-1} int I_{T}(M, z) d(hat{M}-M)(z)+oleft(dleft(frac{hat{M}}{M}, frac{M}{N}right)right)
$$


Get this bounty!!!

#StackBounty: #sampling #asymptotics #measure-theory #survey-sampling #influence-function How is the asymptotic justification of the &q…

Bounty: 50

The survey R package recently adopted the "linearization by influence function" method of estimating covariances between domain estimates. The central paper justifying this method is Deville (1999). I’m trying to understand the main asymptotic claim made in the paper and some confusing aspects of the proof.

The result is summarized concisely by Deville and others in this 2009 Biometrika paper:

In Deville’s approach, a population parameter of interest $Phi$ can be written as a functional $T$ with respect to a finite and discrete measure $M$, namely $Phi=T(M)$. The substitution estimator $hat{Phi}=T(hat{M})$ is the functional $T$ of a random measure $hat{M}$ that is associated with sampling weights $w_{k}, k in U$, and is ‘close’ to $M$. Suppose that $T$ is homogeneous of degree $alpha$, so that $T(r M)=r^{alpha} T(M)$, and $lim _{N rightarrow infty} N^{-alpha} T(M)<infty$. Under broad assumptions, Deville shows that
$$
begin{aligned}
sqrt{n} N^{-alpha}{T(hat{M})-T(M)} &=sqrt{n} N^{-alpha} int I_{T}(M, z) d(hat{M}-M)(z)+o_{p}(1) \
&=sqrt{n} N^{-alpha} sum_{k=1}^{N} u_{k}left(w_{k}-1right)+o_{p}(1)
end{aligned}
$$

The linearized variables $u_{k}$ are the influence functions $I_{T}left(M, z_{k}right)$, where $z_{k}$ is the value of the variable of interest for the $k$ th unit.

This exact equation doesn’t show up in Deville’s 1999 paper as far as I can see. Instead, there is a tantalizingly similar-looking result on page 6.

Result: Under broad assumptions, the substitution estimation of a functional $T(M)$ is linearizable. A linearized variable is $z_{k}=I Tleft(M ; x_{k}right)$ where $I T$ is the influence function of $T$ in $M$.

Proof of the result: Let us provide the space of measurements on $boldsymbol{R}^{q}$ with a metric $d$ accounting for the convergence: $dleft(M_{1}, M_{2}right) rightarrow 0$ if and only if $N^{-1}left(int y d M_{1}-int y d M_{2}right) rightarrow 0$ for any variable of interest $y$. The asymptotic postulates mean that $d(hat{M} / N, M / N)$ tends towards zero. We can visibly ensure that $d(hat{M} / N, M / N)$ is $O_{p}(1 / sqrt{n})$ according to the third postulate. Now, let us assume that $T$ can be derived in accordance with Fréchet, i.e., for any direction of the increase, in the space of "useful" measures provided with the abovementioned metric. Thus we have:
$$
N^{-alpha}(T(hat{M})-T(M))=frac{1}{N} sum_{U} z_{k}left(w_{k}-1right)+oleft(dleft(frac{hat{M}}{M}, frac{M}{N}right)right)
$$

The result is that:
$$
sqrt{n} N^{-alpha}(T(hat{M})-T(M))=frac{sqrt{n}}{N} sum_{U} z_{k}left(w_{k}-1right)+o_{p}(1) .
$$

However, this result in the original 1999 paper is not the same result as the 2009 Biometrika paper. Why does the right-hand side of the Deville 1999 equation use $N^{-1}$, while the right-hand side of the quoted equation in the 2009 paper uses $N^{-alpha}$?

The equation in Deville 1999 doesn’t make sense to me. For example, the mean is a statistic of degree 0 and its influence function is $z_{k}=frac{1}{N}left(y_{k}-bar{Y}right)$, so with the Deville 1999 equation we would end up with the nonsensical result that
$sqrt{n}(hat{bar{Y}} – bar{Y}) = frac{sqrt{n}}{N} left[hat{bar{Y}} – bar{Y}(frac{hat{N}}{N}) right] + o_p(1)$.

And the proof seems to contain some hidden steps. How is that first equation in Deville 1999 derived? It seems to involve the following missing step, but it’s not clear how even this equation would be established.

$$
N^{-alpha}(T(hat{M})-T(M))= N^{-1} int I_{T}(M, z) d(hat{M}-M)(z)+oleft(dleft(frac{hat{M}}{M}, frac{M}{N}right)right)
$$


Get this bounty!!!

#StackBounty: #sampling #asymptotics #measure-theory #survey-sampling #influence-function How is the asymptotic justification of the &q…

Bounty: 50

The survey R package recently adopted the "linearization by influence function" method of estimating covariances between domain estimates. The central paper justifying this method is Deville (1999). I’m trying to understand the main asymptotic claim made in the paper and some confusing aspects of the proof.

The result is summarized concisely by Deville and others in this 2009 Biometrika paper:

In Deville’s approach, a population parameter of interest $Phi$ can be written as a functional $T$ with respect to a finite and discrete measure $M$, namely $Phi=T(M)$. The substitution estimator $hat{Phi}=T(hat{M})$ is the functional $T$ of a random measure $hat{M}$ that is associated with sampling weights $w_{k}, k in U$, and is ‘close’ to $M$. Suppose that $T$ is homogeneous of degree $alpha$, so that $T(r M)=r^{alpha} T(M)$, and $lim _{N rightarrow infty} N^{-alpha} T(M)<infty$. Under broad assumptions, Deville shows that
$$
begin{aligned}
sqrt{n} N^{-alpha}{T(hat{M})-T(M)} &=sqrt{n} N^{-alpha} int I_{T}(M, z) d(hat{M}-M)(z)+o_{p}(1) \
&=sqrt{n} N^{-alpha} sum_{k=1}^{N} u_{k}left(w_{k}-1right)+o_{p}(1)
end{aligned}
$$

The linearized variables $u_{k}$ are the influence functions $I_{T}left(M, z_{k}right)$, where $z_{k}$ is the value of the variable of interest for the $k$ th unit.

This exact equation doesn’t show up in Deville’s 1999 paper as far as I can see. Instead, there is a tantalizingly similar-looking result on page 6.

Result: Under broad assumptions, the substitution estimation of a functional $T(M)$ is linearizable. A linearized variable is $z_{k}=I Tleft(M ; x_{k}right)$ where $I T$ is the influence function of $T$ in $M$.

Proof of the result: Let us provide the space of measurements on $boldsymbol{R}^{q}$ with a metric $d$ accounting for the convergence: $dleft(M_{1}, M_{2}right) rightarrow 0$ if and only if $N^{-1}left(int y d M_{1}-int y d M_{2}right) rightarrow 0$ for any variable of interest $y$. The asymptotic postulates mean that $d(hat{M} / N, M / N)$ tends towards zero. We can visibly ensure that $d(hat{M} / N, M / N)$ is $O_{p}(1 / sqrt{n})$ according to the third postulate. Now, let us assume that $T$ can be derived in accordance with Fréchet, i.e., for any direction of the increase, in the space of "useful" measures provided with the abovementioned metric. Thus we have:
$$
N^{-alpha}(T(hat{M})-T(M))=frac{1}{N} sum_{U} z_{k}left(w_{k}-1right)+oleft(dleft(frac{hat{M}}{M}, frac{M}{N}right)right)
$$

The result is that:
$$
sqrt{n} N^{-alpha}(T(hat{M})-T(M))=frac{sqrt{n}}{N} sum_{U} z_{k}left(w_{k}-1right)+o_{p}(1) .
$$

However, this result in the original 1999 paper is not the same result as the 2009 Biometrika paper. Why does the right-hand side of the Deville 1999 equation use $N^{-1}$, while the right-hand side of the quoted equation in the 2009 paper uses $N^{-alpha}$?

The equation in Deville 1999 doesn’t make sense to me. For example, the mean is a statistic of degree 0 and its influence function is $z_{k}=frac{1}{N}left(y_{k}-bar{Y}right)$, so with the Deville 1999 equation we would end up with the nonsensical result that
$sqrt{n}(hat{bar{Y}} – bar{Y}) = frac{sqrt{n}}{N} left[hat{bar{Y}} – bar{Y}(frac{hat{N}}{N}) right] + o_p(1)$.

And the proof seems to contain some hidden steps. How is that first equation in Deville 1999 derived? It seems to involve the following missing step, but it’s not clear how even this equation would be established.

$$
N^{-alpha}(T(hat{M})-T(M))= N^{-1} int I_{T}(M, z) d(hat{M}-M)(z)+oleft(dleft(frac{hat{M}}{M}, frac{M}{N}right)right)
$$


Get this bounty!!!

#StackBounty: #sampling #asymptotics #measure-theory #survey-sampling #influence-function How is the asymptotic justification of the &q…

Bounty: 50

The survey R package recently adopted the "linearization by influence function" method of estimating covariances between domain estimates. The central paper justifying this method is Deville (1999). I’m trying to understand the main asymptotic claim made in the paper and some confusing aspects of the proof.

The result is summarized concisely by Deville and others in this 2009 Biometrika paper:

In Deville’s approach, a population parameter of interest $Phi$ can be written as a functional $T$ with respect to a finite and discrete measure $M$, namely $Phi=T(M)$. The substitution estimator $hat{Phi}=T(hat{M})$ is the functional $T$ of a random measure $hat{M}$ that is associated with sampling weights $w_{k}, k in U$, and is ‘close’ to $M$. Suppose that $T$ is homogeneous of degree $alpha$, so that $T(r M)=r^{alpha} T(M)$, and $lim _{N rightarrow infty} N^{-alpha} T(M)<infty$. Under broad assumptions, Deville shows that
$$
begin{aligned}
sqrt{n} N^{-alpha}{T(hat{M})-T(M)} &=sqrt{n} N^{-alpha} int I_{T}(M, z) d(hat{M}-M)(z)+o_{p}(1) \
&=sqrt{n} N^{-alpha} sum_{k=1}^{N} u_{k}left(w_{k}-1right)+o_{p}(1)
end{aligned}
$$

The linearized variables $u_{k}$ are the influence functions $I_{T}left(M, z_{k}right)$, where $z_{k}$ is the value of the variable of interest for the $k$ th unit.

This exact equation doesn’t show up in Deville’s 1999 paper as far as I can see. Instead, there is a tantalizingly similar-looking result on page 6.

Result: Under broad assumptions, the substitution estimation of a functional $T(M)$ is linearizable. A linearized variable is $z_{k}=I Tleft(M ; x_{k}right)$ where $I T$ is the influence function of $T$ in $M$.

Proof of the result: Let us provide the space of measurements on $boldsymbol{R}^{q}$ with a metric $d$ accounting for the convergence: $dleft(M_{1}, M_{2}right) rightarrow 0$ if and only if $N^{-1}left(int y d M_{1}-int y d M_{2}right) rightarrow 0$ for any variable of interest $y$. The asymptotic postulates mean that $d(hat{M} / N, M / N)$ tends towards zero. We can visibly ensure that $d(hat{M} / N, M / N)$ is $O_{p}(1 / sqrt{n})$ according to the third postulate. Now, let us assume that $T$ can be derived in accordance with Fréchet, i.e., for any direction of the increase, in the space of "useful" measures provided with the abovementioned metric. Thus we have:
$$
N^{-alpha}(T(hat{M})-T(M))=frac{1}{N} sum_{U} z_{k}left(w_{k}-1right)+oleft(dleft(frac{hat{M}}{M}, frac{M}{N}right)right)
$$

The result is that:
$$
sqrt{n} N^{-alpha}(T(hat{M})-T(M))=frac{sqrt{n}}{N} sum_{U} z_{k}left(w_{k}-1right)+o_{p}(1) .
$$

However, this result in the original 1999 paper is not the same result as the 2009 Biometrika paper. Why does the right-hand side of the Deville 1999 equation use $N^{-1}$, while the right-hand side of the quoted equation in the 2009 paper uses $N^{-alpha}$?

The equation in Deville 1999 doesn’t make sense to me. For example, the mean is a statistic of degree 0 and its influence function is $z_{k}=frac{1}{N}left(y_{k}-bar{Y}right)$, so with the Deville 1999 equation we would end up with the nonsensical result that
$sqrt{n}(hat{bar{Y}} – bar{Y}) = frac{sqrt{n}}{N} left[hat{bar{Y}} – bar{Y}(frac{hat{N}}{N}) right] + o_p(1)$.

And the proof seems to contain some hidden steps. How is that first equation in Deville 1999 derived? It seems to involve the following missing step, but it’s not clear how even this equation would be established.

$$
N^{-alpha}(T(hat{M})-T(M))= N^{-1} int I_{T}(M, z) d(hat{M}-M)(z)+oleft(dleft(frac{hat{M}}{M}, frac{M}{N}right)right)
$$


Get this bounty!!!

#StackBounty: #sampling #asymptotics #measure-theory #survey-sampling #influence-function How is the asymptotic justification of the &q…

Bounty: 50

The survey R package recently adopted the "linearization by influence function" method of estimating covariances between domain estimates. The central paper justifying this method is Deville (1999). I’m trying to understand the main asymptotic claim made in the paper and some confusing aspects of the proof.

The result is summarized concisely by Deville and others in this 2009 Biometrika paper:

In Deville’s approach, a population parameter of interest $Phi$ can be written as a functional $T$ with respect to a finite and discrete measure $M$, namely $Phi=T(M)$. The substitution estimator $hat{Phi}=T(hat{M})$ is the functional $T$ of a random measure $hat{M}$ that is associated with sampling weights $w_{k}, k in U$, and is ‘close’ to $M$. Suppose that $T$ is homogeneous of degree $alpha$, so that $T(r M)=r^{alpha} T(M)$, and $lim _{N rightarrow infty} N^{-alpha} T(M)<infty$. Under broad assumptions, Deville shows that
$$
begin{aligned}
sqrt{n} N^{-alpha}{T(hat{M})-T(M)} &=sqrt{n} N^{-alpha} int I_{T}(M, z) d(hat{M}-M)(z)+o_{p}(1) \
&=sqrt{n} N^{-alpha} sum_{k=1}^{N} u_{k}left(w_{k}-1right)+o_{p}(1)
end{aligned}
$$

The linearized variables $u_{k}$ are the influence functions $I_{T}left(M, z_{k}right)$, where $z_{k}$ is the value of the variable of interest for the $k$ th unit.

This exact equation doesn’t show up in Deville’s 1999 paper as far as I can see. Instead, there is a tantalizingly similar-looking result on page 6.

Result: Under broad assumptions, the substitution estimation of a functional $T(M)$ is linearizable. A linearized variable is $z_{k}=I Tleft(M ; x_{k}right)$ where $I T$ is the influence function of $T$ in $M$.

Proof of the result: Let us provide the space of measurements on $boldsymbol{R}^{q}$ with a metric $d$ accounting for the convergence: $dleft(M_{1}, M_{2}right) rightarrow 0$ if and only if $N^{-1}left(int y d M_{1}-int y d M_{2}right) rightarrow 0$ for any variable of interest $y$. The asymptotic postulates mean that $d(hat{M} / N, M / N)$ tends towards zero. We can visibly ensure that $d(hat{M} / N, M / N)$ is $O_{p}(1 / sqrt{n})$ according to the third postulate. Now, let us assume that $T$ can be derived in accordance with Fréchet, i.e., for any direction of the increase, in the space of "useful" measures provided with the abovementioned metric. Thus we have:
$$
N^{-alpha}(T(hat{M})-T(M))=frac{1}{N} sum_{U} z_{k}left(w_{k}-1right)+oleft(dleft(frac{hat{M}}{M}, frac{M}{N}right)right)
$$

The result is that:
$$
sqrt{n} N^{-alpha}(T(hat{M})-T(M))=frac{sqrt{n}}{N} sum_{U} z_{k}left(w_{k}-1right)+o_{p}(1) .
$$

However, this result in the original 1999 paper is not the same result as the 2009 Biometrika paper. Why does the right-hand side of the Deville 1999 equation use $N^{-1}$, while the right-hand side of the quoted equation in the 2009 paper uses $N^{-alpha}$?

The equation in Deville 1999 doesn’t make sense to me. For example, the mean is a statistic of degree 0 and its influence function is $z_{k}=frac{1}{N}left(y_{k}-bar{Y}right)$, so with the Deville 1999 equation we would end up with the nonsensical result that
$sqrt{n}(hat{bar{Y}} – bar{Y}) = frac{sqrt{n}}{N} left[hat{bar{Y}} – bar{Y}(frac{hat{N}}{N}) right] + o_p(1)$.

And the proof seems to contain some hidden steps. How is that first equation in Deville 1999 derived? It seems to involve the following missing step, but it’s not clear how even this equation would be established.

$$
N^{-alpha}(T(hat{M})-T(M))= N^{-1} int I_{T}(M, z) d(hat{M}-M)(z)+oleft(dleft(frac{hat{M}}{M}, frac{M}{N}right)right)
$$


Get this bounty!!!