#StackBounty: #r #mathematical-statistics #variance #sampling #mean Right way to compute mean and variance

Bounty: 50

1.If I take as definition of $a_{lm}$ following a normal distribution with mean equal to zero and $C_ell=langle a_{lm}^2 rangle=text{Var}(a_{lm})$, and taking the following random variable $Z$ defined by this expression :

$$begin{aligned}
Z = sum_{ell=ell_{min}}^{l_{max}} sum_{m=-ell}^{ell} a_{ell m}^{2}
end{aligned}$$

Then, the goal is to compute $langle Zrangle$ :

If I consider the random variable $Y=sum_{m=-ell}^{ell} C_ell bigg(dfrac{a_{ell m}}{sqrt{C_ell}}bigg)^{2}
$
, this random variable $Y$ follows a $chi^2(1)$ distribution weighted by $C_ell$.

  1. Can I write from this that mean of $Y$ is equal to :

$$langle Yrangle =langlebigg(sum_{m=-ell}^{ell} a_{ell m}^{2}bigg)rangle = (2ell+1),C_ell$$

??

and so, we would have :

$$langle Zrangle = sum_{ell=ell_{min}}^{ell_{max}},C_ell,(2ell+1)$$

?? I have serious doubts since the $a_{lm}$ doesn’t follow a reduced Normal distribution $mathcal{N}(0,1)$.

Shouldn’t it be rather :

$$begin{align}
Z&equiv sum_{ell=ell_{min}}^{ell_{max}} sum_{m=-ell}^ell a_{ell,m}^2 [6pt]
&= sum_{ell=ell_{min}}^{ell_{max}} sum_{m=-ell}^ell C_ell cdot bigg( frac{a_{ell,m}}{sqrt{C_ell}} bigg)^2 [6pt]
&sim sum_{ell=ell_{min}}^{ell_{max}} sum_{m=-ell}^ell C_ell cdot text{ChiSq}(1) [6pt]
&= sum_{ell=ell_{min}}^{ell_{max}} C_ell sum_{m=-ell}^ell text{ChiSq}(1) [6pt]
&= sum_{ell=ell_{min}}^{ell_{max}} C_ell cdot text{ChiSq}(2 ell + 1). [6pt]
end{align}$$

  1. Now, I want to calculate the mean $langle Zrangle$ of $Z$ :

Do you agree that my case here is the computation of a mean for a weighted sum of $chi^2$ ?

So the computation is not trivial, isn’t it ? Maybe I could compute the mean by starting from analytical :

$$langle Zrangle=sum_{ell=ell_{min}}^{ell_{max}} C_ell (2ell + 1)quad(1)$$

and directly doing the numerical computation :

$$langle Zrangle=sum_{i=1}^{N} C_{ell_{i}} (2ell_{i} + 1)quad(2)$$

  1. What do you think about this direct computation, is it correct ?

I make confusions between $(1)$ and $(2)$ above since there is each $C_ell$ corresponds to each $ell$ (I mean on a numerically point of view, each $C_{ell_{i}}$ value is associated to a $ell_{i}$ value)

  1. If the direct computation $langle Zrangle=sum_{i=1}^{N} C_{ell_{i}} (2ell_{i} + 1)$ not correct, then I have to consider random variable $Z$ following a weighted sum of different chisquared distrbutions :

I have tried with following R script where nRed is one of the 5 bins considered and nRow the number of values for $ell$ (from $ell_{min}$ to $ell_{max}$), and also the Cl_sp[,i] the vector of nRow values of $C_ell$ for each bin $i$ taken into acccount.

   # Number of bin
   nRed <- 5
    
   # Number of rows
   nRow <- 36
    
   # Size of sample
   nSample_var <- 1000
    
   # NRow values of multipoles l
   L <- 2*(array_2D[,1])+1
    
   # Weighted sum of Chi squared distribution
   y3_1<-array(0,dim=c(nSample_var,nRed))
      for (i in 1:nRed) {
        for (j in 1:nRow) { 
          y3_1[,i] <- y3_1[,i] + Cl_sp[j,i] * rchisq(nSample_var,df=L[j])
        }
      } 
    
   # Print the mean of Z for each bib
   for (i in 1:nRed) {
     print(paste0('red=',i,'mean_exp = ', mean(y3[,i])))
   }
  1. Is it the right thing to implement to compute the mean of $Z$ if I can’t compute it analytically (see expression $(2)$ above).

I would like to compute also the variance of $Z$, maybe a simple adding in my R script like :

# Print the variance of Z for each bin
for (i in 1:nRed) {
  print(paste0('red=',i,'mean_exp = ', var(y3[,i])))
}

should be enough. What do you think about this ?


Get this bounty!!!

#StackBounty: #sampling #asymptotics #measure-theory #survey-sampling #influence-function How is the asymptotic justification of the &q…

Bounty: 50

The survey R package recently adopted the "linearization by influence function" method of estimating covariances between domain estimates. The central paper justifying this method is Deville (1999). I’m trying to understand the main asymptotic claim made in the paper and some confusing aspects of the proof.

The result is summarized concisely by Deville and others in this 2009 Biometrika paper:

In Deville’s approach, a population parameter of interest $Phi$ can be written as a functional $T$ with respect to a finite and discrete measure $M$, namely $Phi=T(M)$. The substitution estimator $hat{Phi}=T(hat{M})$ is the functional $T$ of a random measure $hat{M}$ that is associated with sampling weights $w_{k}, k in U$, and is ‘close’ to $M$. Suppose that $T$ is homogeneous of degree $alpha$, so that $T(r M)=r^{alpha} T(M)$, and $lim _{N rightarrow infty} N^{-alpha} T(M)<infty$. Under broad assumptions, Deville shows that
$$
begin{aligned}
sqrt{n} N^{-alpha}{T(hat{M})-T(M)} &=sqrt{n} N^{-alpha} int I_{T}(M, z) d(hat{M}-M)(z)+o_{p}(1) \
&=sqrt{n} N^{-alpha} sum_{k=1}^{N} u_{k}left(w_{k}-1right)+o_{p}(1)
end{aligned}
$$

The linearized variables $u_{k}$ are the influence functions $I_{T}left(M, z_{k}right)$, where $z_{k}$ is the value of the variable of interest for the $k$ th unit.

This exact equation doesn’t show up in Deville’s 1999 paper as far as I can see. Instead, there is a tantalizingly similar-looking result on page 6.

Result: Under broad assumptions, the substitution estimation of a functional $T(M)$ is linearizable. A linearized variable is $z_{k}=I Tleft(M ; x_{k}right)$ where $I T$ is the influence function of $T$ in $M$.

Proof of the result: Let us provide the space of measurements on $boldsymbol{R}^{q}$ with a metric $d$ accounting for the convergence: $dleft(M_{1}, M_{2}right) rightarrow 0$ if and only if $N^{-1}left(int y d M_{1}-int y d M_{2}right) rightarrow 0$ for any variable of interest $y$. The asymptotic postulates mean that $d(hat{M} / N, M / N)$ tends towards zero. We can visibly ensure that $d(hat{M} / N, M / N)$ is $O_{p}(1 / sqrt{n})$ according to the third postulate. Now, let us assume that $T$ can be derived in accordance with Fréchet, i.e., for any direction of the increase, in the space of "useful" measures provided with the abovementioned metric. Thus we have:
$$
N^{-alpha}(T(hat{M})-T(M))=frac{1}{N} sum_{U} z_{k}left(w_{k}-1right)+oleft(dleft(frac{hat{M}}{M}, frac{M}{N}right)right)
$$

The result is that:
$$
sqrt{n} N^{-alpha}(T(hat{M})-T(M))=frac{sqrt{n}}{N} sum_{U} z_{k}left(w_{k}-1right)+o_{p}(1) .
$$

However, this result in the original 1999 paper is not the same result as the 2009 Biometrika paper. Why does the right-hand side of the Deville 1999 equation use $N^{-1}$, while the right-hand side of the quoted equation in the 2009 paper uses $N^{-alpha}$?

The equation in Deville 1999 doesn’t make sense to me. For example, the mean is a statistic of degree 0 and its influence function is $z_{k}=frac{1}{N}left(y_{k}-bar{Y}right)$, so with the Deville 1999 equation we would end up with the nonsensical result that
$sqrt{n}(hat{bar{Y}} – bar{Y}) = frac{sqrt{n}}{N} left[hat{bar{Y}} – bar{Y}(frac{hat{N}}{N}) right] + o_p(1)$.

And the proof seems to contain some hidden steps. How is that first equation in Deville 1999 derived? It seems to involve the following missing step, but it’s not clear how even this equation would be established.

$$
N^{-alpha}(T(hat{M})-T(M))= N^{-1} int I_{T}(M, z) d(hat{M}-M)(z)+oleft(dleft(frac{hat{M}}{M}, frac{M}{N}right)right)
$$


Get this bounty!!!

#StackBounty: #sampling #asymptotics #measure-theory #survey-sampling #influence-function How is the asymptotic justification of the &q…

Bounty: 50

The survey R package recently adopted the "linearization by influence function" method of estimating covariances between domain estimates. The central paper justifying this method is Deville (1999). I’m trying to understand the main asymptotic claim made in the paper and some confusing aspects of the proof.

The result is summarized concisely by Deville and others in this 2009 Biometrika paper:

In Deville’s approach, a population parameter of interest $Phi$ can be written as a functional $T$ with respect to a finite and discrete measure $M$, namely $Phi=T(M)$. The substitution estimator $hat{Phi}=T(hat{M})$ is the functional $T$ of a random measure $hat{M}$ that is associated with sampling weights $w_{k}, k in U$, and is ‘close’ to $M$. Suppose that $T$ is homogeneous of degree $alpha$, so that $T(r M)=r^{alpha} T(M)$, and $lim _{N rightarrow infty} N^{-alpha} T(M)<infty$. Under broad assumptions, Deville shows that
$$
begin{aligned}
sqrt{n} N^{-alpha}{T(hat{M})-T(M)} &=sqrt{n} N^{-alpha} int I_{T}(M, z) d(hat{M}-M)(z)+o_{p}(1) \
&=sqrt{n} N^{-alpha} sum_{k=1}^{N} u_{k}left(w_{k}-1right)+o_{p}(1)
end{aligned}
$$

The linearized variables $u_{k}$ are the influence functions $I_{T}left(M, z_{k}right)$, where $z_{k}$ is the value of the variable of interest for the $k$ th unit.

This exact equation doesn’t show up in Deville’s 1999 paper as far as I can see. Instead, there is a tantalizingly similar-looking result on page 6.

Result: Under broad assumptions, the substitution estimation of a functional $T(M)$ is linearizable. A linearized variable is $z_{k}=I Tleft(M ; x_{k}right)$ where $I T$ is the influence function of $T$ in $M$.

Proof of the result: Let us provide the space of measurements on $boldsymbol{R}^{q}$ with a metric $d$ accounting for the convergence: $dleft(M_{1}, M_{2}right) rightarrow 0$ if and only if $N^{-1}left(int y d M_{1}-int y d M_{2}right) rightarrow 0$ for any variable of interest $y$. The asymptotic postulates mean that $d(hat{M} / N, M / N)$ tends towards zero. We can visibly ensure that $d(hat{M} / N, M / N)$ is $O_{p}(1 / sqrt{n})$ according to the third postulate. Now, let us assume that $T$ can be derived in accordance with Fréchet, i.e., for any direction of the increase, in the space of "useful" measures provided with the abovementioned metric. Thus we have:
$$
N^{-alpha}(T(hat{M})-T(M))=frac{1}{N} sum_{U} z_{k}left(w_{k}-1right)+oleft(dleft(frac{hat{M}}{M}, frac{M}{N}right)right)
$$

The result is that:
$$
sqrt{n} N^{-alpha}(T(hat{M})-T(M))=frac{sqrt{n}}{N} sum_{U} z_{k}left(w_{k}-1right)+o_{p}(1) .
$$

However, this result in the original 1999 paper is not the same result as the 2009 Biometrika paper. Why does the right-hand side of the Deville 1999 equation use $N^{-1}$, while the right-hand side of the quoted equation in the 2009 paper uses $N^{-alpha}$?

The equation in Deville 1999 doesn’t make sense to me. For example, the mean is a statistic of degree 0 and its influence function is $z_{k}=frac{1}{N}left(y_{k}-bar{Y}right)$, so with the Deville 1999 equation we would end up with the nonsensical result that
$sqrt{n}(hat{bar{Y}} – bar{Y}) = frac{sqrt{n}}{N} left[hat{bar{Y}} – bar{Y}(frac{hat{N}}{N}) right] + o_p(1)$.

And the proof seems to contain some hidden steps. How is that first equation in Deville 1999 derived? It seems to involve the following missing step, but it’s not clear how even this equation would be established.

$$
N^{-alpha}(T(hat{M})-T(M))= N^{-1} int I_{T}(M, z) d(hat{M}-M)(z)+oleft(dleft(frac{hat{M}}{M}, frac{M}{N}right)right)
$$


Get this bounty!!!

#StackBounty: #sampling #asymptotics #measure-theory #survey-sampling #influence-function How is the asymptotic justification of the &q…

Bounty: 50

The survey R package recently adopted the "linearization by influence function" method of estimating covariances between domain estimates. The central paper justifying this method is Deville (1999). I’m trying to understand the main asymptotic claim made in the paper and some confusing aspects of the proof.

The result is summarized concisely by Deville and others in this 2009 Biometrika paper:

In Deville’s approach, a population parameter of interest $Phi$ can be written as a functional $T$ with respect to a finite and discrete measure $M$, namely $Phi=T(M)$. The substitution estimator $hat{Phi}=T(hat{M})$ is the functional $T$ of a random measure $hat{M}$ that is associated with sampling weights $w_{k}, k in U$, and is ‘close’ to $M$. Suppose that $T$ is homogeneous of degree $alpha$, so that $T(r M)=r^{alpha} T(M)$, and $lim _{N rightarrow infty} N^{-alpha} T(M)<infty$. Under broad assumptions, Deville shows that
$$
begin{aligned}
sqrt{n} N^{-alpha}{T(hat{M})-T(M)} &=sqrt{n} N^{-alpha} int I_{T}(M, z) d(hat{M}-M)(z)+o_{p}(1) \
&=sqrt{n} N^{-alpha} sum_{k=1}^{N} u_{k}left(w_{k}-1right)+o_{p}(1)
end{aligned}
$$

The linearized variables $u_{k}$ are the influence functions $I_{T}left(M, z_{k}right)$, where $z_{k}$ is the value of the variable of interest for the $k$ th unit.

This exact equation doesn’t show up in Deville’s 1999 paper as far as I can see. Instead, there is a tantalizingly similar-looking result on page 6.

Result: Under broad assumptions, the substitution estimation of a functional $T(M)$ is linearizable. A linearized variable is $z_{k}=I Tleft(M ; x_{k}right)$ where $I T$ is the influence function of $T$ in $M$.

Proof of the result: Let us provide the space of measurements on $boldsymbol{R}^{q}$ with a metric $d$ accounting for the convergence: $dleft(M_{1}, M_{2}right) rightarrow 0$ if and only if $N^{-1}left(int y d M_{1}-int y d M_{2}right) rightarrow 0$ for any variable of interest $y$. The asymptotic postulates mean that $d(hat{M} / N, M / N)$ tends towards zero. We can visibly ensure that $d(hat{M} / N, M / N)$ is $O_{p}(1 / sqrt{n})$ according to the third postulate. Now, let us assume that $T$ can be derived in accordance with Fréchet, i.e., for any direction of the increase, in the space of "useful" measures provided with the abovementioned metric. Thus we have:
$$
N^{-alpha}(T(hat{M})-T(M))=frac{1}{N} sum_{U} z_{k}left(w_{k}-1right)+oleft(dleft(frac{hat{M}}{M}, frac{M}{N}right)right)
$$

The result is that:
$$
sqrt{n} N^{-alpha}(T(hat{M})-T(M))=frac{sqrt{n}}{N} sum_{U} z_{k}left(w_{k}-1right)+o_{p}(1) .
$$

However, this result in the original 1999 paper is not the same result as the 2009 Biometrika paper. Why does the right-hand side of the Deville 1999 equation use $N^{-1}$, while the right-hand side of the quoted equation in the 2009 paper uses $N^{-alpha}$?

The equation in Deville 1999 doesn’t make sense to me. For example, the mean is a statistic of degree 0 and its influence function is $z_{k}=frac{1}{N}left(y_{k}-bar{Y}right)$, so with the Deville 1999 equation we would end up with the nonsensical result that
$sqrt{n}(hat{bar{Y}} – bar{Y}) = frac{sqrt{n}}{N} left[hat{bar{Y}} – bar{Y}(frac{hat{N}}{N}) right] + o_p(1)$.

And the proof seems to contain some hidden steps. How is that first equation in Deville 1999 derived? It seems to involve the following missing step, but it’s not clear how even this equation would be established.

$$
N^{-alpha}(T(hat{M})-T(M))= N^{-1} int I_{T}(M, z) d(hat{M}-M)(z)+oleft(dleft(frac{hat{M}}{M}, frac{M}{N}right)right)
$$


Get this bounty!!!

#StackBounty: #sampling #asymptotics #measure-theory #survey-sampling #influence-function How is the asymptotic justification of the &q…

Bounty: 50

The survey R package recently adopted the "linearization by influence function" method of estimating covariances between domain estimates. The central paper justifying this method is Deville (1999). I’m trying to understand the main asymptotic claim made in the paper and some confusing aspects of the proof.

The result is summarized concisely by Deville and others in this 2009 Biometrika paper:

In Deville’s approach, a population parameter of interest $Phi$ can be written as a functional $T$ with respect to a finite and discrete measure $M$, namely $Phi=T(M)$. The substitution estimator $hat{Phi}=T(hat{M})$ is the functional $T$ of a random measure $hat{M}$ that is associated with sampling weights $w_{k}, k in U$, and is ‘close’ to $M$. Suppose that $T$ is homogeneous of degree $alpha$, so that $T(r M)=r^{alpha} T(M)$, and $lim _{N rightarrow infty} N^{-alpha} T(M)<infty$. Under broad assumptions, Deville shows that
$$
begin{aligned}
sqrt{n} N^{-alpha}{T(hat{M})-T(M)} &=sqrt{n} N^{-alpha} int I_{T}(M, z) d(hat{M}-M)(z)+o_{p}(1) \
&=sqrt{n} N^{-alpha} sum_{k=1}^{N} u_{k}left(w_{k}-1right)+o_{p}(1)
end{aligned}
$$

The linearized variables $u_{k}$ are the influence functions $I_{T}left(M, z_{k}right)$, where $z_{k}$ is the value of the variable of interest for the $k$ th unit.

This exact equation doesn’t show up in Deville’s 1999 paper as far as I can see. Instead, there is a tantalizingly similar-looking result on page 6.

Result: Under broad assumptions, the substitution estimation of a functional $T(M)$ is linearizable. A linearized variable is $z_{k}=I Tleft(M ; x_{k}right)$ where $I T$ is the influence function of $T$ in $M$.

Proof of the result: Let us provide the space of measurements on $boldsymbol{R}^{q}$ with a metric $d$ accounting for the convergence: $dleft(M_{1}, M_{2}right) rightarrow 0$ if and only if $N^{-1}left(int y d M_{1}-int y d M_{2}right) rightarrow 0$ for any variable of interest $y$. The asymptotic postulates mean that $d(hat{M} / N, M / N)$ tends towards zero. We can visibly ensure that $d(hat{M} / N, M / N)$ is $O_{p}(1 / sqrt{n})$ according to the third postulate. Now, let us assume that $T$ can be derived in accordance with Fréchet, i.e., for any direction of the increase, in the space of "useful" measures provided with the abovementioned metric. Thus we have:
$$
N^{-alpha}(T(hat{M})-T(M))=frac{1}{N} sum_{U} z_{k}left(w_{k}-1right)+oleft(dleft(frac{hat{M}}{M}, frac{M}{N}right)right)
$$

The result is that:
$$
sqrt{n} N^{-alpha}(T(hat{M})-T(M))=frac{sqrt{n}}{N} sum_{U} z_{k}left(w_{k}-1right)+o_{p}(1) .
$$

However, this result in the original 1999 paper is not the same result as the 2009 Biometrika paper. Why does the right-hand side of the Deville 1999 equation use $N^{-1}$, while the right-hand side of the quoted equation in the 2009 paper uses $N^{-alpha}$?

The equation in Deville 1999 doesn’t make sense to me. For example, the mean is a statistic of degree 0 and its influence function is $z_{k}=frac{1}{N}left(y_{k}-bar{Y}right)$, so with the Deville 1999 equation we would end up with the nonsensical result that
$sqrt{n}(hat{bar{Y}} – bar{Y}) = frac{sqrt{n}}{N} left[hat{bar{Y}} – bar{Y}(frac{hat{N}}{N}) right] + o_p(1)$.

And the proof seems to contain some hidden steps. How is that first equation in Deville 1999 derived? It seems to involve the following missing step, but it’s not clear how even this equation would be established.

$$
N^{-alpha}(T(hat{M})-T(M))= N^{-1} int I_{T}(M, z) d(hat{M}-M)(z)+oleft(dleft(frac{hat{M}}{M}, frac{M}{N}right)right)
$$


Get this bounty!!!

#StackBounty: #sampling #asymptotics #measure-theory #survey-sampling #influence-function How is the asymptotic justification of the &q…

Bounty: 50

The survey R package recently adopted the "linearization by influence function" method of estimating covariances between domain estimates. The central paper justifying this method is Deville (1999). I’m trying to understand the main asymptotic claim made in the paper and some confusing aspects of the proof.

The result is summarized concisely by Deville and others in this 2009 Biometrika paper:

In Deville’s approach, a population parameter of interest $Phi$ can be written as a functional $T$ with respect to a finite and discrete measure $M$, namely $Phi=T(M)$. The substitution estimator $hat{Phi}=T(hat{M})$ is the functional $T$ of a random measure $hat{M}$ that is associated with sampling weights $w_{k}, k in U$, and is ‘close’ to $M$. Suppose that $T$ is homogeneous of degree $alpha$, so that $T(r M)=r^{alpha} T(M)$, and $lim _{N rightarrow infty} N^{-alpha} T(M)<infty$. Under broad assumptions, Deville shows that
$$
begin{aligned}
sqrt{n} N^{-alpha}{T(hat{M})-T(M)} &=sqrt{n} N^{-alpha} int I_{T}(M, z) d(hat{M}-M)(z)+o_{p}(1) \
&=sqrt{n} N^{-alpha} sum_{k=1}^{N} u_{k}left(w_{k}-1right)+o_{p}(1)
end{aligned}
$$

The linearized variables $u_{k}$ are the influence functions $I_{T}left(M, z_{k}right)$, where $z_{k}$ is the value of the variable of interest for the $k$ th unit.

This exact equation doesn’t show up in Deville’s 1999 paper as far as I can see. Instead, there is a tantalizingly similar-looking result on page 6.

Result: Under broad assumptions, the substitution estimation of a functional $T(M)$ is linearizable. A linearized variable is $z_{k}=I Tleft(M ; x_{k}right)$ where $I T$ is the influence function of $T$ in $M$.

Proof of the result: Let us provide the space of measurements on $boldsymbol{R}^{q}$ with a metric $d$ accounting for the convergence: $dleft(M_{1}, M_{2}right) rightarrow 0$ if and only if $N^{-1}left(int y d M_{1}-int y d M_{2}right) rightarrow 0$ for any variable of interest $y$. The asymptotic postulates mean that $d(hat{M} / N, M / N)$ tends towards zero. We can visibly ensure that $d(hat{M} / N, M / N)$ is $O_{p}(1 / sqrt{n})$ according to the third postulate. Now, let us assume that $T$ can be derived in accordance with Fréchet, i.e., for any direction of the increase, in the space of "useful" measures provided with the abovementioned metric. Thus we have:
$$
N^{-alpha}(T(hat{M})-T(M))=frac{1}{N} sum_{U} z_{k}left(w_{k}-1right)+oleft(dleft(frac{hat{M}}{M}, frac{M}{N}right)right)
$$

The result is that:
$$
sqrt{n} N^{-alpha}(T(hat{M})-T(M))=frac{sqrt{n}}{N} sum_{U} z_{k}left(w_{k}-1right)+o_{p}(1) .
$$

However, this result in the original 1999 paper is not the same result as the 2009 Biometrika paper. Why does the right-hand side of the Deville 1999 equation use $N^{-1}$, while the right-hand side of the quoted equation in the 2009 paper uses $N^{-alpha}$?

The equation in Deville 1999 doesn’t make sense to me. For example, the mean is a statistic of degree 0 and its influence function is $z_{k}=frac{1}{N}left(y_{k}-bar{Y}right)$, so with the Deville 1999 equation we would end up with the nonsensical result that
$sqrt{n}(hat{bar{Y}} – bar{Y}) = frac{sqrt{n}}{N} left[hat{bar{Y}} – bar{Y}(frac{hat{N}}{N}) right] + o_p(1)$.

And the proof seems to contain some hidden steps. How is that first equation in Deville 1999 derived? It seems to involve the following missing step, but it’s not clear how even this equation would be established.

$$
N^{-alpha}(T(hat{M})-T(M))= N^{-1} int I_{T}(M, z) d(hat{M}-M)(z)+oleft(dleft(frac{hat{M}}{M}, frac{M}{N}right)right)
$$


Get this bounty!!!

#StackBounty: #sampling #asymptotics #measure-theory #survey-sampling #influence-function How is the asymptotic justification of the &q…

Bounty: 50

The survey R package recently adopted the "linearization by influence function" method of estimating covariances between domain estimates. The central paper justifying this method is Deville (1999). I’m trying to understand the main asymptotic claim made in the paper and some confusing aspects of the proof.

The result is summarized concisely by Deville and others in this 2009 Biometrika paper:

In Deville’s approach, a population parameter of interest $Phi$ can be written as a functional $T$ with respect to a finite and discrete measure $M$, namely $Phi=T(M)$. The substitution estimator $hat{Phi}=T(hat{M})$ is the functional $T$ of a random measure $hat{M}$ that is associated with sampling weights $w_{k}, k in U$, and is ‘close’ to $M$. Suppose that $T$ is homogeneous of degree $alpha$, so that $T(r M)=r^{alpha} T(M)$, and $lim _{N rightarrow infty} N^{-alpha} T(M)<infty$. Under broad assumptions, Deville shows that
$$
begin{aligned}
sqrt{n} N^{-alpha}{T(hat{M})-T(M)} &=sqrt{n} N^{-alpha} int I_{T}(M, z) d(hat{M}-M)(z)+o_{p}(1) \
&=sqrt{n} N^{-alpha} sum_{k=1}^{N} u_{k}left(w_{k}-1right)+o_{p}(1)
end{aligned}
$$

The linearized variables $u_{k}$ are the influence functions $I_{T}left(M, z_{k}right)$, where $z_{k}$ is the value of the variable of interest for the $k$ th unit.

This exact equation doesn’t show up in Deville’s 1999 paper as far as I can see. Instead, there is a tantalizingly similar-looking result on page 6.

Result: Under broad assumptions, the substitution estimation of a functional $T(M)$ is linearizable. A linearized variable is $z_{k}=I Tleft(M ; x_{k}right)$ where $I T$ is the influence function of $T$ in $M$.

Proof of the result: Let us provide the space of measurements on $boldsymbol{R}^{q}$ with a metric $d$ accounting for the convergence: $dleft(M_{1}, M_{2}right) rightarrow 0$ if and only if $N^{-1}left(int y d M_{1}-int y d M_{2}right) rightarrow 0$ for any variable of interest $y$. The asymptotic postulates mean that $d(hat{M} / N, M / N)$ tends towards zero. We can visibly ensure that $d(hat{M} / N, M / N)$ is $O_{p}(1 / sqrt{n})$ according to the third postulate. Now, let us assume that $T$ can be derived in accordance with Fréchet, i.e., for any direction of the increase, in the space of "useful" measures provided with the abovementioned metric. Thus we have:
$$
N^{-alpha}(T(hat{M})-T(M))=frac{1}{N} sum_{U} z_{k}left(w_{k}-1right)+oleft(dleft(frac{hat{M}}{M}, frac{M}{N}right)right)
$$

The result is that:
$$
sqrt{n} N^{-alpha}(T(hat{M})-T(M))=frac{sqrt{n}}{N} sum_{U} z_{k}left(w_{k}-1right)+o_{p}(1) .
$$

However, this result in the original 1999 paper is not the same result as the 2009 Biometrika paper. Why does the right-hand side of the Deville 1999 equation use $N^{-1}$, while the right-hand side of the quoted equation in the 2009 paper uses $N^{-alpha}$?

The equation in Deville 1999 doesn’t make sense to me. For example, the mean is a statistic of degree 0 and its influence function is $z_{k}=frac{1}{N}left(y_{k}-bar{Y}right)$, so with the Deville 1999 equation we would end up with the nonsensical result that
$sqrt{n}(hat{bar{Y}} – bar{Y}) = frac{sqrt{n}}{N} left[hat{bar{Y}} – bar{Y}(frac{hat{N}}{N}) right] + o_p(1)$.

And the proof seems to contain some hidden steps. How is that first equation in Deville 1999 derived? It seems to involve the following missing step, but it’s not clear how even this equation would be established.

$$
N^{-alpha}(T(hat{M})-T(M))= N^{-1} int I_{T}(M, z) d(hat{M}-M)(z)+oleft(dleft(frac{hat{M}}{M}, frac{M}{N}right)right)
$$


Get this bounty!!!

#StackBounty: #sampling #asymptotics #measure-theory #survey-sampling #influence-function How is the asymptotic justification of the &q…

Bounty: 50

The survey R package recently adopted the "linearization by influence function" method of estimating covariances between domain estimates. The central paper justifying this method is Deville (1999). I’m trying to understand the main asymptotic claim made in the paper and some confusing aspects of the proof.

The result is summarized concisely by Deville and others in this 2009 Biometrika paper:

In Deville’s approach, a population parameter of interest $Phi$ can be written as a functional $T$ with respect to a finite and discrete measure $M$, namely $Phi=T(M)$. The substitution estimator $hat{Phi}=T(hat{M})$ is the functional $T$ of a random measure $hat{M}$ that is associated with sampling weights $w_{k}, k in U$, and is ‘close’ to $M$. Suppose that $T$ is homogeneous of degree $alpha$, so that $T(r M)=r^{alpha} T(M)$, and $lim _{N rightarrow infty} N^{-alpha} T(M)<infty$. Under broad assumptions, Deville shows that
$$
begin{aligned}
sqrt{n} N^{-alpha}{T(hat{M})-T(M)} &=sqrt{n} N^{-alpha} int I_{T}(M, z) d(hat{M}-M)(z)+o_{p}(1) \
&=sqrt{n} N^{-alpha} sum_{k=1}^{N} u_{k}left(w_{k}-1right)+o_{p}(1)
end{aligned}
$$

The linearized variables $u_{k}$ are the influence functions $I_{T}left(M, z_{k}right)$, where $z_{k}$ is the value of the variable of interest for the $k$ th unit.

This exact equation doesn’t show up in Deville’s 1999 paper as far as I can see. Instead, there is a tantalizingly similar-looking result on page 6.

Result: Under broad assumptions, the substitution estimation of a functional $T(M)$ is linearizable. A linearized variable is $z_{k}=I Tleft(M ; x_{k}right)$ where $I T$ is the influence function of $T$ in $M$.

Proof of the result: Let us provide the space of measurements on $boldsymbol{R}^{q}$ with a metric $d$ accounting for the convergence: $dleft(M_{1}, M_{2}right) rightarrow 0$ if and only if $N^{-1}left(int y d M_{1}-int y d M_{2}right) rightarrow 0$ for any variable of interest $y$. The asymptotic postulates mean that $d(hat{M} / N, M / N)$ tends towards zero. We can visibly ensure that $d(hat{M} / N, M / N)$ is $O_{p}(1 / sqrt{n})$ according to the third postulate. Now, let us assume that $T$ can be derived in accordance with Fréchet, i.e., for any direction of the increase, in the space of "useful" measures provided with the abovementioned metric. Thus we have:
$$
N^{-alpha}(T(hat{M})-T(M))=frac{1}{N} sum_{U} z_{k}left(w_{k}-1right)+oleft(dleft(frac{hat{M}}{M}, frac{M}{N}right)right)
$$

The result is that:
$$
sqrt{n} N^{-alpha}(T(hat{M})-T(M))=frac{sqrt{n}}{N} sum_{U} z_{k}left(w_{k}-1right)+o_{p}(1) .
$$

However, this result in the original 1999 paper is not the same result as the 2009 Biometrika paper. Why does the right-hand side of the Deville 1999 equation use $N^{-1}$, while the right-hand side of the quoted equation in the 2009 paper uses $N^{-alpha}$?

The equation in Deville 1999 doesn’t make sense to me. For example, the mean is a statistic of degree 0 and its influence function is $z_{k}=frac{1}{N}left(y_{k}-bar{Y}right)$, so with the Deville 1999 equation we would end up with the nonsensical result that
$sqrt{n}(hat{bar{Y}} – bar{Y}) = frac{sqrt{n}}{N} left[hat{bar{Y}} – bar{Y}(frac{hat{N}}{N}) right] + o_p(1)$.

And the proof seems to contain some hidden steps. How is that first equation in Deville 1999 derived? It seems to involve the following missing step, but it’s not clear how even this equation would be established.

$$
N^{-alpha}(T(hat{M})-T(M))= N^{-1} int I_{T}(M, z) d(hat{M}-M)(z)+oleft(dleft(frac{hat{M}}{M}, frac{M}{N}right)right)
$$


Get this bounty!!!

#StackBounty: #sampling #asymptotics #measure-theory #survey-sampling #influence-function How is the asymptotic justification of the &q…

Bounty: 50

The survey R package recently adopted the "linearization by influence function" method of estimating covariances between domain estimates. The central paper justifying this method is Deville (1999). I’m trying to understand the main asymptotic claim made in the paper and some confusing aspects of the proof.

The result is summarized concisely by Deville and others in this 2009 Biometrika paper:

In Deville’s approach, a population parameter of interest $Phi$ can be written as a functional $T$ with respect to a finite and discrete measure $M$, namely $Phi=T(M)$. The substitution estimator $hat{Phi}=T(hat{M})$ is the functional $T$ of a random measure $hat{M}$ that is associated with sampling weights $w_{k}, k in U$, and is ‘close’ to $M$. Suppose that $T$ is homogeneous of degree $alpha$, so that $T(r M)=r^{alpha} T(M)$, and $lim _{N rightarrow infty} N^{-alpha} T(M)<infty$. Under broad assumptions, Deville shows that
$$
begin{aligned}
sqrt{n} N^{-alpha}{T(hat{M})-T(M)} &=sqrt{n} N^{-alpha} int I_{T}(M, z) d(hat{M}-M)(z)+o_{p}(1) \
&=sqrt{n} N^{-alpha} sum_{k=1}^{N} u_{k}left(w_{k}-1right)+o_{p}(1)
end{aligned}
$$

The linearized variables $u_{k}$ are the influence functions $I_{T}left(M, z_{k}right)$, where $z_{k}$ is the value of the variable of interest for the $k$ th unit.

This exact equation doesn’t show up in Deville’s 1999 paper as far as I can see. Instead, there is a tantalizingly similar-looking result on page 6.

Result: Under broad assumptions, the substitution estimation of a functional $T(M)$ is linearizable. A linearized variable is $z_{k}=I Tleft(M ; x_{k}right)$ where $I T$ is the influence function of $T$ in $M$.

Proof of the result: Let us provide the space of measurements on $boldsymbol{R}^{q}$ with a metric $d$ accounting for the convergence: $dleft(M_{1}, M_{2}right) rightarrow 0$ if and only if $N^{-1}left(int y d M_{1}-int y d M_{2}right) rightarrow 0$ for any variable of interest $y$. The asymptotic postulates mean that $d(hat{M} / N, M / N)$ tends towards zero. We can visibly ensure that $d(hat{M} / N, M / N)$ is $O_{p}(1 / sqrt{n})$ according to the third postulate. Now, let us assume that $T$ can be derived in accordance with Fréchet, i.e., for any direction of the increase, in the space of "useful" measures provided with the abovementioned metric. Thus we have:
$$
N^{-alpha}(T(hat{M})-T(M))=frac{1}{N} sum_{U} z_{k}left(w_{k}-1right)+oleft(dleft(frac{hat{M}}{M}, frac{M}{N}right)right)
$$

The result is that:
$$
sqrt{n} N^{-alpha}(T(hat{M})-T(M))=frac{sqrt{n}}{N} sum_{U} z_{k}left(w_{k}-1right)+o_{p}(1) .
$$

However, this result in the original 1999 paper is not the same result as the 2009 Biometrika paper. Why does the right-hand side of the Deville 1999 equation use $N^{-1}$, while the right-hand side of the quoted equation in the 2009 paper uses $N^{-alpha}$?

The equation in Deville 1999 doesn’t make sense to me. For example, the mean is a statistic of degree 0 and its influence function is $z_{k}=frac{1}{N}left(y_{k}-bar{Y}right)$, so with the Deville 1999 equation we would end up with the nonsensical result that
$sqrt{n}(hat{bar{Y}} – bar{Y}) = frac{sqrt{n}}{N} left[hat{bar{Y}} – bar{Y}(frac{hat{N}}{N}) right] + o_p(1)$.

And the proof seems to contain some hidden steps. How is that first equation in Deville 1999 derived? It seems to involve the following missing step, but it’s not clear how even this equation would be established.

$$
N^{-alpha}(T(hat{M})-T(M))= N^{-1} int I_{T}(M, z) d(hat{M}-M)(z)+oleft(dleft(frac{hat{M}}{M}, frac{M}{N}right)right)
$$


Get this bounty!!!

#StackBounty: #sampling #asymptotics #measure-theory #survey-sampling #influence-function How is the asymptotic justification of the &q…

Bounty: 50

The survey R package recently adopted the "linearization by influence function" method of estimating covariances between domain estimates. The central paper justifying this method is Deville (1999). I’m trying to understand the main asymptotic claim made in the paper and some confusing aspects of the proof.

The result is summarized concisely by Deville and others in this 2009 Biometrika paper:

In Deville’s approach, a population parameter of interest $Phi$ can be written as a functional $T$ with respect to a finite and discrete measure $M$, namely $Phi=T(M)$. The substitution estimator $hat{Phi}=T(hat{M})$ is the functional $T$ of a random measure $hat{M}$ that is associated with sampling weights $w_{k}, k in U$, and is ‘close’ to $M$. Suppose that $T$ is homogeneous of degree $alpha$, so that $T(r M)=r^{alpha} T(M)$, and $lim _{N rightarrow infty} N^{-alpha} T(M)<infty$. Under broad assumptions, Deville shows that
$$
begin{aligned}
sqrt{n} N^{-alpha}{T(hat{M})-T(M)} &=sqrt{n} N^{-alpha} int I_{T}(M, z) d(hat{M}-M)(z)+o_{p}(1) \
&=sqrt{n} N^{-alpha} sum_{k=1}^{N} u_{k}left(w_{k}-1right)+o_{p}(1)
end{aligned}
$$

The linearized variables $u_{k}$ are the influence functions $I_{T}left(M, z_{k}right)$, where $z_{k}$ is the value of the variable of interest for the $k$ th unit.

This exact equation doesn’t show up in Deville’s 1999 paper as far as I can see. Instead, there is a tantalizingly similar-looking result on page 6.

Result: Under broad assumptions, the substitution estimation of a functional $T(M)$ is linearizable. A linearized variable is $z_{k}=I Tleft(M ; x_{k}right)$ where $I T$ is the influence function of $T$ in $M$.

Proof of the result: Let us provide the space of measurements on $boldsymbol{R}^{q}$ with a metric $d$ accounting for the convergence: $dleft(M_{1}, M_{2}right) rightarrow 0$ if and only if $N^{-1}left(int y d M_{1}-int y d M_{2}right) rightarrow 0$ for any variable of interest $y$. The asymptotic postulates mean that $d(hat{M} / N, M / N)$ tends towards zero. We can visibly ensure that $d(hat{M} / N, M / N)$ is $O_{p}(1 / sqrt{n})$ according to the third postulate. Now, let us assume that $T$ can be derived in accordance with Fréchet, i.e., for any direction of the increase, in the space of "useful" measures provided with the abovementioned metric. Thus we have:
$$
N^{-alpha}(T(hat{M})-T(M))=frac{1}{N} sum_{U} z_{k}left(w_{k}-1right)+oleft(dleft(frac{hat{M}}{M}, frac{M}{N}right)right)
$$

The result is that:
$$
sqrt{n} N^{-alpha}(T(hat{M})-T(M))=frac{sqrt{n}}{N} sum_{U} z_{k}left(w_{k}-1right)+o_{p}(1) .
$$

However, this result in the original 1999 paper is not the same result as the 2009 Biometrika paper. Why does the right-hand side of the Deville 1999 equation use $N^{-1}$, while the right-hand side of the quoted equation in the 2009 paper uses $N^{-alpha}$?

The equation in Deville 1999 doesn’t make sense to me. For example, the mean is a statistic of degree 0 and its influence function is $z_{k}=frac{1}{N}left(y_{k}-bar{Y}right)$, so with the Deville 1999 equation we would end up with the nonsensical result that
$sqrt{n}(hat{bar{Y}} – bar{Y}) = frac{sqrt{n}}{N} left[hat{bar{Y}} – bar{Y}(frac{hat{N}}{N}) right] + o_p(1)$.

And the proof seems to contain some hidden steps. How is that first equation in Deville 1999 derived? It seems to involve the following missing step, but it’s not clear how even this equation would be established.

$$
N^{-alpha}(T(hat{M})-T(M))= N^{-1} int I_{T}(M, z) d(hat{M}-M)(z)+oleft(dleft(frac{hat{M}}{M}, frac{M}{N}right)right)
$$


Get this bounty!!!