#StackBounty: #r #mathematical-statistics #variance #sampling #mean Right way to compute mean and variance

Bounty: 50

1.If I take as definition of $a_{lm}$ following a normal distribution with mean equal to zero and $C_ell=langle a_{lm}^2 rangle=text{Var}(a_{lm})$, and taking the following random variable $Z$ defined by this expression :

$$begin{aligned}
Z = sum_{ell=ell_{min}}^{l_{max}} sum_{m=-ell}^{ell} a_{ell m}^{2}
end{aligned}$$

Then, the goal is to compute $langle Zrangle$ :

If I consider the random variable $Y=sum_{m=-ell}^{ell} C_ell bigg(dfrac{a_{ell m}}{sqrt{C_ell}}bigg)^{2}
$
, this random variable $Y$ follows a $chi^2(1)$ distribution weighted by $C_ell$.

  1. Can I write from this that mean of $Y$ is equal to :

$$langle Yrangle =langlebigg(sum_{m=-ell}^{ell} a_{ell m}^{2}bigg)rangle = (2ell+1),C_ell$$

??

and so, we would have :

$$langle Zrangle = sum_{ell=ell_{min}}^{ell_{max}},C_ell,(2ell+1)$$

?? I have serious doubts since the $a_{lm}$ doesn’t follow a reduced Normal distribution $mathcal{N}(0,1)$.

Shouldn’t it be rather :

$$begin{align}
Z&equiv sum_{ell=ell_{min}}^{ell_{max}} sum_{m=-ell}^ell a_{ell,m}^2 [6pt]
&= sum_{ell=ell_{min}}^{ell_{max}} sum_{m=-ell}^ell C_ell cdot bigg( frac{a_{ell,m}}{sqrt{C_ell}} bigg)^2 [6pt]
&sim sum_{ell=ell_{min}}^{ell_{max}} sum_{m=-ell}^ell C_ell cdot text{ChiSq}(1) [6pt]
&= sum_{ell=ell_{min}}^{ell_{max}} C_ell sum_{m=-ell}^ell text{ChiSq}(1) [6pt]
&= sum_{ell=ell_{min}}^{ell_{max}} C_ell cdot text{ChiSq}(2 ell + 1). [6pt]
end{align}$$

  1. Now, I want to calculate the mean $langle Zrangle$ of $Z$ :

Do you agree that my case here is the computation of a mean for a weighted sum of $chi^2$ ?

So the computation is not trivial, isn’t it ? Maybe I could compute the mean by starting from analytical :

$$langle Zrangle=sum_{ell=ell_{min}}^{ell_{max}} C_ell (2ell + 1)quad(1)$$

and directly doing the numerical computation :

$$langle Zrangle=sum_{i=1}^{N} C_{ell_{i}} (2ell_{i} + 1)quad(2)$$

  1. What do you think about this direct computation, is it correct ?

I make confusions between $(1)$ and $(2)$ above since there is each $C_ell$ corresponds to each $ell$ (I mean on a numerically point of view, each $C_{ell_{i}}$ value is associated to a $ell_{i}$ value)

  1. If the direct computation $langle Zrangle=sum_{i=1}^{N} C_{ell_{i}} (2ell_{i} + 1)$ not correct, then I have to consider random variable $Z$ following a weighted sum of different chisquared distrbutions :

I have tried with following R script where nRed is one of the 5 bins considered and nRow the number of values for $ell$ (from $ell_{min}$ to $ell_{max}$), and also the Cl_sp[,i] the vector of nRow values of $C_ell$ for each bin $i$ taken into acccount.

   # Number of bin
   nRed <- 5
    
   # Number of rows
   nRow <- 36
    
   # Size of sample
   nSample_var <- 1000
    
   # NRow values of multipoles l
   L <- 2*(array_2D[,1])+1
    
   # Weighted sum of Chi squared distribution
   y3_1<-array(0,dim=c(nSample_var,nRed))
      for (i in 1:nRed) {
        for (j in 1:nRow) { 
          y3_1[,i] <- y3_1[,i] + Cl_sp[j,i] * rchisq(nSample_var,df=L[j])
        }
      } 
    
   # Print the mean of Z for each bib
   for (i in 1:nRed) {
     print(paste0('red=',i,'mean_exp = ', mean(y3[,i])))
   }
  1. Is it the right thing to implement to compute the mean of $Z$ if I can’t compute it analytically (see expression $(2)$ above).

I would like to compute also the variance of $Z$, maybe a simple adding in my R script like :

# Print the variance of Z for each bin
for (i in 1:nRed) {
  print(paste0('red=',i,'mean_exp = ', var(y3[,i])))
}

should be enough. What do you think about this ?


Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.