# #StackBounty: #hypothesis-testing #anova #chi-squared #f-distribution #non-central Derive the distribution of the ANOVA F-statistic und…

### Bounty: 50

Say we have \$k\$ samples of data, where sample \$i\$ is of size \$n_i\$ and we write it as \$x_{i1}, … , x_{in_i}\$. Let the total sample size be \$N\$.

The ANOVA model is \$X_{ij} sim N(mu_i, sigma^2)\$ independently. The null hypothesis is that the \$mu_i\$ are all equal. The alternative hypothesis is that the null hypothesis is not true.

The ANOVA F-statistic is

\$\$F = frac{S_2/(k-1)}{S_1/(N – k)},\$\$

where

\$\$S_1 = sum_{i, j}(x_{ij} – bar{x}_{ibullet})^2\$\$

is the within samples sum of squares and

\$\$S_2 = sum_in_i(bar{x}{ibullet} – bar{x}{bulletbullet})^2\$\$

is the between samples sum of squares.

We know that \$S_1\$ and \$S_2\$ are independent and \$S_0 = S_1 + S_2\$ \$(*)\$, where

\$\$S_0 = sum_{i, j}(x_{ij} – bar{x}_{bulletbullet})^2\$\$

is the total sum of squares.

It is straightforward to show that under both the null and the alternative hypotheses, \$S_1 sim sigma^2chi^2_{N – k}\$.

Also, under the null hypothesis, the \$X_{ij}\$ are identically distributed, and so \$S_0 sim sigma^2chi^2_{N-1}\$. It follows from \$(*)\$ that under the null hypothesis \$S_2 sim sigma^2chi^2_{k-1}\$ and thus \$F sim F_{k-1, N-k}\$.

It is claimed that under the alternative hypothesis, \$F\$ follows a non-central \$F\$-distribution \$F_{k-1, N-k}(lambda)\$, where \$lambda = sum_in_i(mu_i – barmu)^2\$ and \$barmu = sum_in_imu_i/N\$ — or equivalently, that \$S_2\$ follows a (scaled) non-central \$chi^2\$ distribution, \$sigma^2chi^2_{k-1}(lambda)\$.

My tentative approach to proving this is similar to the derivation under the null hypothesis — that is, it is sufficient to prove that \$S_0\$ follows a (scaled) non-central \$chi^2\$ distribution, \$sigma^2chi^2_{N-1}(lambda)\$.

I’ve shown that this would follow from a slightly more general statement, namely that if \$Y_i sim N(mu_i, sigma^2)\$ independently (sample size \$N\$), then \$S_0 = sum_i(Y_i – bar{Y})^2 sim sigma^2chi^2_{N-1}(lambda)\$, where \$lambda = sum_i(mu_i – barmu)^2\$.

Is this the best approach? And what is the simplest proof of the final statement above?

Thanks.

Get this bounty!!!

This site uses Akismet to reduce spam. Learn how your comment data is processed.