*Bounty: 50*

*Bounty: 50*

Say we have $k$ samples of data, where sample $i$ is of size $n_i$ and we write it as $x_{i1}, … , x_{in_i}$. Let the total sample size be $N$.

The ANOVA model is $X_{ij} sim N(mu_i, sigma^2)$ independently. The null hypothesis is that the $mu_i$ are all equal. The alternative hypothesis is that the null hypothesis is not true.

The ANOVA F-statistic is

$$F = frac{S_2/(k-1)}{S_1/(N – k)},$$

where

$$S_1 = sum_{i, j}(x_{ij} – bar{x}_{ibullet})^2$$

is the within samples sum of squares and

$$S_2 = sum_in_i(bar{x}*{ibullet} – bar{x}*{bulletbullet})^2$$

is the between samples sum of squares.

We know that $S_1$ and $S_2$ are independent and $S_0 = S_1 + S_2$ $(*)$, where

$$S_0 = sum_{i, j}(x_{ij} – bar{x}_{bulletbullet})^2$$

is the total sum of squares.

It is straightforward to show that under both the null and the alternative hypotheses, $S_1 sim sigma^2chi^2_{N – k}$.

Also, under the null hypothesis, the $X_{ij}$ are identically distributed, and so $S_0 sim sigma^2chi^2_{N-1}$. It follows from $(*)$ that under the null hypothesis $S_2 sim sigma^2chi^2_{k-1}$ and thus $F sim F_{k-1, N-k}$.

It is claimed that under the alternative hypothesis, $F$ follows a non-central $F$-distribution $F_{k-1, N-k}(lambda)$, where $lambda = sum_in_i(mu_i – barmu)^2$ and $barmu = sum_in_imu_i/N$ — or equivalently, that $S_2$ follows a (scaled) non-central $chi^2$ distribution, $sigma^2chi^2_{k-1}(lambda)$.

My tentative approach to proving this is similar to the derivation under the null hypothesis — that is, it is sufficient to prove that $S_0$ follows a (scaled) non-central $chi^2$ distribution, $sigma^2chi^2_{N-1}(lambda)$.

I’ve shown that this would follow from a slightly more general statement, namely that if $Y_i sim N(mu_i, sigma^2)$ independently (sample size $N$), then $S_0 = sum_i(Y_i – bar{Y})^2 sim sigma^2chi^2_{N-1}(lambda)$, where $lambda = sum_i(mu_i – barmu)^2$.

Is this the best approach? And what is the simplest proof of the final statement above?

Thanks.