*Bounty: 100*

*Bounty: 100*

I have different number of measurements from various classes. I used one-way anova to see if the means of the observations in each class is different from others. This used the ratio of the between-class variance to the total variance.

Now, I want to test whether some classes (basically those with more observations) have a larger variance than expected by chance. What statistical test should I do? I can calculate the sample variance for each class, and then find the $R^2$ and p-value for the correlation of the sample variance vs. class size. Or in R, I could do

```
summary(lm(sampleVar ~ classSize))
```

But the variance of the esitmator of variance (sample variance) depends on the sample size, even for random data.

For example, I generate some random data:

```
dt <- as.data.table(data.frame(obs=rnorm(4000), clabel=as.factor(sample(x = c(1:200),size = 4000, replace = T, prob = 5+c(1:200)))))
```

I compute the sample variance and class sizes

```
dt[,classSize := length(obs),by=clabel]; dt[,sampleVar := var(obs),by=clabel]
```

and then test to see if variance depends on the class size

```
summary(lm(data=unique(dt[,.(sampleVar, classSize),by=clabel]),formula = sampleVar ~ classSize))
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.858047 0.056605 15.159 <2e-16 ***
classSize 0.006035 0.002393 2.521 0.0125 *
```

There seems to be a dependence of the variance with the class size, but this is simply because the variance of the estimator depends on the sample size. How do I construct a statistical test to see if the variances in the different classes are actually dependent on the class sizes?

If my the variable I was regressing against was a continuous variable instead of the ordinal variable classSize, then I could have used the Breusch-Pagan test.

For example, I could do

fit <- lm(data=dt, formula= obs ~ clabel)