#StackBounty: #hypothesis-testing #variance #heteroscedasticity #breusch-pagan Test of heteroscedasticity for a categorical/ordinal pre…

Bounty: 100

I have different number of measurements from various classes. I used one-way anova to see if the means of the observations in each class is different from others. This used the ratio of the between-class variance to the total variance.

Now, I want to test whether some classes (basically those with more observations) have a larger variance than expected by chance. What statistical test should I do? I can calculate the sample variance for each class, and then find the $R^2$ and p-value for the correlation of the sample variance vs. class size. Or in R, I could do

summary(lm(sampleVar ~ classSize))

But the variance of the esitmator of variance (sample variance) depends on the sample size, even for random data.

For example, I generate some random data:

dt <- as.data.table(data.frame(obs=rnorm(4000), clabel=as.factor(sample(x = c(1:200),size = 4000, replace = T, prob = 5+c(1:200)))))

I compute the sample variance and class sizes

dt[,classSize := length(obs),by=clabel]; dt[,sampleVar := var(obs),by=clabel]

and then test to see if variance depends on the class size

summary(lm(data=unique(dt[,.(sampleVar, classSize),by=clabel]),formula = sampleVar ~ classSize))
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.858047   0.056605  15.159   <2e-16 ***
classSize   0.006035   0.002393   2.521   0.0125 *  

There seems to be a dependence of the variance with the class size, but this is simply because the variance of the estimator depends on the sample size. How do I construct a statistical test to see if the variances in the different classes are actually dependent on the class sizes?

If my the variable I was regressing against was a continuous variable instead of the ordinal variable classSize, then I could have used the Breusch-Pagan test.

For example, I could do
fit <- lm(data=dt, formula= obs ~ clabel)

Get this bounty!!!