#StackBounty: #hypothesis-testing #variance #heteroscedasticity #breusch-pagan Test of heteroscedasticity for a categorical/ordinal pre…

Bounty: 100

I have different number of measurements from various classes. I used one-way anova to see if the means of the observations in each class is different from others. This used the ratio of the between-class variance to the total variance.

Now, I want to test whether some classes (basically those with more observations) have a larger variance than expected by chance. What statistical test should I do? I can calculate the sample variance for each class, and then find the $R^2$ and p-value for the correlation of the sample variance vs. class size. Or in R, I could do

summary(lm(sampleVar ~ classSize))

But the variance of the esitmator of variance (sample variance) depends on the sample size, even for random data.

For example, I generate some random data:

dt <- as.data.table(data.frame(obs=rnorm(4000), clabel=as.factor(sample(x = c(1:200),size = 4000, replace = T, prob = 5+c(1:200)))))

I compute the sample variance and class sizes

dt[,classSize := length(obs),by=clabel]; dt[,sampleVar := var(obs),by=clabel]

and then test to see if variance depends on the class size

summary(lm(data=unique(dt[,.(sampleVar, classSize),by=clabel]),formula = sampleVar ~ classSize))
Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.858047   0.056605  15.159   <2e-16 ***
classSize   0.006035   0.002393   2.521   0.0125 *  

There seems to be a dependence of the variance with the class size, but this is simply because the variance of the estimator depends on the sample size. How do I construct a statistical test to see if the variances in the different classes are actually dependent on the class sizes?

If my the variable I was regressing against was a continuous variable instead of the ordinal variable classSize, then I could have used the Breusch-Pagan test.

For example, I could do
fit <- lm(data=dt, formula= obs ~ clabel)


Get this bounty!!!

#StackBounty: #hypothesis-testing #t-test #p-value Estimating "population p-value" $Pi$ using an observed p-value

Bounty: 100

I asked a similar question last month, but from the responses, I see how the question can be asked more precisely.

Let’s suppose a population of the form

$$X sim mathcal{N}(100 + t_{n-1} times sigma / sqrt{n}, sigma)$$

in which $t_{n-1}$ is the student $t$ quantile based on a specific value of a parameter $Pi$ ($0<Pi<1)$. For the sake of the illustration, we could suppose that $Pi$ is 0.025.

When performing a one-sided $t$ test of the null hypothesis $H_0: mu = 100$ on a sample taken from that population, the expected $p$ value is $Pi$, irrespective of sample size (as long as simple randomized sampling is used).

I have 4 questions:

  1. Is the $p$ value a maximum likelihood estimator (MLE) of $Pi$? (Conjecture: yes, because it is based on a $t$ statistic which is based on a likelihood ratio test);

  2. Is the $p$ value a biased estimator of $Pi$? (Conjecture: yes because (i) MLE tend to be biased, and (2) based on simulations, I noted that the median value of many $p$s is close to $Pi$ but the mean value of many $p$s is much larger);

  3. Is the $p$ value a minimum variance estimate of $Pi$? (Conjecture: yes in the asymptotic case but no guarantee for a given sample size)

  4. Can we get a confidence interval around a given $p$ value by using the confidence interval of the observed $t$ value (this is done using the non-central student $t$ distribution with degree of freedom $n-1$ and non-centrality parameter $t$) and computing the $p$ values of the lower and upper bound $t$ values? (Conjecture: yes because both the non-central student $t$ quantiles and the $p$ values of a one-sided test are continuous increasing functions)


Get this bounty!!!

#StackBounty: #hypothesis-testing #t-test #p-value Estimating "population p-value" $Pi$ using an observed p-value

Bounty: 100

I asked a similar question last month, but from the responses, I see how the question can be asked more precisely.

Let’s suppose a population of the form

$$X sim mathcal{N}(100 + t_{n-1} times sigma / sqrt{n}, sigma)$$

in which $t_{n-1}$ is the student $t$ quantile based on a specific value of a parameter $Pi$ ($0<Pi<1)$. For the sake of the illustration, we could suppose that $Pi$ is 0.025.

When performing a one-sided $t$ test of the null hypothesis $H_0: mu = 100$ on a sample taken from that population, the expected $p$ value is $Pi$, irrespective of sample size (as long as simple randomized sampling is used).

I have 4 questions:

  1. Is the $p$ value a maximum likelihood estimator (MLE) of $Pi$? (Conjecture: yes, because it is based on a $t$ statistic which is based on a likelihood ratio test);

  2. Is the $p$ value a biased estimator of $Pi$? (Conjecture: yes because (i) MLE tend to be biased, and (2) based on simulations, I noted that the median value of many $p$s is close to $Pi$ but the mean value of many $p$s is much larger);

  3. Is the $p$ value a minimum variance estimate of $Pi$? (Conjecture: yes in the asymptotic case but no guarantee for a given sample size)

  4. Can we get a confidence interval around a given $p$ value by using the confidence interval of the observed $t$ value (this is done using the non-central student $t$ distribution with degree of freedom $n-1$ and non-centrality parameter $t$) and computing the $p$ values of the lower and upper bound $t$ values? (Conjecture: yes because both the non-central student $t$ quantiles and the $p$ values of a one-sided test are continuous increasing functions)


Get this bounty!!!

#StackBounty: #hypothesis-testing #t-test #p-value Estimating "population p-value" $Pi$ using an observed p-value

Bounty: 100

I asked a similar question last month, but from the responses, I see how the question can be asked more precisely.

Let’s suppose a population of the form

$$X sim mathcal{N}(100 + t_{n-1} times sigma / sqrt{n}, sigma)$$

in which $t_{n-1}$ is the student $t$ quantile based on a specific value of a parameter $Pi$ ($0<Pi<1)$. For the sake of the illustration, we could suppose that $Pi$ is 0.025.

When performing a one-sided $t$ test of the null hypothesis $H_0: mu = 100$ on a sample taken from that population, the expected $p$ value is $Pi$, irrespective of sample size (as long as simple randomized sampling is used).

I have 4 questions:

  1. Is the $p$ value a maximum likelihood estimator (MLE) of $Pi$? (Conjecture: yes, because it is based on a $t$ statistic which is based on a likelihood ratio test);

  2. Is the $p$ value a biased estimator of $Pi$? (Conjecture: yes because (i) MLE tend to be biased, and (2) based on simulations, I noted that the median value of many $p$s is close to $Pi$ but the mean value of many $p$s is much larger);

  3. Is the $p$ value a minimum variance estimate of $Pi$? (Conjecture: yes in the asymptotic case but no guarantee for a given sample size)

  4. Can we get a confidence interval around a given $p$ value by using the confidence interval of the observed $t$ value (this is done using the non-central student $t$ distribution with degree of freedom $n-1$ and non-centrality parameter $t$) and computing the $p$ values of the lower and upper bound $t$ values? (Conjecture: yes because both the non-central student $t$ quantiles and the $p$ values of a one-sided test are continuous increasing functions)


Get this bounty!!!

#StackBounty: #hypothesis-testing #t-test #p-value Estimating "population p-value" $Pi$ using an observed p-value

Bounty: 100

I asked a similar question last month, but from the responses, I see how the question can be asked more precisely.

Let’s suppose a population of the form

$$X sim mathcal{N}(100 + t_{n-1} times sigma / sqrt{n}, sigma)$$

in which $t_{n-1}$ is the student $t$ quantile based on a specific value of a parameter $Pi$ ($0<Pi<1)$. For the sake of the illustration, we could suppose that $Pi$ is 0.025.

When performing a one-sided $t$ test of the null hypothesis $H_0: mu = 100$ on a sample taken from that population, the expected $p$ value is $Pi$, irrespective of sample size (as long as simple randomized sampling is used).

I have 4 questions:

  1. Is the $p$ value a maximum likelihood estimator (MLE) of $Pi$? (Conjecture: yes, because it is based on a $t$ statistic which is based on a likelihood ratio test);

  2. Is the $p$ value a biased estimator of $Pi$? (Conjecture: yes because (i) MLE tend to be biased, and (2) based on simulations, I noted that the median value of many $p$s is close to $Pi$ but the mean value of many $p$s is much larger);

  3. Is the $p$ value a minimum variance estimate of $Pi$? (Conjecture: yes in the asymptotic case but no guarantee for a given sample size)

  4. Can we get a confidence interval around a given $p$ value by using the confidence interval of the observed $t$ value (this is done using the non-central student $t$ distribution with degree of freedom $n-1$ and non-centrality parameter $t$) and computing the $p$ values of the lower and upper bound $t$ values? (Conjecture: yes because both the non-central student $t$ quantiles and the $p$ values of a one-sided test are continuous increasing functions)


Get this bounty!!!

#StackBounty: #hypothesis-testing #t-test #p-value Estimating "population p-value" $Pi$ using an observed p-value

Bounty: 100

I asked a similar question last month, but from the responses, I see how the question can be asked more precisely.

Let’s suppose a population of the form

$$X sim mathcal{N}(100 + t_{n-1} times sigma / sqrt{n}, sigma)$$

in which $t_{n-1}$ is the student $t$ quantile based on a specific value of a parameter $Pi$ ($0<Pi<1)$. For the sake of the illustration, we could suppose that $Pi$ is 0.025.

When performing a one-sided $t$ test of the null hypothesis $H_0: mu = 100$ on a sample taken from that population, the expected $p$ value is $Pi$, irrespective of sample size (as long as simple randomized sampling is used).

I have 4 questions:

  1. Is the $p$ value a maximum likelihood estimator (MLE) of $Pi$? (Conjecture: yes, because it is based on a $t$ statistic which is based on a likelihood ratio test);

  2. Is the $p$ value a biased estimator of $Pi$? (Conjecture: yes because (i) MLE tend to be biased, and (2) based on simulations, I noted that the median value of many $p$s is close to $Pi$ but the mean value of many $p$s is much larger);

  3. Is the $p$ value a minimum variance estimate of $Pi$? (Conjecture: yes in the asymptotic case but no guarantee for a given sample size)

  4. Can we get a confidence interval around a given $p$ value by using the confidence interval of the observed $t$ value (this is done using the non-central student $t$ distribution with degree of freedom $n-1$ and non-centrality parameter $t$) and computing the $p$ values of the lower and upper bound $t$ values? (Conjecture: yes because both the non-central student $t$ quantiles and the $p$ values of a one-sided test are continuous increasing functions)


Get this bounty!!!

#StackBounty: #hypothesis-testing #t-test #p-value Estimating "population p-value" $Pi$ using an observed p-value

Bounty: 100

I asked a similar question last month, but from the responses, I see how the question can be asked more precisely.

Let’s suppose a population of the form

$$X sim mathcal{N}(100 + t_{n-1} times sigma / sqrt{n}, sigma)$$

in which $t_{n-1}$ is the student $t$ quantile based on a specific value of a parameter $Pi$ ($0<Pi<1)$. For the sake of the illustration, we could suppose that $Pi$ is 0.025.

When performing a one-sided $t$ test of the null hypothesis $H_0: mu = 100$ on a sample taken from that population, the expected $p$ value is $Pi$, irrespective of sample size (as long as simple randomized sampling is used).

I have 4 questions:

  1. Is the $p$ value a maximum likelihood estimator (MLE) of $Pi$? (Conjecture: yes, because it is based on a $t$ statistic which is based on a likelihood ratio test);

  2. Is the $p$ value a biased estimator of $Pi$? (Conjecture: yes because (i) MLE tend to be biased, and (2) based on simulations, I noted that the median value of many $p$s is close to $Pi$ but the mean value of many $p$s is much larger);

  3. Is the $p$ value a minimum variance estimate of $Pi$? (Conjecture: yes in the asymptotic case but no guarantee for a given sample size)

  4. Can we get a confidence interval around a given $p$ value by using the confidence interval of the observed $t$ value (this is done using the non-central student $t$ distribution with degree of freedom $n-1$ and non-centrality parameter $t$) and computing the $p$ values of the lower and upper bound $t$ values? (Conjecture: yes because both the non-central student $t$ quantiles and the $p$ values of a one-sided test are continuous increasing functions)


Get this bounty!!!

#StackBounty: #hypothesis-testing #t-test #p-value Estimating "population p-value" $Pi$ using an observed p-value

Bounty: 100

I asked a similar question last month, but from the responses, I see how the question can be asked more precisely.

Let’s suppose a population of the form

$$X sim mathcal{N}(100 + t_{n-1} times sigma / sqrt{n}, sigma)$$

in which $t_{n-1}$ is the student $t$ quantile based on a specific value of a parameter $Pi$ ($0<Pi<1)$. For the sake of the illustration, we could suppose that $Pi$ is 0.025.

When performing a one-sided $t$ test of the null hypothesis $H_0: mu = 100$ on a sample taken from that population, the expected $p$ value is $Pi$, irrespective of sample size (as long as simple randomized sampling is used).

I have 4 questions:

  1. Is the $p$ value a maximum likelihood estimator (MLE) of $Pi$? (Conjecture: yes, because it is based on a $t$ statistic which is based on a likelihood ratio test);

  2. Is the $p$ value a biased estimator of $Pi$? (Conjecture: yes because (i) MLE tend to be biased, and (2) based on simulations, I noted that the median value of many $p$s is close to $Pi$ but the mean value of many $p$s is much larger);

  3. Is the $p$ value a minimum variance estimate of $Pi$? (Conjecture: yes in the asymptotic case but no guarantee for a given sample size)

  4. Can we get a confidence interval around a given $p$ value by using the confidence interval of the observed $t$ value (this is done using the non-central student $t$ distribution with degree of freedom $n-1$ and non-centrality parameter $t$) and computing the $p$ values of the lower and upper bound $t$ values? (Conjecture: yes because both the non-central student $t$ quantiles and the $p$ values of a one-sided test are continuous increasing functions)


Get this bounty!!!

#StackBounty: #hypothesis-testing #t-test #p-value Estimating "population p-value" $Pi$ using an observed p-value

Bounty: 100

I asked a similar question last month, but from the responses, I see how the question can be asked more precisely.

Let’s suppose a population of the form

$$X sim mathcal{N}(100 + t_{n-1} times sigma / sqrt{n}, sigma)$$

in which $t_{n-1}$ is the student $t$ quantile based on a specific value of a parameter $Pi$ ($0<Pi<1)$. For the sake of the illustration, we could suppose that $Pi$ is 0.025.

When performing a one-sided $t$ test of the null hypothesis $H_0: mu = 100$ on a sample taken from that population, the expected $p$ value is $Pi$, irrespective of sample size (as long as simple randomized sampling is used).

I have 4 questions:

  1. Is the $p$ value a maximum likelihood estimator (MLE) of $Pi$? (Conjecture: yes, because it is based on a $t$ statistic which is based on a likelihood ratio test);

  2. Is the $p$ value a biased estimator of $Pi$? (Conjecture: yes because (i) MLE tend to be biased, and (2) based on simulations, I noted that the median value of many $p$s is close to $Pi$ but the mean value of many $p$s is much larger);

  3. Is the $p$ value a minimum variance estimate of $Pi$? (Conjecture: yes in the asymptotic case but no guarantee for a given sample size)

  4. Can we get a confidence interval around a given $p$ value by using the confidence interval of the observed $t$ value (this is done using the non-central student $t$ distribution with degree of freedom $n-1$ and non-centrality parameter $t$) and computing the $p$ values of the lower and upper bound $t$ values? (Conjecture: yes because both the non-central student $t$ quantiles and the $p$ values of a one-sided test are continuous increasing functions)


Get this bounty!!!

#StackBounty: #hypothesis-testing #t-test #p-value Estimating "population p-value" $Pi$ using an observed p-value

Bounty: 100

I asked a similar question last month, but from the responses, I see how the question can be asked more precisely.

Let’s suppose a population of the form

$$X sim mathcal{N}(100 + t_{n-1} times sigma / sqrt{n}, sigma)$$

in which $t_{n-1}$ is the student $t$ quantile based on a specific value of a parameter $Pi$ ($0<Pi<1)$. For the sake of the illustration, we could suppose that $Pi$ is 0.025.

When performing a one-sided $t$ test of the null hypothesis $H_0: mu = 100$ on a sample taken from that population, the expected $p$ value is $Pi$, irrespective of sample size (as long as simple randomized sampling is used).

I have 4 questions:

  1. Is the $p$ value a maximum likelihood estimator (MLE) of $Pi$? (Conjecture: yes, because it is based on a $t$ statistic which is based on a likelihood ratio test);

  2. Is the $p$ value a biased estimator of $Pi$? (Conjecture: yes because (i) MLE tend to be biased, and (2) based on simulations, I noted that the median value of many $p$s is close to $Pi$ but the mean value of many $p$s is much larger);

  3. Is the $p$ value a minimum variance estimate of $Pi$? (Conjecture: yes in the asymptotic case but no guarantee for a given sample size)

  4. Can we get a confidence interval around a given $p$ value by using the confidence interval of the observed $t$ value (this is done using the non-central student $t$ distribution with degree of freedom $n-1$ and non-centrality parameter $t$) and computing the $p$ values of the lower and upper bound $t$ values? (Conjecture: yes because both the non-central student $t$ quantiles and the $p$ values of a one-sided test are continuous increasing functions)


Get this bounty!!!