#StackBounty: #hypothesis-testing #statistical-significance #estimation #bias #lognormal Measuring accuracy of estimates from lognormal…

Bounty: 50

Our org need to make estimates of movie box office results relative to our estimates pre-release.

We know that, generally, box office results are lognormally distributed.
we can determine a good fit on a lognormal estimator and a large box office portfolio of actual results which matches pretty well.

My question has to do with discriminating the cause of errors in estimation of both individual estimates and portfolio total estimates.

E.g. if based on factors like budget cast director and genre and size of release we make an estimate of box office to be obtained and the amount of marketing spend to be made to support that estimate.
So if we estimate that a film will do 50MM in box office, and we spend marketing dollars accordingly, but the film only does 22MM, does that error look like an “outlier” (signalling that we were over-optimstic in our our estimation) or not? (or put another way, is there some p value we can measure against which says if our estimate is unbiased, then the actual result should be with x% of the estimate? Or is there no way to make a judgement as to whether a single trial like this indicates anything about the bias of our “estimation engine” (e.g. a bunch of people sitting around talking)

Likewise, on a portfolio of say, 10 movies, how do we figure out if the delta between the portfolio estimate total box office and the portfolio actual total box office demonstrates that we are biased high in our estimates or not? On the portfolio case we measure simply the ratio of times we exceeded estimate and the times that we are short on the estimate as a measure of our bias, and feel OK if we were high roughly half the time and low roughly half the time, but I’m sure there is a better measure of our bias. However given we have only 10 history I wonder if that is enough to that the portfolio distribution should be symetric given the asymetry of the sampling distribution and the relatively low n. So would we expect that the say, 95% confidence interval should be smaller on the low side, and higher on the high side due to the asymetry of the log normal distribution?

Many thanks!


Get this bounty!!!

#StackBounty: #hypothesis-testing #correlation #multiple-comparisons #spearman-rho How to compare of two Spearman correlations matrices?

Bounty: 50

I have two non-parametric rank correlations matrices emp and sim (for example, based on Spearman’s $rho$ rank correlation coefficient):

emp <- matrix(c(
1.0000000, 0.7771328, 0.6800540, 0.2741636,
0.7771328, 1.0000000, 0.5818167, 0.2933432,
0.6800540, 0.5818167, 1.0000000, 0.3432396,
0.2741636, 0.2933432, 0.3432396, 1.0000000), ncol=4)

sim <- matrix(c(
1.0000000, 0.7616454, 0.6545774, 0.3081403,
0.7616454, 1.0000000, 0.5360392, 0.3146167,
0.6545774, 0.5360392, 1.0000000, 0.3739758,
0.3081403, 0.3146167, 0.3739758, 1.0000000), ncol=4)

The emp matrix is the correlation matrix that contains correlations between the emprical values (time series), the sim matrix is the correlation matrix — the simulated values.

I have read the Q&A How to compare two or more correlation matrices?, in my case it is known that emprical values are not from normal distribution, and I can’t use the Box’s M test.

I need to test the null hypothesis $H_0$: matrices emp and sim are drawn from the same distribution.

Question. What is a test do I can use? Is is possible to use the Wishart statistic?

Edit.
Follow to Stephan Kolassa‘s comment I have done a simulation.

I have tried to compare two Spearman correlations matrices emp and sim with the Box’s M test. The test has returned

# Chi-squared statistic = 2.6163, p-value = 0.9891

Then I have simulated 1000 times the correlations matrix sim and plot the distribution of Chi-squared statistic $M(1-c)simchi^2(df)$.

enter image description here

After that I have defined the 5-% quantile of Chi-squared statistic $M(1-c)simchi^2(df)$. The defined 5-% quantile equals to

quantile(dfr$stat, probs = 0.05)
#       5% 
# 1.505046

One can see that the 5-% quantile is less that the obtained Chi-squared statistic: 1.505046 < 2.6163 (blue line on the fugure), therefore, my emp‘s statistic $M(1−c)$ does not fall in the left tail of the $(M(1−c))_i$.

Edit 2.
Follow to the second Stephan Kolassa‘s comment I have calculated 95-% quantile of Chi-squared statistic $M(1-c)simchi^2(df)$ (blue line on the fugure). The defined 95-% quantile equals to

quantile(dfr$stat, probs = 0.95)
#      95% 
# 7.362071

One can see that the emp‘s statistic $M(1−c)$ does not fall in the right tail of the $(M(1−c))_i$.

Edit 3. I have calculated the exact $p$-value (green line on the figure) through the empirical cumulative distribution function:

ecdf(dfr$stat)(2.6163)
[1] 0.239

One can see that $p$-value=0.239 is greater than $0.05$.

Edit 4.

Dominik Wied (2014): A Nonparametric Test for a Constant Correlation
Matrix, Econometric Reviews, DOI: 10.1080/07474938.2014.998152

Joël Bun, Jean-Philippe Bouchaud and Mark Potters (2016), Cleaning correlation matrices, Risk.net, April 2016

Li, David X., On Default Correlation: A Copula Function Approach (September 1999). Available at SSRN: https://ssrn.com/abstract=187289 or http://dx.doi.org/10.2139/ssrn.187289

G. E. P. Box, A General Distribution Theory for a Class of Likelihood Criteria. Biometrika. Vol. 36, No. 3/4 (Dec., 1949), pp. 317-346

M. S. Bartlett, Properties of Sufficiency and Statistical Tests. Proc. R. Soc. Lond. A 1937 160, 268-282

Robert I. Jennrich (1970): An Asymptotic χ2 Test for the Equality of Two
Correlation Matrices
, Journal of the American Statistical Association, 65:330, 904-912.

Edit 5.

The first founded paper that has no the assumption about normal distribution.

Reza Modarres & Robert W. Jernigan (1993) A robust test for comparing correlation matrices, Journal of Statistical Computation and Simulation, 46:3-4, 169-181


Get this bounty!!!

#StackBounty: #hypothesis-testing #correlation #multiple-comparisons #spearman-rho #kendall-tau How to compare of two Spearman correlat…

Bounty: 50

I have two non-parametric rank correlations matrices emp and sim (for example, based on Spearman’s $rho$ rank correlation coefficient):

emp <- matrix(c(
1.0000000, 0.7771328, 0.6800540, 0.2741636,
0.7771328, 1.0000000, 0.5818167, 0.2933432,
0.6800540, 0.5818167, 1.0000000, 0.3432396,
0.2741636, 0.2933432, 0.3432396, 1.0000000), ncol=4)

sim <- matrix(c(
1.0000000, 0.7616454, 0.6545774, 0.3081403,
0.7616454, 1.0000000, 0.5360392, 0.3146167,
0.6545774, 0.5360392, 1.0000000, 0.3739758,
0.3081403, 0.3146167, 0.3739758, 1.0000000), ncol=4)

The emp matrix is the correlation matrix that contains correlations between the emprical values (time series), the sim matrix is the correlation matrix — the simulated values.

I have read the Q&A How to compare two or more correlation matrices?, in my case it is known that emprical values are not from normal distribution, and I can’t use the Box’s M test.

I need to test the null hypothesis $H_0$: matrices emp and sim are drawn from the same distribution.

Question. What is a test do I can use? Is is possible to use the Wishart statistic?

Edit.
Follow to Stephan Kolassa‘s comment I have done a simulation.

I have tried to compare two Spearman correlations matrices emp and sim with the Box’s M test. The test has returned

# Chi-squared statistic = 2.6163, p-value = 0.9891

Then I have simulated 1000 times the correlations matrix sim and plot the distribution of Chi-squared statistic $M(1-c)simchi^2(df)$.

enter image description here

After that I have defined the 5-% quantile of Chi-squared statistic $M(1-c)simchi^2(df)$. The defined 5-% quantile equals to

quantile(dfr$stat, probs = 0.05)
#       5% 
# 1.505046

One can see that the 5-% quantile is less that the obtained Chi-squared statistic: 1.505046 < 2.6163 (blue line on the fugure), therefore, my emp‘s statistic $M(1−c)$ does not fall in the left tail of the $(M(1−c))_i$.

Edit 2.
Follow to the second Stephan Kolassa‘s comment I have calculated 95-% quantile of Chi-squared statistic $M(1-c)simchi^2(df)$ (blue line on the fugure). The defined 95-% quantile equals to

quantile(dfr$stat, probs = 0.95)
#      95% 
# 7.362071

One can see that the emp‘s statistic $M(1−c)$ does not fall in the right tail of the $(M(1−c))_i$.

Edit 3. I have calculated the exact $p$-value (green line on the figure) through the empirical cumulative distribution function:

ecdf(dfr$stat)(2.6163)
[1] 0.239

One can see that $p$-value=0.239 is greater than $0.05$.

Edit 4.

Dominik Wied (2014): A Nonparametric Test for a Constant Correlation
Matrix, Econometric Reviews, DOI: 10.1080/07474938.2014.998152

Joël Bun, Jean-Philippe Bouchaud and Mark Potters (2016), Cleaning correlation matrices, Risk.net, April 2016

Li, David X., On Default Correlation: A Copula Function Approach (September 1999). Available at SSRN: https://ssrn.com/abstract=187289 or http://dx.doi.org/10.2139/ssrn.187289


Get this bounty!!!

#StackBounty: #hypothesis-testing #statistical-significance #ordinal-data statistical test for order of movements

Bounty: 50

I have 10 data sets where I have to identify the order of movement of particles. For example, first to move is ranked 1, second to move is 2, and so on.

So for each data set, I have a list of particles and the order of movements. The maximum number of particles is 6. There are cases when particles move at the same time (or when the order is not clear) and were ranked with the same number.

I want to know if there is a statistical test to check if the order of the movement found across data sets might be ‘random’ or not. I want to find out if the order is ‘significant’ or not.

Please point me to the correct statistical test for this, and how would you formulate the hypothesis in this case? Your insights will be extremely helpful.


Get this bounty!!!

#StackBounty: #hypothesis-testing #anova #sample-size #levenes-test Minimal sample size per group in Levene's test

Bounty: 50

Recently learned that Levene’s test is a one-way equal-variance ANOVA done on absolute values of residuals calculated from the mean within each group.

For one-way ANOVA the minimal sample size seems to be having at least one group with more than 1 observation. But in the case of Levene’s test it gets a bit tricky for me.

For example, if all groups have 2 observations each, the within-group variance will be 0. So the requirement would seem to be at least 1 group with at least 3 observations.

However what about situations where one group has only one sample? I did a few simulations in R using car::leveneTest() and it seems like p-values are not distributed uniformly in the case of 2 groups where one group has only one sample. Here is a demonstration:

library(car)
groups <- factor(c(rep("A", 999), "B"))
ps <- replicate(100, leveneTest(rnorm(1000), groups)[1,3])

> range(ps)
[1] 0.1681269 0.2107370

Basically, after simulating 100 scenarios where group 1 has 999 observations and group 2 has 1 observation, the p-values range from 0.16 to 0.22. The levenTest() function didn’t complain, but that might be an oversight in the implementation.

Question: what are the minimum sample size requirements for Levene’s test to be valid?

My current take: 2 samples per group with at least one group having 3 but I might have missed something.


Get this bounty!!!

#StackBounty: #hypothesis-testing #anova #sample-size #levenes-test Minimal sample size per group in Levene's test

Bounty: 50

Recently learned that Levene’s test is a one-way equal-variance ANOVA done on absolute values of residuals calculated from the mean within each group.

For one-way ANOVA the minimal sample size seems to be having at least one group with more than 1 observation. But in the case of Levene’s test it gets a bit tricky for me.

For example, if all groups have 2 observations each, the within-group variance will be 0. So the requirement would seem to be at least 1 group with at least 3 observations.

However what about situations where one group has only one sample? I did a few simulations in R using car::leveneTest() and it seems like p-values are not distributed uniformly in the case of 2 groups where one group has only one sample. Here is a demonstration:

library(car)
groups <- factor(c(rep("A", 999), "B"))
ps <- replicate(100, leveneTest(rnorm(1000), groups)[1,3])

> range(ps)
[1] 0.1681269 0.2107370

Basically, after simulating 100 scenarios where group 1 has 999 observations and group 2 has 1 observation, the p-values range from 0.16 to 0.22. The levenTest() function didn’t complain, but that might be an oversight in the implementation.

Question: what are the minimum sample size requirements for Levene’s test to be valid?

My current take: 2 samples per group with at least one group having 3 but I might have missed something.


Get this bounty!!!

#StackBounty: #hypothesis-testing #anova #sample-size #levenes-test Minimal sample size per group in Levene's test

Bounty: 50

Recently learned that Levene’s test is a one-way equal-variance ANOVA done on absolute values of residuals calculated from the mean within each group.

For one-way ANOVA the minimal sample size seems to be having at least one group with more than 1 observation. But in the case of Levene’s test it gets a bit tricky for me.

For example, if all groups have 2 observations each, the within-group variance will be 0. So the requirement would seem to be at least 1 group with at least 3 observations.

However what about situations where one group has only one sample? I did a few simulations in R using car::leveneTest() and it seems like p-values are not distributed uniformly in the case of 2 groups where one group has only one sample. Here is a demonstration:

library(car)
groups <- factor(c(rep("A", 999), "B"))
ps <- replicate(100, leveneTest(rnorm(1000), groups)[1,3])

> range(ps)
[1] 0.1681269 0.2107370

Basically, after simulating 100 scenarios where group 1 has 999 observations and group 2 has 1 observation, the p-values range from 0.16 to 0.22. The levenTest() function didn’t complain, but that might be an oversight in the implementation.

Question: what are the minimum sample size requirements for Levene’s test to be valid?

My current take: 2 samples per group with at least one group having 3 but I might have missed something.


Get this bounty!!!

#StackBounty: #hypothesis-testing #panel-data #causality Causal inference when individuals are observed at multiple points in time

Bounty: 50

I am trying to estimate the causal effect of a treatment on an outcome. I would like to consider one control variable and a time dimension (week). The sample looks as follows:

+------------+---------+---------+-----------+---------+---+
| individual |  week   | control | treatment | outcome |   |
+------------+---------+---------+-----------+---------+---+
|         1  |  201701 | a       |         1 |       0 |   |
|          1 |  201702 | a       |        0 |       0 |   |
|          1 |  201703 | b       |         0 |       0 |   |
|          1 |  201704 | b       |         1 |       1 |   |
|          2 |  201703 | d       |         0 |       0 |   |
|          2 | 201704  | d       |         1 |       0 |   |
|          2 | 201705  | e       |         1 |       1 |   |
|          3 | 201801  | a       |         1 |       1 |   |
+------------+---------+---------+-----------+---------+---+

As you can see, I observe the same individual multiple times for the same control and also multiple times across controls. Once the outcome is 1, the individual drops out. So, I do not observe all individuals all the time.

As I said, I would like to measure the effect of the treatment by control and by week and if the effect does not differ across controls or across weeks I would like to pool across the respective dimension.

I am somewhat puzzled about which individuals to consider in the test (and possibly also regarding what test to apply) since I am observing individuals at multiple points in time.

  1. I would like to check for instance if the effect differs across weeks. But then the same individual might show up in two different weeks. Does this matter, and if yes how should I deal with it? Should I (randomly) consider each individual only once? Is there a test to deal with this kind of scenario?

  2. Suppose I would like to pool across controls (i.e. not consider weeks separately). How would I best approach this issue then?

As of now, I used a chi2-test to find the effect by week-control.


Get this bounty!!!

#StackBounty: #hypothesis-testing #statistical-significance #multiple-regression #diagnostic What is the sample distribution of DFBETA …

Bounty: 50

I am wondering in which way the threshold of determining statistical significance for DFBETA is computed. From the famous book “Regression Diagnostics: Identifying Influential Data and Sources of Collinearity”, it seems the derivation is not rigorous and is just a “suggestion”. And I cannot see why the suggestion (-2 and +2) is reasonable. Would anybody help me clarify this? Thanks!


Get this bounty!!!

#StackBounty: #hypothesis-testing #statistical-significance #normal-distribution #optimization #central-limit-theorem Picking a signifi…

Bounty: 50

Suppose you are running a casino and that you are responsible for ensuring that all the dice are fair to avoid lawsuits. In order to do this, you take a mean of 1000 throws of each die and perform a hypothesis test [using the central limit theorem, CLT] to see whether they are likely biased.

The average cost of a lawsuit is £240000, whilst the cost of a die is £3, so in order to minimise costs you would aim to have $240000P(textrm{Type II Error}) = 3alpha$ where $alpha$ is the significance level of the hypothesis test (and also the probability of a type I error). The cost of testing the die may be ignored.

Now, in order to find the optimal $alpha$ value, one must know the value of $P(textrm{Type II Error})$, something that can only be calculated if the actual mean of the die (which is what we are testing for in the first place) is known, so the optimal solution cannot be found. That being said, however, I’m sure scenarios like this arise rather often, so how are they usually dealt with?

tldr: How would you find a threshold value for the mean of a die above (or below) which it should be considered biased whilst also keeping $240000P(textrm{Type II Error}) approx 3P(textrm{Type I Error})$

Edit: It seems my choice of example is rather poor, as a die shouldn’t even be tested for fairness with a test like this. That being said, however, my question really concerns the tradeoff between Type I and Type II error, not the die in particular.


Get this bounty!!!