#StackBounty: #r #time-series #hypothesis-testing #statistical-significance #granger-causality How to better use and interpret granger …

Bounty: 50

I have the following code and I want to show the connection of two different factors aith a specific one. I want to use grangertest in R and I have the following question:

  1. how can I interpret the results based on different levels of significance?
  2. how can I interpret non-significant results?
  3. is there a way to visualise the results?
my_example <- data.frame(matrix(ncol = 3, nrow = 10))
my_example$X1 <- c(0.8619616, 1.1818621, 0.5530410, 0.6255634, 
       0.9971764, 1.3464298, 2.0889985, 1.5303893, 2.9503790, 
my_example$X2 <- c( -5.7333332, -4.7000000, -7.7000000, 
     -2.5000000,  1.5666667,  0.2666667, -2.7000000, -6.2000000, 
      0.2333333  ,0.5333333)
my_example$X3 <- c( 0.2200000, 0.3625000, 0.2100000, 0.3750000, 
      0.4966667, 0.4133333, 0.3800000, 0.2133333, 0.3733333, 

grangertest(X1 ~ X2, order = 2, data = my_example)

grangertest(X1 ~ X3, order = 2, data = my_example)

Get this bounty!!!

#StackBounty: #hypothesis-testing #sample-size How to calculate sample size for a financial risk experiment

Bounty: 100

I want to experiment with a change to a financial transaction limit. The focus of the experiment will be the change of loss per at-risk transaction. I want to calculate the sample size required to detect a given effect size.

I know very little about statistics, so I’m not sure where to start with this. I think these are the important/relevant details:

  • There are very few at-risk transactions. ~19 monthly average for the last 5 months.
  • Loss is infrequent. 0.8% of at-risk amount per month, arising from 2 events in the last 5 months.

I want to know when a larger loss in the experimental group will be attributable to a change in the limit. I don’t know what method(s) to use here.

Get this bounty!!!

#StackBounty: #hypothesis-testing #self-study #normal-distribution #t-test #likelihood-ratio Likelihood Ratio Test Equivalent with $t$ …

Bounty: 50


Problem Statement: Suppose that independent random samples of sizes $n_1$ and $n_2$
are to be selected from normal populations with means $mu_1$ and $mu_2,$
respectively, and common variance $sigma^2.$ For testing $H_0:mu_1=mu_2$
versus $H_a:mu_1-mu_2>0$ ($sigma^2$ unknown), show that the likelihood
ratio test reduces to the two-sample $t$ test presented in Section 10.8.

Note: This is Exercise 10.94 from Mathematical Statistics with Applications, 5th. Ed., by Wackerly, Mendenhall, and Scheaffer.

My Work So Far: We have the likelihood as
$$L(mu_1, mu_2,sigma^2)=

To compute $Lbig(hatOmega_0big),$ we need to find the MLE for $sigma^2:$

This is the MLE for $sigma^2$ regardless of what $mu_1$ and $mu_2$ are.
Thus, under $H_0,$ we have that

and the unrestricted case is

Under $H_0,;mu_1=mu_2=mu_0,$ so that

and the likelihood ratio is given by

It follows that the rejection region, $lambdale k,$ is equivalent to


My Question: The goal is to get this expression somehow to look like
But I don’t see how I can convert my expression, with the same sign for $overline{x}$ and $overline{y},$ to the desired formula with its opposite signs. What am I missing?

Get this bounty!!!

#StackBounty: #hypothesis-testing #self-study #mathematical-statistics #statistical-power #weibull-distribution Uniformly Most Powerful…

Bounty: 50

$newcommand{szdp}[1]{!left(#1right)} newcommand{szdb}[1]{!left[#1right]}$
Problem Statement: Let $Y_1,dots,Y_n$ be a random sample from the probability
density function given by

with $m$ denoting a known constant.

  1. Find the uniformly most powerful test for testing
    $H_0:theta=theta_0$ against $H_a:theta>theta_0.$
  2. If the test in 1. is to have $theta_0=100, alpha=0.05,$ and
    $beta=0.05$ when $theta_a=400,$ find the appropriate sample size and
    critical region.

Note: This is Problem 10.80 in Mathematical Statistics with Applications, 5th. Ed., by Wackerly, Mendenhall, and Sheaffer.

My Work So Far:

  1. This is a Weibull distribution. We construct the
    likelihood function

    Now we form the inequality indicated in the Neyman-Pearson Lemma:
    frac{displaystyle szdp{frac{m}{theta_0}}^{!!n}prod_{i=1}^ny_i^{m-1}
    {displaystyle szdp{frac{m}{theta_a}}^{!!n}prod_{i=1}^ny_i^{m-1}
    frac{displaystyle theta_a^n
    {displaystyle theta_0^n

    The end result is

  2. We have to discover the distribution of $displaystyle sum_{i=1}^ny_i^m.$
    I claim that the random variable $W=Y^m$ is exponentially distributed with
    parameter $theta.$ Proof:

    which is the distribution of an exponential with parameter $theta,$ as I
    claimed. It follows, then, that $displaystylesum_{i=1}^ny_i^m$ is
    $Gamma(n,theta)$ distributed, and hence that
    $displaystylefrac{2}{theta}sum_{i=1}^ny_i^m$ is $chi^2$ distributed with
    $2n$ d.o.f. So the RR we can write as that region where
    with the $2n$ d.o.f. Let
    Then we have

    So now we solve

    So we choose $n$ so that the $chi^2$ values corresponding to the ratio given
    work out. The ratio of $theta_a/theta_0=4,$ and we choose $chi_alpha^2$ on
    the high end, and $chi_beta^2$ on the low end so that their ratio is $4,$
    by varying $n$. This happens at d.o.f. $13=2n,$ which means we must choose
    $n=7.$ For this choice of $n,$ we have the critical region as

My Question: This is one of the most complicated stats problems I’ve encountered yet in this textbook, and I just want to know if my solution is correct. I feel like I’m "out on a limb" with complex reasoning depending on complex reasoning. I’m fairly confident that part 1 is correct, but what about part 2?

Get this bounty!!!

#StackBounty: #hypothesis-testing #self-study #mathematical-statistics #binomial-distribution #likelihood-ratio Binomial Distribution: …

Bounty: 50


Problem Statement: A survey of voter sentiment was conducted in four midcity
political wards to compare the fraction of voters favoring candidate $A.$
Random samples of $200$ voters were polled in each of the four wards. The numbers of voters favoring $A$ in the four samples can be regarded as four independent binomial random variables. Construct a likelihood ratio test of the hypothesis that the
fractions of voters favoring candidate $A$ are the same in all four wards.
Use $alpha=0.05.$

Note 1: This is essentially Exercise 10.88 in Mathematical Statistics with Applications, 5th Ed., by Wackerly, Mendenhall, and Sheaffer.

Note 2: I have looked at several threads asking the same question. This thread has no viable answer. This thread has a solution done mostly in R and is not a theoretical derivation of the needed result. This thread works out exactly zero details: and as you’ll see, I’m definitely in the weeds on this one.

My Work So Far: Let $p_i$ be the proportion of voters favoring $A$ in Ward $i.$ So the
null hypothesis is that $p_1=p_2=p_3=p_4,$ while the alternative hypothesis
is that at least one proportion is different from the others. We have
$f$ as the
It follows that the likelihood function is

Then we construct $Lbig(hatOmega_0big)$ and $Lbig(hatOmegabig).$ Note
that under the null hypothesis, we will set $p_1=p_2=p_3=p_4=p.$ Hence,

The one remaining parameter $p$ we will replace with its MLE, which we can
confidently say is $big(sum y_ibig)/(4n).$ Hence
y_i}{4n}}^{!!y_i}szdp{1-frac{sum y_i}{4n}}^{!!n-y_i}}\
y_i}^{!y_i}szdp{4n-sum y_i}^{!n-y_i}}.

Next, we turn our attention to $Lbig(hatOmegabig):$

Next we form the likelihood ratio:
szdp{sum y_i}^{!y_i}szdp{4n-sum y_i}^{!n-y_i}}}
prod_{i=1}^4szdb{szdp{frac{sum y_j}{y_i}}^{!y_i},
szdp{frac{4n-sum y_j}{n-y_i}}^{!n-y_i}}\
prod_{i=1}^4szdb{szdp{frac{sum y_j}{y_i}}^{!y_i},
szdp{frac{4n-sum y_j}{n-y_i}}^{!n-y_i}}.

My Questions:

  1. This looks wrong to me, because I’m told (and it totally makes sense) that $0lelambdale 1,$ whereas everything in sight is greater than $1.$
  2. Supposing this expression can be salvaged, what are the next steps? Should I take logs and try to simplify somehow?
  3. I’m expecting to be able to obtain a test something along the lines of
    although this test doesn’t strike me as sensitive enough. We could have $y_1/n$ much too low, and $y_4/n$ much too high, and this test could still mark them down as equal because they "average out" to the right thing. What’s the right generalization to the standard difference of proportions test?

Get this bounty!!!

#StackBounty: #time-series #probability #hypothesis-testing #statistical-significance #missing-data (Sudden deafness ended?) How can I …

Bounty: 50

I think I found a major (conclusion-flipping) statistical error in a paper in an AMA journal. Did I?

If I messed up, I’d like to know how; I hope someone can point me in the direction of my errors. If I messed up, I must have made at least two major errors, as I came to the same conclusion in two independent ways. I communicated with the journal editor and corresponding author.

Here, you can find the paper and the correspondence. I reproduce it below.

To try to make this question fully self-contained, I’ll summarize the issue.

The authors calculate the background rate of sudden sensorineural hearing loss (SSNHL) per year and compare it to the rate of SSNHL over a three-week post-intervention period, and graph "Estimated incidence of SSNHL, per 100 000 per y". Their conclusion is that the data indicates the intervention does not increase the incidence of SSNHL; a substantially and significant reduction is indicated.
They state that "We then estimated the incidence of SSNHL that occurred after vaccination on an annualized basis." But this cannot be what they calculated. It is incompatible with what they report is the data their research yielded.

  1. It’s an error to limit the possible adverse side effect window to 3 weeks post-vaccination (excluding adverse events outside that window) but then spread the remaining adverse events over a year to calculate risk on an annualized basis.". It’s unjustifiable. A reasonable start for comparison would be to compare risk over the 3 weeks to the annual (52-week) risk, scaled to 3 week period. So the correct finding, based on their research, appears to be no risk difference over a 3 week period of SSNHL between groups, (0.6-4.4 vs 0.3-4.1, n.s.).
  2. Their conclusion implies that the authors have discovered that the intervention reduced SSNHL by about 94%. Which would be a groundbreaking discovery if confirmed, and a there’s no plausible mechanism presented for such a miraculous treatment effect, more evidence of grave error. It does not pass this basic plausibility test.
  3. As I finished writing this up, I found further concerns, which I’ll put in an answer. I put in chat: https://chat.stackexchange.com/rooms/18/ten-fold because preliminary.

[end summary]

Again, here, you can find the paper and the correspondence. I reproduce it immediately below.

I wrote:


I write in respect to *.

This study should be withdrawn. It’s an error to limit the possible
adverse side effect window to 3 weeks post-vaccination (excluding
adverse events outside that window) but then spread the remaining
adverse events over a year to calculate risk. It’s unjustifiable. A
reasonable start for comparison would be to compare risk over the 3
weeks to the annual (52-week) risk, scaled to 3 week period. So the
correct finding appears to be that risk in a 3 week period of SSNHL,
whether vaccinated or unvaccinated, is the same (0.6-4.4 vs 0.3-4.1,
n.s.). A closer look at adverse events within shorter periods after
vaccination would be an appropriate topic for further research.
Another way to see this error is to consider whether the original
results pass a basic plausibility test. They do not. If the results
shown in the figure accurately reflected Incidence Range / 100k of
SSNHL between the vaccinated and unvaccinated, then it would suggest
that the authors had discovered that vaccination reduced SSNHL by
about 94%. Which would be a groundbreaking discovery, and a there’s no
plausible mechanism presented for such a miraculous treatment effect,
this is more evidence of grave error.

*Formeister EJ, et. al. JAMA Otolaryngol Head Neck Surg. 2021;147(7):674–676. doi:10.1001/jamaoto.2021.0869


Assuming I have, this won’t my first time spotting a major error in a peer-reviewed publication. (I think that was in article High-fructose corn syrup causes characteristics of obesity in rats: Increase body weight, body fat and triglyceride levels (2010) in 2010. This HFCS, Bocarsly, Princeton paper was wildly popular in the lay press.)

Yet I received this non-response response (emphasis mine):


Thank you very much for your recent communication about the
paper w published in JAMA Otolaryngology ("Preliminary Analysis of
Association Between COVID-19 Vaccination and Sudden Hearing Loss Using
VAERS"). As a peer reviewed publication this manuscript was vetted in
a process that includes assessment and validation of hypotheses,
methodologies, and conclusions. Readers and scientists can have faith
in the integrity of these robust processes. We look forward to seeing
this important field expand and would encourage all interested
scientists to consider peer reviewed publication of their work in the
We would encourage a thoughtful re-read of the manuscript to
understand the methodology, and additional reading on the topics of
idiopathic sudden sensorineural hearing loss and principles of
epidemiology, for your understanding. Respectfully, Dr. Eric
Formeister, MD, MS on behalf of the authors.

If I messed up, I’d like to know ; I hope someone can point me in the direction of my error(s).

Get this bounty!!!

#StackBounty: #hypothesis-testing #sample-size #statistical-power Continuous sample size determination based on the control group

Bounty: 50

I am curious if and where the following reasoning breaks down.

Traditionally, sample size determination is done as part of the design phase. To this end, one has to have an understanding of the baseline performance, such as the mean and standard deviation of the metric in question. One might use recent historical data to obtain reasonable estimates.

Suppose we are not necessarily interested in knowing the minimal sample size ahead of time and simply launch the experiment. The control group is the baseline. Each day, we take the statistics of the control group and perform sample size determination. With each new day passed, we get a better and better understanding of when to stop.

Apart from the fact that one might launch an experiment that is doomed to fail, are there any other problems with the above logic? Of course, we are not assuming a clinical setting but rather a service of some kind.

It should be noted that this question is not about whether it is sound to check some p-value on a daily basis and stop whenever it goes below a predefined level. The inadequacy of this procedure is well understood.

Get this bounty!!!

#StackBounty: #regression #hypothesis-testing #diagnostic $H_0$ vs $H_1$ in diagnostic testing

Bounty: 50

Consider diagnostic testing of a fitted model, e.g. testing whether regression residuals are autocorrelated (a violation of an assumption) or not (no violation). I have a feeling that the null hypothesis and the alternative hypothesis in diagnostic tests often tend to be exchanged/flipped w.r.t. what we would ideally like to have.

If are interested in persuading a sceptic that there is a (nonzero) effect, we usually take the null hypothesis to be that there is no effect, and then we try to reject it. Rejecting $H_0$ at a sufficiently low significance level produces convincing evidence that $H_0$ is incorrect, and we therefore are comfortable in concluding that there is a nonzero effect. (There are of course a bunch of other assumptions which must hold, as otherwise the rejection of $H_0$ may result from a violation of one of those assumptions rather than $H_0$ actually being incorrect. And we never have 100% confidence but only, say, 95% confidence.)

Meanwhile, in diagnostic testing of a model, we typically have $H_0$ that the model is correct and $H_1$ than there is something wrong with the model. E.g. $H_0$ is that regression residuals are not autocorrelated while $H_1$ is that they are autocorrelated. However, if we want to persuade a sceptic that our model is valid, we would have $H_0$ consistent with a violation and $H_1$ consistent with validity. Thus the usual setup in diagnostic testing seems to exchange $H_0$ with $H_1$, and so we do not get to control the probability of the relevant error.

Is this a valid concern (philosophically and/or practically)? Has it been addressed and perhaps resolved?

Get this bounty!!!

#StackBounty: #hypothesis-testing #bayesian Hypothesis testing via separate inference for each group and then combining

Bounty: 50

Suppose there are two groups, A and B, and we are interested in inferring a certain parameter for each one and also the difference between the two parameters. Here we can take a Bayesian perspective and strive for a posterior distribution in each case. I am wondering if the following is a sound way of doing this:

  1. estimate the posterior for group A,
  2. estimate the posterior for group B, and
  3. estimate the posterior of the difference by sampling extensively the first two posteriors and taking the difference.

I am specifically unsure about this kind of divide-and-conquer approach where each group is treated separately, and then the results are combined. Usually, it is done in one take where, perhaps, a linear model is fitted with an indicator for the group membership.

Let me give a simple example. Say, the outcome is binary. One can then use a Bernoulli–beta model to infer the posterior of the success probability, which will be a beta distribution for each group. As the last step, one can sample the two betas and get a posterior for the difference.

Get this bounty!!!

#StackBounty: #probability #hypothesis-testing #distributions #statistical-significance #p-value How to determine the 'significance…

Bounty: 100

Problem: I have carried out a series of biological experiments where the output of the experiment is a N x N matrix of counts. I then created a custom distance metric that takes in two rows of counts and calculates the ‘difference’ between them (I will call this difference metric D). I calculated D for all pairwise comparisons and now have an array of difference metrics D called D_array.

My assumption based on biology is that the majority of D in D_array represent that there is no significant difference between the two rows of counts and only the >= 95% interval of D metrics actually represent real differences between two rows of counts. Let us assume that this is true, even if it doesn’t make sense.

So this means if D_array = [0, 1, 2, 3, 4 … 99] (100 metrics) then only a D score of 95-99 are actually representative of a real difference between two rows of counts.

Note: D_array is not representative of my data. My actual data actually has a distribution of values like this (black line represents the mean): https://imgur.com/usvvIgB

Given D_array I want to be able to determine whether a newly calculated distance value D’ is "significant" based on my previous data: the distribution of my D_array. Ideally, I would like to provide some sort of metric of ‘significance’ such as a p-value. By significance I mean the probability / significance of having gotten a result as extreme as D’.

After a bit of reading, I found that I can use bootstrapping to calculate a 95% confidence interval for D_array, and then essentially ask if D’ is outside of the 95% CI range. However, I am unsure if there is a way to determine how significant having obtained a value of D’ is based on D_array.

My questions are:

  1. Does asking if D’ is outside of the 95% CI of bootstrapped D_array in order to determine whether D’ represents a ‘real’ difference between two rows of counts make sense?

  2. Given D’ and D_array how can I determine the significance of having gotten a value as extreme as D’ as a result. I have seen bootstrapping used to calculate P-values, but this usually requires the mean of two different distributions which I do not have in this case.

  3. Is there a better way to determine whether a new observation is ‘significantly’ different from my prior distribution of ‘null’ (D_array) data. If so, how?

Get this bounty!!!