## #StackBounty: #hypothesis-testing #permutation-test #model-comparison Permutation test for model comparison?

### Bounty: 50

I have two nested models for the same data, and I want to test whether the more complex model explains significantly more variance. Due to a necessary smoothing step, my data aren’t independent, so I can’t use standard tests for this (e.g. an F-test or likelihood-ratio test). I naturally thought of using a permutation test because I can permute the data before the smoothing step. This way I would introduce the same dependencies in the permuted data that also exist in the observed data, s.t. the simulated null distribution is fair. However, I can’t quite come up with the right way to do this.

I can think of a way to test whether either model explains significant variance. In this case, my algorithm would be:

1. Fit the model to the smoothed observed data, and calculate \$R^2\$
2. For 10,000 iterations, repeat steps 3-5:
3. Randomly permute the observed data (i.e. randomly shuffle predictor and response values s.t. their true relationship is destroyed)
4. Apply smoothing to this permuted data to introduce the same dependencies that the observed \$R^2\$-value suffers from
5. Fit the model to the smoothed permuted data and calculate \$R^2\$
6. Compare the observed \$R^2\$-value to the thus-constructed null distribution of \$R^2\$-values for this data & model

The point being that in this case it’s easy to construct the null distribution, because the null hypothesis holds that there is no relationship between the predictors in the model and the outcome variable, and it is obvious how we can permute the data to emulate this. (Formally, I guess the null hypothesis is that the values of the predictors are exchangeable w.r.t. the values of the outcome variable.)

However, what I want to do is estimate a null distribution for the increase in \$R^2\$ from one model to the next. The null hypothesis here is that the added parameter in the more complex model is meaningless, i.e. that the two models are exchangeable. However, it seems to me that this is an exchangeablility hypothesis on the models, rather than on (some aspect of) the data, so I just don’t see what I would permute in a permutation test in order to simulate this. Can anyone help me out? Am I missing something, or is this just not possible and perhaps I can only do a bootstrap here?

Get this bounty!!!

## #StackBounty: #regression #hypothesis-testing #interaction #regression-coefficients #permutation-test How to do permutation test on mod…

### Bounty: 100

Given the following model as an example:

\$\$Y=beta_0+beta_Acdot A+beta_Bcdot B+beta_{AB}cdot A cdot B+epsilon\$\$

In alternative notation:

\$\$Ysim A + B + A: B\$\$

## The main question:

When permuting entries of variable \$A\$ to test its coefficient (\$beta_A\$) in a model, should an interaction term that includes it such as \$Bcdot A\$ be recomputed as well?

## Secondary question:

And what about testing the \$Bcdot A\$ interaction term coefficient (\$beta_{AB}\$)? Are its permutations computed regardless of the variables \$A\$ and \$B\$?

## A bit of context:

I want to perform a test on the coefficients of a model (it’s a canonical correlation analysis, but the question is applicable to any linear model including interactions).

I’m trying my hands with permutation tests. While it’s fairly straightforward to test the canonical correlation itself, how to do the same with the variable scores, or coefficients, is a bit unclear to me when including an interaction term.

I’ve read How to test an interaction effect with a non-parametric test (e.g. a permutation test)?, but my question is much more practical.

Get this bounty!!!

## #StackBounty: #hypothesis-testing #maximum-likelihood #likelihood-ratio #wald-estimator MLE-based hypothesis tests

### Bounty: 50

I recently encountered the three MLE-based tests Wald test, Likelihood ratio test and Lagrange Multiplier test. Although it seemed at first like the usual hypothesis testing I already know from statistics, I got in some trouble with regard to the actual application. In an attempt to fully understand all three I got my hands on some practice problems. Since all tests are built on MLE, asymptotically equivalent and the points I don’t get are similar for all three, I will just ask my question to all jointly here. Hence the post will be longer, apologies. If I should pose them separately, I will, of course, do so on request.

First, the Wald test. Take for example 100 realizations of a normally distributed random variable with mean \$mu = 0.36\$ and \$sigma^2 = 4\$. Define \$c_1(mu) = mu\$. How would you conduct a Wald test for \$H_0: c(mu) = c_1(mu) – 0.8 = 0\$ at the 5% significance level practically? Also, what I don’t understand is why one usually defines another function \$c_1(mu)\$?
What I understood here, is that one tests the restriction \$\$c(mu) = 0\$\$ and the closer the value of the test statistic as well as the value of \$c(mu)\$ is to zero the more valid the restriction. Since it is normally distributed one can set \$mu\$ as the MLE estimate for the mean. But how does it go from here exactly? I calculate \$\$ c(0.36) = 0.36 – 0.8 = -0.44\$\$

and, according to my materials,

\$\$W = -0.44^{-1}(-0.44)\$\$

But what is \$Var(-0.44)\$? And what is the underlying distribution from which I get the p-value?

Second, the likelihood ratio test. Take again a sample of 100 observations. But this time from the poisson distribution. The sample mean here is \$mu = 1.7\$. Therefore, the MLE estimate of \$lambda\$ is also \$hat{lambda} = 1.7\$. This time consider \$\$c(lambda) = lambda^2 – 3lambda + 2\$\$ How to test for \$c(lambda) = 0\$ at the 5% level? Here, I understand even less how to get along with so little information since I thought that one would need to evaluate the log-likelihood function and then decide based on the difference between the log-likelihood value of the restricted and unrestricted MLE estimate? And again, which distribution does the statistic (and thus the p-value) follow?

Finally, the Lagrange Multiplier test. I thought here as well that I would need the log-likelihood function since I have to insert the restricted estimate in its derivative, don’t I? Take the same distribution as before but with the function \$\$c(lambda) = frac{1}{lambda^2} – 0.1\$\$ What is the restricted MLE estimate that I insert in the log-likelihood derivative? Is it \$frac{1}{1.7^2} – 0.1\$? How do I go about it without having the actual sample and the log-likelihood function at my disposal?

Get this bounty!!!

## The (basic) question:

Given the following model as an example:

\$\$Y=beta_0+beta_Acdot A+beta_Bcdot B+beta_{Atimes B}cdot A cdot B+epsilon\$\$

In alternative notation:

\$\$Ysim A + B + Atimes B\$\$

When permuting entries of variable \$A\$ to test its coefficient (\$beta_A\$) in a model, should an interaction term that includes it such as \$Btimes A\$ be recomputed as well?

And what about testing the \$Btimes A\$ interaction term coefficient (\$beta_{Atimes B}\$)? Are its permutations computed regardless of the variables \$A\$ and \$B\$?

## A bit of context:

I want to perform a test on the coefficients of a model (it’s a canonical correlation analysis, but the question is applicable to any linear model including interactions).

I’m trying my hands with permutation tests. While it’s fairly straightforward to test the canonical correlation itself, how to do the same with the variable scores, or coefficients, is a bit unclear to me when including an interaction term.

I’ve read How to test an interaction effect with a non-parametric test (e.g. a permutation test)?, but my question is much more practical.

Get this bounty!!!

## #StackBounty: #hypothesis-testing #correlation #statistical-significance #experiment-design #binary-data How to determine a 'strong…

### Bounty: 50

I have a set of drivers that are binary and a concept to measure that contains natural numbers between 1-10.

I’m currently using Kruskal’s key driver analysis to determine the relative contribution of each of the drivers. It’s discussed as being more robust that Pearson’s Correlation by taking into consideration the complete set of drivers and their relative contribution.

However, is the Kruskal’s approach still valid when the drivers are binary and the concept to measure are natural numbers between 1 and 10? I thought about switching to using the point biserial correlation, however this is identical to Pearson’s R.

My question is: Where do I set the threshold between a ‘good’ driver and a ‘not so good’ driver? It’s dependent upon the size of the data and also the properties of the data. Calculating the significance using t-tests (ignoring the fact the data may not meet the necessary assumptions of the t-test (that’s bundled in with the pearsonr scipy algorithm), denotes all of them to be significant, as they usually will be because even weak drivers will have some correlation, and aren’t ‘random’. Therefore do I set the ‘strong’ drivers to have a very low p-value – something that seems kind of arbitrary. Or is there a better algorithm that can distinguish between strong and weak drivers?

Or is it that no algorithm can really determine what a strong driver is? Is it dependent upon other factors relating to the context of the data that is being analysed?

Get this bounty!!!

## #StackBounty: #hypothesis-testing #correlation #statistical-significance #experiment-design #binary-data How to determine a 'strong…

### Bounty: 50

I have a set of drivers that are binary and a concept to measure that contains natural numbers between 1-10.

I’m currently using Kruskal’s key driver analysis to determine the relative contribution of each of the drivers. It’s discussed as being more robust that Pearson’s Correlation by taking into consideration the complete set of drivers and their relative contribution.

However, is the Kruskal’s approach still valid when the drivers are binary and the concept to measure are natural numbers between 1 and 10? I thought about switching to using the point biserial correlation, however this is identical to Pearson’s R.

My question is: Where do I set the threshold between a ‘good’ driver and a ‘not so good’ driver? It’s dependent upon the size of the data and also the properties of the data. Calculating the significance using t-tests (ignoring the fact the data may not meet the necessary assumptions of the t-test (that’s bundled in with the pearsonr scipy algorithm), denotes all of them to be significant, as they usually will be because even weak drivers will have some correlation, and aren’t ‘random’. Therefore do I set the ‘strong’ drivers to have a very low p-value – something that seems kind of arbitrary. Or is there a better algorithm that can distinguish between strong and weak drivers?

Or is it that no algorithm can really determine what a strong driver is? Is it dependent upon other factors relating to the context of the data that is being analysed?

Get this bounty!!!

## #StackBounty: #hypothesis-testing #correlation #statistical-significance #experiment-design #binary-data How to determine a 'strong…

### Bounty: 50

I have a set of drivers that are binary and a concept to measure that contains natural numbers between 1-10.

I’m currently using Kruskal’s key driver analysis to determine the relative contribution of each of the drivers. It’s discussed as being more robust that Pearson’s Correlation by taking into consideration the complete set of drivers and their relative contribution.

However, is the Kruskal’s approach still valid when the drivers are binary and the concept to measure are natural numbers between 1 and 10? I thought about switching to using the point biserial correlation, however this is identical to Pearson’s R.

My question is: Where do I set the threshold between a ‘good’ driver and a ‘not so good’ driver? It’s dependent upon the size of the data and also the properties of the data. Calculating the significance using t-tests (ignoring the fact the data may not meet the necessary assumptions of the t-test (that’s bundled in with the pearsonr scipy algorithm), denotes all of them to be significant, as they usually will be because even weak drivers will have some correlation, and aren’t ‘random’. Therefore do I set the ‘strong’ drivers to have a very low p-value – something that seems kind of arbitrary. Or is there a better algorithm that can distinguish between strong and weak drivers?

Or is it that no algorithm can really determine what a strong driver is? Is it dependent upon other factors relating to the context of the data that is being analysed?

Get this bounty!!!

## #StackBounty: #hypothesis-testing #correlation #statistical-significance #experiment-design #binary-data How to determine a 'strong…

### Bounty: 50

I have a set of drivers that are binary and a concept to measure that contains natural numbers between 1-10.

I’m currently using Kruskal’s key driver analysis to determine the relative contribution of each of the drivers. It’s discussed as being more robust that Pearson’s Correlation by taking into consideration the complete set of drivers and their relative contribution.

However, is the Kruskal’s approach still valid when the drivers are binary and the concept to measure are natural numbers between 1 and 10? I thought about switching to using the point biserial correlation, however this is identical to Pearson’s R.

My question is: Where do I set the threshold between a ‘good’ driver and a ‘not so good’ driver? It’s dependent upon the size of the data and also the properties of the data. Calculating the significance using t-tests (ignoring the fact the data may not meet the necessary assumptions of the t-test (that’s bundled in with the pearsonr scipy algorithm), denotes all of them to be significant, as they usually will be because even weak drivers will have some correlation, and aren’t ‘random’. Therefore do I set the ‘strong’ drivers to have a very low p-value – something that seems kind of arbitrary. Or is there a better algorithm that can distinguish between strong and weak drivers?

Or is it that no algorithm can really determine what a strong driver is? Is it dependent upon other factors relating to the context of the data that is being analysed?

Get this bounty!!!

## #StackBounty: #hypothesis-testing #correlation #statistical-significance #experiment-design #binary-data How to determine a 'strong…

### Bounty: 50

I have a set of drivers that are binary and a concept to measure that contains natural numbers between 1-10.

I’m currently using Kruskal’s key driver analysis to determine the relative contribution of each of the drivers. It’s discussed as being more robust that Pearson’s Correlation by taking into consideration the complete set of drivers and their relative contribution.

However, is the Kruskal’s approach still valid when the drivers are binary and the concept to measure are natural numbers between 1 and 10? I thought about switching to using the point biserial correlation, however this is identical to Pearson’s R.

My question is: Where do I set the threshold between a ‘good’ driver and a ‘not so good’ driver? It’s dependent upon the size of the data and also the properties of the data. Calculating the significance using t-tests (ignoring the fact the data may not meet the necessary assumptions of the t-test (that’s bundled in with the pearsonr scipy algorithm), denotes all of them to be significant, as they usually will be because even weak drivers will have some correlation, and aren’t ‘random’. Therefore do I set the ‘strong’ drivers to have a very low p-value – something that seems kind of arbitrary. Or is there a better algorithm that can distinguish between strong and weak drivers?

Or is it that no algorithm can really determine what a strong driver is? Is it dependent upon other factors relating to the context of the data that is being analysed?

Get this bounty!!!

## #StackBounty: #hypothesis-testing #correlation #statistical-significance #experiment-design #binary-data How to determine a 'strong…

### Bounty: 50

I have a set of drivers that are binary and a concept to measure that contains natural numbers between 1-10.

I’m currently using Kruskal’s key driver analysis to determine the relative contribution of each of the drivers. It’s discussed as being more robust that Pearson’s Correlation by taking into consideration the complete set of drivers and their relative contribution.

However, is the Kruskal’s approach still valid when the drivers are binary and the concept to measure are natural numbers between 1 and 10? I thought about switching to using the point biserial correlation, however this is identical to Pearson’s R.

My question is: Where do I set the threshold between a ‘good’ driver and a ‘not so good’ driver? It’s dependent upon the size of the data and also the properties of the data. Calculating the significance using t-tests (ignoring the fact the data may not meet the necessary assumptions of the t-test (that’s bundled in with the pearsonr scipy algorithm), denotes all of them to be significant, as they usually will be because even weak drivers will have some correlation, and aren’t ‘random’. Therefore do I set the ‘strong’ drivers to have a very low p-value – something that seems kind of arbitrary. Or is there a better algorithm that can distinguish between strong and weak drivers?

Or is it that no algorithm can really determine what a strong driver is? Is it dependent upon other factors relating to the context of the data that is being analysed?

Get this bounty!!!