#StackBounty: #hypothesis-testing #likelihood-ratio Why does a better null model increase the power of the LRT?

Bounty: 50

Related question: Hypothesis testing: Why is a null model that fits the data well better than one that doesn't?

I simulated a response, y, that is influenced by a covariate but not by the “treatment” (the thing I care about) as follows:

n <- 1e3
covar_effect <- 1
trt_effect <- 0

covar <- rnorm(n = n)
trt <- rnorm(n = n)
y <- rnorm(n = n, mean = covar*covar_effect + trt*trt_effect)

I computed the likelihood ratio statistic (LRS) between a null model that omits trt and an alternative model that includes trt in two ways.

  1. both the null and alternative model omit covar
  2. both the null and alternative model include covar

In both cases, the distribution of the LRS was $chi^2_1$, as you can see in this figure:
null_scenario
Here’s the gist showing how I ran this simulation in R.

Then, I repeated the process, but simulated a situation where there actually is a treatment effect:

n <- 1e3
covar_effect <- 1
trt_effect <- 0.1

covar <- rnorm(n = n)
trt <- rnorm(n = n)
y <- rnorm(n = n, mean = covar*covar_effect + trt*trt_effect)

Consistent with intuition, the LRS was greater between the null and alternative that include the covariate than between the null and alternative that omit it:
alt_scenario

If I were thinking about things from a parameter estimation perspective and using a Wald test to test whether the effect of trt is zero, I could readily understand why including the covariate in the null and alternative models would increase my power to reject the null — the standard error of the effect estimate is $frac{hat{sigma}}{V(x)}$, so anything that decreases $sigma$ will increase the precision of the estimate.

But I am not thinking about things from the perspective of a Wald test. I am thinking about a likelihood ratio. There are some likelihood relationships that are obvious to me (I hope this notation is clear):

LRSs will be positive:

  • L(y|trt) > L(y)
  • L(y|trt, covar) > L(y|covar)

These comparisons are directly relevant but, the models are nested so the inequality seems obvious:

  • L(y|covar) > L(y)
  • L(y|trt, covar) > L(y|trt)

But the increased LRS with the covariate modeled doesn’t follow directly from any of that. It relates to this inequality:

(L(y|trt,covar) – L(y|covar)) > (L(y|trt) – L(y))

which I can rearrange into:

L(y|trt, covar) + L(y) > L(y|covar) + L(y|trt)

Now why would that be true? Can anyone offer a mathematical or conceptual explanation of why it’s beneficial to model a covariate in both the null and alternative models?


Get this bounty!!!

#StackBounty: #hypothesis-testing #anova #mixed-model #lme4-nlme #degrees-of-freedom Satterthwaite degrees of freedom in lmerTest for a…

Bounty: 50

I have traditionally been running my analyses using aov and would like to switch to lmer. My questions is whether or not this is the correct procedure (see below).

Assume a data set which consist of a 3(A1 vs. A2 vs. A3) x 2(B vs. C) x 2(Y vs. Z) mixed design, with the first factor being between-subjects and the two others within-subjects. The two within-subjects factors are crossed with each other. As far as I know, the only random factor is participant’s id. I am interested in the fixed effects.

Here is a reproducible example:

library(data.table)
library(tidyr)
library(lme4)
library(lmerTest)

# -- Data in wide format --
set.seed(33)

X <- data.table( id = paste0("P", 1:12),                  # participant unique ID
                 A  = rep(c("A1", "A2", "A3"), each = 4),
                 BY = rnorm(12, 1),
                 BZ = rnorm(12, 3),
                 CY = rnorm(12, 2),
                 CZ = rnorm(12, 0) )

# -- Orthogonal contrast codes --
X[A == "A1", A1vsA2 := +1] # A1 vs. A2
X[A == "A2", A1vsA2 := -1]
X[A == "A3", A1vsA2 :=  0]

X[A == "A1", A3vsA1A2 := +1] # A3 vs. A1&A2
X[A == "A2", A3vsA1A2 := +1]
X[A == "A3", A3vsA1A2 := -2]

# -- Convert to long format --
XL <- data.table( gather(X, KEY, Y, BY, BZ, CY, CZ) )

XL[KEY %in% c("BY", "BZ"), BvsC := +1]
XL[KEY %in% c("CY", "CZ"), BvsC := -1]

XL[KEY %in% c("BY", "CY"), YvsZ := +1]
XL[KEY %in% c("BZ", "CZ"), YvsZ := -1]

# -- Test model --
aovMdl  <-  aov(Y ~ A1vsA2*A3vsA1A2*BvsC*YvsZ + Error(id/(BvsC*YvsZ)), data = XL, REML = FALSE)
lmerMdl1 <- lmer(Y ~ A1vsA2*A3vsA1A2*BvsC*YvsZ + (1|id),                data = XL, REML = FALSE)

>summary(aovMdl)

Error: id
          Df Sum Sq Mean Sq F value Pr(>F)
A1vsA2     1  1.774  1.7737   2.132  0.178
A3vsA1A2   1  0.147  0.1467   0.176  0.684
Residuals  9  7.487  0.8319               

Error: id:BvsC
              Df Sum Sq Mean Sq F value  Pr(>F)   
BvsC           1 10.005  10.005  12.670 0.00612 **
A1vsA2:BvsC    1  0.339   0.339   0.429 0.52895   
A3vsA1A2:BvsC  1  0.132   0.132   0.167 0.69206   
Residuals      9  7.107   0.790                   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: id:YvsZ
              Df Sum Sq Mean Sq F value Pr(>F)
YvsZ           1  0.061  0.0606   0.096  0.764
A1vsA2:YvsZ    1  0.793  0.7927   1.250  0.293
A3vsA1A2:YvsZ  1  0.000  0.0000   0.000  0.996
Residuals      9  5.708  0.6342               

Error: id:BvsC:YvsZ
                   Df Sum Sq Mean Sq F value   Pr(>F)    
BvsC:YvsZ           1  64.67   64.67  95.985 4.24e-06 ***
A1vsA2:BvsC:YvsZ    1   1.37    1.37   2.034    0.188    
A3vsA1A2:BvsC:YvsZ  1   0.04    0.04   0.064    0.806    
Residuals           9   6.06    0.67                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

>anova(lmerMdl1)

fixed-effect model matrix is rank deficient so dropping 4 columns / coefficients
Analysis of Variance Table of type III  with  Satterthwaite 
approximation for degrees of freedom
                   Sum Sq Mean Sq NumDF DenDF F.value   Pr(>F)    
A1vsA2              1.491   1.491     1                            
A3vsA1A2            0.123   0.123     1    12   0.235 0.6364995    
BvsC               10.005  10.005     1    36  19.079 0.0001017 ***
YvsZ                0.061   0.061     1    36   0.116 0.7358099    
A1vsA2:BvsC         0.339   0.339     1                            
A3vsA1A2:BvsC       0.132   0.132     1    36   0.252 0.6187426    
A1vsA2:YvsZ         0.793   0.793     1                            
A3vsA1A2:YvsZ       0.000   0.000     1    36   0.000 0.9956444    
BvsC:YvsZ          64.667  64.667     1    36 123.316 3.515e-13 ***
A1vsA2:BvsC:YvsZ    1.370   1.370     1                            
A3vsA1A2:BvsC:YvsZ  0.043   0.043     1    36   0.082 0.7765143    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Additional questions (assuming the procedure is correct):

  • Why do the degrees of freedom differ so widely between the two procedures?
  • Why are some values not being displayed in the lmerMdl result?
  • Do I have to care about the warnings (e.g. model matrix rank is deficient)?

UPDATE:

I figured out how to display all the values from the anova table by explicitly excluding the interaction between the two orthogonal variables (A1vsA2:A3vsA1A2):

lmerMdl2 <- lmer(Y ~ A1vsA2*A3vsA1A2*BvsC*YvsZ + (BvsC + YvsZ | id),   data = XL, REML = FALSE)

lmerMdl3 <- lmer(Y ~ A1vsA2 + A3vsA1A2 + BvsC + YvsZ +
                     A1vsA2:BvsC + A3vsA1A2:BvsC + A1vsA2:YvsZ + A3vsA1A2:YvsZ + BvsC:YvsZ +
                     A1vsA2:BvsC:YvsZ + A3vsA1A2:BvsC:YvsZ + (BvsC + YvsZ | id), data = XL, REML = FALSE)

> anova(lmerMdl2)
                   Sum Sq Mean Sq NumDF  DenDF F.value    Pr(>F)    
A1vsA2              0.894   0.894     1                             
A3vsA1A2            0.082   0.082     1 13.358   0.217  0.648891    
BvsC                6.342   6.342     1 12.043  16.869  0.001443 ** 
YvsZ                0.041   0.041     1 15.093   0.110  0.744851    
A1vsA2:BvsC         0.417   0.417     1                             
A3vsA1A2:BvsC       0.084   0.084     1 12.043   0.223  0.645364    
A1vsA2:YvsZ         0.540   0.540     1                             
A3vsA1A2:YvsZ       0.000   0.000     1 15.093   0.000  0.995795    
BvsC:YvsZ          64.667  64.667     1 24.000 171.995 1.945e-12 ***
A1vsA2:BvsC:YvsZ    1.370   1.370     1                             
A3vsA1A2:BvsC:YvsZ  0.043   0.043     1 24.000   0.114  0.738474   


> anova(lmerMdl3)
                   Sum Sq Mean Sq NumDF  DenDF F.value    Pr(>F)    
A1vsA2              0.986   0.986     1 13.358   2.623  0.128673    
A3vsA1A2            0.082   0.082     1 13.358   0.217  0.648891    
BvsC                6.342   6.342     1 12.043  16.869  0.001443 ** 
YvsZ                0.041   0.041     1 15.093   0.110  0.744851    
A1vsA2:BvsC         0.215   0.215     1 12.043   0.571  0.464417    
A3vsA1A2:BvsC       0.084   0.084     1 12.043   0.223  0.645364    
A1vsA2:YvsZ         0.540   0.540     1 15.093   1.436  0.249203    
A3vsA1A2:YvsZ       0.000   0.000     1 15.093   0.000  0.995795    
BvsC:YvsZ          64.667  64.667     1 24.000 171.995 1.945e-12 ***
A1vsA2:BvsC:YvsZ    1.370   1.370     1 24.000   3.644  0.068306 .  
A3vsA1A2:BvsC:YvsZ  0.043   0.043     1 24.000   0.114  0.738474  

I don’t understand why the lmer function returns from (1, 13) to (1, 24) dfs for the within-subject factors and (1, 12) for the between-subject whereas aov always returns (1, 11) for each factor no matter if it is within or between?


Get this bounty!!!

#StackBounty: #hypothesis-testing #statistical-significance #anova Understanding two-way repeated measures anova

Bounty: 50

I have the following data:

  • The same 25 subjects have participated in 3 tasks, once in city A and once in city B. So for each city, I have 75 datapoints – one datapoint for each task and each subject.

If I use a one-way repeated measures anova for city B to compare task performance of the subjects, there is no significant difference (p = 0.48).

Now when I use a two-way repeated measures Anova, taking data from both cities,
the main effect of city proves significant (p < 0.0001). How is this possible? Can the main effect be so low even if task is only relevant in one of the two levels of the factor city (ie task performances are only significantly different in one of the cities)?


Get this bounty!!!

#StackBounty: #hypothesis-testing #p-value #multiple-comparisons #uniform #false-discovery-rate On the generality of using empirical FD…

Bounty: 50

Given the discussion here:
Why are p-values uniformly distributed under the null hypothesis?

And, particularly the point that @whuber brought up on the composite hypothesis (link here).

I’m wondering even if the distribution of p-values is not good looking as it should, would it be fine to just empirically estimate the
distribution of p-values (under null) or test statistics and
calculate empirical FDR based on that?

So, this means that we can generalize this to all the cases where this happens?

Of course, it’s always good to find the root of the problem. And, that this test is suffering from having Type I error less than selected $alpha$. However, if one can get a good estimate of FDR by permutation on the expense of more computation, it’s always a good idea, at least to me.


Get this bounty!!!

#StackBounty: #hypothesis-testing #p-value #permutation-test #permutation How is the exact permutation test procedure carried out: iter…

Bounty: 150

I’ve tried to find an article that explains the procedure of permutation tests for the exhaustive sampling of all permutations (not the monte carlo method) and couldn’t find a resource that was specific enough to help me with the ambiguity outlined below. For example, the Wikipedia article (https://en.wikipedia.org/wiki/Resampling_(statistics)#Permutation_tests) says

enter image description here

For example, given a combined dataset (1, 2, 3) where group A has length 2 and group B has length 1 for simplicity, it is not clear to me whether “all possible ways to divide it” means {(1, 2), (3)} and {(2, 1), (3), …} or if they count “{(1, 2), (3)}” and “{(2, 1), (3)}” as the same division.

Looking at various code examples, for example, the Python, R, Julia, etc. examples on https://rosettacode.org/wiki/Permutation_test, I see that the permutation test is usually implemented as follows:


Given two samples A and B

  1. Record the test statistic (e.g., $|bar{A} – bar{B}|$)
  2. Combine samples A and B into one large sample AB
  3. for combination of length(A) from AB:
    3a) compute permutation statistic (e.g., $|bar{A’} – bar{B’}|$, where A’ is the combination from 3. and B’ are all the samples in AB that are not in A’).
    3b) record permutation statistic
  4. Compute p-value as the proportion the permutation statistic from 3a) was more extreme than the test statistic from 1. divided by the number of combinations sampled

However, shouldn’t we be sampling the permutations instead of combinations of length A? For example, as outlined below (I highlighted the difference to the previous procedure in bold):


Given two samples A and B

  1. Record the test statistic (e.g., $|bar{A} – bar{B}|$)
  2. Combine samples A and B into one large sample AB
  3. for permutation of length(AB) from AB:
    3a) compute permutation statistic (e.g., $|bar{A’} – bar{B’}|$, where A’ are the first len(A) samples in the permuted AB sequence and B’ are the remaining samples in AB

  4. Compute p-value as the proportion the permutation statistic from 3a) was more extreme than the test statistic from 1. divided by the number of permutations sampled


Or, to provide a simple numeric example, consider the following 2 samples

a = [1, 3]
b = [2]

with the observed difference:
obs = |mean(a) – mean(b)| = 2

Using the “combinations” procedure, we would be sampling the following:

(1, 2), (3) => diff 0
(1, 3), (2) => diff 2
(2, 3), (1) => diff 4

where in 2 out of 3 cases, we would observe a difference equal or more extreme than in the observed statistic (i.e., p=2/3)

Now, using permutations, we would get the following:

(1, 2), (3) => diff 0
(1, 3), (2) => diff 2
(2, 1), (3) => diff 0
(2, 3), (1) => diff 4
(3, 1), (2) => diff 2
(3, 2), (1) => diff 4

Here, we observe a difference that is equal or more extreme than the observed statistic in 4 out of 6 cases (p=4/6)

Does anyone no more about the exact procedure and has a reliable resource at hand? Thanks!


Get this bounty!!!

#StackBounty: #hypothesis-testing #generalized-linear-model #chi-squared #cochran-mantel-haenszel Contradicting p-value and confidence …

Bounty: 50

This is a part of the appendix to a past paper I have but it is just essentially R-Code.

For the Mantel-Haenszel test, the p-value is greater than 5% so I would Fail To Reject the null hypothesis that there is no partial association at any level of the third variable.

However, the 95% confidence interval [0.530 090 09, 0.998 165 1] does not contain 1. Although it almost does. It seems that I would Reject the null hypothesis.

Am I interpreting this correctly? I can see that it does not seem to be very significant at a 5% level but I wouldn’t expect the test to lead to different conclusions.
enter image description here

enter image description here


Get this bounty!!!

#StackBounty: #hypothesis-testing #permutation-test #model-comparison Permutation test for model comparison?

Bounty: 50

I have two nested models for the same data, and I want to test whether the more complex model explains significantly more variance. Due to a necessary smoothing step, my data aren’t independent, so I can’t use standard tests for this (e.g. an F-test or likelihood-ratio test). I naturally thought of using a permutation test because I can permute the data before the smoothing step. This way I would introduce the same dependencies in the permuted data that also exist in the observed data, s.t. the simulated null distribution is fair. However, I can’t quite come up with the right way to do this.

I can think of a way to test whether either model explains significant variance. In this case, my algorithm would be:

  1. Fit the model to the smoothed observed data, and calculate $R^2$
  2. For 10,000 iterations, repeat steps 3-5:
  3. Randomly permute the observed data (i.e. randomly shuffle predictor and response values s.t. their true relationship is destroyed)
  4. Apply smoothing to this permuted data to introduce the same dependencies that the observed $R^2$-value suffers from
  5. Fit the model to the smoothed permuted data and calculate $R^2$
  6. Compare the observed $R^2$-value to the thus-constructed null distribution of $R^2$-values for this data & model

The point being that in this case it’s easy to construct the null distribution, because the null hypothesis holds that there is no relationship between the predictors in the model and the outcome variable, and it is obvious how we can permute the data to emulate this. (Formally, I guess the null hypothesis is that the values of the predictors are exchangeable w.r.t. the values of the outcome variable.)

However, what I want to do is estimate a null distribution for the increase in $R^2$ from one model to the next. The null hypothesis here is that the added parameter in the more complex model is meaningless, i.e. that the two models are exchangeable. However, it seems to me that this is an exchangeablility hypothesis on the models, rather than on (some aspect of) the data, so I just don’t see what I would permute in a permutation test in order to simulate this. Can anyone help me out? Am I missing something, or is this just not possible and perhaps I can only do a bootstrap here?


Get this bounty!!!

#StackBounty: #regression #hypothesis-testing #interaction #regression-coefficients #permutation-test How to do permutation test on mod…

Bounty: 100

Given the following model as an example:

$$Y=beta_0+beta_Acdot A+beta_Bcdot B+beta_{AB}cdot A cdot B+epsilon$$

In alternative notation:

$$Ysim A + B + A: B$$

The main question:

When permuting entries of variable $A$ to test its coefficient ($beta_A$) in a model, should an interaction term that includes it such as $Bcdot A$ be recomputed as well?

Secondary question:

And what about testing the $Bcdot A$ interaction term coefficient ($beta_{AB}$)? Are its permutations computed regardless of the variables $A$ and $B$?

A bit of context:

I want to perform a test on the coefficients of a model (it’s a canonical correlation analysis, but the question is applicable to any linear model including interactions).

I’m trying my hands with permutation tests. While it’s fairly straightforward to test the canonical correlation itself, how to do the same with the variable scores, or coefficients, is a bit unclear to me when including an interaction term.

I’ve read How to test an interaction effect with a non-parametric test (e.g. a permutation test)?, but my question is much more practical.


Get this bounty!!!

#StackBounty: #hypothesis-testing #maximum-likelihood #likelihood-ratio #wald-estimator MLE-based hypothesis tests

Bounty: 50

I recently encountered the three MLE-based tests Wald test, Likelihood ratio test and Lagrange Multiplier test. Although it seemed at first like the usual hypothesis testing I already know from statistics, I got in some trouble with regard to the actual application. In an attempt to fully understand all three I got my hands on some practice problems. Since all tests are built on MLE, asymptotically equivalent and the points I don’t get are similar for all three, I will just ask my question to all jointly here. Hence the post will be longer, apologies. If I should pose them separately, I will, of course, do so on request.

First, the Wald test. Take for example 100 realizations of a normally distributed random variable with mean $mu = 0.36$ and $sigma^2 = 4$. Define $c_1(mu) = mu$. How would you conduct a Wald test for $H_0: c(mu) = c_1(mu) – 0.8 = 0$ at the 5% significance level practically? Also, what I don’t understand is why one usually defines another function $c_1(mu)$?
What I understood here, is that one tests the restriction $$c(mu) = 0$$ and the closer the value of the test statistic as well as the value of $c(mu)$ is to zero the more valid the restriction. Since it is normally distributed one can set $mu$ as the MLE estimate for the mean. But how does it go from here exactly? I calculate $$ c(0.36) = 0.36 – 0.8 = -0.44$$

and, according to my materials,

$$W = -0.44^{-1}(-0.44)$$

But what is $Var(-0.44)$? And what is the underlying distribution from which I get the p-value?

Second, the likelihood ratio test. Take again a sample of 100 observations. But this time from the poisson distribution. The sample mean here is $mu = 1.7$. Therefore, the MLE estimate of $lambda$ is also $hat{lambda} = 1.7$. This time consider $$c(lambda) = lambda^2 – 3lambda + 2$$ How to test for $c(lambda) = 0$ at the 5% level? Here, I understand even less how to get along with so little information since I thought that one would need to evaluate the log-likelihood function and then decide based on the difference between the log-likelihood value of the restricted and unrestricted MLE estimate? And again, which distribution does the statistic (and thus the p-value) follow?

Finally, the Lagrange Multiplier test. I thought here as well that I would need the log-likelihood function since I have to insert the restricted estimate in its derivative, don’t I? Take the same distribution as before but with the function $$c(lambda) = frac{1}{lambda^2} – 0.1$$ What is the restricted MLE estimate that I insert in the log-likelihood derivative? Is it $frac{1}{1.7^2} – 0.1$? How do I go about it without having the actual sample and the log-likelihood function at my disposal?


Get this bounty!!!

#StackBounty: #hypothesis-testing #interaction #regression-coefficients #permutation-test How to do permutation test on the coefficient…

Bounty: 100

The (basic) question:

Given the following model as an example:

$$Y=beta_0+beta_Acdot A+beta_Bcdot B+beta_{Atimes B}cdot A cdot B+epsilon$$

In alternative notation:

$$Ysim A + B + Atimes B$$

When permuting entries of variable $A$ to test its coefficient ($beta_A$) in a model, should an interaction term that includes it such as $Btimes A$ be recomputed as well?

And what about testing the $Btimes A$ interaction term coefficient ($beta_{Atimes B}$)? Are its permutations computed regardless of the variables $A$ and $B$?

A bit of context:

I want to perform a test on the coefficients of a model (it’s a canonical correlation analysis, but the question is applicable to any linear model including interactions).

I’m trying my hands with permutation tests. While it’s fairly straightforward to test the canonical correlation itself, how to do the same with the variable scores, or coefficients, is a bit unclear to me when including an interaction term.

I’ve read How to test an interaction effect with a non-parametric test (e.g. a permutation test)?, but my question is much more practical.


Get this bounty!!!