## #StackBounty: #repeated-measures #glmm #effect-size Calculating effect size for glmm with repeated measures, what is the sample size?

### Bounty: 50

I am new to effect sizes and trying to calculate it for a repeated measure GLMM that looks like this:

variable ~ treatment * sampling occasion * year + (1|subject)

The variable is continuous, the treatment is categorical (two independent groups), the sampling occasion is discrete, the year is discrete, and subject is used as a random effect. The variable is measured at each sampling occasion on the same subject, hence the repeated measures.

Following Nakagawa & Cuthill 2007, I believe I should use the equation from Rosenthal (1994):

$$d = t_{unpaired} + sqrt{dfrac{(n_1+n_2)}{n_1n_2}}$$

In this case, should I use the total n for each repeated measure or the real number of subjects in my experiment?

For example if I have 10 subjects in my first group, and 3 sampling occasion, is my n1 value 30 or 10?

Nakagawa, S., & Cuthill, I. C. (2007). Effect size, confidence
interval and statistical significance: a practical guide for
biologists. Biological reviews, 82(4), 591-605.

Get this bounty!!!

## #StackBounty: #variance #effect-size #threshold #subset Is there a name for the increase in variance upon remeasurement after subsettin…

### Bounty: 50

Context: My problem relates to estimating effect sizes, such as Cohen’s d, when looking at a subset of the population defined by a cut-off threshold. This effect size is the difference in two population means divided by the (assumed equal) population standard deviation.

Suppose there is a sample from a population with a variable $$Y$$ with "true" values $$Y_{i0}$$ that will be measured with error at two time points, $$t_1$$ and $$t_2$$, giving measurements $$Y_{i1} = Y_{i0} + epsilon_{i1}$$, $$Y_{i2} = Y_{i0} + epsilon_{i2}$$. At time $$t_1$$ we define a subset $$J$$ of the population by "$$i in J$$ if $$Y_{i1} > a$$" for some fixed $$a$$. The objective is to estimate the variance of the subset at $$t_2$$, $$V[Y_{2j}|j in J]$$ (or equivalently, the variance of $$Y$$ in the subset measured at any time other than $$t_1$$). We cannot use the subset’s estimated variance at $$t_1$$ because the variance at $$t_2$$ will be larger.

Example code showing that the standard deviation of the subset at $$t_2$$ is greater than the standard deviation at $$t_1$$.

``````set.seed(1)
N <- 1000
Y0 <- rnorm(N,mean=0,sd=1)
Y1 <- Y0 + rnorm(N,mean=0,sd=0.5)
Y2 <- Y0 + rnorm(N,mean=0,sd=0.5)
indx <- Y1 > 1
sd(Y1[indx])
#  0.6007802
sd(Y2[indx])
#  0.8145581
``````

Does this phenomenon, the variance of a thresholded subset increasing upon re-measurement, have a name? Can anyone share any references to help understand it either generally or in the specific context of effect sizes?

Get this bounty!!!

## #StackBounty: #mixed-model #multilevel-analysis #meta-analysis #effect-size #meta-regression What’s the right multilevel model to addre…

### Bounty: 50

I have a sample of about 4,000 $$r$$ (that is, Pearson correlation), $$t-$$, or $$F-$$ tests reported in psychology journals. These tests have been drawn randomly from a larger dataset with about 500,000 statistical tests extracted from ~33,000 articles from 132 psychology journals.

For each statistical test I have the following data:
– Test statistic value
– Category of test statistic ($$t$$, $$F$$, or $$r$$)
– Degrees of freedom (both degrees of freedom in the case of F-test)
– Estimate of standard error (since tests with missing degrees of freedom were excluded, I can generate this from the test statistic and degrees of freedom)
– Reported p-value
– Whether the reported p-value is consistent with the test statistic and degrees of freedom (with inconsistency likely indicating a reporting error)
– Year of publication (range from 1980-2019, though weighted towards more recent articles)
– Journal of publication
– Pre-registration status (either the article underwent some sort of pre-registration, or not)
– Classification of the statistical test as either “central” or “peripheral”

That last point relates to a classification of whether the statistic test was central to the main aims of the article, or whether it was instead peripheral to those aims (e.g. a statistical test done in the course of assumption-checking). These judgments were made by human raters, who have been shown to have good reliability/validity in relation to this task.

All the test statistics are converted to Fisher Z-transformed correlation coefficients, so that they may be compared.

There are two main research questions of interest.

1. Are “central” effect sizes declining over time? From a prior analysis (in which no distinction could be made between central and peripheral tests) we already suspect that overall test statistics are slightly declining over time.
2. Are statistical reporting errors more common in central tests, or peripheral tests?

I’d originally planned to address these questions using two multilevel models,

1. A multilevel regression in which tests are nested inside journals, and the outcome variable is the test effect size (Fisher Z-transformed correlation coefficient). Predictors would be statistic type ($$F$$, $$t$$, $$r$$), focal/peripheral status, and year of publication.

2. A multilevel logistic regression in which tests are nested inside journals, and the outcome variable is the probability the test contains a reporting error. Predictors would be statistic type ($$F$$, $$t$$, $$r$$) and focal/peripheral status.

It’s been suggested to me that I should instead be doing “a multilevel meta-regression”. This is not a concept I was previously familiar with, but looking at the Cochrane handbook I read that

Meta-regressions usually differ from simple regressions in two ways.
First, larger studies have more influence on the relationship than
smaller studies, since studies are weighted by the precision of their
respective effect estimate. Second, it is wise to allow for the
residual heterogeneity among intervention effects not modelled by the
explanatory variables. This gives rise to the term ‘random-effects
meta-regression’, since the extra variability is incorporated in the
same way as in a random-effects meta-analysis.

It wasn’t obvious to me how either of those things would be relevant in my context.

Regarding the first research question (effect sizes over time), I get that weighting large N studies higher makes sense if the meta-analysis is interested in the size of the underlying effects being studied by psychologists. However, if the meta-analysis is only interested in assessing the effect sizes psychologists report over time I don’t see why large $$N$$ studies should be weighted higher.

Regarding the second research question (statistical reporting errors), I don’t see why large $$N$$ studies should be weighted higher.

What analysis should I be doing?

Get this bounty!!!

## #StackBounty: #anova #t-test #spss #effect-size #post-hoc What effect size should I report on post hoc analyses that compared estimated…

### Bounty: 50

I am using SPSS, and have run an ANOVA model including three factors (two within and one between subjects).

One of these factors has three levels and so I conduct post hoc tests comparing different levels of this factor. SPSS does this by comparing estimated marginal means. It produces a t-statistic and p-value. But I’m not sure what sort of effect size I can report for this analysis. Any ideas?

Get this bounty!!!

## #StackBounty: #hypothesis-testing #effect-size #power-analysis #intuition #methodology What's the difference between using a compos…

### Bounty: 50

Probably a weird place to quote, but I just stumbled across this discussion on Reddit, and one of them said (with typos fixed):

The effect size in a power calculation has no relationship whatsoever to the minimal effect size under a non-point null. These are two completely unrelated. If you consider an effect size in your power calculation against a point null, you’re still testing a point null.

(……) Of course you can have a null of zero in a power simulation. You usually do. You compute power at the alternative, but the test statistic is computed under the assumption of the null. You shouldn’t be running around changing your null.

(……) You can use composite nulls, but this is in general very uncommon. Using a composite null is also different than running a power calculation for minimal effect size.

As indicated, the most obvious difference between (A) using a composite null and (B) running a power calculation for minimal effect size is that the sample distributions of statistics are calculated under the assumption of their respective null hypothesis. However, I’m not sure if it’s the only difference. More importantly, what’s their implications, and when should we use (A), (B), or both?

To put it more concretely, given that I specify the expected minimum effect size as “the actual difference in means is at least $$0.8sigma$$“, what the difference among the following configurations?

begin{align} H_0&: mu = 0 & H_1&: mu ne 0 tag 1\ H_0&: -0.8sigma le mu le 0.8sigma & H_1&: mu ne 0 tag 2\ H_0&: mu = 0 & H_1&: mu < -0.8sigma lor mu > 0.8sigma tag 3 end{align}

By the way, I’m not familiar with the terms “power analysis” and “effect size”, so please don’t assume me a strong background. Believe it or not, my professor didn’t even mention them when teaching us hypothesis testing. (Yeah, as a Statistics major, this really sucks.)

Get this bounty!!!

## #StackBounty: #hypothesis-testing #effect-size #power-analysis #intuition #methodology What's the difference between using a compos…

### Bounty: 50

Probably a weird place to quote, but I just stumbled across this discussion on Reddit, and one of them said (with typos fixed):

The effect size in a power calculation has no relationship whatsoever to the minimal effect size under a non-point null. These are two completely unrelated. If you consider an effect size in your power calculation against a point null, you’re still testing a point null.

(……) Of course you can have a null of zero in a power simulation. You usually do. You compute power at the alternative, but the test statistic is computed under the assumption of the null. You shouldn’t be running around changing your null.

(……) You can use composite nulls, but this is in general very uncommon. Using a composite null is also different than running a power calculation for minimal effect size.

As indicated, the most obvious difference between (A) using a composite null and (B) running a power calculation for minimal effect size is that the sample distributions of statistics are calculated under the assumption of their respective null hypothesis. However, I’m not sure if it’s the only difference. More importantly, what’s their implications, and when should we use (A), (B), or both?

To put it more concretely, given that I specify the expected minimum effect size as “the actual difference in means is at least $$0.8sigma$$“, what the difference among the following configurations?

begin{align} H_0&: mu = 0 & H_1&: mu ne 0 tag 1\ H_0&: -0.8sigma le mu le 0.8sigma & H_1&: mu ne 0 tag 2\ H_0&: mu = 0 & H_1&: mu < -0.8sigma lor mu > 0.8sigma tag 3 end{align}

By the way, I’m not familiar with the terms “power analysis” and “effect size”, so please don’t assume me a strong background. Believe it or not, my professor didn’t even mention them when teaching us hypothesis testing. (Yeah, as a Statistics major, this really sucks.)

Get this bounty!!!

## #StackBounty: #hypothesis-testing #effect-size #power-analysis #intuition #methodology What's the difference between using a compos…

### Bounty: 50

Probably a weird place to quote, but I just stumbled across this discussion on Reddit, and one of them said (with typos fixed):

The effect size in a power calculation has no relationship whatsoever to the minimal effect size under a non-point null. These are two completely unrelated. If you consider an effect size in your power calculation against a point null, you’re still testing a point null.

(……) Of course you can have a null of zero in a power simulation. You usually do. You compute power at the alternative, but the test statistic is computed under the assumption of the null. You shouldn’t be running around changing your null.

(……) You can use composite nulls, but this is in general very uncommon. Using a composite null is also different than running a power calculation for minimal effect size.

As indicated, the most obvious difference between (A) using a composite null and (B) running a power calculation for minimal effect size is that the sample distributions of statistics are calculated under the assumption of their respective null hypothesis. However, I’m not sure if it’s the only difference. More importantly, what’s their implications, and when should we use (A), (B), or both?

To put it more concretely, given that I specify the expected minimum effect size as “the actual difference in means is at least $$0.8sigma$$“, what the difference among the following configurations?

begin{align} H_0&: mu = 0 & H_1&: mu ne 0 tag 1\ H_0&: -0.8sigma le mu le 0.8sigma & H_1&: mu ne 0 tag 2\ H_0&: mu = 0 & H_1&: mu < -0.8sigma lor mu > 0.8sigma tag 3 end{align}

By the way, I’m not familiar with the terms “power analysis” and “effect size”, so please don’t assume me a strong background. Believe it or not, my professor didn’t even mention them when teaching us hypothesis testing. (Yeah, as a Statistics major, this really sucks.)

Get this bounty!!!

## #StackBounty: #hypothesis-testing #effect-size #power-analysis #intuition #methodology What's the difference between using a compos…

### Bounty: 50

Probably a weird place to quote, but I just stumbled across this discussion on Reddit, and one of them said (with typos fixed):

The effect size in a power calculation has no relationship whatsoever to the minimal effect size under a non-point null. These are two completely unrelated. If you consider an effect size in your power calculation against a point null, you’re still testing a point null.

(……) Of course you can have a null of zero in a power simulation. You usually do. You compute power at the alternative, but the test statistic is computed under the assumption of the null. You shouldn’t be running around changing your null.

(……) You can use composite nulls, but this is in general very uncommon. Using a composite null is also different than running a power calculation for minimal effect size.

As indicated, the most obvious difference between (A) using a composite null and (B) running a power calculation for minimal effect size is that the sample distributions of statistics are calculated under the assumption of their respective null hypothesis. However, I’m not sure if it’s the only difference. More importantly, what’s their implications, and when should we use (A), (B), or both?

To put it more concretely, given that I specify the expected minimum effect size as “the actual difference in means is at least $$0.8sigma$$“, what the difference among the following configurations?

begin{align} H_0&: mu = 0 & H_1&: mu ne 0 tag 1\ H_0&: -0.8sigma le mu le 0.8sigma & H_1&: mu ne 0 tag 2\ H_0&: mu = 0 & H_1&: mu < -0.8sigma lor mu > 0.8sigma tag 3 end{align}

By the way, I’m not familiar with the terms “power analysis” and “effect size”, so please don’t assume me a strong background. Believe it or not, my professor didn’t even mention them when teaching us hypothesis testing. (Yeah, as a Statistics major, this really sucks.)

Get this bounty!!!

## #StackBounty: #hypothesis-testing #effect-size #power-analysis #intuition #methodology What's the difference between using a compos…

### Bounty: 50

Probably a weird place to quote, but I just stumbled across this discussion on Reddit, and one of them said (with typos fixed):

The effect size in a power calculation has no relationship whatsoever to the minimal effect size under a non-point null. These are two completely unrelated. If you consider an effect size in your power calculation against a point null, you’re still testing a point null.

(……) Of course you can have a null of zero in a power simulation. You usually do. You compute power at the alternative, but the test statistic is computed under the assumption of the null. You shouldn’t be running around changing your null.

(……) You can use composite nulls, but this is in general very uncommon. Using a composite null is also different than running a power calculation for minimal effect size.

As indicated, the most obvious difference between (A) using a composite null and (B) running a power calculation for minimal effect size is that the sample distributions of statistics are calculated under the assumption of their respective null hypothesis. However, I’m not sure if it’s the only difference. More importantly, what’s their implications, and when should we use (A), (B), or both?

To put it more concretely, given that I specify the expected minimum effect size as “the actual difference in means is at least $$0.8sigma$$“, what the difference among the following configurations?

begin{align} H_0&: mu = 0 & H_1&: mu ne 0 tag 1\ H_0&: -0.8sigma le mu le 0.8sigma & H_1&: mu ne 0 tag 2\ H_0&: mu = 0 & H_1&: mu < -0.8sigma lor mu > 0.8sigma tag 3 end{align}

By the way, I’m not familiar with the terms “power analysis” and “effect size”, so please don’t assume me a strong background. Believe it or not, my professor didn’t even mention them when teaching us hypothesis testing. (Yeah, as a Statistics major, this really sucks.)

Get this bounty!!!

## #StackBounty: #hypothesis-testing #effect-size #power-analysis #intuition #methodology What's the difference between using a compos…

### Bounty: 50

Probably a weird place to quote, but I just stumbled across this discussion on Reddit, and one of them said (with typos fixed):

The effect size in a power calculation has no relationship whatsoever to the minimal effect size under a non-point null. These are two completely unrelated. If you consider an effect size in your power calculation against a point null, you’re still testing a point null.

(……) Of course you can have a null of zero in a power simulation. You usually do. You compute power at the alternative, but the test statistic is computed under the assumption of the null. You shouldn’t be running around changing your null.

(……) You can use composite nulls, but this is in general very uncommon. Using a composite null is also different than running a power calculation for minimal effect size.

As indicated, the most obvious difference between (A) using a composite null and (B) running a power calculation for minimal effect size is that the sample distributions of statistics are calculated under the assumption of their respective null hypothesis. However, I’m not sure if it’s the only difference. More importantly, what’s their implications, and when should we use (A), (B), or both?

To put it more concretely, given that I specify the expected minimum effect size as “the actual difference in means is at least $$0.8sigma$$“, what the difference among the following configurations?

begin{align} H_0&: mu = 0 & H_1&: mu ne 0 tag 1\ H_0&: -0.8sigma le mu le 0.8sigma & H_1&: mu ne 0 tag 2\ H_0&: mu = 0 & H_1&: mu < -0.8sigma lor mu > 0.8sigma tag 3 end{align}

By the way, I’m not familiar with the terms “power analysis” and “effect size”, so please don’t assume me a strong background. Believe it or not, my professor didn’t even mention them when teaching us hypothesis testing. (Yeah, as a Statistics major, this really sucks.)

Get this bounty!!!