#StackBounty: #interaction #mediation Why do mediation and moderation analysis give substantively different results?

Bounty: 50

I have a study where I pair two people up and have them play a behavioral game together. I measure some change score before and after the game. I create a four-level factor variable called treatment that has the following values: MM, MF, FM, FF which describe your biological sex, and the biological sex of your partner.

Let’s say I also have some index moderator_idx where I believe, for people in the MF and FM category, high levels of moderator_idx will be associated with high levels of the outcome. So I model the following:

> m1 <- lm_robust(outcome ~ treatment / moderator_idx -1,
              cluster = team_id,
              se = "stata",
              data = data_full)

                             Estimate  Std. Error   t value    Pr(>|t|)      CI Lower     CI Upper  DF
treatmentFF                 9.6698108  4.31703853  2.239918 0.025776158   1.176747161 18.162874409 323
treatmentFM                -2.4975363  6.76639155 -0.369109 0.712288286 -15.809299377 10.814226686 323
treatmentMF                -6.5241575  5.42255272 -1.203152 0.229798459 -17.192138536  4.143823553 323
treatmentMM               -20.0332461 13.66149199 -1.466403 0.143511539 -46.909985777  6.843493496 323
treatmentFF:moderator_idx  -0.1041088  0.05135792 -2.027123 0.043470833  -0.205147088 -0.003070541 323
treatmentFM:moderator_idx   0.1580438  0.07965394  1.984131 0.048087366   0.001337787  0.314749868 323
treatmentMF:moderator_idx   0.1845383  0.06667904  2.767562 0.005973407   0.053358306  0.315718388 323
treatmentMM:moderator_idx   0.2405057  0.15315809  1.570310 0.117322055  -0.060807677  0.541819060 323

Examining only the interaction effects (e.g. treatmentFM:moderator_idx), I get a result that I suspected: for people in different-sex conditions (FM, MF), the effect of an increase in the moderator is associated with significant increases in the outcome of interest.

However, I can re-cast this analysis as a mediation analysis like so (where different_sex is a dummy variable set to 1 if you’re paired with someone in the opposite sex, and sex is your own biological sex). Note that this is equivalent to a four-level treatment factor above…

med.fit <- lm(moderator_idx ~ different_sex * sex, data = data_full)
out.fit <- lm(outgroup_feelings_diff ~ sex * different_sex * moderator_idx, data = data_full)
med.out <- mediation::mediate(med.fit, out.fit, treat = "different_sex", mediator = "moderator_idx", robustSE = TRUE, sims = 1000)

Quasi-Bayesian Confidence Intervals

                         Estimate 95% CI Lower 95% CI Upper p-value    
ACME (control)            -0.0484      -0.5168         0.37    0.84    
ACME (treated)            -0.6666      -1.7546         0.20    0.14    
ADE (control)             11.2765       7.7503        14.58  <2e-16 ***
ADE (treated)             10.6583       7.2103        13.88  <2e-16 ***
Total Effect              10.6099       7.2444        13.87  <2e-16 ***
Prop. Mediated (control)  -0.0032      -0.0525         0.04    0.84    
Prop. Mediated (treated)  -0.0620      -0.1759         0.02    0.14    
ACME (average)            -0.3575      -0.9851         0.13    0.15    
ADE (average)             10.9674       7.5714        14.19  <2e-16 ***
Prop. Mediated (average)  -0.0326      -0.1000         0.01    0.15    
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

And I get that ACME of the treated is non-significant. My question is: How can I get in the first analysis large and significant effects of moderation, but non-significant effects of mediation. What are the substantive differences between the two results, and which should I trust?

Get this bounty!!!

#StackBounty: #interaction #panel-data #stationarity #unit-root what should I do about a non-stationary variable in a panel-data intera…

Bounty: 50

We have panel data on immigration stocks, immigration flows, and immigration policy for 30 countries and 10-30 years. We would like to test the theory that the effect of immigration flows (i.e., annual numbers of incoming immigrants as % of pop) on immigration policy depends on immigrant stocks (i.e., non-citizens as % of pop). In other words, immigration flows affect policy, but only when there are few existing immigrants to begin with.

It seems to me that an interaction between immigration stocks and flows will allow a test of this theory. However, while our dependent variable (immigration policy) and our main independent variable (immigration flows) appear to be stationary, immigrant stocks is not. Standard solutions like first-differencing immigrant stocks won’t help because that would transform stocks into another measure of annual flows, which will not allow us to test the theory.

Another way of putting this is to ask: does stationarity matter only for the dependent variable? Or also for all independent variables?

Advice on how to proceed will be greatly appreciated!

Get this bounty!!!

#StackBounty: #r #categorical-data #interaction #instrumental-variables #2sls A 2SLS when the instrumented variable has two interaction…

Bounty: 50

I am using ivreg and ivmodel in R to apply a 2SLS.

I would like to instrument one variable, namely $x_1$, present in two interaction terms. In this example $x_1$ is a factor variable. The regression is specified in this manner because the ratio between $a$ and $b$ is of importance.

$$y = ax_1 x_2 + bx_1x_3 + cx_4 + e$$

For this instrumented variable I have two instruments $z_1$ and $z_2$. For both the following causal diagram is applicable (Z only has an indirect effect on Y through X).

enter image description here

What is for this problem the correct way to instrument $x_1$?

In the data

Translated to some (fake) sample data the problem looks like:

$$happiness = a(factor:income) + b(factor:sales) + c(educ) + e$$
$$(y = ax_1 x_2 + bx_1x_3 + cx_4 + e)$$

Where the instrument $z_1$ is urban and $z_2$ is size. Here I however become to get confused about how to write the regression.

For the first stage:

What is my dependent variable here?

For the second stage, should I do:

$$happiness = a(urban:income) + b(urban:sales) + c(educ) + e$$
$$happiness = a(size:income) + b(size:sales) + c(educ) + e$$

Or should I just do:

$$happiness = urban(a:income+b:sales) + c(educ) + e$$
$$happiness = size
(a:income+b:sales) + c(educ) + e$$

Nevertheless, how should I specify this in R ?

panelID = c(1:50)   
year= c(2001:2010)
country = c("NLD", "BEL", "GER")
urban = c("A", "B", "C")
indust = c("D", "E", "F")
sizes = c(1,2,3,4,5)
n <- 2
DT <- data.table(panelID = rep(sample(panelID), each = n),
                    country = rep(sample(country, length(panelID), replace = T), each = n),
                    year = c(replicate(length(panelID), sample(year, n))),
                    some_NA = sample(0:5, 6),                                             
                    Factor = sample(0:5, 6), 
                    industry = rep(sample(indust, length(panelID), replace = T), each = n),
                    urbanisation = rep(sample(urban, length(panelID), replace = T), each = n),
                    size = rep(sample(sizes, length(panelID), replace = T), each = n),
                    income = round(runif(100)/10,2),
                    Y_Outcome= round(rnorm(10,100,10),2),
                    sales= round(rnorm(10,10,10),2),
                    happiness = sample(10,10),
                    Sex = round(rnorm(10,0.75,0.3),2),
                    Age = sample(100,100),
                    educ = round(rnorm(10,0.75,0.3),2))        
DT [, uniqueID := .I]                                                         # Creates a unique ID     
DT <- as.data.frame(DT)

To make it slightly easier for someone to help who is not familiar with the packages, I have added how the structure of the two packages I use looks.

The structure of the second stage of ivreg is as follows:

second_stage <- ivreg(Happiness ~ factor:income + factor:sales + educ | urban:income + urban:sales + educ, data=DT)

The structure for ivmodel is:

second_stage<- ivmodel(Y=DT$Happiness,D=DT$factor,Z=DT[,c("urban","size")],X=DT$educ, na.action = na.omit) 

Any help with figuring out how to do this properly would be greatly appreciated!

Get this bounty!!!

#StackBounty: #interaction Interpretation of interaction coefficient in PPML with fixed effects

Bounty: 50

I am estimating a gravity model aiming at evaluating how environmental policies can affect trade patterns. I am using a ppml model using the ppmlhdfe command in Stata. 

As identification strategy, I am using dyadic fixed effects. The model that I am estimating is the following:

$ex_{ij,t} = exp[alpha_{i,t}+ alpha_{j,t}+ alpha_{ij}+ boldsymbol{beta_1 D_{eu,j} times Policy_{i,t}}] times varepsilon_{ij,t}$

Where export between countries i and j ($ex_{ij,t}$) is a function of exporter-year ($alpha_{i,t}$), importer-year ($alpha_{j,t}$), and dyadic fixed effects ($alpha_{ij}$). My main variable of interest is an interaction between $D_{eu,j}$ that is a dummy variable that indicating whether the importer country $j$ is part of the EU(1=EU, 0= otherwise). $Policy_{i,t}$ is the log of a continuous variable that indicates stringency in environmental policies in the exporter country i. My aim is to assess whether having stringent environmental policies favours exports towards the EU affecting trade patterns. The interaction is identified because it varies for every dyad-year. The problem is how to interpret the coefficient of the interaction ($beta_1$).
The issue is that because of collinearity with the fixed effects I cannot estimate the individual coefficients for $D_{eu,j}$ and $Policy_{i,t}$. Hence I cannot do plots not I can say what is the reference level.

For instance, what would a significant coefficient of 0.4 mean?
Would it be correct to say that to a 1% increase in the policy score, exports towards the EU increase by 49% ($[e^{0.4}-1]*100=49%$) relative to exports to non-EU members?
Is there any way to estimate the significance level of these coefficients? I am not sure how I could estimate margin plots.

If anyone can help to understand the coefficient it would be very much appreciated.

Get this bounty!!!

#StackBounty: #r #anova #repeated-measures #interaction #power-analysis R: Power analysis for a 2 by 2 within-within ANOVA interaction …

Bounty: 50


I would like to perform a power analysis for the interaction effect in a 2 by 2 within-within ANOVA design. I would like to crosscheck that I am performing this correctly using the pwr::pwr.f2.test() function in R.

Required inputs

The pwr::pwr.f2.test() function requires the following parameters:

  • u: degrees of freedom for numerator

  • v: degrees of freedom for denominator

  • f2: the effect size Cohen’s $f^2$

  • sig.level: $alpha$ level

  • power: the desired power ($1 – beta$)

We will omit v so the function will estimate this value.

Inputs for a $2 times 2$ within-within design

  1. u

The numerator degrees of freedom for an interaction effect in this design is given as:

$$u = A times B = (a – 1)(b – 1)$$

Where $a$ is the number of levels of Factor $A$, and $b$ is the number of levels of Factor $B$. Therefore, u is:

$$u = (2-1) times (2-1) = 1$$

  1. f2

Let’s assume I have estimated that the interaction effect should be $eta^2_p$ = 0.2. I can convert this to $f^2$ using the following formula:

$$f^2 = frac{eta^2_p}{1 – eta^2_p}$$

Therefore, f2 is:

$$f^2 = frac{0.2}{1-0.2} = 0.25$$

  1. sig.level

I will set $alpha = .05$

  1. power

I will set $power = 0.80$

Perform the analysis

We use the above inputs in the following power analysis:

pwr::pwr.f2.test(u = 1, f2 = 0.25, sig.level = 0.05, power = 0.8)

 Multiple regression power calculation 

          u = 1
          v = 31.42944
         f2 = 0.25
  sig.level = 0.05
      power = 0.8

The required denominator degrees of freedom to detect our effect with 80% power is 31.42. To convert v to the total $N$ required we will do some simple algebra.

The denominator degrees of freedom for a $2 times 2$ interaction effect is given as:

$$v = A times B times S = (a – 1)(b – 1)(N – 1)$$

Where $N$ is the total sample size. In our study this is:

$$v = (2 – 1)(2 – 1)(N – 1) = N – 1$$

$$N = v + 1$$

This means we simply add 1 to v to estimate the required sample size, and round up:

ceiling(31.42944 + 1)
[1] 33

The required sample size is therefore $N = 33$.


Have I performed these steps correctly?

Get this bounty!!!