#StackBounty: #r #categorical-data #interaction #instrumental-variables #2sls A 2SLS when the instrumented variable has two interaction…

Bounty: 50

I am using ivreg and ivmodel in R to apply a 2SLS.

I would like to instrument one variable, namely $x_1$, present in two interaction terms. In this example $x_1$ is a factor variable. The regression is specified in this manner because the ratio between $a$ and $b$ is of importance.

$$y = ax_1 x_2 + bx_1x_3 + cx_4 + e$$

For this instrumented variable I have two instruments $z_1$ and $z_2$. For both the following causal diagram is applicable (Z only has an indirect effect on Y through X).

enter image description here

What is for this problem the correct way to instrument $x_1$?

In the data

Translated to some (fake) sample data the problem looks like:

$$happiness = a(factor:income) + b(factor:sales) + c(educ) + e$$
$$=$$
$$(y = ax_1 x_2 + bx_1x_3 + cx_4 + e)$$

Where the instrument $z_1$ is urban and $z_2$ is size. Here I however become to get confused about how to write the regression.

For the first stage:

What is my dependent variable here?

For the second stage, should I do:

$$happiness = a(urban:income) + b(urban:sales) + c(educ) + e$$
$$happiness = a(size:income) + b(size:sales) + c(educ) + e$$

Or should I just do:

$$happiness = urban(a:income+b:sales) + c(educ) + e$$
$$happiness = size
(a:income+b:sales) + c(educ) + e$$

Nevertheless, how should I specify this in R ?

library(data.table)
library(ivmodel)
library(AER)
panelID = c(1:50)   
year= c(2001:2010)
country = c("NLD", "BEL", "GER")
urban = c("A", "B", "C")
indust = c("D", "E", "F")
sizes = c(1,2,3,4,5)
n <- 2
library(data.table)
set.seed(123)
DT <- data.table(panelID = rep(sample(panelID), each = n),
                    country = rep(sample(country, length(panelID), replace = T), each = n),
                    year = c(replicate(length(panelID), sample(year, n))),
                    some_NA = sample(0:5, 6),                                             
                    Factor = sample(0:5, 6), 
                    industry = rep(sample(indust, length(panelID), replace = T), each = n),
                    urbanisation = rep(sample(urban, length(panelID), replace = T), each = n),
                    size = rep(sample(sizes, length(panelID), replace = T), each = n),
                    income = round(runif(100)/10,2),
                    Y_Outcome= round(rnorm(10,100,10),2),
                    sales= round(rnorm(10,10,10),2),
                    happiness = sample(10,10),
                    Sex = round(rnorm(10,0.75,0.3),2),
                    Age = sample(100,100),
                    educ = round(rnorm(10,0.75,0.3),2))        
DT [, uniqueID := .I]                                                         # Creates a unique ID     
DT <- as.data.frame(DT)

To make it slightly easier for someone to help who is not familiar with the packages, I have added how the structure of the two packages I use looks.

The structure of the second stage of ivreg is as follows:

second_stage <- ivreg(Happiness ~ factor:income + factor:sales + educ | urban:income + urban:sales + educ, data=DT)

The structure for ivmodel is:

second_stage<- ivmodel(Y=DT$Happiness,D=DT$factor,Z=DT[,c("urban","size")],X=DT$educ, na.action = na.omit) 

Any help with figuring out how to do this properly would be greatly appreciated!


Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.