I am using
R to apply a 2SLS.
I would like to instrument one variable, namely $x_1$, present in two interaction terms. In this example $x_1$ is a factor variable. The regression is specified in this manner because the ratio between $a$ and $b$ is of importance.
$$y = ax_1 x_2 + bx_1x_3 + cx_4 + e$$
For this instrumented variable I have two instruments $z_1$ and $z_2$. For both the following causal diagram is applicable (Z only has an indirect effect on Y through X).
What is for this problem the correct way to instrument $x_1$?
In the data
Translated to some (fake) sample data the problem looks like:
$$happiness = a(factor:income) + b(factor:sales) + c(educ) + e$$
$$(y = ax_1 x_2 + bx_1x_3 + cx_4 + e)$$
Where the instrument $z_1$ is
urban and $z_2$ is
size. Here I however become to get confused about how to write the regression.
For the first stage:
What is my dependent variable here?
For the second stage, should I do:
$$happiness = a(urban:income) + b(urban:sales) + c(educ) + e$$
$$happiness = a(size:income) + b(size:sales) + c(educ) + e$$
Or should I just do:
$$happiness = urban(a:income+b:sales) + c(educ) + e$$
$$happiness = size(a:income+b:sales) + c(educ) + e$$
Nevertheless, how should I specify this in
panelID = c(1:50)
country = c("NLD", "BEL", "GER")
urban = c("A", "B", "C")
indust = c("D", "E", "F")
sizes = c(1,2,3,4,5)
n <- 2
DT <- data.table(panelID = rep(sample(panelID), each = n),
country = rep(sample(country, length(panelID), replace = T), each = n),
year = c(replicate(length(panelID), sample(year, n))),
some_NA = sample(0:5, 6),
Factor = sample(0:5, 6),
industry = rep(sample(indust, length(panelID), replace = T), each = n),
urbanisation = rep(sample(urban, length(panelID), replace = T), each = n),
size = rep(sample(sizes, length(panelID), replace = T), each = n),
income = round(runif(100)/10,2),
happiness = sample(10,10),
Sex = round(rnorm(10,0.75,0.3),2),
Age = sample(100,100),
educ = round(rnorm(10,0.75,0.3),2))
DT [, uniqueID := .I] # Creates a unique ID
DT <- as.data.frame(DT)
To make it slightly easier for someone to help who is not familiar with the packages, I have added how the structure of the two packages I use looks.
The structure of the second stage of
ivreg is as follows:
second_stage <- ivreg(Happiness ~ factor:income + factor:sales + educ | urban:income + urban:sales + educ, data=DT)
The structure for
second_stage<- ivmodel(Y=DT$Happiness,D=DT$factor,Z=DT[,c("urban","size")],X=DT$educ, na.action = na.omit)
Any help with figuring out how to do this properly would be greatly appreciated!
Get this bounty!!!