I am using
R to apply a 2SLS.
I would like to instrument one variable, namely $x_1$, present in two interaction terms. In this example $x_1$ is a factor variable. The regression is specified in this manner because the ratio between $a$ and $b$ is of importance.
$$y = ax_1 x_2 + bx_1x_3 + cx_4 + e$$
For this instrumented variable I have two instruments $z_1$ and $z_2$. For both the following causal diagram is applicable (Z only has an indirect effect on Y through X).
What is for this problem the correct way to instrument $x_1$?
In the data
Translated to some (fake) sample data the problem looks like:
$$happiness = a(factor:income) + b(factor:sales) + c(educ) + e$$
$$(y = ax_1 x_2 + bx_1x_3 + cx_4 + e)$$
Where the instrument $z_1$ is
urban and $z_2$ is
size. Here I however become to get confused about how to write the regression.
For the first stage:
What is my dependent variable here?
For the second stage, should I do:
$$happiness = a(urban:income) + b(urban:sales) + c(educ) + e$$
$$happiness = a(size:income) + b(size:sales) + c(educ) + e$$
Or should I just do:
$$happiness = urban(a:income+b:sales) + c(educ) + e$$
$$happiness = size(a:income+b:sales) + c(educ) + e$$
Nevertheless, how should I specify this in
library(data.table) library(ivmodel) library(AER) panelID = c(1:50) year= c(2001:2010) country = c("NLD", "BEL", "GER") urban = c("A", "B", "C") indust = c("D", "E", "F") sizes = c(1,2,3,4,5) n <- 2 library(data.table) set.seed(123) DT <- data.table(panelID = rep(sample(panelID), each = n), country = rep(sample(country, length(panelID), replace = T), each = n), year = c(replicate(length(panelID), sample(year, n))), some_NA = sample(0:5, 6), Factor = sample(0:5, 6), industry = rep(sample(indust, length(panelID), replace = T), each = n), urbanisation = rep(sample(urban, length(panelID), replace = T), each = n), size = rep(sample(sizes, length(panelID), replace = T), each = n), income = round(runif(100)/10,2), Y_Outcome= round(rnorm(10,100,10),2), sales= round(rnorm(10,10,10),2), happiness = sample(10,10), Sex = round(rnorm(10,0.75,0.3),2), Age = sample(100,100), educ = round(rnorm(10,0.75,0.3),2)) DT [, uniqueID := .I] # Creates a unique ID DT <- as.data.frame(DT)
To make it slightly easier for someone to help who is not familiar with the packages, I have added how the structure of the two packages I use looks.
The structure of the second stage of
ivreg is as follows:
second_stage <- ivreg(Happiness ~ factor:income + factor:sales + educ | urban:income + urban:sales + educ, data=DT)
The structure for
second_stage<- ivmodel(Y=DT$Happiness,D=DT$factor,Z=DT[,c("urban","size")],X=DT$educ, na.action = na.omit)
Any help with figuring out how to do this properly would be greatly appreciated!