## #StackBounty: #regression #econometrics #probit #endogeneity #2sls 2SLS with a boolean regressor

### Bounty: 50

So, I have the following linear model:
$$y = alpha + beta x + u$$
and $$x in {0,1}$$, i.e. the variable $$x$$ is boolean. Moreover $$x$$ may be endogenous, and I have a set of instrumental variables $$boldsymbol{z}$$ which are exogenous. In this situation usually one uses a simple 2SLS regression and that’s it. But I was wondering whether one could first regress $$x$$ on $$boldsymbol{z}$$ thorough probit, and then take the fitted values $$hat{x}$$ as instrumental variables in the second step of the regression, where we use $$hat{x}$$ as instrumental variable for $$x$$ and use IV.
So I have replaced the OLS regression of the first step with a probit regression.

Is the result of this kind of two step regression consistent? Does it make sense to do so?

Thanks!

Get this bounty!!!

## #StackBounty: #r #categorical-data #interaction #instrumental-variables #2sls A 2SLS when the instrumented variable has two interaction…

### Bounty: 50

I am using `ivreg` and `ivmodel` in `R` to apply a 2SLS.

I would like to instrument one variable, namely $$x_1$$, present in two interaction terms. In this example $$x_1$$ is a factor variable. The regression is specified in this manner because the ratio between $$a$$ and $$b$$ is of importance.

$$y = ax_1 x_2 + bx_1x_3 + cx_4 + e$$

For this instrumented variable I have two instruments $$z_1$$ and $$z_2$$. For both the following causal diagram is applicable (Z only has an indirect effect on Y through X).

What is for this problem the correct way to instrument $$x_1$$?

# In the data

Translated to some (fake) sample data the problem looks like:

$$happiness = a(factor:income) + b(factor:sales) + c(educ) + e$$
$$=$$
$$(y = ax_1 x_2 + bx_1x_3 + cx_4 + e)$$

Where the instrument $$z_1$$ is `urban` and $$z_2$$ is `size`. Here I however become to get confused about how to write the regression.

# For the first stage:

What is my dependent variable here?

# For the second stage, should I do:

$$happiness = a(urban:income) + b(urban:sales) + c(educ) + e$$
$$happiness = a(size:income) + b(size:sales) + c(educ) + e$$

Or should I just do:

$$happiness = urban(a:income+b:sales) + c(educ) + e$$
$$happiness = size$$
(a:income+b:sales) + c(educ) + e\$\$

Nevertheless, how should I specify this in `R` ?

``````library(data.table)
library(ivmodel)
library(AER)
panelID = c(1:50)
year= c(2001:2010)
country = c("NLD", "BEL", "GER")
urban = c("A", "B", "C")
indust = c("D", "E", "F")
sizes = c(1,2,3,4,5)
n <- 2
library(data.table)
set.seed(123)
DT <- data.table(panelID = rep(sample(panelID), each = n),
country = rep(sample(country, length(panelID), replace = T), each = n),
year = c(replicate(length(panelID), sample(year, n))),
some_NA = sample(0:5, 6),
Factor = sample(0:5, 6),
industry = rep(sample(indust, length(panelID), replace = T), each = n),
urbanisation = rep(sample(urban, length(panelID), replace = T), each = n),
size = rep(sample(sizes, length(panelID), replace = T), each = n),
income = round(runif(100)/10,2),
Y_Outcome= round(rnorm(10,100,10),2),
sales= round(rnorm(10,10,10),2),
happiness = sample(10,10),
Sex = round(rnorm(10,0.75,0.3),2),
Age = sample(100,100),
educ = round(rnorm(10,0.75,0.3),2))
DT [, uniqueID := .I]                                                         # Creates a unique ID
DT <- as.data.frame(DT)
``````

To make it slightly easier for someone to help who is not familiar with the packages, I have added how the structure of the two packages I use looks.

The structure of the second stage of `ivreg` is as follows:

``````second_stage <- ivreg(Happiness ~ factor:income + factor:sales + educ | urban:income + urban:sales + educ, data=DT)
``````

The structure for `ivmodel` is:

``````second_stage<- ivmodel(Y=DT$$Happiness,D=DT$$factor,Z=DT[,c("urban","size")],X=DT\$educ, na.action = na.omit)
``````

Any help with figuring out how to do this properly would be greatly appreciated!

Get this bounty!!!