Bounty: 50
I am trying to use a Control Function (CF) / Two Stage Residual Inclusion (2SRI) approach, because the modeled relationship that I am trying to estimate is nonlinear (my dependent variable has a percentage interpretation).
model < glm(y ~ x + z1 + z2, family="quasibinomial", data=df1)
Because my endogenous variable is an ordinal variable, I use an ordinal probit model in the first stage (polr
).
reduced.form < polr(as.ordered(x) ~ z1 + z2, data=df1, method='probit')
For the Control Function (CF) / Two Stage Residual Inclusion (2SRI) approach, I need residuals. But an ordinal probit model does not have any traditional residuals (see also this post).
I used the package sure
(CRAN, R journal), to calculate surrogate residuals of a polr
with resids
(see data below). The package is based on this paper, in the Journal of the American Statistical Association. The problem is that: "Note: Surrogate residuals require sampling from a continuous distribution; consequently, the result will be different with every call to resids."
My problem now, is that my output changes significantly, every time I run the model.
df1$residuals < resids(reduced.form, nsim=1000, method="latent")
var(df1$residuals)
Is different from:
df1$residuals_II < resids(reduced.form, nsim=1000, method="latent")
var(df1$residuals_II)
What are my options to fix my estimation problem?

I thought I could maybe simply run the model 100 times and take the average (see this post), but @gungReinstateMonica commented that that was a "shoehorn it" solution.

I thought of getting different residuals that are stable, but I did not find any (post).

I thought of recoding my ordinal variable to a dummy variable (But according to Peter Flom, that is throwing away information, "Collapsing the variable will only very rarely be correct. It throws away information, and that’s rarely a good thing to do." link).

I thought of using 2SLS instead. But then I would have to argue that the relationship I model is linear. The reason that I am using CF/2SRI, is because including fitted values (which polr
has) into a nonlinear second stage produces inconsistent results (Wooldridge (2010) mentions that you can still use 2SLS in this case, but not mimic it by using fitted values! See chapter 9.5.2 titled "Estimation").

I thouht of using Stata’s ivpois
(in this link Wooldridge explains: "IVPOIS has nothing to do with the Poisson distribution. It is a method of moments procedure for an exponential mean function. It is the most robust of ALL procedures"). But I am not sure if I can then also apply it to my quasibinomial
model (see also this post).
DATA
library(sure) # for residual function and sample data sets
library(MASS) # for polr function
df1 < df1
df1$z1 < df1$x
df1$x < NULL
df1$x < df2$y
df1$z2 < df2$x
df1$y < df3$x/10
mod1 < polr(as.ordered(x) ~ z1 + z2, data=df1, method='probit')
df1$residuals < resids(mod1, nsim=100, method="latent")
mod2 < glm(y ~ x + residuals + z1 + z2, family="quasibinomial", data=df1)
summary(mod2)
References
On 2SRI:
Terza, J. V., Basu, A., & Rathouz, P. J. (2008). Twostage residual inclusion estimation: addressing endogeneity in health econometric modeling. Journal of health economics, 27(3), 531–543. https://doi.org/10.1016/j.jhealeco.2007.09.009
On surrogate residuals:
Dungang Liu & Heping Zhang (2018) Residuals and Diagnostics for Ordinal Regression Models: A Surrogate Approach, Journal of the American Statistical Association, 113:522, 845854, DOI: 10.1080/01621459.2017.1292915
On 2SLS/CF
Wooldridge, J. (2010). Econometric Analysis of Cross Section and Panel Data. Cambridge, Massachusetts; London, England: The MIT Press. Retrieved April 15, 2021, from http://www.jstor.org/stable/j.ctt5hhcfr
Get this bounty!!!