#StackBounty: #r #generalized-linear-model #stata #quasi-likelihood How flexible is Stata's ivpois? Could I use it for a (quasi) bi…

Bounty: 50

According to this post on statalist, Stata’s ivpois (an instrumental variable approach) is pretty flexible, with very little assumptions.

The problem mentioned in the post is:

"I have a database with counts as dependent variable. This variable suffers from over-dispersion problem."

Wooldridge mentions:

"I would strongly recommend trying IVPOIS, too. Regrettably, this command name is a misnomer. It should be called something like IVEXPON, as it works for any exponential model with multiplicative error. It does not care whether the EEV is continuous or discrete, so it produces consistent estimators under much weaker assumptions."

My thought was, if it works for any exponential model, does that mean it also works for my model, which is technically a quasibinomial, but according to this fine post, that boils down to a binomial with robust standard errors?

Hence my question: How flexible is Stata’s ivpois? Could I use it for a (quasi) binomial distribution?

Background: How to do a Control Function (CF) / Two Stage Residual Inclusion (2SRI) with an ordinal dependent variable in the first stage and a glm in the second


Get this bounty!!!

#StackBounty: #r #generalized-linear-model #stata #quasi-likelihood How flexible is Stata's ivpois? Could I use it for a (quasi) bi…

Bounty: 50

According to this post on statalist, Stata’s ivpois (an instrumental variable approach) is pretty flexible, with very little assumptions.

The problem mentioned in the post is:

"I have a database with counts as dependent variable. This variable suffers from over-dispersion problem."

Wooldridge mentions:

"I would strongly recommend trying IVPOIS, too. Regrettably, this command name is a misnomer. It should be called something like IVEXPON, as it works for any exponential model with multiplicative error. It does not care whether the EEV is continuous or discrete, so it produces consistent estimators under much weaker assumptions."

My thought was, if it works for any exponential model, does that mean it also works for my model, which is technically a quasibinomial, but according to this fine post, that boils down to a binomial with robust standard errors?

Hence my question: How flexible is Stata’s ivpois? Could I use it for a (quasi) binomial distribution?

Background: How to do a Control Function (CF) / Two Stage Residual Inclusion (2SRI) with an ordinal dependent variable in the first stage and a glm in the second


Get this bounty!!!

#StackBounty: #r #generalized-linear-model #stata #quasi-likelihood How flexible is Stata's ivpois? Could I use it for a (quasi) bi…

Bounty: 50

According to this post on statalist, Stata’s ivpois (an instrumental variable approach) is pretty flexible, with very little assumptions.

The problem mentioned in the post is:

"I have a database with counts as dependent variable. This variable suffers from over-dispersion problem."

Wooldridge mentions:

"I would strongly recommend trying IVPOIS, too. Regrettably, this command name is a misnomer. It should be called something like IVEXPON, as it works for any exponential model with multiplicative error. It does not care whether the EEV is continuous or discrete, so it produces consistent estimators under much weaker assumptions."

My thought was, if it works for any exponential model, does that mean it also works for my model, which is technically a quasibinomial, but according to this fine post, that boils down to a binomial with robust standard errors?

Hence my question: How flexible is Stata’s ivpois? Could I use it for a (quasi) binomial distribution?

Background: How to do a Control Function (CF) / Two Stage Residual Inclusion (2SRI) with an ordinal dependent variable in the first stage and a glm in the second


Get this bounty!!!

#StackBounty: #r #generalized-linear-model #stata #quasi-likelihood How flexible is Stata's ivpois? Could I use it for a (quasi) bi…

Bounty: 50

According to this post on statalist, Stata’s ivpois (an instrumental variable approach) is pretty flexible, with very little assumptions.

The problem mentioned in the post is:

"I have a database with counts as dependent variable. This variable suffers from over-dispersion problem."

Wooldridge mentions:

"I would strongly recommend trying IVPOIS, too. Regrettably, this command name is a misnomer. It should be called something like IVEXPON, as it works for any exponential model with multiplicative error. It does not care whether the EEV is continuous or discrete, so it produces consistent estimators under much weaker assumptions."

My thought was, if it works for any exponential model, does that mean it also works for my model, which is technically a quasibinomial, but according to this fine post, that boils down to a binomial with robust standard errors?

Hence my question: How flexible is Stata’s ivpois? Could I use it for a (quasi) binomial distribution?

Background: How to do a Control Function (CF) / Two Stage Residual Inclusion (2SRI) with an ordinal dependent variable in the first stage and a glm in the second


Get this bounty!!!

#StackBounty: #r #generalized-linear-model #stata #quasi-likelihood How flexible is Stata's ivpois? Could I use it for a (quasi) bi…

Bounty: 50

According to this post on statalist, Stata’s ivpois (an instrumental variable approach) is pretty flexible, with very little assumptions.

The problem mentioned in the post is:

"I have a database with counts as dependent variable. This variable suffers from over-dispersion problem."

Wooldridge mentions:

"I would strongly recommend trying IVPOIS, too. Regrettably, this command name is a misnomer. It should be called something like IVEXPON, as it works for any exponential model with multiplicative error. It does not care whether the EEV is continuous or discrete, so it produces consistent estimators under much weaker assumptions."

My thought was, if it works for any exponential model, does that mean it also works for my model, which is technically a quasibinomial, but according to this fine post, that boils down to a binomial with robust standard errors?

Hence my question: How flexible is Stata’s ivpois? Could I use it for a (quasi) binomial distribution?

Background: How to do a Control Function (CF) / Two Stage Residual Inclusion (2SRI) with an ordinal dependent variable in the first stage and a glm in the second


Get this bounty!!!

#StackBounty: #r #generalized-linear-model #stata #quasi-likelihood How flexible is Stata's ivpois? Could I use it for a (quasi) bi…

Bounty: 50

According to this post on statalist, Stata’s ivpois (an instrumental variable approach) is pretty flexible, with very little assumptions.

The problem mentioned in the post is:

"I have a database with counts as dependent variable. This variable suffers from over-dispersion problem."

Wooldridge mentions:

"I would strongly recommend trying IVPOIS, too. Regrettably, this command name is a misnomer. It should be called something like IVEXPON, as it works for any exponential model with multiplicative error. It does not care whether the EEV is continuous or discrete, so it produces consistent estimators under much weaker assumptions."

My thought was, if it works for any exponential model, does that mean it also works for my model, which is technically a quasibinomial, but according to this fine post, that boils down to a binomial with robust standard errors?

Hence my question: How flexible is Stata’s ivpois? Could I use it for a (quasi) binomial distribution?

Background: How to do a Control Function (CF) / Two Stage Residual Inclusion (2SRI) with an ordinal dependent variable in the first stage and a glm in the second


Get this bounty!!!

#StackBounty: #r #generalized-linear-model #stata #quasi-likelihood How flexible is Stata's ivpois? Could I use it for a (quasi) bi…

Bounty: 50

According to this post on statalist, Stata’s ivpois (an instrumental variable approach) is pretty flexible, with very little assumptions.

The problem mentioned in the post is:

"I have a database with counts as dependent variable. This variable suffers from over-dispersion problem."

Wooldridge mentions:

"I would strongly recommend trying IVPOIS, too. Regrettably, this command name is a misnomer. It should be called something like IVEXPON, as it works for any exponential model with multiplicative error. It does not care whether the EEV is continuous or discrete, so it produces consistent estimators under much weaker assumptions."

My thought was, if it works for any exponential model, does that mean it also works for my model, which is technically a quasibinomial, but according to this fine post, that boils down to a binomial with robust standard errors?

Hence my question: How flexible is Stata’s ivpois? Could I use it for a (quasi) binomial distribution?

Background: How to do a Control Function (CF) / Two Stage Residual Inclusion (2SRI) with an ordinal dependent variable in the first stage and a glm in the second


Get this bounty!!!

#StackBounty: #r #generalized-linear-model #stata #quasi-likelihood How flexible is Stata's ivpois? Could I use it for a (quasi) bi…

Bounty: 50

According to this post on statalist, Stata’s ivpois (an instrumental variable approach) is pretty flexible, with very little assumptions.

The problem mentioned in the post is:

"I have a database with counts as dependent variable. This variable suffers from over-dispersion problem."

Wooldridge mentions:

"I would strongly recommend trying IVPOIS, too. Regrettably, this command name is a misnomer. It should be called something like IVEXPON, as it works for any exponential model with multiplicative error. It does not care whether the EEV is continuous or discrete, so it produces consistent estimators under much weaker assumptions."

My thought was, if it works for any exponential model, does that mean it also works for my model, which is technically a quasibinomial, but according to this fine post, that boils down to a binomial with robust standard errors?

Hence my question: How flexible is Stata’s ivpois? Could I use it for a (quasi) binomial distribution?

Background: How to do a Control Function (CF) / Two Stage Residual Inclusion (2SRI) with an ordinal dependent variable in the first stage and a glm in the second


Get this bounty!!!

#StackBounty: #r #residuals #2sls #quasi-likelihood #ordered-probit How to do a Control Function (CF) / Two Stage Residual Inclusion (2…

Bounty: 50

I am trying to use a Control Function (CF) / Two Stage Residual Inclusion (2SRI) approach, because the modeled relationship that I am trying to estimate is non-linear (my dependent variable has a percentage interpretation).

model <- glm(y ~ x + z1 + z2, family="quasibinomial", data=df1)

Because my endogenous variable is an ordinal variable, I use an ordinal probit model in the first stage (polr).

reduced.form <- polr(as.ordered(x) ~ z1 + z2, data=df1, method='probit')

For the Control Function (CF) / Two Stage Residual Inclusion (2SRI) approach, I need residuals. But an ordinal probit model does not have any traditional residuals (see also this post).

I used the package sure (CRAN, R journal), to calculate surrogate residuals of a polr with resids (see data below). The package is based on this paper, in the Journal of the American Statistical Association. The problem is that: "Note: Surrogate residuals require sampling from a continuous distribution; consequently, the result will be different with every call to resids."
My problem now, is that my output changes significantly, every time I run the model.

df1$residuals <- resids(reduced.form, nsim=1000, method="latent")
var(df1$residuals)

Is different from:

df1$residuals_II <- resids(reduced.form, nsim=1000, method="latent")
var(df1$residuals_II)

What are my options to fix my estimation problem?

  1. I thought I could maybe simply run the model 100 times and take the average (see this post), but @gung-ReinstateMonica commented that that was a "shoehorn it" solution.

  2. I thought of getting different residuals that are stable, but I did not find any (post).

  3. I thought of recoding my ordinal variable to a dummy variable (But according to Peter Flom, that is throwing away information, "Collapsing the variable will only very rarely be correct. It throws away information, and that’s rarely a good thing to do." link).

  4. I thought of using 2SLS instead. But then I would have to argue that the relationship I model is linear. The reason that I am using CF/2SRI, is because including fitted values (which polr has) into a non-linear second stage produces inconsistent results (Wooldridge (2010) mentions that you can still use 2SLS in this case, but not mimic it by using fitted values! See chapter 9.5.2 titled "Estimation").

  5. I thouht of using Stata’s ivpois (in this link Wooldridge explains: "IVPOIS has nothing to do with the Poisson distribution. It is a method of moments procedure for an exponential mean function. It is the most robust of ALL procedures"). But I am not sure if I can then also apply it to my quasibinomial model (see also this post).

DATA

library(sure) # for residual function and sample data sets
library(MASS) # for polr function

df1 <- df1
df1$z1 <- df1$x
df1$x <- NULL
df1$x <- df2$y
df1$z2 <- df2$x
df1$y <- df3$x/10

mod1 <- polr(as.ordered(x) ~ z1 + z2, data=df1, method='probit')
df1$residuals <- resids(mod1, nsim=100, method="latent")
mod2 <- glm(y ~ x + residuals + z1 + z2, family="quasibinomial", data=df1)
summary(mod2)

References

On 2SRI:

Terza, J. V., Basu, A., & Rathouz, P. J. (2008). Two-stage residual inclusion estimation: addressing endogeneity in health econometric modeling. Journal of health economics, 27(3), 531–543. https://doi.org/10.1016/j.jhealeco.2007.09.009

On surrogate residuals:

Dungang Liu & Heping Zhang (2018) Residuals and Diagnostics for Ordinal Regression Models: A Surrogate Approach, Journal of the American Statistical Association, 113:522, 845-854, DOI: 10.1080/01621459.2017.1292915

On 2SLS/CF

Wooldridge, J. (2010). Econometric Analysis of Cross Section and Panel Data. Cambridge, Massachusetts; London, England: The MIT Press. Retrieved April 15, 2021, from http://www.jstor.org/stable/j.ctt5hhcfr


Get this bounty!!!

#StackBounty: #count-data #negative-binomial #poisson-regression #endogeneity #quasi-likelihood Testing for endogeneity in a negative b…

Bounty: 150

I’m trying to fit a negative binomial model to my data because the dependent variable exhibits overdispersion. However, one of my reviewers is insisting that I also test for endogeneity. He or she is worried that two independent variables are potentially endogenous (one of them might potentially be so…). My question is how one goes about testing for overdispersion in a negative binomial model, ideally in R. Can it be done simultaneously for two variables? I already found a potential instrument for the most problematic of these two variables (correlated with the endogenous independent variable but uncorrelated to the dependent variable). I’m just not sure how to go from here… I see papers that implement a two-step Heckman procedure, running the negative binomial regression with the inverse Mills ratio. However, I also read that this might not be appropriate…

My current model looks like this, I’m using R. Basically I’m pooling three years of data from two different countries. I’m primarily interested in the differences between these two countries. I have 2 control variables and 9 independent variables of interest. X1 and X3 are the potential problematic variables. Y is a count of different countries in which firms are present, and independent variables are things like international experience, international education, board independence, etc. Endogeneity arises, for instance, because international firms might hire people with more international experience/education than their local counterparts.

negbin <- glm.nb(Y~ Control1 + Contro2 + Year + Country
                 + X1*Country
                 + X2*Country
                 + X3*Country
                 + X4*Country
                 + X5*Country
                 + X6*Country 
                 + X7*Country
                 + X8*Country
                 + X9*Country
                 + X10*Country, data = mydata)
summary(negbin)
car::vif(negbin)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-2.04651  -1.16581  -0.56598   0.01105   3.00675  

Coefficients:
                         Estimate Std. Error z value Pr(>|z|)    
(Intercept)              1.588771   1.742045   0.912 0.361761    
Control1                 0.240602   0.086086   2.795 0.005191 ** 
Control2                -0.013200   0.003732  -3.537 0.000404 ***
YearThree                0.152904   0.277186   0.552 0.581203    
YearTwo                  0.085071   0.276648   0.308 0.758459    
Country                 -1.899136   2.604823  -0.729 0.465950    
X1                       1.609189   0.652992   2.464 0.013727 *  
X2                       0.146868   0.111476   1.317 0.187674    
X3                      -4.792707   0.748956  -6.399 1.56e-10 ***
X4                       4.352965   0.677561   6.424 1.32e-10 ***
X5                      -0.054561   0.015381  -3.547 0.000389 ***
X6                      -1.497622   0.374987  -3.994 6.50e-05 ***
X7                      -2.689511   0.768235  -3.501 0.000464 ***
X8                      -0.078919   0.069243  -1.140 0.254394    
X9                       4.237630   1.544278   2.744 0.006068 ** 
X10                      3.333337   1.258869   2.648 0.008100 ** 
Country:X1               0.584704   0.992207   0.589 0.555662    
Country:X2              -0.635671   0.332893  -1.910 0.056193 .  
Country:X3               4.508881   0.884777   5.096 3.47e-07 ***
Country:X4              -7.823156   1.411851  -5.541 3.01e-08 ***
Country:X5              -0.003909   0.032332  -0.121 0.903779    
Country:X6               1.001702   0.570836   1.755 0.079294 .  
Country:X7               4.870946   0.991810   4.911 9.05e-07 ***
Country:X8               0.403581   0.100593   4.012 6.02e-05 ***
Country:X9              -2.151496   1.953145  -1.102 0.270655    
Country:X10            -21.951529   4.102211  -5.351 8.74e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


Get this bounty!!!