## #StackBounty: #r #generalized-linear-model #stata #quasi-likelihood How flexible is Stata's ivpois? Could I use it for a (quasi) bi…

### Bounty: 50

According to this post on statalist, Stata’s `ivpois` (an instrumental variable approach) is pretty flexible, with very little assumptions.

The problem mentioned in the post is:

"I have a database with counts as dependent variable. This variable suffers from over-dispersion problem."

Wooldridge mentions:

"I would strongly recommend trying IVPOIS, too. Regrettably, this command name is a misnomer. It should be called something like IVEXPON, as it works for any exponential model with multiplicative error. It does not care whether the EEV is continuous or discrete, so it produces consistent estimators under much weaker assumptions."

My thought was, if it works for any exponential model, does that mean it also works for my model, which is technically a `quasibinomial`, but according to this fine post, that boils down to a `binomial` with robust standard errors?

Hence my question: How flexible is Stata’s ivpois? Could I use it for a (quasi) binomial distribution?

Get this bounty!!!

## #StackBounty: #nonparametric #stata Which one is the correct specification to estimate Nonparametric regressions with discrete and cont…

### Bounty: 50

I was trying to implement manually the estimation of nonparametric regression using local-linear approximation with a mixture of discrete and continuous data.
consider a simple model:
$$y=f(xc,xd)$$
where xc is continuous and xd is discrete

Say that I want to estimate this model non parametrically. Which one of the two following regressions is the correct one (assuming local linear estimation.

1:
$$y=a0+a1*(xc-c)+e$$
2:
$$y=a0+a1*(xc-c)+a2*xd +e$$

Assume that both models are estimated using the correct kernel weights and that xd is a dummy.

I thought the correct model was (1), but npregress in Stata uses (2). Which one would be the correct one?

Thank you

EDIT:
Perhaps a different way to ask the same question.
Say that you have a 3 variables, y, xc (continuous) and xd (discrete), and that you want to estimate a nonparametric, using local linear kernel estimation, for:
$$y=f(xc,xd)$$
Empirically, how would you estimate this model using WLS? which one is the correct specification? equation 1 or equation 2 (assuming weights are appropriately obtained)

Get this bounty!!!

## #StackBounty: #r #stata #instrumental-variables #endogeneity #hausman What are the differences between tests for overidentification in …

### Bounty: 50

I am using 2SLS for my research and I want to test for overidentification. I started out with the Hausman test of which I have a reasonable grasp.

The problem I have is that from the Hausman and the Sargan Test I am getting very different results.

The Sargan test is done by `ivmodel` from `library(ivmodel)`. I copied the Hausman test from “Using R for Introductory Econometrics” page 226, by Florian Heiss.

``````[1] "############################################################"
[1] "***Hausman Test for Overidentification***"
[1] "############################################################"
[1] "***R2***"
[1] 0.0031
[1] "***Number of observations (nobs)***"
[1] 8937
[1] "***nobs*R2***"
[1] 28
[1] "***p-value***"
[1] 0.00000015

Sargan Test Result:

Sargan Test Statistics=0.31, df=1, p-value is 0.6
``````

On top of this I am also using `ivtobit` from Stata, which provides a Wald test of exogeneity.

Lastly I read about a fourth which is the `Hansen J statistic`.

What is the difference between all of these tests?

Get this bounty!!!

## #StackBounty: #19.10 #libpng #plot #stata stata very slow to plot graphs in ubuntu 19.10 (gnome)

### Bounty: 50

For those who have tried to use stata under linux, probably already encountered some issues related to the icons and libpng12 library… while I could get those fixed and stata starts normally and display the icons, somehow, stata takes really long to plot a scatter plot with some 150.000 observations.

• I first thought it was a issue with stata, but it works fine under manjaro (arch-based, with gnome 3.34.1).

• I also tried multiple releases of stata (mp, se) and also multiple versions (14.2, 15, 15.1)… and the results are the same across all versions: works as expected in Manjaro and veeeery slow in Ubuntu.

• other than plotting, stata seems to run just as fast as in other systems.
• i first thought the issue was related to the libpng12 library. I installed this library in several ways, including using the binary from arch repository, from this ppa as in this tutorial and
also manually from xenial’s repo.
• also tried in elementaryOS (5.1) and ubuntu 16.04, so it is not a problem only with the 19.10 version.

anyone has any idea what else I could try?

• cpu: i5 5300U
• gpu: Intel HD Graphics 5500 (i think… definitely no dedicated chip)
• ram: 16gb
• comment: with the same computer, it works fine on windows… (on other computer, older one with i5 from 2nd gen, works fine on manjaro)
• if I `set graph off`, the creation of the plot is normal, but if try to export it to png, it takes really long time (export to eps is rather quick).

• while plotting, the Xorg process goes to 90~100% CPU usage (of 1 core). The same happens in wayland (but with XWayland process). Even if graph is not be displayed (window is hidden or when graph is set off and only being exported…)
• it affects several versions of stata (14, 15.1, 16)

Get this bounty!!!

## #StackBounty: #19.10 #libpng #plot #stata stata15 very slow to plot graphs in ubuntu 19.10 (gnome)

### Bounty: 50

For those who have tried to use stata under linux, probably already encountered some issues related to the icons and libpng12 library… while I could get those fixed and stata starts normally and display the icons, somehow, stata takes really long to plot a scatter plot with some 150.000 observations.

• I first thought it was a issue with stata, but it works fine under manjaro (arch-based, with gnome 3.34.1).

• I also tried multiple releases of stata (mp, se) and also multiple versions (14.2, 15, 15.1)… and the results are the same across all versions: works as expected in Manjaro and veeeery slow in Ubuntu.

• other than plotting, stata seems to run just as fast as in other systems.
• i first thought the issue was related to the libpng12 library. I installed this library in several ways, including using the binary from arch repository, from this ppa as in this tutorial and
also manually from xenial’s repo.
• also tried in elementaryOS (5.1) and ubuntu 16.04, so it is not a problem only with the 19.10 version.

anyone has any idea what else I could try?

• cpu: i5 5300U
• gpu: Intel HD Graphics 5500 (i think… definitely no dedicated chip)
• ram: 16gb
• comment: with the same computer, it works fine on windows… (on other computer, older one with i5 from 2nd gen, works fine on manjaro)
• if I `set graph off`, the creation of the plot is normal, but if try to export it to png, it takes really long time (export to eps is rather quick).

• while plotting, the Xorg process goes to 90~100% CPU usage (of 1 core). The same happens in wayland (but with XWayland process). Even if graph is not be displayed (window is hidden or when graph is set off and only being exported…)

Get this bounty!!!

## #StackBounty: #statistical-significance #stata #fixed-effects-model #clustered-standard-errors Variance-covariance matrix of individual…

### Bounty: 50

For some reasons, I am interested in the variance-covariance matrix of the individual fixed-effects when regressing wages on personal characteristics:
$$Y_{i,t} = X_{i,t} times beta + c_i + epsilon_{i,t}$$ where $$c_i$$ stand for the individual fixed effects and $$epsilon_{i,t}$$ stand for the perturbations.

1/ Quite surprisingly (to me), when clustering the standard errors (SE) by individuals, the variance associated to the coefficients of the individuals fixed effects is drastically reduced. For instance, in the simple case without covariates, it goes from $$0.19…$$ to $$2.07 times 10^{-14}$$.

2/ Moreover, when I make a Monte Carlo simulation (with i.i.d. draws for the perturbations $$epsilon_{i,t}$$), this still holds. Hence, the clustering is obviously a bad idea for my purpose. But why ?

Note 1 : The data and codes (in Stata) are available here: https://sites.google.com/view/acazenave-lacroutz/stackexchange_question1

Note 2 : I am aware that the standard errors are adjusted by stata for small-size sample. It explains why the results I get with $$reg$$ for the SE of the $$beta$$ are not the same than the results I get with $$xtreg$$ ; but cannot explain such difference for the SE of the $$c_i$$.

Note 3 : Assertion 2 seems to prevent the usual explanation that cluster standard errors can be smaller than the unclustered ones due to intraclass correlations (e.g. cluster-robust standard errors are smaller than unclustered ones in fgls with cluster fixed effects ).

Get this bounty!!!