#StackBounty: #r #inference #lasso #high-dimensional #selectiveinference Inference for quasibinomial GLM with LASSO penalty using selec…

Bounty: 50

I would like to carry out inference on a binomial LASSO model, but take into account the fact that my data are overdispersed and use the quasibinomial family instead.

R package selectiveInference, which does inference for LASSO models, only seems to support the binomial family though and not quasibinomial.

To get around this, I was wondering if it would be correct to adjust the z scores and p values returned by fixedLassoInf called using family="binomial" for overdispersion by dividing the z scores by the square root of the estimated dispersion coefficient of a quasibinomial GLM with the selected variables included? (or perhaps all variables included??)
Any thoughts if this would be a correct procedure? If it is, I was also wondering then how I should recalculate/adjust the returned confidence intervals? Any thoughts?

[BTW, package hdi, which has a similar aim, also doesn’t support quasibinomial, and I also couldn’t readily see how that package could be interfaced with package glmmLasso – if that would be possible then overdispersion could perhaps be taken into account using an observation-level random effect; if anyone would know how to do this then let me know too]

The output I had for my data right now was

fixedLassoInf(x, y, beta, lambda, family = "binomial", 
              intercept=TRUE, alpha=0.1, type="partial")
# Var   Coef Z-score P-value LowConfPt UpConfPt LowTailArea UpTailArea
# 2  2.596  10.710       0     2.194    2.995       0.048      0.050
# 3  1.224  16.400       0     1.101    1.348       0.049      0.050
# 5  2.608  17.219       0     2.356    2.857       0.049      0.050
# 7  0.776  10.588       0     0.655    0.897       0.048      0.050
# 8 -1.857  -5.103       0     1.229    2.462       0.050      0.048

Get this bounty!!!

Leave a Reply