#StackBounty: #r #regression #logistic #goodness-of-fit #instrumental-variables How can I get goodness-of-fit measures for "ivglm&…

Bounty: 50

I am trying to get the goodness-of-fit measures, such as R-square, chi-square, etc. from the “ivglm” code in the “ivtools” package in R programming.

However, I could not find a way to get these from its output.

For your reference, I also have different number of missing values for each variable as well.

For instance, I run the following code and get the output.

    reg_X.LZ=glm(reg[,5]+reg[,3]+reg[,6]~reg[,14]+reg[,25]+reg[,15]+reg[,46], data=reg)
    reg_Y.LX=glm(reg[,8]~reg[,5]+reg[,7]+reg[,6]+reg[,3]+reg[,4]+reg[,9]+reg[,10]*reg[,13]+reg[,11]+reg[,12]+reg[,14], data=reg, family=binomial(link="logit"))
    reg_logit=ivglm(estmethod="ts", fitX.LZ=reg_X.LZ, fitY.LX=reg_Y.LX, data=reg, family=binomial(link="logit")) 

    > summary(reg_logit)

    Call:  
    ivglm(estmethod = "ts", fitX.LZ = reg_X.LZ, fitY.LX = reg_Y.LX, 
        data = reg, family = binomial(link = "logit"))

    Coefficients: 
                             Estimate   Std. Error z value Pr(>|z|)    
    (Intercept)              2.582e+00  1.673e+00   1.543 0.122738    
    reg[, 5]                -7.177e-02  4.150e-03 -17.293  < 2e-16 ***
    reg[, 7]                 1.666e+00  1.163e-01  14.331  < 2e-16 ***
    reg[, 6]                -1.339e-01  2.393e-02  -5.596 2.19e-08 ***
    reg[, 3]                -1.678e-04  2.763e-05  -6.075 1.24e-09 ***
    reg[, 4]                 1.016e-01  3.873e-03  26.235  < 2e-16 ***
    reg[, 9]                 2.169e-02  6.504e-03   3.335 0.000854 ***
    reg[, 10]               -2.127e-01  1.870e-01  -1.137 0.255463    
    reg[, 13]               -4.391e+00  1.899e+00  -2.313 0.020721 *  
    reg[, 11]                4.420e-02  1.112e-02   3.976 7.01e-05 ***
    reg[, 12]                3.070e-01  6.807e-02   4.510 6.48e-06 ***
    reg[, 14]                1.919e-01  7.351e-02   2.610 0.009046 ** 
    reg[, 10]:reg[, 13]      4.545e-01  2.138e-01   2.126 0.033488 *  
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

However, I could not find a single way to get any goodness-of-fit measures.

I even tried to use some crude manners to get these goodness-of-fit measures as such but never worked:

    1-pchisq(reg_logit$null.deviance-reg_logit$deviance, reg_logit$df.null-reg_logit$df.residual)
    reg_logit$null.deviance-reg_logit$deviance
    1- reg_logit$deviance/reg_logit$null.deviance

So I will show what other alternative methods I tried to get (1) Chi-squared values and (2) R-squared values which I will need your help.

(1) R-squared values: I tried to following crude method to get the R-squared value but I put zeros across the missing values (NAs) in the data by using the code below:

    predicted = 1/(1+exp(-(reg_logit$est[1]+reg_logit$est[2]*reg[, 5]+reg_logit$est[3]*reg[, 7]+reg_logit$est[4]*reg[, 6]+reg_logit$est[5]*reg[, 3]+reg_logit$est[6]*reg[, 4]+reg_logit$est[7]*reg[, 9]+reg_logit$est[8]*reg[, 10]+reg_logit$est[9]*reg[, 13]+reg_logit$est[10]*reg[, 11]+reg_logit$est[11]*reg[, 12]+reg_logit$est[12]*reg[, 14]+reg_logit$est[13]*reg[, 10]*reg[, 13])))
    y=reg[, 8]
    predicted[is.na(predicted)] <- 0
    1 - sum((y-predicted)^2)/sum((y-mean(y))^2)

However, the R-squared value I get is ” -0.8353449″, a negative value from this. I’ve heard that R-squared value calculation for instrumental variable regressions should be treated differently (if I am correct) so it could be natural to get negative R-squared values if following the basic formula.
The source of this information is as below:

https://www.stata.com/support/faqs/statistics/two-stage-least-squares/#example

My main concern about this is how to supress the negative R-squared value and make into a positive R-squared value following the advice in the link above.

(2) Chi-squared values: I tried the following crude method again to get the chi-squared value for my instrumental variable logistic regression which do not give it out by default as other types of regressions.

 chi=((y-predicted)^2)/predicted
 chi[is.na(chi)] <- 0
 chisq_rs=sum(chi)
 p_val = pchisq(chisq_rs, df = 11, lower.tail = FALSE)

Here I get a large chi-squared value of “2250.203” and P-value of “0” which sort of makes sense. However, unlike above method, I did not have to worry about converting missing values NAs into “0” (zeros) as before since this method still worked. However, since I do not treat NAs as “0” (zeros) here in chi-squared value calculations while I do treat NAs as “0” (zeros) in my R-squared value calculation above, I will still need help on this to ask which way is correct and how to fix the problem for consistency in missing data treatment.

I would appreciate if I can get on this R-squared value calculation along with the chi-squared value calculation as my goodness-of-fit measures of my instrumental variable logistic regression.


Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.