#StackBounty: #lme4-nlme #random-effects-model #non-independent Estimating expected values for correlated data using random effects mod…

Bounty: 100

Statement of the problem: in a study, continuous and dichotomous variables were measured for both eyes for 60 individuals. The researchers needs estimates of expected values (means and proportions) for those measurements for all 60 subjects across bot eyes. In order to do this, the 120 eyes from the 60 subjects must be used to provide a pooled estimate.

The proposed random effects models to achieve this are as follows:

$E(y_{ij})=mu+alpha_j+epsilon_{ij}$

and

$Logit(p_{ij})=gamma+alpha_j$

Where $mu$ is the overall mean for a continuous variable $y_{ij}$, $gamma$ is the overall log odds of the probability for the dichotomous variables, $alpha_j, epsilon {ij}$ are random, uncorrelated random effects with normal distributions ($alpha_j sim N(0,sigma{gamma}), ; epsilon_{ij} sim N(0,sigma_{epsilon}), Cov(alpha_j,epsilon_{ij})=0$). Index $j$ stands for subject and index $i$ stands for eye nested in the subject.

A more complex nested random effects model could be appropriate, however, for the sake of simplicity will be ignored.

I have made a github project with the data and code in R to do this (https://github.com/nmolanog/glmer_question).

Now I present the main issue of this post: for dichotomous variables I am observing huge differences in estimates ignoring correlation of eyes nested in subjects vs estimates provided by the random effect models. Those differences are so important, that researchers are questioning and mistrusting the approach and its results. For continuous variables the differences in estimates are almost none existent and (as expected) the main differences are found in confidence intervals, where random effects models provides wider CI’s (see figure).
Estimates and confidence intervals for random effects models and classical estimation procedures

See for example variables M and N, the differences between approaches are huge. In the github repo I explored a nested random effects models for variable K obtaining very similar results to those provided by the simpler random effects model.

How those differences could be explained? Is there any problem with the approach?

Update-Sample code:

###estimate proportion for variable K using glm
mk_glm<-glm(K~1,data = ldf, family = binomial(link = "logit"))
mk_glm_ci<-inv.logit(confint(mk_glm))

##arrange result from glm model
(res_df<-data.frame(method="glm",estimate=inv.logit(mk_glm$coefficients),LCI=mk_glm_ci[1],UCI=mk_glm_ci[2]))

#compare to  raw estimate:
ldf$K%>%table()%>%{.[2]/sum(.)}

###estimate proportion for variable K using glmer model 1
mk_glmer<-glmer(K~1+(1|Id),data = ldf, family = binomial(link = "logit"),control=glmerControl(optimizer = "bobyqa"),nAGQ = 20)
mk_glmer_ci<-confint(mk_glmer)
#add result to res_df
(res_df<-rbind(res_df,data.frame(method="glmer",estimate=inv.logit(fixef(mk_glmer)),LCI=inv.logit(mk_glmer_ci[2,1]),UCI=inv.logit(mk_glmer_ci[2,2]))))

###estimate proportion for variable K using glmer model 2, nested random effects
mk_glmer_2<-glmer(K~1+(1|Id/eye),data = ldf, family = binomial(link = "logit"),control=glmerControl(optimizer = "bobyqa"))
mk_glmer_2_ci<-confint(mk_glmer_2)
(res_df<-rbind(res_df,data.frame(method="glmer2",estimate=inv.logit(fixef(mk_glmer_2)),LCI=inv.logit(mk_glmer_2_ci[3,1]),UCI=inv.logit(mk_glmer_2_ci[3,2]))))

Outuput

             method  estimate       LCI       UCI
(Intercept)     glm 0.7083333 0.6231951 0.7846716
(Intercept)1  glmer 0.9230166 0.7399146 0.9990011
(Intercept)2 glmer2 0.9999539 0.9991883 0.9999995

The dataset and code can be found in https://github.com/nmolanog/glmer_question


Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.