#StackBounty: #r #latent-class Latent Class Analysis: What's the difference between polytomous vs. dichotomous manifest variables?

Bounty: 50

I am using poLCA in R to run a latent class analysis (LCA), and I’d like some help understanding the implications of using polytomous vs. dichotomous manifest variables.

I have three polytomous manifest variables, each with many levels (7, 12, and 14, to be precise). I originally dummy coded them all and ran it on 33 dichotomous variables, but I’m re-running it now as just three polytomous variables. The degrees of freedom for the model are different depending on how I do this (and measures of fit, etc. all change too, of course), so it’s clearly not running the same model two ways, but two completely different models.

Can someone explain how LCA with polytomous manifest variables differs from LCA on the same information represented in dichotomous variables? Some software requires dichotomous manifest variables, but poLCA can handle polytomous, so I have the option to choose, and I don’t feel like I understand the implications of the choice.
Here’s an example:

> set.seed(24601)
> df <- data.frame(V1=rep(c("a", "b", "c", "d", "e"), 10), 
+                  V2=rep(c("f", "g", "h", "i", "j"), 10), 
+                  V3=rep(c("k", "l", "m", "n", "o"), 10))
> rn <- data.frame(V1=factor(rbinom(1000, 4, .3), labels=c("a", "b", "c", "d", "e")), 
+                  V2=factor(rbinom(1000, 4, .3), labels=c("f", "g", "h", "i", "j")), 
+                  V3=factor(rbinom(1000, 4, .3), labels=c("k", "l", "m", "n", "o")))
> df <- rbind(df, rn) # combine systematic and random data together
> summary(df)
 V1      V2      V3     
 a:234   f:243   k:258  
 b:433   g:437   l:408  
 c:286   h:254   m:280  
 d: 82   i: 91   n: 89  
 e: 15   j: 25   o: 15  
> 

as polytomous

> as_poly <- poLCA(cbind(V1, V2, V3) ~ 1, df, nclass=2, maxiter = 10000)
Conditional item response (column) probabilities,
 by outcome variable, for each class (row) 

$V1
               a      b      c      d      e
class 1:  0.2277 0.4202 0.2781 0.0691 0.0049
class 2:  0.0054 0.0649 0.0158 0.4792 0.4347

$V2
               f      g      h      i      j
class 1:  0.2366 0.4244 0.2473 0.0777 0.0139
class 2:  0.0000 0.0506 0.0000 0.4849 0.4645

$V3
               k      l      m      n      o
class 1:  0.2506 0.3973 0.2726 0.0761 0.0034
class 2:  0.0263 0.0000 0.0000 0.4713 0.5024

Estimated class population shares 
 0.9781 0.0219 

Predicted class memberships (by modal posterior prob.) 
 0.9781 0.0219 

Here’s the fit, df, etc.:

========================================================= 
Fit for 2 latent classes: 
========================================================= 
number of observations: 1050 
number of estimated parameters: 25 
residual degrees of freedom: 99 
maximum log-likelihood: -4122.564 

AIC(2): 8295.129
BIC(2): 8419.043
G^2(2): 139.2114 (Likelihood ratio/deviance statistic) 
X^2(2): 139.9829 (Chi-square goodness of fit) 

as dichotomous

> library(psych) # for dummy.code()
> library(dplyr) # for pipe and mutate_if
> df_dich <- cbind(dummy.code(df$V1), dummy.code(df$V2), dummy.code(df$V3)) %>% 
+   as.data.frame() %>% 
+   mutate_if(is.numeric, as.factor) 
> 
> as_dich <- poLCA(cbind(a,b,c,d,e,f,g,h,i,j,k,l,m,n,o) ~ 1, df_dich, nclass=2, maxiter = 10000)
Conditional item response (column) probabilities,
 by outcome variable, for each class (row) 

$a
               0      1
class 1:  0.7849 0.2151
class 2:  0.7716 0.2284

$b
               0      1
class 1:  0.5767 0.4233
class 2:  0.5954 0.4046

$c
               0      1
class 1:  0.7208 0.2792
class 2:  0.7325 0.2675

$d
               0      1
class 1:  0.9222 0.0778
class 2:  0.9217 0.0783

$e
               0      1
class 1:  0.9954 0.0046
class 2:  0.9788 0.0212

$f
               0      1
class 1:  1.0000 0.0000
class 2:  0.6036 0.3964

$g
          0 1
class 1:  0 1
class 2:  1 0

$h
               0      1
class 1:  1.0000 0.0000
class 2:  0.5856 0.4144

$i
               0      1
class 1:  1.0000 0.0000
class 2:  0.8515 0.1485

$j
               0      1
class 1:  1.0000 0.0000
class 2:  0.9592 0.0408

$k
               0      1
class 1:  0.7689 0.2311
class 2:  0.7439 0.2561

$l
               0      1
class 1:  0.5652 0.4348
class 2:  0.6444 0.3556

$m
               0      1
class 1:  0.7437 0.2563
class 2:  0.7259 0.2741

$n
               0      1
class 1:  0.9268 0.0732
class 2:  0.9070 0.0930

$o
               0      1
class 1:  0.9954 0.0046
class 2:  0.9788 0.0212

Estimated class population shares 
 0.4162 0.5838 

Predicted class memberships (by modal posterior prob.) 
 0.4162 0.5838 

Here’s the fit, df, etc.:

========================================================= 
Fit for 2 latent classes: 
========================================================= 
number of observations: 1050 
number of estimated parameters: 31 
residual degrees of freedom: 1019 
maximum log-likelihood: -6420.786 

AIC(2): 12903.57
BIC(2): 13057.23
G^2(2): 4735.655 (Likelihood ratio/deviance statistic) 
X^2(2): 315556.7 (Chi-square goodness of fit) 

You’ll notice the degrees of freedom change, so it’s clearly running two different models (and measures of fit, etc. change as well, of course).


Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.