I am using `poLCA`

in R to run a latent class analysis (LCA), and I’d like some help understanding the implications of using polytomous vs. dichotomous manifest variables.

I have three polytomous manifest variables, each with many levels (7, 12, and 14, to be precise). I originally dummy coded them all and ran it on 33 dichotomous variables, but I’m re-running it now as just three polytomous variables. The degrees of freedom for the model are different depending on how I do this (and measures of fit, etc. all change too, of course), so it’s clearly not running the same model two ways, but two completely different models.

**Can someone explain how LCA with polytomous manifest variables differs from LCA on the same information represented in dichotomous variables?** Some software requires dichotomous manifest variables, but `poLCA`

can handle polytomous, so I have the option to choose, and I don’t feel like I understand the implications of the choice.

Here’s an example:

```
> set.seed(24601)
> df <- data.frame(V1=rep(c("a", "b", "c", "d", "e"), 10),
+ V2=rep(c("f", "g", "h", "i", "j"), 10),
+ V3=rep(c("k", "l", "m", "n", "o"), 10))
> rn <- data.frame(V1=factor(rbinom(1000, 4, .3), labels=c("a", "b", "c", "d", "e")),
+ V2=factor(rbinom(1000, 4, .3), labels=c("f", "g", "h", "i", "j")),
+ V3=factor(rbinom(1000, 4, .3), labels=c("k", "l", "m", "n", "o")))
> df <- rbind(df, rn) # combine systematic and random data together
> summary(df)
V1 V2 V3
a:234 f:243 k:258
b:433 g:437 l:408
c:286 h:254 m:280
d: 82 i: 91 n: 89
e: 15 j: 25 o: 15
>
```

### as polytomous

```
> as_poly <- poLCA(cbind(V1, V2, V3) ~ 1, df, nclass=2, maxiter = 10000)
Conditional item response (column) probabilities,
by outcome variable, for each class (row)
$V1
a b c d e
class 1: 0.2277 0.4202 0.2781 0.0691 0.0049
class 2: 0.0054 0.0649 0.0158 0.4792 0.4347
$V2
f g h i j
class 1: 0.2366 0.4244 0.2473 0.0777 0.0139
class 2: 0.0000 0.0506 0.0000 0.4849 0.4645
$V3
k l m n o
class 1: 0.2506 0.3973 0.2726 0.0761 0.0034
class 2: 0.0263 0.0000 0.0000 0.4713 0.5024
Estimated class population shares
0.9781 0.0219
Predicted class memberships (by modal posterior prob.)
0.9781 0.0219
```

Here’s the fit, df, etc.:

```
=========================================================
Fit for 2 latent classes:
=========================================================
number of observations: 1050
number of estimated parameters: 25
residual degrees of freedom: 99
maximum log-likelihood: -4122.564
AIC(2): 8295.129
BIC(2): 8419.043
G^2(2): 139.2114 (Likelihood ratio/deviance statistic)
X^2(2): 139.9829 (Chi-square goodness of fit)
```

### as dichotomous

```
> library(psych) # for dummy.code()
> library(dplyr) # for pipe and mutate_if
> df_dich <- cbind(dummy.code(df$V1), dummy.code(df$V2), dummy.code(df$V3)) %>%
+ as.data.frame() %>%
+ mutate_if(is.numeric, as.factor)
>
> as_dich <- poLCA(cbind(a,b,c,d,e,f,g,h,i,j,k,l,m,n,o) ~ 1, df_dich, nclass=2, maxiter = 10000)
Conditional item response (column) probabilities,
by outcome variable, for each class (row)
$a
0 1
class 1: 0.7849 0.2151
class 2: 0.7716 0.2284
$b
0 1
class 1: 0.5767 0.4233
class 2: 0.5954 0.4046
$c
0 1
class 1: 0.7208 0.2792
class 2: 0.7325 0.2675
$d
0 1
class 1: 0.9222 0.0778
class 2: 0.9217 0.0783
$e
0 1
class 1: 0.9954 0.0046
class 2: 0.9788 0.0212
$f
0 1
class 1: 1.0000 0.0000
class 2: 0.6036 0.3964
$g
0 1
class 1: 0 1
class 2: 1 0
$h
0 1
class 1: 1.0000 0.0000
class 2: 0.5856 0.4144
$i
0 1
class 1: 1.0000 0.0000
class 2: 0.8515 0.1485
$j
0 1
class 1: 1.0000 0.0000
class 2: 0.9592 0.0408
$k
0 1
class 1: 0.7689 0.2311
class 2: 0.7439 0.2561
$l
0 1
class 1: 0.5652 0.4348
class 2: 0.6444 0.3556
$m
0 1
class 1: 0.7437 0.2563
class 2: 0.7259 0.2741
$n
0 1
class 1: 0.9268 0.0732
class 2: 0.9070 0.0930
$o
0 1
class 1: 0.9954 0.0046
class 2: 0.9788 0.0212
Estimated class population shares
0.4162 0.5838
Predicted class memberships (by modal posterior prob.)
0.4162 0.5838
```

Here’s the fit, df, etc.:

```
=========================================================
Fit for 2 latent classes:
=========================================================
number of observations: 1050
number of estimated parameters: 31
residual degrees of freedom: 1019
maximum log-likelihood: -6420.786
AIC(2): 12903.57
BIC(2): 13057.23
G^2(2): 4735.655 (Likelihood ratio/deviance statistic)
X^2(2): 315556.7 (Chi-square goodness of fit)
```

You’ll notice the degrees of freedom change, so it’s clearly running two different models (and measures of fit, etc. change as well, of course).

