#StackBounty: #r #repeated-measures #lme4-nlme #binomial #glmm Accounting for time in repeated measures glmm, R

Bounty: 50

I have some count data of advanced stage juvenile snails in tanks that are sampled every 4 days for 4 sample points. I want to see how much the snail development stages change with a changing dosage of algae (3 tanks per dosage). So the number of advanced juvenile snails (a proportion of the total snails) is the response (Count), dosage of algae is a fixed predictor (continuous), and tanks are included as a random factor.

Here is a sample of the data:

 head(dat)
  Day Tank Dose      Count
1   1    1  100 0.00000000
2   1    8  100 0.08333333
3   1   13  100 0.07692308
4   1   17   75 0.00000000
5   1    4   75 0.00000000
6   1    5   75 0.00000000

Importantly, my data also has a time variable (Day), since the tanks were repeatedly sampled every 4 days, and this sampling was destructive, meaning the tank population decreased each time. Also the stage of the snails would increase with time, and would depend on the stages of the snails at the previous time points (obviously).

How can I account for this repeated measures aspect?
The below code is what I have so far, but to me this says I am looking for an interaction between Dose and Day. I’m not sure if this is what I want. The effect of algal Dose is the primary focus here.

glmer(Count ~ Dose * Day + (1| Tank) , data = dat, family= "poisson")

Any advice would be very much appreciated.


Get this bounty!!!

#StackBounty: #r #repeated-measures #lme4-nlme #binomial #glmm Accounting for time in repeated measures glm, R

Bounty: 50

I have some count data of advanced adult snails in tanks that are sampled every 4 days for 4 sample points. I want to see how much the snail development stages change with a changing dosage of algae (3 tanks per dosage). So the number of advanced adult snails is the response, dosage of algae is a fixed predictor (continuous), and tanks are included as a random factor.

Here is a sample of the data:

 head(dat)
  Day Tank Dose      Count
1   1    1  100 0.00000000
2   1    8  100 0.08333333
3   1   13  100 0.07692308
4   1   17   75 0.00000000
5   1    4   75 0.00000000
6   1    5   75 0.00000000

This question had a similar experimental design to mine, and based on the answer of Ben Bolker, I have decided to go with a quasibinomial model. However, my data also has a time variable (Day), since the tanks were repeatedly sampled every 4 days, and this sampling was destructive, meaning the tank population decreased each time. Also the stage of the snails would increase with time, and would depend on the stages of the snails at the previous time points (obviously).

How can I account for this repeated measures aspect?
The below code is what I have so far, but to me this says I am looking for an interaction between Dose and Day. I’m not sure if this is what I want. The effect of algal Dose is the primary focus here.

dat<-ddply(dat, .(Dose), transform, rescale = rescale(Count))

glm(rescale ~ Dose * Day + (1| Tank) , data = dat, family= "quasibinomial")

Any advice would be very much appreciated.


Get this bounty!!!

#StackBounty: #r #regression #time-series #multiple-regression #python Multiple time-series symbolic regression

Bounty: 50

I have few columns of technical data as a time-series, let’s say column a, column b and column c. I want to find out the impact of both a and b on c.

If I search for these keywords I find

  • pandas corr function that computes (several) correlation coefficients on (only) two columns.
  • models like ARMA or ARIMA where the first “A” means autoregressive that’s a regression of the same but time-lagged column.

So, what I am looking for is a kind of symbolic regression (or similar) that can compute correlations of several columns of time series on another.


Get this bounty!!!

#StackBounty: #r #latent-class Latent Class Analysis: What's the difference between polytomous vs. dichotomous manifest variables?

Bounty: 50

I am using poLCA in R to run a latent class analysis (LCA), and I’d like some help understanding the implications of using polytomous vs. dichotomous manifest variables.

I have three polytomous manifest variables, each with many levels (7, 12, and 14, to be precise). I originally dummy coded them all and ran it on 33 dichotomous variables, but I’m re-running it now as just three polytomous variables. The degrees of freedom for the model are different depending on how I do this (and measures of fit, etc. all change too, of course), so it’s clearly not running the same model two ways, but two completely different models.

Can someone explain how LCA with polytomous manifest variables differs from LCA on the same information represented in dichotomous variables? Some software requires dichotomous manifest variables, but poLCA can handle polytomous, so I have the option to choose, and I don’t feel like I understand the implications of the choice.
Here’s an example:

> set.seed(24601)
> df <- data.frame(V1=rep(c("a", "b", "c", "d", "e"), 10), 
+                  V2=rep(c("f", "g", "h", "i", "j"), 10), 
+                  V3=rep(c("k", "l", "m", "n", "o"), 10))
> rn <- data.frame(V1=factor(rbinom(1000, 4, .3), labels=c("a", "b", "c", "d", "e")), 
+                  V2=factor(rbinom(1000, 4, .3), labels=c("f", "g", "h", "i", "j")), 
+                  V3=factor(rbinom(1000, 4, .3), labels=c("k", "l", "m", "n", "o")))
> df <- rbind(df, rn) # combine systematic and random data together
> summary(df)
 V1      V2      V3     
 a:234   f:243   k:258  
 b:433   g:437   l:408  
 c:286   h:254   m:280  
 d: 82   i: 91   n: 89  
 e: 15   j: 25   o: 15  
> 

as polytomous

> as_poly <- poLCA(cbind(V1, V2, V3) ~ 1, df, nclass=2, maxiter = 10000)
Conditional item response (column) probabilities,
 by outcome variable, for each class (row) 

$V1
               a      b      c      d      e
class 1:  0.2277 0.4202 0.2781 0.0691 0.0049
class 2:  0.0054 0.0649 0.0158 0.4792 0.4347

$V2
               f      g      h      i      j
class 1:  0.2366 0.4244 0.2473 0.0777 0.0139
class 2:  0.0000 0.0506 0.0000 0.4849 0.4645

$V3
               k      l      m      n      o
class 1:  0.2506 0.3973 0.2726 0.0761 0.0034
class 2:  0.0263 0.0000 0.0000 0.4713 0.5024

Estimated class population shares 
 0.9781 0.0219 

Predicted class memberships (by modal posterior prob.) 
 0.9781 0.0219 

Here’s the fit, df, etc.:

========================================================= 
Fit for 2 latent classes: 
========================================================= 
number of observations: 1050 
number of estimated parameters: 25 
residual degrees of freedom: 99 
maximum log-likelihood: -4122.564 

AIC(2): 8295.129
BIC(2): 8419.043
G^2(2): 139.2114 (Likelihood ratio/deviance statistic) 
X^2(2): 139.9829 (Chi-square goodness of fit) 

as dichotomous

> library(psych) # for dummy.code()
> library(dplyr) # for pipe and mutate_if
> df_dich <- cbind(dummy.code(df$V1), dummy.code(df$V2), dummy.code(df$V3)) %>% 
+   as.data.frame() %>% 
+   mutate_if(is.numeric, as.factor) 
> 
> as_dich <- poLCA(cbind(a,b,c,d,e,f,g,h,i,j,k,l,m,n,o) ~ 1, df_dich, nclass=2, maxiter = 10000)
Conditional item response (column) probabilities,
 by outcome variable, for each class (row) 

$a
               0      1
class 1:  0.7849 0.2151
class 2:  0.7716 0.2284

$b
               0      1
class 1:  0.5767 0.4233
class 2:  0.5954 0.4046

$c
               0      1
class 1:  0.7208 0.2792
class 2:  0.7325 0.2675

$d
               0      1
class 1:  0.9222 0.0778
class 2:  0.9217 0.0783

$e
               0      1
class 1:  0.9954 0.0046
class 2:  0.9788 0.0212

$f
               0      1
class 1:  1.0000 0.0000
class 2:  0.6036 0.3964

$g
          0 1
class 1:  0 1
class 2:  1 0

$h
               0      1
class 1:  1.0000 0.0000
class 2:  0.5856 0.4144

$i
               0      1
class 1:  1.0000 0.0000
class 2:  0.8515 0.1485

$j
               0      1
class 1:  1.0000 0.0000
class 2:  0.9592 0.0408

$k
               0      1
class 1:  0.7689 0.2311
class 2:  0.7439 0.2561

$l
               0      1
class 1:  0.5652 0.4348
class 2:  0.6444 0.3556

$m
               0      1
class 1:  0.7437 0.2563
class 2:  0.7259 0.2741

$n
               0      1
class 1:  0.9268 0.0732
class 2:  0.9070 0.0930

$o
               0      1
class 1:  0.9954 0.0046
class 2:  0.9788 0.0212

Estimated class population shares 
 0.4162 0.5838 

Predicted class memberships (by modal posterior prob.) 
 0.4162 0.5838 

Here’s the fit, df, etc.:

========================================================= 
Fit for 2 latent classes: 
========================================================= 
number of observations: 1050 
number of estimated parameters: 31 
residual degrees of freedom: 1019 
maximum log-likelihood: -6420.786 

AIC(2): 12903.57
BIC(2): 13057.23
G^2(2): 4735.655 (Likelihood ratio/deviance statistic) 
X^2(2): 315556.7 (Chi-square goodness of fit) 

You’ll notice the degrees of freedom change, so it’s clearly running two different models (and measures of fit, etc. change as well, of course).


Get this bounty!!!

#StackBounty: #r #latent-class Latent Class Analysis: What's the difference between polytomous vs. dichotomous manifest variables?

Bounty: 50

I am using poLCA in R to run a latent class analysis (LCA), and I’d like some help understanding the implications of using polytomous vs. dichotomous manifest variables.

I have three polytomous manifest variables, each with many levels (7, 12, and 14, to be precise). I originally dummy coded them all and ran it on 33 dichotomous variables, but I’m re-running it now as just three polytomous variables. The degrees of freedom for the model are different depending on how I do this (and measures of fit, etc. all change too, of course), so it’s clearly not running the same model two ways, but two completely different models.

Can someone explain how LCA with polytomous manifest variables differs from LCA on the same information represented in dichotomous variables? Some software requires dichotomous manifest variables, but poLCA can handle polytomous, so I have the option to choose, and I don’t feel like I understand the implications of the choice.
Here’s an example:

> set.seed(24601)
> df <- data.frame(V1=rep(c("a", "b", "c", "d", "e"), 10), 
+                  V2=rep(c("f", "g", "h", "i", "j"), 10), 
+                  V3=rep(c("k", "l", "m", "n", "o"), 10))
> rn <- data.frame(V1=factor(rbinom(1000, 4, .3), labels=c("a", "b", "c", "d", "e")), 
+                  V2=factor(rbinom(1000, 4, .3), labels=c("f", "g", "h", "i", "j")), 
+                  V3=factor(rbinom(1000, 4, .3), labels=c("k", "l", "m", "n", "o")))
> df <- rbind(df, rn) # combine systematic and random data together
> summary(df)
 V1      V2      V3     
 a:234   f:243   k:258  
 b:433   g:437   l:408  
 c:286   h:254   m:280  
 d: 82   i: 91   n: 89  
 e: 15   j: 25   o: 15  
> 

as polytomous

> as_poly <- poLCA(cbind(V1, V2, V3) ~ 1, df, nclass=2, maxiter = 10000)
Conditional item response (column) probabilities,
 by outcome variable, for each class (row) 

$V1
               a      b      c      d      e
class 1:  0.2277 0.4202 0.2781 0.0691 0.0049
class 2:  0.0054 0.0649 0.0158 0.4792 0.4347

$V2
               f      g      h      i      j
class 1:  0.2366 0.4244 0.2473 0.0777 0.0139
class 2:  0.0000 0.0506 0.0000 0.4849 0.4645

$V3
               k      l      m      n      o
class 1:  0.2506 0.3973 0.2726 0.0761 0.0034
class 2:  0.0263 0.0000 0.0000 0.4713 0.5024

Estimated class population shares 
 0.9781 0.0219 

Predicted class memberships (by modal posterior prob.) 
 0.9781 0.0219 

Here’s the fit, df, etc.:

========================================================= 
Fit for 2 latent classes: 
========================================================= 
number of observations: 1050 
number of estimated parameters: 25 
residual degrees of freedom: 99 
maximum log-likelihood: -4122.564 

AIC(2): 8295.129
BIC(2): 8419.043
G^2(2): 139.2114 (Likelihood ratio/deviance statistic) 
X^2(2): 139.9829 (Chi-square goodness of fit) 

as dichotomous

> library(psych) # for dummy.code()
> library(dplyr) # for pipe and mutate_if
> df_dich <- cbind(dummy.code(df$V1), dummy.code(df$V2), dummy.code(df$V3)) %>% 
+   as.data.frame() %>% 
+   mutate_if(is.numeric, as.factor) 
> 
> as_dich <- poLCA(cbind(a,b,c,d,e,f,g,h,i,j,k,l,m,n,o) ~ 1, df_dich, nclass=2, maxiter = 10000)
Conditional item response (column) probabilities,
 by outcome variable, for each class (row) 

$a
               0      1
class 1:  0.7849 0.2151
class 2:  0.7716 0.2284

$b
               0      1
class 1:  0.5767 0.4233
class 2:  0.5954 0.4046

$c
               0      1
class 1:  0.7208 0.2792
class 2:  0.7325 0.2675

$d
               0      1
class 1:  0.9222 0.0778
class 2:  0.9217 0.0783

$e
               0      1
class 1:  0.9954 0.0046
class 2:  0.9788 0.0212

$f
               0      1
class 1:  1.0000 0.0000
class 2:  0.6036 0.3964

$g
          0 1
class 1:  0 1
class 2:  1 0

$h
               0      1
class 1:  1.0000 0.0000
class 2:  0.5856 0.4144

$i
               0      1
class 1:  1.0000 0.0000
class 2:  0.8515 0.1485

$j
               0      1
class 1:  1.0000 0.0000
class 2:  0.9592 0.0408

$k
               0      1
class 1:  0.7689 0.2311
class 2:  0.7439 0.2561

$l
               0      1
class 1:  0.5652 0.4348
class 2:  0.6444 0.3556

$m
               0      1
class 1:  0.7437 0.2563
class 2:  0.7259 0.2741

$n
               0      1
class 1:  0.9268 0.0732
class 2:  0.9070 0.0930

$o
               0      1
class 1:  0.9954 0.0046
class 2:  0.9788 0.0212

Estimated class population shares 
 0.4162 0.5838 

Predicted class memberships (by modal posterior prob.) 
 0.4162 0.5838 

Here’s the fit, df, etc.:

========================================================= 
Fit for 2 latent classes: 
========================================================= 
number of observations: 1050 
number of estimated parameters: 31 
residual degrees of freedom: 1019 
maximum log-likelihood: -6420.786 

AIC(2): 12903.57
BIC(2): 13057.23
G^2(2): 4735.655 (Likelihood ratio/deviance statistic) 
X^2(2): 315556.7 (Chi-square goodness of fit) 

You’ll notice the degrees of freedom change, so it’s clearly running two different models (and measures of fit, etc. change as well, of course).


Get this bounty!!!

#StackBounty: #r #latent-class Latent Class Analysis: What's the difference between polytomous vs. dichotomous manifest variables?

Bounty: 50

I am using poLCA in R to run a latent class analysis (LCA), and I’d like some help understanding the implications of using polytomous vs. dichotomous manifest variables.

I have three polytomous manifest variables, each with many levels (7, 12, and 14, to be precise). I originally dummy coded them all and ran it on 33 dichotomous variables, but I’m re-running it now as just three polytomous variables. The degrees of freedom for the model are different depending on how I do this (and measures of fit, etc. all change too, of course), so it’s clearly not running the same model two ways, but two completely different models.

Can someone explain how LCA with polytomous manifest variables differs from LCA on the same information represented in dichotomous variables? Some software requires dichotomous manifest variables, but poLCA can handle polytomous, so I have the option to choose, and I don’t feel like I understand the implications of the choice.
Here’s an example:

> set.seed(24601)
> df <- data.frame(V1=rep(c("a", "b", "c", "d", "e"), 10), 
+                  V2=rep(c("f", "g", "h", "i", "j"), 10), 
+                  V3=rep(c("k", "l", "m", "n", "o"), 10))
> rn <- data.frame(V1=factor(rbinom(1000, 4, .3), labels=c("a", "b", "c", "d", "e")), 
+                  V2=factor(rbinom(1000, 4, .3), labels=c("f", "g", "h", "i", "j")), 
+                  V3=factor(rbinom(1000, 4, .3), labels=c("k", "l", "m", "n", "o")))
> df <- rbind(df, rn) # combine systematic and random data together
> summary(df)
 V1      V2      V3     
 a:234   f:243   k:258  
 b:433   g:437   l:408  
 c:286   h:254   m:280  
 d: 82   i: 91   n: 89  
 e: 15   j: 25   o: 15  
> 

as polytomous

> as_poly <- poLCA(cbind(V1, V2, V3) ~ 1, df, nclass=2, maxiter = 10000)
Conditional item response (column) probabilities,
 by outcome variable, for each class (row) 

$V1
               a      b      c      d      e
class 1:  0.2277 0.4202 0.2781 0.0691 0.0049
class 2:  0.0054 0.0649 0.0158 0.4792 0.4347

$V2
               f      g      h      i      j
class 1:  0.2366 0.4244 0.2473 0.0777 0.0139
class 2:  0.0000 0.0506 0.0000 0.4849 0.4645

$V3
               k      l      m      n      o
class 1:  0.2506 0.3973 0.2726 0.0761 0.0034
class 2:  0.0263 0.0000 0.0000 0.4713 0.5024

Estimated class population shares 
 0.9781 0.0219 

Predicted class memberships (by modal posterior prob.) 
 0.9781 0.0219 

Here’s the fit, df, etc.:

========================================================= 
Fit for 2 latent classes: 
========================================================= 
number of observations: 1050 
number of estimated parameters: 25 
residual degrees of freedom: 99 
maximum log-likelihood: -4122.564 

AIC(2): 8295.129
BIC(2): 8419.043
G^2(2): 139.2114 (Likelihood ratio/deviance statistic) 
X^2(2): 139.9829 (Chi-square goodness of fit) 

as dichotomous

> library(psych) # for dummy.code()
> library(dplyr) # for pipe and mutate_if
> df_dich <- cbind(dummy.code(df$V1), dummy.code(df$V2), dummy.code(df$V3)) %>% 
+   as.data.frame() %>% 
+   mutate_if(is.numeric, as.factor) 
> 
> as_dich <- poLCA(cbind(a,b,c,d,e,f,g,h,i,j,k,l,m,n,o) ~ 1, df_dich, nclass=2, maxiter = 10000)
Conditional item response (column) probabilities,
 by outcome variable, for each class (row) 

$a
               0      1
class 1:  0.7849 0.2151
class 2:  0.7716 0.2284

$b
               0      1
class 1:  0.5767 0.4233
class 2:  0.5954 0.4046

$c
               0      1
class 1:  0.7208 0.2792
class 2:  0.7325 0.2675

$d
               0      1
class 1:  0.9222 0.0778
class 2:  0.9217 0.0783

$e
               0      1
class 1:  0.9954 0.0046
class 2:  0.9788 0.0212

$f
               0      1
class 1:  1.0000 0.0000
class 2:  0.6036 0.3964

$g
          0 1
class 1:  0 1
class 2:  1 0

$h
               0      1
class 1:  1.0000 0.0000
class 2:  0.5856 0.4144

$i
               0      1
class 1:  1.0000 0.0000
class 2:  0.8515 0.1485

$j
               0      1
class 1:  1.0000 0.0000
class 2:  0.9592 0.0408

$k
               0      1
class 1:  0.7689 0.2311
class 2:  0.7439 0.2561

$l
               0      1
class 1:  0.5652 0.4348
class 2:  0.6444 0.3556

$m
               0      1
class 1:  0.7437 0.2563
class 2:  0.7259 0.2741

$n
               0      1
class 1:  0.9268 0.0732
class 2:  0.9070 0.0930

$o
               0      1
class 1:  0.9954 0.0046
class 2:  0.9788 0.0212

Estimated class population shares 
 0.4162 0.5838 

Predicted class memberships (by modal posterior prob.) 
 0.4162 0.5838 

Here’s the fit, df, etc.:

========================================================= 
Fit for 2 latent classes: 
========================================================= 
number of observations: 1050 
number of estimated parameters: 31 
residual degrees of freedom: 1019 
maximum log-likelihood: -6420.786 

AIC(2): 12903.57
BIC(2): 13057.23
G^2(2): 4735.655 (Likelihood ratio/deviance statistic) 
X^2(2): 315556.7 (Chi-square goodness of fit) 

You’ll notice the degrees of freedom change, so it’s clearly running two different models (and measures of fit, etc. change as well, of course).


Get this bounty!!!

#StackBounty: #r #latent-class Latent Class Analysis: What's the difference between polytomous vs. dichotomous manifest variables?

Bounty: 50

I am using poLCA in R to run a latent class analysis (LCA), and I’d like some help understanding the implications of using polytomous vs. dichotomous manifest variables.

I have three polytomous manifest variables, each with many levels (7, 12, and 14, to be precise). I originally dummy coded them all and ran it on 33 dichotomous variables, but I’m re-running it now as just three polytomous variables. The degrees of freedom for the model are different depending on how I do this (and measures of fit, etc. all change too, of course), so it’s clearly not running the same model two ways, but two completely different models.

Can someone explain how LCA with polytomous manifest variables differs from LCA on the same information represented in dichotomous variables? Some software requires dichotomous manifest variables, but poLCA can handle polytomous, so I have the option to choose, and I don’t feel like I understand the implications of the choice.
Here’s an example:

> set.seed(24601)
> df <- data.frame(V1=rep(c("a", "b", "c", "d", "e"), 10), 
+                  V2=rep(c("f", "g", "h", "i", "j"), 10), 
+                  V3=rep(c("k", "l", "m", "n", "o"), 10))
> rn <- data.frame(V1=factor(rbinom(1000, 4, .3), labels=c("a", "b", "c", "d", "e")), 
+                  V2=factor(rbinom(1000, 4, .3), labels=c("f", "g", "h", "i", "j")), 
+                  V3=factor(rbinom(1000, 4, .3), labels=c("k", "l", "m", "n", "o")))
> df <- rbind(df, rn) # combine systematic and random data together
> summary(df)
 V1      V2      V3     
 a:234   f:243   k:258  
 b:433   g:437   l:408  
 c:286   h:254   m:280  
 d: 82   i: 91   n: 89  
 e: 15   j: 25   o: 15  
> 

as polytomous

> as_poly <- poLCA(cbind(V1, V2, V3) ~ 1, df, nclass=2, maxiter = 10000)
Conditional item response (column) probabilities,
 by outcome variable, for each class (row) 

$V1
               a      b      c      d      e
class 1:  0.2277 0.4202 0.2781 0.0691 0.0049
class 2:  0.0054 0.0649 0.0158 0.4792 0.4347

$V2
               f      g      h      i      j
class 1:  0.2366 0.4244 0.2473 0.0777 0.0139
class 2:  0.0000 0.0506 0.0000 0.4849 0.4645

$V3
               k      l      m      n      o
class 1:  0.2506 0.3973 0.2726 0.0761 0.0034
class 2:  0.0263 0.0000 0.0000 0.4713 0.5024

Estimated class population shares 
 0.9781 0.0219 

Predicted class memberships (by modal posterior prob.) 
 0.9781 0.0219 

Here’s the fit, df, etc.:

========================================================= 
Fit for 2 latent classes: 
========================================================= 
number of observations: 1050 
number of estimated parameters: 25 
residual degrees of freedom: 99 
maximum log-likelihood: -4122.564 

AIC(2): 8295.129
BIC(2): 8419.043
G^2(2): 139.2114 (Likelihood ratio/deviance statistic) 
X^2(2): 139.9829 (Chi-square goodness of fit) 

as dichotomous

> library(psych) # for dummy.code()
> library(dplyr) # for pipe and mutate_if
> df_dich <- cbind(dummy.code(df$V1), dummy.code(df$V2), dummy.code(df$V3)) %>% 
+   as.data.frame() %>% 
+   mutate_if(is.numeric, as.factor) 
> 
> as_dich <- poLCA(cbind(a,b,c,d,e,f,g,h,i,j,k,l,m,n,o) ~ 1, df_dich, nclass=2, maxiter = 10000)
Conditional item response (column) probabilities,
 by outcome variable, for each class (row) 

$a
               0      1
class 1:  0.7849 0.2151
class 2:  0.7716 0.2284

$b
               0      1
class 1:  0.5767 0.4233
class 2:  0.5954 0.4046

$c
               0      1
class 1:  0.7208 0.2792
class 2:  0.7325 0.2675

$d
               0      1
class 1:  0.9222 0.0778
class 2:  0.9217 0.0783

$e
               0      1
class 1:  0.9954 0.0046
class 2:  0.9788 0.0212

$f
               0      1
class 1:  1.0000 0.0000
class 2:  0.6036 0.3964

$g
          0 1
class 1:  0 1
class 2:  1 0

$h
               0      1
class 1:  1.0000 0.0000
class 2:  0.5856 0.4144

$i
               0      1
class 1:  1.0000 0.0000
class 2:  0.8515 0.1485

$j
               0      1
class 1:  1.0000 0.0000
class 2:  0.9592 0.0408

$k
               0      1
class 1:  0.7689 0.2311
class 2:  0.7439 0.2561

$l
               0      1
class 1:  0.5652 0.4348
class 2:  0.6444 0.3556

$m
               0      1
class 1:  0.7437 0.2563
class 2:  0.7259 0.2741

$n
               0      1
class 1:  0.9268 0.0732
class 2:  0.9070 0.0930

$o
               0      1
class 1:  0.9954 0.0046
class 2:  0.9788 0.0212

Estimated class population shares 
 0.4162 0.5838 

Predicted class memberships (by modal posterior prob.) 
 0.4162 0.5838 

Here’s the fit, df, etc.:

========================================================= 
Fit for 2 latent classes: 
========================================================= 
number of observations: 1050 
number of estimated parameters: 31 
residual degrees of freedom: 1019 
maximum log-likelihood: -6420.786 

AIC(2): 12903.57
BIC(2): 13057.23
G^2(2): 4735.655 (Likelihood ratio/deviance statistic) 
X^2(2): 315556.7 (Chi-square goodness of fit) 

You’ll notice the degrees of freedom change, so it’s clearly running two different models (and measures of fit, etc. change as well, of course).


Get this bounty!!!

#StackBounty: #r #latent-class Latent Class Analysis: What's the difference between polytomous vs. dichotomous manifest variables?

Bounty: 50

I am using poLCA in R to run a latent class analysis (LCA), and I’d like some help understanding the implications of using polytomous vs. dichotomous manifest variables.

I have three polytomous manifest variables, each with many levels (7, 12, and 14, to be precise). I originally dummy coded them all and ran it on 33 dichotomous variables, but I’m re-running it now as just three polytomous variables. The degrees of freedom for the model are different depending on how I do this (and measures of fit, etc. all change too, of course), so it’s clearly not running the same model two ways, but two completely different models.

Can someone explain how LCA with polytomous manifest variables differs from LCA on the same information represented in dichotomous variables? Some software requires dichotomous manifest variables, but poLCA can handle polytomous, so I have the option to choose, and I don’t feel like I understand the implications of the choice.
Here’s an example:

> set.seed(24601)
> df <- data.frame(V1=rep(c("a", "b", "c", "d", "e"), 10), 
+                  V2=rep(c("f", "g", "h", "i", "j"), 10), 
+                  V3=rep(c("k", "l", "m", "n", "o"), 10))
> rn <- data.frame(V1=factor(rbinom(1000, 4, .3), labels=c("a", "b", "c", "d", "e")), 
+                  V2=factor(rbinom(1000, 4, .3), labels=c("f", "g", "h", "i", "j")), 
+                  V3=factor(rbinom(1000, 4, .3), labels=c("k", "l", "m", "n", "o")))
> df <- rbind(df, rn) # combine systematic and random data together
> summary(df)
 V1      V2      V3     
 a:234   f:243   k:258  
 b:433   g:437   l:408  
 c:286   h:254   m:280  
 d: 82   i: 91   n: 89  
 e: 15   j: 25   o: 15  
> 

as polytomous

> as_poly <- poLCA(cbind(V1, V2, V3) ~ 1, df, nclass=2, maxiter = 10000)
Conditional item response (column) probabilities,
 by outcome variable, for each class (row) 

$V1
               a      b      c      d      e
class 1:  0.2277 0.4202 0.2781 0.0691 0.0049
class 2:  0.0054 0.0649 0.0158 0.4792 0.4347

$V2
               f      g      h      i      j
class 1:  0.2366 0.4244 0.2473 0.0777 0.0139
class 2:  0.0000 0.0506 0.0000 0.4849 0.4645

$V3
               k      l      m      n      o
class 1:  0.2506 0.3973 0.2726 0.0761 0.0034
class 2:  0.0263 0.0000 0.0000 0.4713 0.5024

Estimated class population shares 
 0.9781 0.0219 

Predicted class memberships (by modal posterior prob.) 
 0.9781 0.0219 

Here’s the fit, df, etc.:

========================================================= 
Fit for 2 latent classes: 
========================================================= 
number of observations: 1050 
number of estimated parameters: 25 
residual degrees of freedom: 99 
maximum log-likelihood: -4122.564 

AIC(2): 8295.129
BIC(2): 8419.043
G^2(2): 139.2114 (Likelihood ratio/deviance statistic) 
X^2(2): 139.9829 (Chi-square goodness of fit) 

as dichotomous

> library(psych) # for dummy.code()
> library(dplyr) # for pipe and mutate_if
> df_dich <- cbind(dummy.code(df$V1), dummy.code(df$V2), dummy.code(df$V3)) %>% 
+   as.data.frame() %>% 
+   mutate_if(is.numeric, as.factor) 
> 
> as_dich <- poLCA(cbind(a,b,c,d,e,f,g,h,i,j,k,l,m,n,o) ~ 1, df_dich, nclass=2, maxiter = 10000)
Conditional item response (column) probabilities,
 by outcome variable, for each class (row) 

$a
               0      1
class 1:  0.7849 0.2151
class 2:  0.7716 0.2284

$b
               0      1
class 1:  0.5767 0.4233
class 2:  0.5954 0.4046

$c
               0      1
class 1:  0.7208 0.2792
class 2:  0.7325 0.2675

$d
               0      1
class 1:  0.9222 0.0778
class 2:  0.9217 0.0783

$e
               0      1
class 1:  0.9954 0.0046
class 2:  0.9788 0.0212

$f
               0      1
class 1:  1.0000 0.0000
class 2:  0.6036 0.3964

$g
          0 1
class 1:  0 1
class 2:  1 0

$h
               0      1
class 1:  1.0000 0.0000
class 2:  0.5856 0.4144

$i
               0      1
class 1:  1.0000 0.0000
class 2:  0.8515 0.1485

$j
               0      1
class 1:  1.0000 0.0000
class 2:  0.9592 0.0408

$k
               0      1
class 1:  0.7689 0.2311
class 2:  0.7439 0.2561

$l
               0      1
class 1:  0.5652 0.4348
class 2:  0.6444 0.3556

$m
               0      1
class 1:  0.7437 0.2563
class 2:  0.7259 0.2741

$n
               0      1
class 1:  0.9268 0.0732
class 2:  0.9070 0.0930

$o
               0      1
class 1:  0.9954 0.0046
class 2:  0.9788 0.0212

Estimated class population shares 
 0.4162 0.5838 

Predicted class memberships (by modal posterior prob.) 
 0.4162 0.5838 

Here’s the fit, df, etc.:

========================================================= 
Fit for 2 latent classes: 
========================================================= 
number of observations: 1050 
number of estimated parameters: 31 
residual degrees of freedom: 1019 
maximum log-likelihood: -6420.786 

AIC(2): 12903.57
BIC(2): 13057.23
G^2(2): 4735.655 (Likelihood ratio/deviance statistic) 
X^2(2): 315556.7 (Chi-square goodness of fit) 

You’ll notice the degrees of freedom change, so it’s clearly running two different models (and measures of fit, etc. change as well, of course).


Get this bounty!!!

#StackBounty: #r #latent-class Latent Class Analysis: What's the difference between polytomous vs. dichotomous manifest variables?

Bounty: 50

I am using poLCA in R to run a latent class analysis (LCA), and I’d like some help understanding the implications of using polytomous vs. dichotomous manifest variables.

I have three polytomous manifest variables, each with many levels (7, 12, and 14, to be precise). I originally dummy coded them all and ran it on 33 dichotomous variables, but I’m re-running it now as just three polytomous variables. The degrees of freedom for the model are different depending on how I do this (and measures of fit, etc. all change too, of course), so it’s clearly not running the same model two ways, but two completely different models.

Can someone explain how LCA with polytomous manifest variables differs from LCA on the same information represented in dichotomous variables? Some software requires dichotomous manifest variables, but poLCA can handle polytomous, so I have the option to choose, and I don’t feel like I understand the implications of the choice.
Here’s an example:

> set.seed(24601)
> df <- data.frame(V1=rep(c("a", "b", "c", "d", "e"), 10), 
+                  V2=rep(c("f", "g", "h", "i", "j"), 10), 
+                  V3=rep(c("k", "l", "m", "n", "o"), 10))
> rn <- data.frame(V1=factor(rbinom(1000, 4, .3), labels=c("a", "b", "c", "d", "e")), 
+                  V2=factor(rbinom(1000, 4, .3), labels=c("f", "g", "h", "i", "j")), 
+                  V3=factor(rbinom(1000, 4, .3), labels=c("k", "l", "m", "n", "o")))
> df <- rbind(df, rn) # combine systematic and random data together
> summary(df)
 V1      V2      V3     
 a:234   f:243   k:258  
 b:433   g:437   l:408  
 c:286   h:254   m:280  
 d: 82   i: 91   n: 89  
 e: 15   j: 25   o: 15  
> 

as polytomous

> as_poly <- poLCA(cbind(V1, V2, V3) ~ 1, df, nclass=2, maxiter = 10000)
Conditional item response (column) probabilities,
 by outcome variable, for each class (row) 

$V1
               a      b      c      d      e
class 1:  0.2277 0.4202 0.2781 0.0691 0.0049
class 2:  0.0054 0.0649 0.0158 0.4792 0.4347

$V2
               f      g      h      i      j
class 1:  0.2366 0.4244 0.2473 0.0777 0.0139
class 2:  0.0000 0.0506 0.0000 0.4849 0.4645

$V3
               k      l      m      n      o
class 1:  0.2506 0.3973 0.2726 0.0761 0.0034
class 2:  0.0263 0.0000 0.0000 0.4713 0.5024

Estimated class population shares 
 0.9781 0.0219 

Predicted class memberships (by modal posterior prob.) 
 0.9781 0.0219 

Here’s the fit, df, etc.:

========================================================= 
Fit for 2 latent classes: 
========================================================= 
number of observations: 1050 
number of estimated parameters: 25 
residual degrees of freedom: 99 
maximum log-likelihood: -4122.564 

AIC(2): 8295.129
BIC(2): 8419.043
G^2(2): 139.2114 (Likelihood ratio/deviance statistic) 
X^2(2): 139.9829 (Chi-square goodness of fit) 

as dichotomous

> library(psych) # for dummy.code()
> library(dplyr) # for pipe and mutate_if
> df_dich <- cbind(dummy.code(df$V1), dummy.code(df$V2), dummy.code(df$V3)) %>% 
+   as.data.frame() %>% 
+   mutate_if(is.numeric, as.factor) 
> 
> as_dich <- poLCA(cbind(a,b,c,d,e,f,g,h,i,j,k,l,m,n,o) ~ 1, df_dich, nclass=2, maxiter = 10000)
Conditional item response (column) probabilities,
 by outcome variable, for each class (row) 

$a
               0      1
class 1:  0.7849 0.2151
class 2:  0.7716 0.2284

$b
               0      1
class 1:  0.5767 0.4233
class 2:  0.5954 0.4046

$c
               0      1
class 1:  0.7208 0.2792
class 2:  0.7325 0.2675

$d
               0      1
class 1:  0.9222 0.0778
class 2:  0.9217 0.0783

$e
               0      1
class 1:  0.9954 0.0046
class 2:  0.9788 0.0212

$f
               0      1
class 1:  1.0000 0.0000
class 2:  0.6036 0.3964

$g
          0 1
class 1:  0 1
class 2:  1 0

$h
               0      1
class 1:  1.0000 0.0000
class 2:  0.5856 0.4144

$i
               0      1
class 1:  1.0000 0.0000
class 2:  0.8515 0.1485

$j
               0      1
class 1:  1.0000 0.0000
class 2:  0.9592 0.0408

$k
               0      1
class 1:  0.7689 0.2311
class 2:  0.7439 0.2561

$l
               0      1
class 1:  0.5652 0.4348
class 2:  0.6444 0.3556

$m
               0      1
class 1:  0.7437 0.2563
class 2:  0.7259 0.2741

$n
               0      1
class 1:  0.9268 0.0732
class 2:  0.9070 0.0930

$o
               0      1
class 1:  0.9954 0.0046
class 2:  0.9788 0.0212

Estimated class population shares 
 0.4162 0.5838 

Predicted class memberships (by modal posterior prob.) 
 0.4162 0.5838 

Here’s the fit, df, etc.:

========================================================= 
Fit for 2 latent classes: 
========================================================= 
number of observations: 1050 
number of estimated parameters: 31 
residual degrees of freedom: 1019 
maximum log-likelihood: -6420.786 

AIC(2): 12903.57
BIC(2): 13057.23
G^2(2): 4735.655 (Likelihood ratio/deviance statistic) 
X^2(2): 315556.7 (Chi-square goodness of fit) 

You’ll notice the degrees of freedom change, so it’s clearly running two different models (and measures of fit, etc. change as well, of course).


Get this bounty!!!

#StackBounty: #r #latent-class Latent Class Analysis: What's the difference between polytomous vs. dichotomous manifest variables?

Bounty: 50

I am using poLCA in R to run a latent class analysis (LCA), and I’d like some help understanding the implications of using polytomous vs. dichotomous manifest variables.

I have three polytomous manifest variables, each with many levels (7, 12, and 14, to be precise). I originally dummy coded them all and ran it on 33 dichotomous variables, but I’m re-running it now as just three polytomous variables. The degrees of freedom for the model are different depending on how I do this (and measures of fit, etc. all change too, of course), so it’s clearly not running the same model two ways, but two completely different models.

Can someone explain how LCA with polytomous manifest variables differs from LCA on the same information represented in dichotomous variables? Some software requires dichotomous manifest variables, but poLCA can handle polytomous, so I have the option to choose, and I don’t feel like I understand the implications of the choice.
Here’s an example:

> set.seed(24601)
> df <- data.frame(V1=rep(c("a", "b", "c", "d", "e"), 10), 
+                  V2=rep(c("f", "g", "h", "i", "j"), 10), 
+                  V3=rep(c("k", "l", "m", "n", "o"), 10))
> rn <- data.frame(V1=factor(rbinom(1000, 4, .3), labels=c("a", "b", "c", "d", "e")), 
+                  V2=factor(rbinom(1000, 4, .3), labels=c("f", "g", "h", "i", "j")), 
+                  V3=factor(rbinom(1000, 4, .3), labels=c("k", "l", "m", "n", "o")))
> df <- rbind(df, rn) # combine systematic and random data together
> summary(df)
 V1      V2      V3     
 a:234   f:243   k:258  
 b:433   g:437   l:408  
 c:286   h:254   m:280  
 d: 82   i: 91   n: 89  
 e: 15   j: 25   o: 15  
> 

as polytomous

> as_poly <- poLCA(cbind(V1, V2, V3) ~ 1, df, nclass=2, maxiter = 10000)
Conditional item response (column) probabilities,
 by outcome variable, for each class (row) 

$V1
               a      b      c      d      e
class 1:  0.2277 0.4202 0.2781 0.0691 0.0049
class 2:  0.0054 0.0649 0.0158 0.4792 0.4347

$V2
               f      g      h      i      j
class 1:  0.2366 0.4244 0.2473 0.0777 0.0139
class 2:  0.0000 0.0506 0.0000 0.4849 0.4645

$V3
               k      l      m      n      o
class 1:  0.2506 0.3973 0.2726 0.0761 0.0034
class 2:  0.0263 0.0000 0.0000 0.4713 0.5024

Estimated class population shares 
 0.9781 0.0219 

Predicted class memberships (by modal posterior prob.) 
 0.9781 0.0219 

Here’s the fit, df, etc.:

========================================================= 
Fit for 2 latent classes: 
========================================================= 
number of observations: 1050 
number of estimated parameters: 25 
residual degrees of freedom: 99 
maximum log-likelihood: -4122.564 

AIC(2): 8295.129
BIC(2): 8419.043
G^2(2): 139.2114 (Likelihood ratio/deviance statistic) 
X^2(2): 139.9829 (Chi-square goodness of fit) 

as dichotomous

> library(psych) # for dummy.code()
> library(dplyr) # for pipe and mutate_if
> df_dich <- cbind(dummy.code(df$V1), dummy.code(df$V2), dummy.code(df$V3)) %>% 
+   as.data.frame() %>% 
+   mutate_if(is.numeric, as.factor) 
> 
> as_dich <- poLCA(cbind(a,b,c,d,e,f,g,h,i,j,k,l,m,n,o) ~ 1, df_dich, nclass=2, maxiter = 10000)
Conditional item response (column) probabilities,
 by outcome variable, for each class (row) 

$a
               0      1
class 1:  0.7849 0.2151
class 2:  0.7716 0.2284

$b
               0      1
class 1:  0.5767 0.4233
class 2:  0.5954 0.4046

$c
               0      1
class 1:  0.7208 0.2792
class 2:  0.7325 0.2675

$d
               0      1
class 1:  0.9222 0.0778
class 2:  0.9217 0.0783

$e
               0      1
class 1:  0.9954 0.0046
class 2:  0.9788 0.0212

$f
               0      1
class 1:  1.0000 0.0000
class 2:  0.6036 0.3964

$g
          0 1
class 1:  0 1
class 2:  1 0

$h
               0      1
class 1:  1.0000 0.0000
class 2:  0.5856 0.4144

$i
               0      1
class 1:  1.0000 0.0000
class 2:  0.8515 0.1485

$j
               0      1
class 1:  1.0000 0.0000
class 2:  0.9592 0.0408

$k
               0      1
class 1:  0.7689 0.2311
class 2:  0.7439 0.2561

$l
               0      1
class 1:  0.5652 0.4348
class 2:  0.6444 0.3556

$m
               0      1
class 1:  0.7437 0.2563
class 2:  0.7259 0.2741

$n
               0      1
class 1:  0.9268 0.0732
class 2:  0.9070 0.0930

$o
               0      1
class 1:  0.9954 0.0046
class 2:  0.9788 0.0212

Estimated class population shares 
 0.4162 0.5838 

Predicted class memberships (by modal posterior prob.) 
 0.4162 0.5838 

Here’s the fit, df, etc.:

========================================================= 
Fit for 2 latent classes: 
========================================================= 
number of observations: 1050 
number of estimated parameters: 31 
residual degrees of freedom: 1019 
maximum log-likelihood: -6420.786 

AIC(2): 12903.57
BIC(2): 13057.23
G^2(2): 4735.655 (Likelihood ratio/deviance statistic) 
X^2(2): 315556.7 (Chi-square goodness of fit) 

You’ll notice the degrees of freedom change, so it’s clearly running two different models (and measures of fit, etc. change as well, of course).


Get this bounty!!!