#StackBounty: #r #anova #repeated-measures #residuals #normality-assumption Mixed ANOVA normality: which variables should be tested? (i…

Bounty: 50

I have spent a lot of time reading book chapters, articles, online tutorials, etc., but with no clear answer (mostly because they only describe one-way ANOVA or other very specific applications). There have also been many similar questions on this site, but again no satisfactory answer for my purposes.

In essence, I’d like to know the clear and straightforward (non-technical), and completely generalizable (and practically implementable) answer for how to test the (in)famous ANOVA normality assumption given any number of within-subject or between-subject factors (with any number of levels).

At least this tutorial advises to test the normality of every single cell, i.e. every possible combination of each level of each factor – but no references or detailed reasoning is given, and it seems quite extreme for complex designs. But most others (e.g. this answer or this book chapter or this video tutorial) suggests that only the residuals should be tested (regardless of within/between factors). Even if I assume that this is latter true, the question remains: which residuals should be tested?

In the following, I use the R function stats:aov output to illustrate in an example some potential answers.

I have a hypothetical dataset. Each individual subject is denoted with "subject_id". There are two between-subject factors: "btwn_X" and "btwn_Y". There are also two within-subject factors: "wthn_X" and "wthn_Y". The aov object aov_obj returns the following:

Grand Mean: 523.3064

Stratum 1: subject_id

Terms:
                       btwn_X btwn_Y btwn_X:btwn_Y Residuals
Sum of Squares   393209.0        45184.5            9583.3 1768261.2
Deg. of Freedom         1              1                 1       132

Residual standard error: 115.7407
9 out of 12 effects not estimable
Estimated effects may be unbalanced

Stratum 2: subject_id:wthn_X

Terms:
                wthn_X btwn_X:wthn_X btwn_Y:wthn_X btwn_X:btwn_Y:wthn_X Residuals
Sum of Squares   273876.35     262325.82                    192.19                       663.69 199702.58
Deg. of Freedom          1             1                         1                            1       132

Residual standard error: 38.89599
4 out of 8 effects not estimable
Estimated effects may be unbalanced

Stratum 3: subject_id:wthn_Y

Terms:
                 wthn_Y btwn_X:wthn_Y btwn_Y:wthn_Y btwn_X:btwn_Y:wthn_Y Residuals
Sum of Squares   20514.4    27879.9                85348.2                   15667.3  325852.8
Deg. of Freedom        1          1                      1                         1       132

Residual standard error: 49.68483
4 out of 8 effects not estimable
Estimated effects may be unbalanced

Stratum 4: subject_id:wthn_X:wthn_Y

Terms:
                wthn_X:wthn_Y btwn_X:wthn_X:wthn_Y btwn_Y:wthn_X:wthn_Y btwn_X:btwn_Y:wthn_X:wthn_Y Residuals
Sum of Squares             1042.83               1070.27                           5202.57                              5791.20  78756.74
Deg. of Freedom                  1                     1                                 1                                    1       132

Residual standard error: 24.42626
Estimated effects may be unbalanced

I can access the following residuals (see here for more details):

aov_obj$subject_id$residuals
aov_obj$`subject_id:wthn_X`$residuals
aov_obj$`subject_id:wthn_Y`$residuals
aov_obj$`subject_id:wthn_X:wthn_Y`$residuals

Based on this answer, it would seem that each of these variables should be tested separately for normality. Alternatively, perhaps only subject_id:wthn_X:wthn_Y$residuals. (Yet again, perhaps each cell should be tested tested separately, and residuals can be ignored.)


Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.