I have spent a lot of time reading book chapters, articles, online tutorials, etc., but with no clear answer (mostly because they only describe one-way ANOVA or other very specific applications). There have also been many similar questions on this site, but again no satisfactory answer for my purposes.
In essence, I’d like to know the clear and straightforward (non-technical), and completely generalizable (and practically implementable) answer for how to test the (in)famous ANOVA normality assumption given any number of within-subject or between-subject factors (with any number of levels).
At least this tutorial advises to test the normality of every single cell, i.e. every possible combination of each level of each factor – but no references or detailed reasoning is given, and it seems quite extreme for complex designs. But most others (e.g. this answer or this book chapter or this video tutorial) suggests that only the residuals should be tested (regardless of within/between factors). Even if I assume that this is latter true, the question remains: which residuals should be tested?
In the following, I use the
stats:aov output to illustrate in an example some potential answers.
I have a hypothetical dataset. Each individual subject is denoted with "
subject_id". There are two between-subject factors: "
btwn_X" and "
btwn_Y". There are also two within-subject factors: "
wthn_X" and "
wthn_Y". The aov object
aov_obj returns the following:
Grand Mean: 523.3064 Stratum 1: subject_id Terms: btwn_X btwn_Y btwn_X:btwn_Y Residuals Sum of Squares 393209.0 45184.5 9583.3 1768261.2 Deg. of Freedom 1 1 1 132 Residual standard error: 115.7407 9 out of 12 effects not estimable Estimated effects may be unbalanced Stratum 2: subject_id:wthn_X Terms: wthn_X btwn_X:wthn_X btwn_Y:wthn_X btwn_X:btwn_Y:wthn_X Residuals Sum of Squares 273876.35 262325.82 192.19 663.69 199702.58 Deg. of Freedom 1 1 1 1 132 Residual standard error: 38.89599 4 out of 8 effects not estimable Estimated effects may be unbalanced Stratum 3: subject_id:wthn_Y Terms: wthn_Y btwn_X:wthn_Y btwn_Y:wthn_Y btwn_X:btwn_Y:wthn_Y Residuals Sum of Squares 20514.4 27879.9 85348.2 15667.3 325852.8 Deg. of Freedom 1 1 1 1 132 Residual standard error: 49.68483 4 out of 8 effects not estimable Estimated effects may be unbalanced Stratum 4: subject_id:wthn_X:wthn_Y Terms: wthn_X:wthn_Y btwn_X:wthn_X:wthn_Y btwn_Y:wthn_X:wthn_Y btwn_X:btwn_Y:wthn_X:wthn_Y Residuals Sum of Squares 1042.83 1070.27 5202.57 5791.20 78756.74 Deg. of Freedom 1 1 1 1 132 Residual standard error: 24.42626 Estimated effects may be unbalanced
I can access the following residuals (see here for more details):
aov_obj$subject_id$residuals aov_obj$`subject_id:wthn_X`$residuals aov_obj$`subject_id:wthn_Y`$residuals aov_obj$`subject_id:wthn_X:wthn_Y`$residuals
Based on this answer, it would seem that each of these variables should be tested separately for normality. Alternatively, perhaps only
subject_id:wthn_X:wthn_Y$residuals. (Yet again, perhaps each cell should be tested tested separately, and residuals can be ignored.)