Bounty: 50
I know that there are many related questions and answers, but each setup, including my case, is so specific in LMM that it is hard to draw reliable analogies.
The data (available here):
subject_id
: Participant ID.rt_start
: Response time (RT) to stimulus (between 150 and 1000 ms).stim_type
: Either of two types of the displayed stimulus, probe or control.block_number
: Block (1, 2, 3, or 4). (Different blocks contain different specific stimulus words, but their types are always either probe or control.)trial_number
: Trial, numbered 1648, with ca. 46% "missing" trials randomly throughout, but relatively evenly distributed (too slow and incorrect responses, and other, irrelevant types excluded from the data, originally mixed together with probe and control stimuli in a balanced sequence).
It is important that the stim_type
effect is large and expected: probe RT is on average much larger than control RT. What I want to test is just how much this difference is influenced by learning (practice/habituation effect) throughout the task, expecting that the difference is decreasing with time. Crucially, I want to see both (a) whether there is a withinblock effect (increasing trial number leads to decreasing probecontrol difference; stim_type
x trial_number
interaction), and (b) whether there is a betweenblock effect (increasing block number leads to decreasing probecontrol difference; stim_type
x block_number
interaction)
I’m pretty sure that I need LMM to do this, but I’m not sure how exactly.
Some general relevant points:
 I expect an overall learning effect (generally faster responses), hence both
block_number
andtrial_number
main fixed effects are included, but otherwise they are not variables of interest in themselves.  Since each trial happens in each block (in principle), perhaps one could argue that trials in the first blocks (1162) are no different from e.g. those in the fourth blocks (1162 in that block, but counting with the entire test 487648). However, for now this is not taken into account.
 I assume that the degree of change per trial and per block are both identical, hence I treat them as numeric (continuous) variables.
 The general RT as well as the
item_type
factor (probe vs. control) differs by subject, hence I include(item_typesubjects)
that should, if I understand correctly, account for both general RT baseline (intercept) and for probecontrol difference ("slope" of the dummy variableitem_type
) per individual.  I’m using
glmer
withGamma(link = "identity")
, as this is recommended for RTs, plus this seems to improve a little bit the residual problem (see below), though still not very well. I had no success with robust alternatives (e.g.robustlmm
and similar), mostly because they run into errors due to the large sample.  No matter how I vary the approach (e.g. any of the parameters mentioned above, or even with blockwise aggregation using simple ANOVA), the overall length effect (e.g.,
stim_type
xtrial_number
interaction) seems robust (e.g. at least p < .015 for everything I tried so far), so there seems to be little doubt about it. I even triedrobustlmm:rlmer
with decimated data (to avoid the largedata error), and even so the effects were significant and with similar estimates. What’s more, I got the same results (again very low p values) with another very similar independent dataset (omitted here for brevity), where residuals look somewhat better.
library('lme4')
# fitting full model
mlm_full = glmer(
rt_start ~ stim_type + block_number + trial_number +
stim_type:block_number + stim_type:trial_number +
(stim_type  subject_id),
data = lgcit_dat,
family = Gamma(link = "identity")
)
> Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
> Family: Gamma ( identity )
> Formula: rt_start ~ stim_type + block_number + trial_number + stim_type:block_number +
> stim_type:trial_number + (stim_type  subject_id)
> Data: lgcit_dat
> AIC BIC logLik deviance df.resid
> 910966.3 911058.7 455473.1 910946.3 76637
> Random effects:
> Groups Name Std.Dev. Corr
> subject_id (Intercept) 20.1051
> stim_typeprobe 28.2413 0.07
> Residual 0.1904
> Number of obs: 76647, groups: subject_id, 219
> Fixed Effects:
> (Intercept) stim_typeprobe block_number
> 514.31118 115.23858 18.92389
> trial_number stim_typeprobe:block_number stim_typeprobe:trial_number
> 0.19776 3.21046 0.05767
# without stim_type:trial_number
mlm_xtrial = glmer(
rt_start ~ stim_type + block_number + trial_number +
stim_type:block_number +
(stim_type  subject_id),
data = lgcit_dat,
family = Gamma(link = "identity")
)
# without stim_type:block_number
mlm_xblock = glmer(
rt_start ~ stim_type + block_number + trial_number +
stim_type:trial_number +
(stim_type  subject_id),
data = lgcit_dat,
family = Gamma(link = "identity")
)
# test whether stim_type:trial_number is significant contributor
aov_trials = anova(mlm_full, mlm_xtrial, )
> npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
> mlm_xtrial 9 910973 911056 455477 910955
> mlm_full 10 910966 911059 455473 910946 8.3873 1 0.003779 **
# test whether stim_type:block_number is significant contributor
aov_trials = anova(mlm_full, mlm_xblock, )
> npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
> mlm_xblock 9 910965 911048 455474 910947
> mlm_full 10 910966 911059 455473 910946 0.9347 1 0.3336
As noted above, the residuals are (still) problematic (via ggpubr::ggqqplot(resid(mlm_full), shape = 1)
):
Which is strange because the underlying data seems normally distributed, see e.g. (via lmer
/profile
/xyplot
):
Questions:

Is this approach generally correct? In particular: is this how I should include and test the significance of the two critical interactions?

Given the very strong and repeated (replicated) evidence, is the assumption violation "kind of" permissible? If not, how can I do it better (in practice, not in theory)?

Looking at the full model, would it be correct to say that the probecontrol differences decrease significantly by 0.06 ms per each trial, and nominally (but without statistical significance) by an additional 3 ms per each block? (Note that these coefficients too hardly change depending on settings.)