*Bounty: 50*

*Bounty: 50*

I am not sure I fully understand how mixed-effects models (such as mixed-effects PK/PD models) can be used for forecasting.

**Some notations**

Let $p in mathbb{N}$ with $p geq 2$. We assume that for each individual $i in lbrace 1,ldots,p rbrace$, we have $k_i in mathbb{N}^{ast}$ scalar observations $(y_{i,j})*{1 leq j leq k_i}$** obtained at times $(t*{i,j})*{1 leq j leq k_i}$. Therefore, for each individual, the observations are $left( y*{i,j}, t_{i,j} right)_{1 leq j leq k_i}$. We also assume the following model:

$$ y_{i,j} = fleft( t_{i,j}, b_i, theta right) + varepsilon_{i,j} $$

where $theta$ is a vector of parameters which contains *fixed effects* and *variance-covariance parameters* ; $b_i$ is a vector of *individual random effects* ; $f$ is sometimes called the *structural model* ; $varepsilon_{i,j}$ is the observation noise. We assume that:

$$ b_i sim mathcal{N}left( 0, mathbf{D} right), quad text{and} quad varepsilon_i = begin{bmatrix} varepsilon_{i,1} \ vdots \ varepsilon_{i, k_i} end{bmatrix} sim mathcal{N}left( 0, mathbf{Sigma} right). $$

The individual random effects $b_i$ are assumed i.i.d. and independent from $varepsilon_i$.

**The question**

Given $left( y_{i,j}, t_{i,j} right)_{substack{1 leq i leq p \ 1 leq j leq k_i}}$, one can obtain an estimate $hat{theta}$ of the model parameters $theta$ (which contain the unique coefficients in $mathbf{D}$ and $mathbf{Sigma}$) by maximizing the model likelihood. This can be done, for instance, using stochastic versions of the EM algorithm (see link above).

Assume that $hat{theta}$ is available.

If we are given some observations $y_{s}^{mathrm{new}}$ for a new individual $s notin lbrace 1, ldots, p rbrace$, its individual random effects are estimated by:

$$ widehat{b_s} = mathop{mathrm{argmax}} limits_{b_s} pleft( b_s mid y_{s}^{mathrm{new}}, hat{theta} right) $$

where $pleft( cdot mid y_{s}^{mathrm{new}}, hat{theta} right)$ is the *posterior* distribution of the random effects given the new observations $y_{s}^{mathrm{new}}$ and the point estimate of the model parameters $hat{theta}$. Thanks to Bayes’ theorem, this is equivalent to maximizing the product "likelihood $times$ prior:

$$ widehat{b_s} = mathop{mathrm{argmax}} limits_{b_s} pleft( y_{s}^{mathrm{new}} mid b_{s}, hat{theta} right) pleft( b_{s} mid hat{theta} right). $$

Now, if $t , longmapsto , f(t, cdot, cdot)$ is a continuous function of time, we may call it a *growth curve*. It describes the evolution of the measurements with time. Let $i_{0} in lbrace 1, ldots, p rbrace$ and $t$ such that $t_{i_{0},1} < ldots < t_{i_{0},k_i} < t$.

How can we use this mixed-effects model to predict the most likely value $y_{i_{0}}^{ast}$ for individual $i_{0}$ at time $t$? This relates to forecasting since we want to predict the measurement value at a future time.

Naively, I would do as follows. Given $left( y_{i,j}, t_{i,j} right)*{substack{1 leq i leq p \ 1 leq j leq k_i}}$**, I would estimate $hat{theta}$ (we estimate the model parameters using all the data including the past observations for individual $i*{0}$). Then I would estimate $widehat{b_{i_{0}}}$ as described above. Eventually, I would say that:

$$ y_{i_{0}}^{ast} = fleft( t, widehat{b_{i_{0}}}, hat{theta} right). $$

If this is right, I don’t see how I would prove it mathematically. Still, I’m feeling like I’m missing something because this predicted value $y_{i_{0}}^{ast}$ does not take into account the noise distribution. Also, I do not see how I would be able to estimate CIs for $y_{i_{0}}^{ast}$ with this.

In a Bayesian setting (with a prior distribution on $theta$), would I need to use the posterior predictive distribution (see this post and these notes)? From what I understand, if $y_{i_{0}}$ denotes the vector of the past observations for individual $i_{0}$, this posterior predictive distribution is given by:

$$ pleft( y_{i_{0}}^{ast} mid y_{i_{0}} right) = int_{Theta} pleft( y_{i_{0}}^{ast} mid theta, y_{i_{0}} right) pleft( theta mid y_{i_{0}} right) , dtheta. $$

However, I’m not sure it applies here and I’m not sure where the random effects come in.

Any reference, explanation, hint,… is welcome ! 🙂