# #StackBounty: #regression #mixed-model #forecasting #references #random-effects-model Predictions and forecasting with mixed-effects mo…

### Bounty: 50

I am not sure I fully understand how mixed-effects models (such as mixed-effects PK/PD models) can be used for forecasting.

Some notations

Let $$p in mathbb{N}$$ with $$p geq 2$$. We assume that for each individual $$i in lbrace 1,ldots,p rbrace$$, we have $$k_i in mathbb{N}^{ast}$$ scalar observations $$(y_{i,j}){1 leq j leq k_i}$$ obtained at times $$(t$${i,j}){1 leq j leq k_i}\$. Therefore, for each individual, the observations are $$left( y$${i,j}, t_{i,j} right)_{1 leq j leq k_i}\$. We also assume the following model:

$$y_{i,j} = fleft( t_{i,j}, b_i, theta right) + varepsilon_{i,j}$$

where $$theta$$ is a vector of parameters which contains fixed effects and variance-covariance parameters ; $$b_i$$ is a vector of individual random effects ; $$f$$ is sometimes called the structural model ; $$varepsilon_{i,j}$$ is the observation noise. We assume that:

$$b_i sim mathcal{N}left( 0, mathbf{D} right), quad text{and} quad varepsilon_i = begin{bmatrix} varepsilon_{i,1} \ vdots \ varepsilon_{i, k_i} end{bmatrix} sim mathcal{N}left( 0, mathbf{Sigma} right).$$

The individual random effects $$b_i$$ are assumed i.i.d. and independent from $$varepsilon_i$$.

The question

Given $$left( y_{i,j}, t_{i,j} right)_{substack{1 leq i leq p \ 1 leq j leq k_i}}$$, one can obtain an estimate $$hat{theta}$$ of the model parameters $$theta$$ (which contain the unique coefficients in $$mathbf{D}$$ and $$mathbf{Sigma}$$) by maximizing the model likelihood. This can be done, for instance, using stochastic versions of the EM algorithm (see link above).

Assume that $$hat{theta}$$ is available.

If we are given some observations $$y_{s}^{mathrm{new}}$$ for a new individual $$s notin lbrace 1, ldots, p rbrace$$, its individual random effects are estimated by:

$$widehat{b_s} = mathop{mathrm{argmax}} limits_{b_s} pleft( b_s mid y_{s}^{mathrm{new}}, hat{theta} right)$$

where $$pleft( cdot mid y_{s}^{mathrm{new}}, hat{theta} right)$$ is the posterior distribution of the random effects given the new observations $$y_{s}^{mathrm{new}}$$ and the point estimate of the model parameters $$hat{theta}$$. Thanks to Bayes’ theorem, this is equivalent to maximizing the product "likelihood $$times$$ prior:

$$widehat{b_s} = mathop{mathrm{argmax}} limits_{b_s} pleft( y_{s}^{mathrm{new}} mid b_{s}, hat{theta} right) pleft( b_{s} mid hat{theta} right).$$

Now, if $$t , longmapsto , f(t, cdot, cdot)$$ is a continuous function of time, we may call it a growth curve. It describes the evolution of the measurements with time. Let $$i_{0} in lbrace 1, ldots, p rbrace$$ and $$t$$ such that $$t_{i_{0},1} < ldots < t_{i_{0},k_i} < t$$.

How can we use this mixed-effects model to predict the most likely value $$y_{i_{0}}^{ast}$$ for individual $$i_{0}$$ at time $$t$$? This relates to forecasting since we want to predict the measurement value at a future time.

Naively, I would do as follows. Given $$left( y_{i,j}, t_{i,j} right){substack{1 leq i leq p \ 1 leq j leq k_i}}$$, I would estimate $$hat{theta}$$ (we estimate the model parameters using all the data including the past observations for individual $$i$${0}\$). Then I would estimate $$widehat{b_{i_{0}}}$$ as described above. Eventually, I would say that:

$$y_{i_{0}}^{ast} = fleft( t, widehat{b_{i_{0}}}, hat{theta} right).$$

If this is right, I don’t see how I would prove it mathematically. Still, I’m feeling like I’m missing something because this predicted value $$y_{i_{0}}^{ast}$$ does not take into account the noise distribution. Also, I do not see how I would be able to estimate CIs for $$y_{i_{0}}^{ast}$$ with this.

In a Bayesian setting (with a prior distribution on $$theta$$), would I need to use the posterior predictive distribution (see this post and these notes)? From what I understand, if $$y_{i_{0}}$$ denotes the vector of the past observations for individual $$i_{0}$$, this posterior predictive distribution is given by:

$$pleft( y_{i_{0}}^{ast} mid y_{i_{0}} right) = int_{Theta} pleft( y_{i_{0}}^{ast} mid theta, y_{i_{0}} right) pleft( theta mid y_{i_{0}} right) , dtheta.$$

However, I’m not sure it applies here and I’m not sure where the random effects come in.

Any reference, explanation, hint,… is welcome ! 🙂

Get this bounty!!!

This site uses Akismet to reduce spam. Learn how your comment data is processed.