#StackBounty: #r #residuals #prediction-interval #lm Is the variation in the residual standard deviation (on sample) accounted for when…

Bounty: 50

This question is somehow related to Is the residual, e, an estimator of the error, $epsilon$?

I also found some information here: Confidence interval of RMSE

Let’s say, I got a model that explains sampled X by mean(X).

Using suggestions from Mr. @whuber I reproduced calculations of the PI from predict.lm in R.

## lm PI

x_dat <- data.frame(x = rnorm(100, 0, 1))

lm_model <- lm(data = x_dat, x ~ 1)


lm_preds <- predict.lm(lm_model
                       , x_dat
                       , interval = "prediction"
                       , level = 0.95

beta.hat <- lm_model$coefficients

# var-covariance matrix

V <- vcov(lm_model)

# mean prediction

Xp <- model.matrix(~ 1, x_dat)
pred <- as.numeric(Xp %*% beta.hat)[1]

# mean prediction variance:

pred_var <- unname(rowSums((Xp %*% V) * Xp))[1]

# confidence

t_stat <- qt((1 - 0.95)/2, df = lm_model$df.residual)

# residual MSE

res_mse <- sum(lm_model$residuals ^ 2) / lm_model$df.residual

# PI

PI <- pred + c(t_stat, -t_stat) * sqrt(pred_var + res_mse)


print(lm_preds[1, ])

> print(PI)
[1] -2.043175  2.572508
> print(lm_preds[1, ])
       fit        lwr        upr 
 0.2646666 -2.0431747  2.5725079 

I only have 2 questions.

Is it right that we assume that the true model error variance is that of sampled residual variance in order to make an unbiased estimate?

Given that we don’t know what the true error variance is, does it mean we make a biased estimate of PI, in particular, by not adjusting for an additional dispersion of the residual variance? If so, can we assume that variance of the residual variance follows chi-square distribution in order to get an upper quantile of the value that can be supplied inside for an exact calculation?

        pred.var = 

Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.