#StackBounty: #r #residuals #lm Is the variation in the residual standard deviation (on sample) accounted for when one builds a predict…

Bounty: 50

This question is somehow related to Is the residual, e, an estimator of the error, $epsilon$?

I also found some information here: Confidence interval of RMSE

Let’s say, I got a model that explains y by the mean of y:

> x <- data.table(x = rnorm(100, 0, 1), y = rnorm(100, 0, 1)); summary(lm(y ~ 1, data = x))

lm(formula = y ~ 1, data = x)

     Min       1Q   Median       3Q      Max 
-2.41639 -0.84908  0.05192  0.79689  2.82043 

            Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.05278    0.10635   0.496    0.621

Residual standard error: 1.063 on 99 degrees of freedom

That is how I get the so called residual standard error (a better term would the sample standard deviation of residuals or RMSE):

> sqrt(sum((x[, y] - x[, mean(y)]) ^ 2) / 99)
[1] 1.06346

That is how I get the standard error of the sample mean:

> sd(x[, y]) / sqrt(nrow(x))
[1] 0.106346

So now what I don’t completely understand is whether the estimation of the prediction confidence interval for y-hat includes all three things:

  • variation in the sample mean
  • residual standard deviation (as an estimate of model error standard deviation)
  • variation in the residuals’ standard deviation (are residual variance distributed following chi-square?)

    lm_model <- lm(y ~ 1, data = x);
    predict.lm(lm_model, newdata = x, interval = “prediction”)[1]
    [1] 0.05277586

Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.