# #StackBounty: #r #residuals #lm Is the variation in the residual standard deviation (on sample) accounted for when one builds a predict…

### Bounty: 50

This question is somehow related to Is the residual, e, an estimator of the error, \$epsilon\$?

I also found some information here: Confidence interval of RMSE

Let’s say, I got a model that explains y by the mean of y:

``````> x <- data.table(x = rnorm(100, 0, 1), y = rnorm(100, 0, 1)); summary(lm(y ~ 1, data = x))

Call:
lm(formula = y ~ 1, data = x)

Residuals:
Min       1Q   Median       3Q      Max
-2.41639 -0.84908  0.05192  0.79689  2.82043

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.05278    0.10635   0.496    0.621

Residual standard error: 1.063 on 99 degrees of freedom
``````

That is how I get the so called residual standard error (a better term would the sample standard deviation of residuals or RMSE):

``````> sqrt(sum((x[, y] - x[, mean(y)]) ^ 2) / 99)
[1] 1.06346
``````

That is how I get the standard error of the sample mean:

``````> sd(x[, y]) / sqrt(nrow(x))
[1] 0.106346
``````

So now what I don’t completely understand is whether the estimation of the prediction confidence interval for y-hat includes all three things:

• variation in the sample mean
• residual standard deviation (as an estimate of model error standard deviation)
• variation in the residuals’ standard deviation (are residual variance distributed following chi-square?)

lm_model <- lm(y ~ 1, data = x);
predict.lm(lm_model, newdata = x, interval = “prediction”)[1]
[1] 0.05277586

Get this bounty!!!

This site uses Akismet to reduce spam. Learn how your comment data is processed.