#StackBounty: #model-selection #r-squared #performance #rms Why does the rank order of models differ for R squared and RMSE?

Bounty: 50

I am comparing $R^2$ and RMSE of different models. Interestingly, the rank ordering of the models with respect to $-R^2$ and RMSE is different and I do not understand why.

Here is an example in R:

library(caret) 

set.seed(0)
d<-SLC14_1(n=1000)
tc<-trainControl(method="cv",number=10)
t1<-train(y~.,data=d,method="glmnet",trControl=tc) 
order(t1$results$RMSE)==order(-t1$results$Rsquared)

Output:

[1]  TRUE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE  TRUE

Thus, the order if different for $-R^2$ suqared and $RMSE$.

The question is, why.

Let $SS_{res}$ be the sum of squared residuals $sum(y_i-f_i)^2$.

$RMSE$ is defined as $sqrt{SS_{res}/n}$.

$R^2$ is defined as $1-SS_{res}/SS_{tot}$ where $SS_{tot}$ is $sum(y_i-overline{y})^2$.

Since $SS_{res}=n(RMSE)^2$, we can write $R^2$ as $1-n(RMSE)^2/SS_{tot}$.
Since $n$ and $SS_{tot}$ are constant and the same for all models, $-R^2$ and $RMSE$ should strictly positively related. However, they are not since the ranking order is in practice not identical (see example code).

What is wrong with my argument?


Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.