#StackBounty: #r #classification #cart #rpart How can I use estimated probabilities of a class from rpart to identify the top N classes?
rpart library, I’m trying to predict which class each observation belongs to. Here is a reproducible example explaining the steps I am taking:
library(rpart) # training set df_train <- data.frame( tag = c('123', '123', '124', '124', '125'), p1 = c('home', 'work', 'work', 'work', 'home'), p2 = c(1, 1, 1, 0, 1) ) # testing set df_test <- data.frame( tag = c('123', '124', '125'), p1 = c('home', 'work', 'home'), p2 = c(1, 1, 0) ) # train model model.rpart = rpart(tag~p1+p2, data=df_train, method="class") # predict probabilities of class pred.rpart = predict(model.rpart, data=df_test, method="prob") # list out results pred.rpart
My problem is that I don’t fully understand the output of the table
> pred.rpart 123 124 125 1 0.4 0.4 0.2 2 0.4 0.4 0.2 3 0.4 0.4 0.2 4 0.4 0.4 0.2 5 0.4 0.4 0.2
I thought it was giving me a list of probabilities for each class in my test dataset, but I don’t understand why there are five rows, when I am just trying to look at the predictions of the test data set.
pred.rpart contain five rows of data?
My overall objective is to find the top N predictions for a class. So for the first observation in my
df_test dataframe, I would like to be able to say:
Top 2 predictions for the first observation: #1: '123': 40% #2: '124': 40%
Once I understand the output of
rpart.pred I want to summarize this using the following command to give me each class prediction, ordered by probability:
n_classes <- 2 apply(pred.rpart,1,function(xx)head(names(sort(xx, decreasing=T)), n_classes))