This is a cross post of a question i posted several months ago here on a different forum.
I am trying to understand how variable importance is calculated from the research papers published on the topic. There are two papers to pull this from. The first is the very math heavy original paper and presentation. The other is the less math more intuitive treatment and much easier to understand.
The key paragraph from the more intuitive explanation is below
The set of variables $Z$ to be conditioned on should contain all variables that are correlated with the current variable of interest $X_j$. In the
varimpfunction,this is assured by the small default value 0.2 of the threshold argument: By default, all variables whose correlation with $X_j$ meets the condition 1 – ($p$-value) > 0.2 are used for conditioning. A larger value of threshold would have the effect that only those variables that are strongly correlated with $Xj$ would be used for conditioning, but would also lower the computational burden.
The best intuition I have seen from conditional inference trees is this blog post and when I stepped through it, it made instant sense on how the trees work. They stopped short and variable importance 🙂
My understanding and where it all breaks down:
- In each tree compute the oob-prediction accuracy before the permutation
- For all variables Z to be conditioned on: Extract the cutpoints that split this variable in the current tree and create a grid by means of bisecting the sample space in each cutpoint.
- Within this grid permute the values of X j and compute the oob-prediction accuracy after permutation
- The difference between the prediction accuracy before and after the permutation accuracy again gives the importance of X j for one tree
- My question is where does the variable i want to use go within Z. Is Z determined based on if the correlation is above X amount then permutate the attributes together.
- In step 3 what exactly am i permutating within Z? Is this similar to partial correlation?
I think this would make more sense to me if I tried to see Z looks like for a single tree so I ran the code below. Can anyone help?
library(party) library(janitor) library(tidyverse) set.seed(123) # Create a dataframe where we are trying to predict setosa mydf <- iris %>% mutate(set_tgt = factor(ifelse(Species == 'setosa', 'yes', 'no'))) %>% select(-Species) # We will try to predict "set_tgt" cf_mod <- cforest(set_tgt ~ ., data = mydf, control = cforest_unbiased(mtry = 2, ntree = 3)) # If we use conditional set to true it permutates the variables # based on the threshold varimp(cf_mod, conditional = TRUE, threshold = 0.2) %>% enframe() %>% arrange(desc(value)) # Finding Z mod <- ctree(set_tgt ~ .,data = mydf) plot(mod) # Row names are the label we are trying to predict Z <- tibble("Petal.Length <= 1.9" = 50, "Petal.Length > 1.9" = 0) %>% bind_rows(tibble("Petal.Length <= 1.9" = 0, "Petal.Length > 1.9" = 100)) %>% data.frame() %>% clean_names() row.names(Z) <- c("no", "yes") # This creates the Z dataframe (maybe), if my understanding isn’t completely wrong Z