I have trained a SVM and logistic regression classifier on my dataset for binary classification. Both classifier provide a weight vector which is of the size of the number of features. I can use this weight vector to select the 10 most important features. For doing that I have turned the weights into t-scores by doing a permutation test. I did 1000 permutations of the class labels and at each permutation I calculated the weight vector. In the end I subtracted the mean of the permuted weights from the real weights and divided by the standard deviation of the permuted weights. So I have now t-scores.
Should I use the absolute values of the t-scores, i.e. selecting the 10 features with the highest absolute values? So let’s say the features have the following t-scores:
feature 1: 1.3 feature 2: -1.7 feature 3: 1.1 feature 4: -0.5
If I select the 2 most important features by considering the highest absolute values, feature 1 and 2 would win. If I consider not the absolute values, feature 1 and 3 would win.
Second, this only works for SVM with linear kernel but not with RBF kernel as I have read. For non-linear kernel the weights are somehow no more linear. What is the exact reason that the weight vector cannot be used to determine the importance of features in case of non-linear kernel SVM?