I am trying to predict the stand type of a forest 10 years following a clear cut. I am using GBM models to do the multinomial classification. The problem is I have several competing methods for how to group the stands in to types. For each stand I have data on the percent coverage of 19 different species.
I have tried two different methods for assigning the stands to stand types. The first is using ecological knowledge to group similar stands together based on similar species attributes. The second is to use k-medoids clustering to create groups of similar stands. Both these methods can be used to produce a range of different numbers of Stand Type classes that the model will try to predict.
1) Say I have picked a number of clusters based on internal cluster quality and and number of ecological types based on domain knowledge ie 9 clusters and 11 ecological types. How can I compare the performance of the 2 gbm models created by using these as the response?
2) Would it be any different if I was comparing 2 models based on different numbers of clusters?
I am avoiding using Accuracy because both models involve imbalanced classes. I have calculated Kappa and logloss but I wonder how these are impacted by having different numbers of classes. I have also calculated AUC and prAUC using
caret which I believe takes the average of the one-vs-all measure for each class. Are these performance metrics biased by the number of classes in the model? I would imagine that having more classes would make the outcome harder to predict, so if I have a model with 11 classes in the response performing better than one with 9 classes can I take that as an indication that the method of grouping which produced 11 classes is better?
I am most interested comparing the resulting models as opposed to using different clustering or modeling methods but if you think this methodology is flawed then feel free to present an alternative.