For example, taking the image from sebastian raschkas post “Machine Learning FAQ”:
I would expect a very similar (if not exactly the same) result for a decision tree: Given only two features, it finds the optimal feature (and value for that feature) to split the classes. Then, the decision tree does the same for each child considering only the data which arrives in the child. Of course, boosting considers all the data again, but at least in the given sample it leads to exactly the same decision boundary. Could you make an example where a decision tree would have a different decision boundary on the same training set than boosted decision stumps?
I have the intuition that boosted decision stumps are less likely to overfit because the base classifier is so simple, but I couldn’t exactly pin point why.