Decision tree learning presents a very different type of learning method than we have seen so far in this course. A question that continues to face us is gaining a better understanding of the inductive biases of the various methods we study. In the content we discussed the representation bias and preference bias for these methods, so we don't really need to repeat that. Instead, I would like you to discuss the implications of these biases as compared to the biases of other, seemingly statistically based algorithms (e.g., naive Bayes, nearest neighbor, mixture models) we have seen so far. Where do these biases help? Where do they potentially hurt us? Under what circumstances might we expect the resulting algorithms to be well suited or ill suited? What adaptations to the various biases can we apply that might address some of the limitations? How might this adaptations hurt relative to the original methods?

In case of decision tree learning algorithms, the heuristic search makes more difficult to define the inductive bias. This is the main reason why shorter trees are more recommended as long as the information gain attributes reside closer to their roots. For naïve Bayes, the bias is given by the unbalancing degree of the datasets; practically, higher unbalanced these are, more false (positives or negatives) outputs might be obtained....

