In the December 2019 exam, one of the tasks includes identifying an interaction to use in a glm model from a tree. The solution states that one can tell there is an interaction between marital status and education due to the nested nature of the splits in the tree. Would someone please comment further to provide a clearer explanation?
The very first split is on marital status. After that, the next two splits are cap_gain on the left and education_num on the right. Let’s start with looking at cap_gain to see if there is any interaction. If you go further down the splits to the right, cap_gain is used as a split as well. So regardless of the marital status, eventually cap_gain is used as a split. Since the split of marital_status does not change whether cap_gain is used as a split or not, signals to us that their is likely no interaction.
Now lets look at education_num. It is only used as a split on the right branch, not the left. Since this variable is only used when marital status is “no” for the listed categories, this indicates that education_num only affects/matters when marital status is married spouse present, etc (I cant recall the other categories).
This is how I have been thinking about interactions in decision trees.