Modelling interactions

August 18, 2009 in Math,Politics

Andrew Gelman's latest post highlights the importance of interactions. He includes this breakdown of where people fall depending on political party, ideology, and income:

Consider the income dimension. Among liberals, the income curve is flat no matter whether the person is a Democrat, Independent or Republican. For conservatives, however, income has a large effect - in fact it becomes a strong predictor of political party. Thus, in modeling the impact of income on party, we must consider the income-ideology interaction. Without it, we would overstate the impact among liberals and understate it among conservatives.

It is not enough to merely include ideology as a separate variable in a linear model, however. That would be tantamount to presenting two distinct graphs instead of the three-way graph above - one of party vs income and another of party vs ideology. The interaction of income and ideology is explicitly ignored.

Instead, one must consider what essentially amounts to three different income variables: one for conservatives; another for moderates; and a third for liberals. These three variables would each have different coefficients, and so the model could properly capture the joint impact of income and ideology.

Be warned, though: interactions can quickly lead to overfitting, as they increase the number of variables geometrically. An exploratory analysis like the graph above or a compelling alternative hypothesis is a necessary prerequisite to using interactions in a model; if an interaction isn't justified, you probably shouldn't use it.

Leave a Comment

Previous post:

Next post: