Predicting United States Policy Outcomes with Random Forests
Two decades of U.S. government legislative outcomes, as well as the policy preferences of high-income people, the general population, and diverse interest groups, were captured in a detailed dataset curated and analyzed by Gilens, Page et al. (2014). They found that the preferences of high-income earners correlated strongly with policy outcomes, while the preferences of the general population did not, except via a linkage with the preferences of high earners. Their analysis applied the tools of classical statistical inference, in particular logistic regression. In this paper we analyze the Gilens dataset using the complementary tools of Random Forest classifiers (RFs), from Machine Learning. We present two primary findings, concerning respectively prediction and inference: (i) Holdout test sets can be predicted with approximately 70% balanced accuracy by models that consult only the preferences of those in the 90th income percentile and a small number of powerful interest groups, as well as policy area labels. These results include retrodiction, where models trained on pre-1997 cases predicted “future” (post-1997) cases. The 20% gain in accuracy over baseline (chance), in this detailed but noisy dataset, indicates the high importance of a few distinct players in U.S. policy outcomes, and aligns with a body of research indicating that the U.S. government has significant plutocratic tendencies. (ii) The feature selection methods of RF models identify especially salient subsets of interest groups (economic players). These can be used to further investigate the dynamics of governmental policy making, and also offer an example of the potential value of RF feature selection methods for inference on datasets such as this one.