Random Forest (RF) is an interpretable and robust machine learning algorithm for classification and regression. However, existing RF methods are not well-equipped for multivariate regression tasks (predicting multiple continuous outputs from multiple inputs) due to the inherent challenges of variance in high-dimensional space. To address this issue, we introduce two projection-based split criteria: axis projection and oblique projection. For axis projection, rather than computing mean squared error (MSE) over all predictors and all samples, at each split, MSE is computed on a predictor chosen at random. The oblique projection split criterion splits based on MSE of a linear combination of predictors using weights 1.0 and -1.0 and 0.0. These new split criteria outperform all existing split criteria in the Scikit-Learn implementation of Random Forest (MSE, mean absolute error (MAE), and Friedman MSE) in several nonlinear simulation settings.
Tealeaf
2020
Team Members:
- Vivek Gopalakrishnan
- Jennifer Heiko
- Suyeon Ju
- Morgan Sanchez
- Celina Shih
Advisors:
- Joshua Volegstein, PhD
- Benjamin Pedigo
- Jaewon Chung