Tealeaf

2020

Team Members:

Vivek Gopalakrishnan
Jennifer Heiko
Suyeon Ju
Morgan Sanchez
Celina Shih

Advisors:

Joshua Volegstein, PhD
Benjamin Pedigo
Jaewon Chung

Abstract:

Random Forest (RF) is an interpretable and robust machine learning algorithm for classification and regression. However, existing RF methods are not well-equipped for multivariate regression tasks (predicting multiple continuous outputs from multiple inputs) due to the inherent challenges of variance in high-dimensional space. To address this issue, we introduce two projection-based split criteria: axis projection and oblique projection. For axis projection, rather than computing mean squared error (MSE) over all predictors and all samples, at each split, MSE is computed on a predictor chosen at random. The oblique projection split criterion splits based on MSE of a linear combination of predictors using weights 1.0 and -1.0 and 0.0. These new split criteria outperform all existing split criteria in the Scikit-Learn implementation of Random Forest (MSE, mean absolute error (MAE), and Friedman MSE) in several nonlinear simulation settings.

All Design Projects