A Tandem EVolutionary Algorithm for Feature Selection (TEVA)
Day 3 Session 2 (1:30 PM)
Presenters:
Kristen Underwood, University of Vermont
Donna Rizzo, PhD, University of Vermont
John Hanley, PhD, University of Vermont
Ali Dadkhah, University of Vermont
Ryan van der Heijden, University of Vermont
Cailin Gramling, University of Vermont
Shaurya Swami, University of Vermont
Underwood et al. (2023) have recently introduced the tandem evolutionary algorithm (TEVA) of Hanley et al. (2020) to the water resources and ecology domains, and applied it to identify features (catchment-scale attributes) and feature interactions important in determining patterns in Dissolved Organic Carbon across the continental US. TEVA has particular advantages for feature selection in large, multivariate observational data sets of complex systems like riverscapes or ecosystems, and has been shown to outperform logistic regression or Random Forest for identifying feature interactions and equifinality. TEVA finds interactions between multiple variables that may result from either additive processes or feature interactions, and not only extracts features significantly associated with a given outcome class(es), but also identifies the specific value ranges associated with those features. This algorithm is also robust to issues of mixed data types (continuous, categorical), missing data, censored data, skewed distributions, and unbalanced target classes or clusters.
Learning Outcomes:
After completing this workshop, participants should have the basic knowledge and code base to run TEVA on their own data set, including:
- How to pre-process the data
- How to run data through TEVA using the provided Jupyter Notebook(s) and CUAHSI JupyterHub
- How to access and visualize TEVA output
- How to interpret basic TEVA output to identify important features, their specific value ranges, and potential phenomenon of equifinality.
Prerequisites:
Knowledge:
- Python coding knowledge
- fundamentals of statistics including multivariate statistics and factor interactions
- Participants should read “Machine-Learning Reveals Equifinality in Drivers of Stream DOC Concentration at Continental Scales” Underwood et al., 2023 prior to the workshop
Hardware/Software:
- Laptop with browser
Accounts:
- HydroShare: request access to join the DevCon group or CUAHSI JupyterHub group