Authors: Kaylee B. Tanner, Anna C. Cardall, Riley Chad Hales, Gustavious P. Williams, Kel N. Markert – Brigham Young University
Title: GEE Tools for Coincident Data Sampling, Feature Engineering, and Model development using L1-Regularization for Remote Sensing Data
Abstract: Remote-sensing data are used extensively to estimate a variety of geophysical parameters and processes, usually by collecting in-situ data coincident with satellite observations, then creating empirical remote sensing models. Typical model-building approaches require a priori selection of model parameters, and may not be suited to non-standard conditions. For these cases, it may be useful to include non-standard terms, which might not be considered with traditional methods. Machine learning can explore large feature spaces and generate accurate empirical models that do not require parameter selection. However, these methods, because of the large number of included terms, result in unexplainable models. We present Least Absolute Shrinkage and Select Operator (LASSO), or L1, regularization as a method to fit linear regression models and produce parsimonious models with limited terms. We also present two Colab notebooks to document and explain our approach. The first notebook compiles near-coincident data pairs of remote-sensing and in-situ data using Google Earth Engine (GEE), and the second implements L1 model creation using scikitlearn. The second notebook includes data-engineering routines which generate band ratios, logs, and other combinations. These notebooks can be easily adapted to other locations, sensors, or parameters. The Colab notebooks are available at https://github.com/BYU-Hydroinformatics/ee-wq-lasso.