The AMS Intermediate Machine Learning in Python for Environmental Science Problems Short Course presents a selection of intermediate-level ML topics for environmental scientists who already have a background in ML fundamentals. The course will be taught by a team of instructors who are presenting course modules related to their ML expertise. Participants will work with real-world environmental data and follow along by programming using the Google Colab Python programming environment.
The course will be divided into modules that focus on a single topic, led by one of the instructors. Modules will contain a mix of lecture material and interactive coding. Participants can work together on interactive problem-solving. The planned topics include advanced model evaluation, hyperparameter tuning, handling highly imbalanced datasets, custom loss functions, and explainable artificial intelligence.
We assume that the participants have some experience developing ML models in Python. Students are expected to be familiar with TensorFlow, a Python framework for model development. In addition, students should have a basic familiarity with a selection of commonly-used scientific Python modules: numpy, pandas, and matplotlib. For participants who do not have this experience, we highly suggest taking the Short Course on Machine Learning for Weather and Climate offered for free by CIRA. Unfortunately due to the scope of the material, we will not be covering beginner materials at this time.
REGISTRATION RATES
The major goal of this course is to help participants better apply ML for environmental science applications. Instead of focusing deeply on specific ML architectures, we are providing material that will be broadly useful across many environmental ML applications. With the rapid development of ML techniques, it is important for practitioners to be able to appropriately tailor these models for complex environmental applications which often use datasets that are high-dimensional and highly imbalanced.
ML model training tends to be very sensitive to the choice of hyperparameters. Beginner tutorials usually demonstrate tuning these with a simple grid search, if they demonstrate hyperparameter tuning at all. For complex environmental models, that grid search may be too computationally intensive. Here, we will demonstrate how to use packages for automatic hyperparameter tuning, and offer practical guidance on which tuning strategies are appropriate for different situations.
Another major concern is model evaluation. Using conventional metrics like accuracy and mean squared error, it is very easy to be misled regarding model performance. Here, we will demonstrate model evaluation with a variety of forecasting skill scores that are much better for capturing how the model performs for critical events instead of average performance. This is a major concern for meteorology where a model can have very high average performance, but fail to predict the extreme events (e.g. storms). In addition, we will demonstrate how to create and interpret evaluation graphics such as the receiver operating characteristic curve for a deeper understanding of model performance.
Imbalanced datasets are a major concern for ML modeling in environmental science. Very often, it is the rare events that we are most interested in predicting and where model performance is most critical. We will demonstrate several strategies for working with imbalanced datasets that can potentially improve model performance. These include sampling techniques as well as methods for generating synthetic examples of the minority class. We will also share some of the caveats and potential pitfalls associated with the methods.
Optimizing the model is driven by the loss function, and with the wrong loss function it is possible to optimize a model that is very skilled at performing the wrong task. We will discuss how loss functions influence what the models learn, and provide some examples of customizing the loss functions to tailor the model for various environmental applications.
Finally, we will discuss model interpretation through explainable AI (XAI) techniques. There are many reasons why users want to understand how their model works. XAI may reveal failure cases that could lead to ideas for improving the model. Or, the model may reveal that it has learned physically-realistic strategies which may help us trust the models more for use in critical situations. We will introduce XAI and demonstrate several commonly used methods. We will show how XAI can be used to investigate interesting cases in imbalanced datasets. We will also show examples of some of XAI pitfalls and how to avoid being misled by the explanations.
Participants will learn:
If you have questions regarding the course, please contact Evan Krell.
Texas A&M University - Corpus Christi
NOAA
Texas A&M University - Corpus Christi