Workshop: Contained Chaos: Quality Assurance for the Community Earth System Model
Abstract: State-of-the-science climate model are valuable tools for understanding past and present climates and are particularly vital for addressing otherwise intractable questions about future climate scenarios. Because the simulation output may affect societal responses to the changing climate, maintaining model confidence and reliability is critical for institutions like the National Center for Atmospheric research, which leads the development of the popular Community Earth System Model (CESM). CESM models the Earth system by simulating the major Earth system components (e.g., atmosphere, ocean, land, river, ice, etc.) and the interactions between them. These complex processes result in a model that is inherently chaotic, meaning that small perturbations can cause large effects. For this reason, ensemble methods are common in climate studies, as a collection of simulations are needed to understand and characterize this uncertainty in the climate model system. While climate scientists typically use initial condition perturbations to create ensemble spread, similar effects can result from seemingly minor changes to the hardware or software stack. This sensitivity makes quality assurance challenging, and defining "correctness" separately from bit-reproducibility is really a practical necessity. Our approach casts correctness in terms of statistical distinguishability such that the problem becomes one of making decisions under uncertainty in a high-dimensional variable space. We developed a statistical testing framework that can be thought of as hypothesis testing combined with Principal Component Analysis (PCA), which not only captures changes in individual variables but in the relationship between variables as well. We are currently delving into the technical details of the PCA analysis to better describe the probabilistic properties of our testing framework and improve its robustness. In other recent work, we are developing tools to identify and understand the reason for statistically distinct output that will aid developers in root cause analysis. This talk will overview our multi-year effort to better evaluate the correctness of CESM and detail promising recent developments.