Penn State Institute for Computational and Data Sciences co-hire and professor of mathematics in the Eberly College of Science and meteorology **John Harlim** uses his experience in applied mathematics and data science to design computational algorithms and understand which scientific problems they can and can’t solve.

At the start of his career, Harlim, who started at Penn State as a co-hire and associate professor in 2013, worked on a data assimilation method motivated by weather forecasting applications, or an approach that involves combining modeling and observations to provide the best weather prediction. He eventually shifted to more general computational and applied mathematics, largely due to his interest in addressing modeling errors for accurate predictions in scientific modeling, including data assimilation.

“The main challenge is to uncover the unknown mechanism that affects the dynamics based on the available data,” Harlim said.

In modeling tropical atmospheric dynamics, for example, Harlim noted it is important to account for the dynamic processes of clouds that are believed to be responsible for the heating and cooling of the atmospheric column — a conceptual representation of the atmosphere at different altitudes. Observations suggest appropriate modeling should include several cloud types; however, designing an effective parameterized model for such a dynamical process can be challenging, according to Harlim. Without the comprehensive representation of the appropriate variables, the model can end up predicting incorrect information. This problem, called model error, persists in all predictive modeling.

“Model error is a long-standing problem that has been actively studied for many decades,” Harlim said. “I was fixated on developing a general class of methods to solve this problem with machine learning (ML) algorithms with mathematical guarantees.”

His recent work uncovers the mathematical conditions that can be observed and used to accurately predict quantity of interest in the short and long terms.

Currently, Harlim and his colleagues from aerospace engineering and electrical engineering, as well as two graduate students, are working on a U.S. National Science Foundation-funded project to overcome errors in modeling power grids. The goal of this project is to develop computational methods to predict power systems dynamics and understand their sensitivity under various scenarios. With this understanding, controllers could make rapid decisions to contribute energy to the grid or sustain autonomous operation during power outages, according to Harlim.

“I was inclined to understand why machine learning algorithms work, how they work and what their limitations and advantages are,” Harlim said. “The motivation is mainly to develop a more efficient and effective training strategy with a more educated design of models rather than employing a brute force fitting to a generic class of neural network models, which usually require expensive parameter calibrations.”

The idea, Harlim said, is to better design data-driven algorithms as opposed to manually correcting imperfect models with empirically chosen mathematical formulas. He pointed to the idea of manifolds as an example. Manifolds are low-dimensional structures that is locally “look” like a flat terrain. The surface of the Earth, a sphere-like structure if we ignore steep Mountains and Valleys, is a two-dimensional manifold. “Locally” or as far as we can see when we stand on anywhere on Earth, it looks flat. Back to the data science, if the observed data lie on a manifold, then that manifold structure must be incorporated into the training process.

“While understanding this problem, known as manifold learning, requires rigorous understanding of various mathematical tools from analysis, differential geometry, differential equations, linear algebra and statistics, it has many interesting applications,” Harlim said.

In applied mathematics, partial differential equations (PDEs) play a central role in modeling anything from physical to biological problems that arise in applications ranging from material sciences to meteorology to modeling propagation of cancer or wildfire, according to Harlim.

“While most PDE models for complex problems usually have no explicit solutions, approximating solutions of high-dimensional PDEs remain a daunting task,” Harlim said.

High-dimensional in this context means the problem can have many variables and each variable depends on many other variables. Such a PDE problem is computationally difficult because it cannot be solved by classical algorithms with high precision as it requires computing powers that are not scalable in the foreseeable future. Harlim explained this issue with the “curse of dimensionality,” a classical problem in the field when dealing with high-dimensional problems. The number of data required to solve such a problem with desirable solution typically increases exponentially as a function of the dimension.

“However, if the observed data suggests that the problem lies on an unknown low-dimensional manifold embedded in a high-dimensional representation, there is hope in solving such a problem,” Harlim said.

This hope motivates Harlim’s current research to solve PDEs on unknown manifolds, where the domain is identified by randomly sampled point cloud data, or data points in space representing a specific structure, but can often be corrupted by noise.

“A lot of work remains to be done,” Harlim said. “My research interest, at the core, is the development of fundamental algorithms. What I hope is that with rigorous mathematics, we can understand the modeling paradigm induced by the available data, and thus, give a blueprint to approach large-scale problems.”

Visit Harlim's website for more information about his work and a list of his publications.