4:00 PM
5:00 PM
Building models and methods for large spatio-temporal data is important for many scientific and application areas that affect our lives. In this talk, I will discuss several interrelated yet distinct models and methods on a graph and mean recovery problems with applications in neuroscience, spatio-temporal modeling, and genomics.
In the first result, I discuss the Gemini methods for estimating the graphical structures and underlying parameters, namely, the row and column covariance and inverse covariance matrices from the matrix variate data. Under sparsity conditions, we show that one is able to recover the graphs and covariance matrices with a single random matrix from the matrix variate normal distribution. Our method extends, with suitable adaptation, to the general setting where replicates are available. We establish consistency and obtain the rates of convergence in the operator and the Frobenius norm. We show that having replicates will allow one to estimate more complicated graphical structures and achieve faster rates of convergence. We provide simulation evidence showing that we can recover graphical structures as well as estimating the precision matrices, as predicted by theory.
It has been proposed that complex populations, such as those that arise in genomics studies, may exhibit dependencies among observations as well as among variables. This gives rise to the challenging problem of analyzing high-dimensional data with unknown mean and dependence structures. In the second part of the talk, I present a practical method utilizing generalized least squares and penalized (inverse) covariance estimation to address this challenge. We establish consistency and obtain rates of convergence for estimating the mean parameters and covariance matrices iteratively. We use simulation studies and analysis of genomic data from a twin study of ulcerative colitis to illustrate the statistical convergence and the performance of our methods in practical settings.
In the final part of the talk (time permitting), I will discuss a parsimonious model for precision matrices of matrix-normal data based on the Cartesian product of graphs. By enforcing extreme sparsity (the number of parameters) and explicit structures on the precision matrix, this model has excellent potential for improving the scalability of the computation and interpretability of complex data analysis. We establish consistency for both the Bi-graphical Lasso (BiGLasso) and Tensor Graphical Lasso (TeraLasso) estimators and obtain the rates of convergence for estimating the precision matrix.
This talk is based on joint work with Michael Hornstein, Roger Fan, Kerby Shedden, Kristjan Greenewald and Al Hero.