Junwei Lu, Harvard T.H. Chan School of Public Health 
"Topological Inference on Large Scale Graphon"

We propose to test the topological structures of complex networks under the graphon model. Graphon is a nonparametric model for large scale stochastic graphs. Many works have been done on graphon estimation, however it is not easy to interpret the network structures from estimators. We provide an inferential toolkit to study the persistent homology of the graphon landscape which reveals the clustering structure of stochastic networks. Our methods are applied to the neuroscience data related to visual memories.
 

Po-ling Loh, University of Wisconsin–Madison 
"Mean estimation for entangled single-sample distributions"

We consider the problem of estimating the common mean of univariate data, when independent samples are drawn from non-identical symmetric,
unimodal distributions. This captures the setting where all samples are Gaussian with different unknown variances. We propose an estimator
that adapts to the level of heterogeneity in the data, achieving near-optimality in both the i.i.d. setting and some heterogeneous settings, where the fraction of “low-noise" points is as small as log n. Our estimator n is a hybrid of the modal interval, shorth, and median estimators from classical statistics. The rates depend on the percentile of the mixture distribution, making our estimators useful even for distributions with infinite variance.
 

Alex Belloni, Duke University 
"Subvector Inference in Partially Identified Models with Many Moment Inequalities"

In this work we consider bootstrap-based inference methods for functions of the parameter vector in the presence of many moment inequalities where the number of moment inequalities, denoted by p, is possibly much larger than the sample size n. In particular this covers the case of subvector inference, such as the inference on a single component associated with a treatment/policy variable of interest. We consider a min-max of (centered and non-centered) Studentized statistics and study the properties of the associated critical values. In order to establish that we provide a new finite sample analysis that does not rely on Donsker's properties and establish new central limit theorems for the min-max of the components of random matrices. Furthermore, we consider the anti-concentration properties of the min-max of the components of a Gaussian matrix and propose bootstrap based methods to estimate them. In turn this provides a valid data-driven to set the tuning parameters of the bootstrap-based inference methods. Importantly, the tuning parameters generalize choices of literature for Donsker's classes (and showing why those would not be appropriate in our setting) which might better characterize finite sample behavior. This is co-authored with Federico Bugni and Victor Chernozhukov. 

Javier Pena, Carnegie Mellon University  
"Bregman proximal methods for convex optimization"

We propose an overview and unified analysis of Bregman proximal first-order algorithms for convex minimization.  Our approach highlights the fundamental but somewhat overlooked role that the Fechel conjugate plays in this important and versatile class of algorithms.  Our approach yields novel proofs of the convergence rates of the Bregman proximal subgradient, Bregman proximal gradient, and a new accelerated Bregman proximal gradient algorithm.  We illustrate the effectiveness of Bregman proximal methods in two interesting applications, namely the D-optimal design and Poisson linear inverse problems.

 

Ying Huang, Fred Hutchinson Cancer Research Center 
"Inferential Procedures for Assessing the Incremental Value of New Biomarkers based on Logic Rules"

Single biomarkers often have inadequate classification performance in early detection of disease, making it important to identify new biomarkers to combine with the existing marker for improved performance.

One method for combining biomarkers that appears to have a sound biological basis is to use logic rules, e.g., the OR/AND rules. In a motivating example of early detection of pancreatic cancer, the established biomarker CA19-9 is only present in a subclass of cancers; it is of interest to identify new biomarkers present in the other subclasses and declare disease when either marker is positive. While there has been research on developing biomarker combinations using the OR/AND rules, inference regarding the incremental value of the new marker within this framework is lacking; furthermore, such research is hindered by challenges due to statistical non-regularity. In this talk I will present a recent development on the inferential question of whether combining the new biomarker achieves better classification performance than using the existing biomarker alone, based on a nonparametrically estimated OR rule that maximizes the weighted average of sensitivity and specificity. I will propose procedures for testing the incremental value of the new biomarker and constructing its confidence interval, using bootstrap, cross-validation, and a novel fuzzy p-value-based technique. Finally, I will use numerical studies and the pancreatic cancer example to illustrate the performance of the proposed methods

Jiwei Zhao, University of New York at Buffalo 
"Nonignorable Missingness Mechanism Model Can Be Ignored"

Nonignorable missing data exist in various biomedical studies and social sciences, e.g., aging research, metabolomics data analysis, electronic medical records, and health surveys. A major hurdle of rigorous nonignorable missing data analysis is how to model or estimate the missingness mechanism. Since this model depends on some unobserved data, its model fitting and model diagnostics are generally regarded as difficult, if not impossible. In this talk, I will consider a regression setting where the outcome variable is subject to nonignorable missingness. The primary interest is to estimate the unknown parameter in the regression model. I will discuss an estimation procedure where modeling of missingness mechanism is completely bypassed. I will show the asymptotic properties of the proposed estimator and the algorithm implementation. Numerical studies will also be presented to illustrate the usefulness of our proposed estimation. This talk is based on a joint work with Dr. Yanyuan Ma from Penn State University.

Corwin Zigler, University of Texas at Austin
"
Bipartite Causal Inference with Interference: Estimating Health Impacts of Power Plant Regulations"

A fundamental feature of evaluating causal health effects of air quality regulations is that air pollution moves through space, rendering health outcomes at a particular population location dependent upon regulatory actions taken at multiple, possibly distant, pollution sources. Motivated by studies of the public-health impacts of power plant regulations in the U.S., this talk introduces the novel setting of bipartite causal inference with interference, which arises when 1) treatments are defined on observational units that are distinct from those at which outcomes are measured and 2) there is interference between units in the sense that outcomes for some units depend on the treatments assigned to many other units. Interference in this setting arises due to complex exposure patterns dictated by physical-chemical atmospheric processes of pollution transport, with intervention effects framed as propagating across a bipartite network of power plants and residential zip codes. New causal estimands are introduced for the bipartite setting, along with an estimation approach based on generalized propensity scores for treatments on a network. The new methods are deployed to estimate how emission-reduction technologies implemented at coal-fired power plants causally affect health outcomes among Medicare beneficiaries in the U.S..

Tuo Zhao, Georgia Institute of Technology
"Towards Understanding First Order Algorithms for Nonconvex Optimization in Machine Learning"

Stochastic Gradient Descent-type (SGD) algorithms have been widely applied to many non-convex optimization problems in machine learning, e.g., training deep neural networks, variational Bayesian inference and collaborative filtering. Due to current technical limit, however, establishing convergence properties of SGD for these highly complicated practical non-convex problems is generally infeasible. Therefore, we propose to analyze the behavior of the SGD-type algorithms through two simpler but non-trivial non-convex problems – (1) Streaming Principal Component Analysis and (2) Training Non-overlapping Two-layer Convolutional Neural Networks. Specifically, we prove that for both examples, SGD attains a sub-linear rate of convergence to the global optima with high probability. Our theory not only helps us better understand SGD, but also provides new insights on more complicated non-convex optimization problems in machine learning.