Name: Colloquium: Detecting the Signal Among Noise and Contamination in High Dimensions
Start: 2020-01-28T20:30:00
End: 2020-01-28T21:30:00

Submitted by rpc5102 on Tue, 01/28/2020 - 08:29

stat

Colloquium: Detecting the Signal Among Noise and Contamination in High Dimensions

stat

Colloquium: Detecting the Signal Among Noise and Contamination in High Dimensions

Presented By

David Kepplinger (University of British Columbia)

Details

Start DateTue, Jan 28, 2020
3:30 PM

End DateTue, Jan 28, 2020
4:30 PM

Location

View larger map

201 Thomas Building, University Park, PA

Add to Calendar 2020-01-28T20:30:00 2020-01-28T21:30:00 UTC Colloquium: Detecting the Signal Among Noise and Contamination in High Dimensions 201 Thomas Building, University Park, PA

Start DateTue, Jan 28, 2020
3:30 PM

End DateTue, Jan 28, 2020
4:30 PM

Presented By

David Kepplinger (University of British Columbia)

Event Series:

Abstract

Improvements in biomedical technology and a surge in other data-driven sciences lead to the collection of increasingly large amounts of data. In this affluence of data, contamination is ubiquitous but often neglected, creating substantial risk of spurious scientific discoveries. Especially in applications with high-dimensional data, for instance proteomic biomarker discovery, the impact of contamination on methods for variable selection and estimation can be profound yet difficult to diagnose.

In this talk I present a method for variable selection and estimation in high-dimensional linear regression models, leveraging the elastic-net penalty for complex data structures. The method is capable of harnessing the collected information even in the presence of arbitrary contamination in the response and the predictors. I showcase the method’s theoretical and practical advantages, specifically in applications with heavy-tailed errors and limited control over the data. I outline efficient algorithms to tackle computational challenges posed by inherently non-convex objective functions of robust estimators and practical strategies for hyper-parameter selection, ensuring scalability of the method and applicability to a wide range of problems.