Name: Sufficient Variable Screening with High Dimensional Controls
Start: 2022-09-15T19:30:00
End: 2022-09-15T20:30:00

Submitted by rpc5102 on Mon, 08/29/2022 - 19:56

stat

Sufficient Variable Screening with High Dimensional Controls

stat

Sufficient Variable Screening with High Dimensional Controls

Presented By

Chenlu Ke (Virginia Commonwealth University)

Details

Start DateThu, Sep 15, 2022
3:30 PM

End DateThu, Sep 15, 2022
4:30 PM

Location

View larger map

201 Thomas Building, University Park

Add to Calendar 2022-09-15T19:30:00 2022-09-15T20:30:00 UTC Sufficient Variable Screening with High Dimensional Controls 201 Thomas Building, University Park

Start DateThu, Sep 15, 2022
3:30 PM

End DateThu, Sep 15, 2022
4:30 PM

Presented By

Chenlu Ke (Virginia Commonwealth University)

Event Series: Statistics Colloquia

Variable screening for ultrahigh dimensional data has attracted extensive attention in the past decade. In many applications, researchers learn from previous studies about certain important predictors or control variables related to the response of interest. Such knowledge should be taken into account so that these variables can assist in the selection of the other important predictors while being shielded from screening. Compared to the vast literature for generic unconditional screening, the development of conditional variable screening that factors in available prior information, however, has been less fruitful, due to the hardness of conditional independence learning. In this talk, we introduce a model-free variable screening paradigm for regression and classification problems, which allows for multivariate or even high dimensional controls. The contribution of each individual predictor is quantified marginally and conditionally in the presence of the control variables as well as the other candidates by reproducing-kernel-based R-squared and partial R-squared statistics. As a payoff, the proposed method enjoys the sure screening property and the rank consistency property under the notion of sufficiency, with which its superiority over existing methods is well-established. The advantages of the proposed method are demonstrated by simulation studies and an application to high-throughput gene expression data.