3:30 PM
4:30 PM
Variable screening for ultrahigh dimensional data has attracted extensive attention in the past decade. In many applications, researchers learn from previous studies about certain important predictors or control variables related to the response of interest. Such knowledge should be taken into account so that these variables can assist in the selection of the other important predictors while being shielded from screening. Compared to the vast literature for generic unconditional screening, the development of conditional variable screening that factors in available prior information, however, has been less fruitful, due to the hardness of conditional independence learning. In this talk, we introduce a model-free variable screening paradigm for regression and classification problems, which allows for multivariate or even high dimensional controls. The contribution of each individual predictor is quantified marginally and conditionally in the presence of the control variables as well as the other candidates by reproducing-kernel-based R-squared and partial R-squared statistics. As a payoff, the proposed method enjoys the sure screening property and the rank consistency property under the notion of sufficiency, with which its superiority over existing methods is well-established. The advantages of the proposed method are demonstrated by simulation studies and an application to high-throughput gene expression data.