Colloquia
stat
Stability analysis for clustering problems and community detection in graphical models
Add to Calendar 2021-01-28T20:30:00 2021-01-28T21:30:00 UTC Stability analysis for clustering problems and community detection in graphical models
Start DateThu, Jan 28, 2021
3:30 PM
to
End DateThu, Jan 28, 2021
4:30 PM
Presented By
Rachael Hageman Blair (University of Buffalo)
Event Series: Statistics Colloquia

Abstract

The identification of patterns and structure within a dataset is a challenging problem in unsupervised learning. This is due in part to the fact that there is no gold standard by which performance can be assessed.  The concept of “stability” has been used as a surrogate for performance primarily in the area of data clustering and defined in a number of ways. Measures of stability capture the quality of the clustering and reproducibility. In this talk, I will introduce an approach to cluster stability that relies on bootstrapped clustering of the data and use of the Jaccard distance. A distinguishing feature of this approach is that stability can be measured and summarized at the level of the individual items being clustered, the clusters themselves and used for model selection (number of clusters). Recent extensions to this framework to the problem of community detection in undirected graphical models will be described. Applications include metabolomics dataset from the Beijing Olympics Air Pollution (BoaP) study. These approaches are implemented in the “bootcluster” package that is available in the R programming language.

 

Watch Stream