We propose Bayesian nonparametric procedures for density estimation for compositional data, i.e., data in the simplex space. To this aim, we propose prior distributions on probability measures based on modified classes of multivariate Bernstein polynomials. The resulting prior distributions are induced by mixtures of Dirichlet distributions, with random weights and a random number of components. Theoretical properties of the proposal are discussed, including large support and consistency of the posterior distribution. We use the proposed procedures to define latent models and apply them to data on employees of the U.S. federal government. Specifically, we model data comprising federal employees’ careers, i.e., the sequence of agencies where the employees have worked. Our modeling of the employees’ careers is part of a broader undertaking to create a synthetic dataset of the federal workforce. The synthetic dataset will facilitate access to relevant data for social science research while protecting subjects’ confidential information.
Bio: I am currently a Postdoctoral Associate at Duke University under the mentorship of Jerry Reiter. Before joining Duke, I received a grant from the Chilean National Fund for Scientific and Technological Development to work as a Postdoctoral Fellow at the Pontificia Universidad Católica de Chile, under the mentorship of Alejandro Jara. I completed my Ph.D. in Statistics at the Pontificia Universidad Católica de Chile under the supervision of Fernando Quintana (advisor) and Alejandro Jara (co-advisor). Before starting my Ph.D., I worked for Universidad del Valle, in Colombia (where I am originally from), as a Faculty at the School of Industrial Engineering and Statistics. I earned my bachelor’s degree in Statistics also at Universidad del Valle.