Dr. Stephanie Lanza: Statistics for the Good of the People
As an undergraduate student at the University of North Carolina, I studied math because, well, that was always my strongest subject in school. Definitely not English. One of my electives in college was a psychology class which was fun, so I decided to sign up for a second psych class - this time one focused on quantitative methods. That is when things really sparked for me... I discovered statistics! I couldn't get enough - biostatistics, econometrics, probability, you name it. Finally, I discovered a path that would lead me through my studies and eventually my career - one that integrated training in math, statistics, psychology, and human development. Studying math for the sake of math could never sustain me, but statistics opened up the door to applying my math brain to real-world problems. I was sold.
After college I worked as a data analyst - first at a marketing research firm and then at a large pharmaceutical company. I could have enjoyed a nice career on this path, but I decided to pursue my PhD in hopes of having more opportunity for creative thinking and so that I could pursue research on issues I was more passionate about. I found my way to a graduate program at Penn State, where I could devote myself to learning more about statistics and human development, understand pathways that can lead to terrible outcomes such as substance use disorder, and ponder ways that interdisciplinary teams could prevent these kinds of complex social problems.
I've been a researcher at Penn State for nearly two decades now, and the thread that weaves my NIH-funded research career together is, without a doubt, statistics. I am now a Professor of Biobehavioral Health and I direct a large research center, the Edna Bennett Pierce Prevention Research Center at Penn State. A few years ago, I was approached by the university leaders and asked to pitch an initiative that could position Penn State to really address issues related to substance use and addiction. It was incredibly rewarding to launch the Consortium on Substance Use and Addiction in 2018 and serve as the inaugural director through 2021. Using statistics for the good of public health - that is what drives me in all that I do. When mixed with substantive knowledge about the problems we are trying to solve, along with good communication skills and leadership experience, I think we have a chance to really make a difference.
Even though English was not my best subject growing up, it turns out that technical writing came easily for me. In fact, I wrote two textbooks with colleagues designed to introduce applied researchers to complex statistical models: one on latent class and latent transition analysis, the other on time-varying effect modeling. I think it can be smart to follow our strengths when we pursue careers, but it may be much more rewarding if we can find a way to pivot those strengths toward a passion-driven career!
Dr. Claire McKay Bowen: From Physics to Public Policy
I started out studying physics because I wanted to know how the world worked. But then, within the first year of my studies, I realized that mathematics is the language of science, so I pursued a dual degree in mathematics and physics at Idaho State University. I got involved in a lot of different projects while at Idaho State: I was in a radiation physics lab for a bit, so I was analyzing samples from the environment and looking at radiation levels; I worked in a biophysics lab, playing with really cool lasers and looking at DNA–RNA interactions. I conducted some education research too, and I got into STEM [science, technology, engineering and mathematics] outreach and education.
It was after getting to try all these different kinds of things and talking about them with my spouse, who I was dating at the time, that I realized that what I liked most was the analysis part of research. To adapt that famous quote from John Tukey about statisticians: I like playing in other people’s backyards. I applied to both physics and statistics programs for graduate school and ended up on a statistics program at the University of Notre Dame in Indiana. There I completed a master’s in applied and computational mathematics and statistics before pursuing my PhD, which is when I became interested in data privacy.
My dissertation was on “Data Privacy via Integration of Differential Privacy and Data Synthesis”. Differential privacy had only been out for a few years at that point, so it was still very new theoretical work, and I was trying to do something more applied with it, mixing it with synthetic data. I had no idea differential privacy was going to become such a hot topic, but the fact that it did made it easier to find a job when I graduated.
I currently work at the Urban Institute, a bipartisan non-profit public policy research institution. We try to “elevate the debate” on public policy issues to help inform public policy-makers, such as the United States Congress, on making decisions that are very much evidence-based. We have various public policy centers within the Urban Institute, 12 in total, focusing on health policy, justice policy, tax policy, and other areas. I am part of the data science team within Urban’s Technology and Data Science Office, and our role is to both lead research and to assist in research across Urban, using data science techniques.
Because I am a specialist in data privacy and confidentiality, I am specifically looking at the question of how to release data that is meaningful and powerful for making policy decisions while still protecting the privacy of individuals. So, for example, in the United States we have just passed a $1.9 trillion stimulus package. That is quite a bit of money, and for it to be distributed effectively, it would be great to access taxpayer data to figure out, for example, who needs it most based on the impacts of Covid-19 over the past year. Now, taxpayer data contains a lot of sensitive information, so you would not want to know who specifically is in that data, but you want to know enough to make those kinds of policy decisions with something like the stimulus package. So, a project that I am working on right now is a collaboration with the Internal Revenue Service to figure out if we can create a synthetic data set – a data set with pseudo records that should be statistically representative of the original data, based on some sort of underlying model. Then, if somebody proposes a new tax policy and wants to know how it might affect the average American, you could use this data set to simulate the effects, to adjust the model based on, say, income tax going up or down for different groups of people.
I have found that communication skills are very important in the job that I do. Most people get that there is a tension or balance between data privacy and data utility, and that this is what I am interested in exploring. But when it comes to something like differential privacy, you might find people saying, “Okay, so you’re using this methodology that uses fake numbers … How does that work?”, and I have to figure out how to explain it to different lay audiences. So if I had one piece of advice for someone interested in a role like mine, it would be: do not neglect the “soft skills”.