Statistical and Machine Learning

Machine learning is a branch of artificial intelligence that enables computers to learn and improve from experience without being explicitly programmed. Machine learning and statistics are deeply intertwined fields that share the common goal of extracting insights from data. Statistics provides the mathematical foundation for machine learning, offering frameworks for probability, inference, hypothesis testing, and uncertainty quantification. While traditional statistics emphasizes interpretability, rigorous assumptions, and understanding why models work, machine learning often prioritizes predictive accuracy and scalability to large, complex datasets. Modern practice increasingly blends both perspectives: statisticians adopt computational ML techniques for high-dimensional problems, while machine learning researchers incorporate statistical principles to ensure models are reliable, fair, and theoretically sound. Together, they form the backbone of data science.


Machine learning research in Penn State's Statistics Department emphasizes both theoretical foundations and applications. The department specializes in a wide range of topics, including but not limited to statistical learning theory, kernel machines, deep learning, optimal transport, graphical models, causal inference, learning from dependent data, fairness, and privacy (Bharath Sriperumbudur, Lingzhou Xue, Runze Li, Bing Li, Jia Li, Alexandra Slavkovic, Hyebin Song, Michael Schweinberger, Yubai Yuan). Machine learning applications to many related areas are explored, including bioinformatics, public health, climate and meteorology, social sciences, and computer vision (Qunhua Li, Le Bao, Francesca Chiaramonte, Runze Li, Jia Li, Justin Silverman). The department is also connected to Penn State's broader machine learning ecosystem, including Penn State’s AI Hub, which bridges statistics, computer science, and other disciplines for collaborative research.

Faculty

Professor and Associate Department Head
Email:
Professor of Statistics, Dorothy Foehr Huck and J. Lloyd Huck Chair in Statistics for the Life Sciences
Email:
Professor of Statistics and Computer Science
Email:
Assistant Professor of Information Science & Technology, Statistics, and Medicine
Professor of Statistics and Mathematics
Email:

Image
image of baseball fantasy sports web page

Faculty and Student Research Collaborations

Machine Learning for Fantasy Sports Betting

A student-led project, led by Penn State Statistics graduate student Isaac Wright, together with undergraduate students Mallet James, Jeffrey Lunger, and Kyle Kroboth, and advised by Associate Teaching Professor of Statistics Dr. Andrew Wiesner, have developed a software that uses Machine Learning (ML) applied to daily fantasy sports betting. With the fantasy sports industry projected to grow to $48 billion by 2027, it is not hard to imagine that ML tools and software could provide a value-added service by operators. “As a group who enjoys both playing fantasy sports and machine learning, it felt natural to build a project around our dual interest,” says Isaac. Thus, the idea of building consistent winning fantasy sports lineups for the Major League Baseball through a data-driven approach was born.

This student-led project has already yielded concrete results. The group created a website for people interested to follow their results and progress, and they also plan to release an interactive web app where people can play around with building their own fantasy sports lineups using the group’s algorithms. The project, however, was not without challenges. After a few weeks since the start of the project, Penn State shifted to remote learning and because of adapting to the working environment and life during COVID, the project was put on pause for several months. After restarting, the project was scaled back from building software for the MLB and NFL to building only the MLB software.

According to Isaac, the most rewarding aspect of the project was watching everyone come together and give time to the project, driven by their passion for sports and data science. “The true beauty of such projects,” writes Dr. Wiesner, “is that the success falls solely on the self-motivation of those involved.  There is no money, no credit, just the willingness to learn and improve their analytical skills.” As for the future, the team has big plans; they have talked of building a company around the proprietary software or writing a paper and making everything available through the blog. Either way, they are excited to see what they can do.