Data scientists are in high demand. The U.S. Bureau of Labor Statistics in 2018 listed big data–oriented mathematical professions among the country's top 30 fastest growing occupations, and the job site Glassdoor has placed data scientist at the top of its Best Jobs in America list each year since 2016—the same year Penn State launched its intercollege Data Sciences program.
"Companies realize that people who have the skills to extract knowledge from data are very important to their business," says Matthew Beckman, director of the Data Sciences program's option in Statistical Modeling. "And we're shaping our major based on that industry demand."
Data science's applications in industry are myriad, so multidisciplinarity is key—a reality that is reflected in the program's co-ownership by the College of Information Sciences and Technology (IST), the School of Electrical Engineering and Computer Science in the College of Engineering, and the Department of Statistics in the Eberly College of Science. As in any field, specialization also is key, and this fact is reflected in the program's three options: focus tracks in application, computation, and statistical modeling. The result—broadly cross-disciplinary and yet technically granular—mirrors the amalgamation that is real-world data science.
"One of the major strengths of the program, in my view," says Beckman, "is the structure of the collaboration between IST, computer science, and statistics. If the major had its home in only one of the colleges, there would be a risk that it could become just a rebranded computer science major, or a rebranded statistics major, or something else like that—and it wouldn't be long before it wasn't data science anymore. I'm convinced data science exists in the tension between these disciplines."
David Hunter, one of the program's founding faculty, adds, "Data science at Penn State is a broad enterprise, and we distinguish ourselves by being legitimately multidisciplinary. "A significant factor, he says, is the colleges' collaborative creation of new, data science–specific courses, integrated early in the curriculum. Previously, "there wasn't a course at Penn State where you could introduce a group of first-or second-year students to topics like learning how to build artificial neural networks and apply machine learning to big datasets," he explains. "That's something that typically would have happened—if at all—in the 300 or 400 level, at the earliest."
Another of the program's critical components is data ethics, the focus of a strategic partnership with Penn State's Rock Ethics Institute. "We wanted ethics to be an important component of this major from the beginning," Hunter says, "but we didn't have a course. All we could do was incorporate ethics as best we could into some of the other courses. We required a course on privacy, and there are serious ethical components to that, but we knew that down the road we wanted to have ethics clearly recognized with a permanent spot in the curriculum." With the Rock Ethics Institute, the College of IST has recently hired two new assistant professors to work specifically on data ethics, and a new course is slated for the curriculum in 2020.
With such an industry-driven program, though, there are significant challenges in developing and maintaining the curriculum. "We're trying our best to keep it as current as possible," Beckman says. "The skills are so dynamic, and the things that were important five years ago are quickly obsolete. We need to train our students so they're ready to adopt the things that emerge as important five years from now. It's not that far into the future, but that's how quickly things change."
Addressing those challenges, a group of discipline-diverse faculty from across the colleges who form the program's curriculum committee are developing supplemental course modules that can be readily adapted and provide deeper exposure in selected topics such as bioinformatics and astronomy—both fields that increasingly rely on data science. And Penn State's growing data science community provides crucial extracurricular support through clubs like the student-run Nittany Data Labs (which Beckman co-advises), events such as the American Statistical Association's annual ASA DataFest competition hosted by the statistics department, and initiatives like Teaching and Learning with Technology'sLearning Spaces at Penn State, which brings data science to bear on developing next-generation classrooms and teaching methodologies.
Although the Data Sciences program is still young and its graduates are relatively few, there are indicators of real success. A recent graduate of the Statistical Modeling option, Andrea Wan, promptly landed a position as a data analyst at Goldman Sachs. "I think the program prepared me well," she says. "Dr. Beckman is a great resource who introduced me to a lot of opportunities within the major, as well as in the department, that helped me to succeed."
From where he stands, Beckman is confident that the program is on the right track. "There are a lot of programs where students are exposed to some core skills on the technical side, some courses in an area of emphasis, and then it's up to them to close the gap and connect them," he says. "But the Data Sciences program does a really strong job of leading students to make those connections about how a data scientist does problem solving. I think it's just a terrific program."
Matthew Beckman is an assistant research professor of statistics and the director of Penn State's undergraduate programs in statistics.
David Hunter is a professor of statistics and the director of Penn State's online programs in statistics.