One of the challenges in teaching introductory physics for life-science students is making biologically authentic examples. Moreover, the differences in how different disciplines explore and express ideas making interdisciplinary connections difficult. One way to explore how students are making these connections is to ask them. What connections are our students seeing among the different introductory courses: introductory biology, calculus for life-science students, general chemistry, physics 131/132, and organic chemistry.
Polling these students can provide deep insights: what connections are the students making? Which connections would we like them to make that they are missing. However, these courses are large and parsing the results by hand quickly becomes impractical.
We are working to develop a code framework based on natural language processing to quantify and analyze written responses from undergraduate students taking introductory science courses. The responses used were taken from students in Physics 132 and Chemistry 251 (Orgo); eventually, the goal will be to expand this to also include Biology and Math as well. By linking the responses of each student across the classes they will take, we hope to gain some insight into the changes that the students experience as they progress through the UMass science trajectory.
In our first steps, we are using Pandas in Python which allows easy manipulation of data for analysis, especially for the relatively small (compared to what we as particle physicists are used to <10 000 events) Using the Pandas functionality, new functions were created to perform several new analyses on the student responses. The first simply found counts of the occurrence of each word and noun phrase in the total set of responses to a specific question. This may sound straightforward, but the computer method is a cast improvement on past efforts, which relied on students poring over responses and counting occurrences by hand. An example of the output of this is the word cloud seen in as seen below.

We are now moving on to a second part of the analysis: performing a sentiment analysis using Natural Language Processing (NLP) tools in Python. We chose TextBlob for its ease of use and the presence of desired functionality. Among these is an NLP tool which rates sentences based on ‘subjectivity’, measured between 0 and 1.0, and ‘polarity’, measured between -1.0 and 1.0.
Contributors
Cooper Wagner – Physics Ph.D. Student Brokk Toggerson – PI