This last semester I taught a new course, “Happiness and the Pursuit of the Good Life,” the goal of which was to put positive psychology and religious texts into conversation. The class was overwhelmingly popular. I ended with around 400 students, and had to turn away many others. But was it a success?
Every semester, in every course, I struggle with this question. I have found it useful to break this question into a series of other, more specific questions:
- How would I subjectively judge the overall quality of the students’ written work?
- How would I subjectively judge the quality of our in-class conversations?
- Was I successful in creating a classroom environment that fostered learning and encouraged students to give their best?
- Did I learn and grow from teaching the class?
- Did students seek out opportunities for engagement outside of the classroom?
- How did students feel about the class, and did they feel that they learned?
That last question specifically, and ironically, is perhaps the easiest to answer. Ironically, because it is in many respects the least important in thinking about the “success” of a class: we are not always in the best position to evaluate our own learning (or the success of one’s own classes). There are too many confounding cognitive biases at work. Moreover, actual student evaluations are famously biased and problematic for many reasons. Hence, in nearly all of my courses I require a separate final reflection paper in which students frankly assess their own learning and places to grow. This paper plays no role in their evaluation. Far from perfect, but I have found it useful and think that many students have as well.
I actually read all 400 of the final reflections for this class, but they are hard to get a handle on. Particularly good and critical ones stand out, but how is one to assess the larger bulk of them? This made me think about “distant reading”, and whether digital tools can be profitably applied to a large set of papers like this. That set me off on trying to write and deploy a Python program that does two things: (1) creates a topic model from a set of student papers; and (2) also creates a Word Cloud. I am not sure that doing so for these final reflection essays really told me much, but I am still intrigued by the potential usefulness of the technique and want to spend the rest of this post sharing the method. In my next post I’ll return to discuss the Happiness course.
The Python program I wrote is available here, as a Jupyter Notebook on Github. It does not require much familiarity with programming, although a basic knowledge (e.g., how to set up and run a Jupyter Notebook) is necessary. We use Canvas at Brown, so the first step, prior to using the program, is downloading the Zip file of the papers, all of which are in docx format and then extract these into a separate folder. The program converts the individual docx files into txt files, and then merges the documents into a single txt file.
The next step is the most annoying and iterative. The text has to be preprocessed, which means stripping out punctuation and other useless text, including a prepackaged list of “stop words”, words like “and”, “but”, and “or” that are common but not useful. This list, though, has to be expanded in line with the particular papers. A set of papers, for example, might include the name of the course and the professor, both of which would throw off the processing when repeated in every paper. Additionally, I have found the preprocessing throws off a lot of junk words, that need to be identified and then added to the stop words. So each set of papers requires its own process of looking at the results, adding new stop words, and rerunning to get new results until something more useful is achieved.
I then topic model the set of papers. This is a process that identifies distinctive clusters of words that appear together. I set the number of topics that I want, a parameter determined through trial and error and that will change between sets of papers. For this set of 400 final reflection papers, I chose to create five topics that looked like this:
[(0, ‘0.010*”religious” + 0.008*”feel” + 0.008*”learned” + 0.006*”work” + ‘ ‘0.005*”many” + 0.005*”need” + 0.005*”found” + 0.005*”different” + ‘ ‘0.004*”even” + 0.004*”journaling”‘),
(1, ‘0.009*”learned” + 0.007*”people” + 0.007*”much” + 0.006*”happy” + ‘ ‘0.006*”self” + 0.006*”elephant” + 0.005*”lot” + 0.005*”take” + ‘ ‘0.005*”things” + 0.005*”gratitude”‘),
(2, ‘0.007*”work” + 0.006*”learned” + 0.006*”readings” + 0.006*”things” + ‘ ‘0.005*”much” + 0.005*”found” + 0.005*”take” + 0.005*”self” + 0.005*”lot” + ‘ ‘0.005*”different”‘),
(3, ‘0.008*”learned” + 0.007*”feel” + 0.006*”learning” + 0.006*”things” + ‘ ‘0.005*”even” + 0.005*”learn” + 0.005*”found” + 0.005*”much” + 0.005*”lot” + ‘ ‘0.004*”enjoyed”‘),
(4, ‘0.007*”elephant” + 0.007*”take” + 0.006*”things” + 0.006*”still” + ‘ ‘0.006*”work” + 0.005*”feel” + 0.005*”improve” + 0.004*”xmlmn” + ‘ ‘0.004*”learning” + 0.004*”questions”‘)]
There is still some junk in the words (“xmlmn”) that I was too lazy to go back and strip out (one finds that junk is often replaced with more junk). It is less clear whether I would have gotten more useful results if I stripped out words like “feel” or “learned”, but in the end I felt like they told me something so I kept them. The program then produced a word cloud, above.
Maybe the most important takeaway from these data is that students tended to report that they learned, that the course helped them to think, and that it touched them. The journaling component of the course was effective for many of them, and many felt more aware of the power of gratitude. Of limited value, but it’s a start.
For this post, though, more important is the method. If you are able to deploy this and use it with better results, please drop a comment!