Tagging the Talmud

This essay is cross-posted at thetalmud.com where it kicks off a series on digital humanities.

The other week I attended a workshop called Classical Philology Goes Digital Workshop in Potsdam, Germany. The major goal of the workshop, which was also tied to the Humboldt Chair of Digital Humanities was to further the work of creating and analyzing open texts of the “classics”, broadly construed. We have been thinking about adding natural language processing (including morphological and syntactic tagging – or, as I learned at the workshop, more accurately “annotation”) to the Inscriptions of Israel/Palestine project. While we learned much and are better positioned to add this functionality, I was must struck by how far the world of “digital classical philology,” focused mainly on texts, has progressed and it got me thinking about the state of our own field.

Running underneath the workshop was the uneasy knowledge that classical philology, as traditionally understood and practiced, is on a sharp decline. There is increasingly little interest on the part of students, administrators, and funding agencies to support, for example, the creation of textual editions of Greek and Latin texts. The gambit at the heart of this workshop is that going digital provides a new and more exciting approach to classical philology. Instead of focusing on individual texts, philology becomes a collaborative exercise in “big data” and “distant reading.” Electronic editions of each text are prepared with this in mind and then enter this wider corpus to which a variety of digital tools can be applied. At the workshop several of the larger initiatives, such as Perseus, Open Philology, and the Digital Latin Library, were discussed. All of these projects, unlike the Thesaurus Linguae Graecae for example, are open access. Open access is a critical part of this vision as its value is not simply its accessibility but its availability to digital analysis, revision, and reuse.

Most of the presentations dealt with practical issues dealing especially with standards and annotations: What does one do to a text (other than give free access to it) to maximize its scholarly utility? How does one annotate not only morphology, syntax, and named entities (e.g., proper names and places) but also actions and events? Canonical Text Services (CTS), an architecture for precise citation of digital texts, turns out to be particularly imporant, because it facilitates the linking of lines of text in one manuscript to another (thus allowing for the automated production of synoptic editions), to parts of images, and to various translations. Tools like iAlign (being used in Leipzig) were particularly interesting in this respect. Other presentations focused on creating treebanks, that is, something that looks like the sentence diagrams I had to do in middle school. These can then be analyzed and compared across texts for rhetorical similarities and differences.

Another area of focus in the classics is interextuality and the tools that can reveal citing or reuse of one text by another. One important site that shows its utility it Tesserae, which supports this kind of analysis across several Latin texts. TRACER is also a powerful tool for Latin. While this digital approach to classics and distant reading still has its strong critics (see, e.g., here), there is little question that these and tools like them will yield import, perhaps transformational, scholarship.

And this finally brings me to rabbinic literature. Where do we stand in relation to the application of the digital to the classics, and where are the opportunities? In some surprising respects, we are very much on or ahead of the curve. Large swathes of texts already have been digitized and some (e.g., Mechon Mamre and Sefaria) are committed to an open-access policy. The Bar-Ilan Responsa project contains a vast number of texts that are also tagged for morphology, allowing, for example, searches by lemma. Two sites in particular, the Lieberman Institute and the Friedberg Jewish Manuscript Society contain digitized manuscripts, transcriptions, and some kind of CTS architecture that links images and different transcriptions, although neither allows for fully automated open access. One model is the Digital Mishnah which is open-access and has many of the features noted above.

The question that occurred to me in Potsdam is how we deploy and utilize these resources to move us into the age of “big data” and to make possible the kinds of larger-scale, cross-corpus analyses that our colleagues in classics are beginning to do. We have only just begun to think about visualizing the links between documents (as in this example from Sefaria) and, in general, applying the approach of “distant reading” to the rabbinic corpus (see, for example, the dissertation of Itay Marienberg-Milikowsky). What are the opportunities and how do we get there?

To be transparent, I confess that I have dreams. I am intrigued by the idea of creating digital editions of rabbinic texts. I would like to see links between images, transcriptions, and different translations. I would like to be able to map places and events found in this literature and to create a social network analysis of the rabbis. Using treebanks as a new approach to form analysis could be exciting. Further down the road, perhaps literary and formal structures of talmudic sugyot could be created at the push of a button. What kinds of questions would these analyses allow us to answer? What kinds of new questions would we ask?

But open-access digital editions are the first step. These editions would ideally use a CTS architecture and include multiple manuscript transcriptions and images; morphological, lexical, and syntactic annotation; links of words to such places as Ma’agarim and the Comprehensive Aramaic Lexicon; and annotations of named entities. Given what has already been done, should the various resource owners desire to cooperate, such editions might be easily and quickly produced. Of course, any efforts to do this would have to be accompanied by a viable and sustainable financial model.

In the United States and Israel (and I suspect Europe as well), the traditional practice of “rabbinics” in secular universities has, like classical philology, is in a precarious position. It is increasingly important for the survival of the field to make our texts relevant to larger academic concerns. Big data and distant reading are not the only possible approaches to making rabbinic literature more relevant, but they are receiving increased attention (and funding) and offer a largely unexplored set of new research possibilities.

This is a curve we can get ahead of. Any takers?

Share this:

Reader Interactions

Trackbacks