At the Association for Jewish Studies Annual Meeting in 2018 I delivered a kind-of “state of the field” talk with some reflections on applying digital humanities to the study of Jews and Judaism in antiquity (really, Late Antiquity). I am still working on this paper and will eventually post it, but in the interim I thought it would be useful to make the handout available here, in the form of a page that can be updated. Please contact me if you know of other useful sites!
Digital Humanities
Naming Rabbis: A Digital List
A little over five years ago I posted an idea about creating a social network analysis of the rabbis found in classical rabbinic literature. In the interim I have thought a lot about this project but have done very little on it. I still believe it is worth doing, though, and I have finally taken a concrete step forward.
First, a brief justification for the project. According to several scholars, most of the rabbis we find in classical rabbinic literature (e.g., the Babylonian Talmud) worked in small disciple circles rather than larger educational institutions (e.g., the yeshiva) which only began to emerge at the end of Late Antiquity. These small circles formed a loose network. Students, for example, could move back and forth between circles, transmitting knowledge. It is unclear how often or in what contexts individual rabbis would have contact with their colleagues.
Rabbinic literature, particularly the reports of cases and individual sayings (as opposed to the less reliable stories, or aggadah), mentions interactions between these rabbis. Sometimes these statements note simply that “rabbi x said in the name of rabbi y,” others mention biological or pedagogical relationships between them. The idea is to use social network analysis to visualize these networks. How did knowledge move within and between the nodes (circles of rabbinic disciples) of this network? Who are the rabbinic connectors, leaders, and isolates? How did information move between circles in Palestine and Babylonia? Where do we locate “center” and “periphery”? Does a visualization help us?
The first step to creating such a visualization, not to mention other larger scale projects that involve rabbinic literature, is to create a digital list of all of the rabbis mentioned in the literature. Next, with the help of a collaborator, we will seek to identify the “connecting words” that describe all the different relationships between rabbis and to code these relationships into a software package that will perform the visualizations.
The first step is finally complete. We created a spreadsheet of all the names of rabbis as enumerated in the three volume Hebrew work by Aaron Hyman, Toldot Tanaʾim ṿe-Amoraʾim : mesudar ʻa. p. a.b. ʻim beʾurim ṿe-hagahot ṿe-girsaʾot shonot (London, 1910). Since this list can be used in other digital projects as well, I am making it available here (Rabbis_Names) in Excel. We broke out each of the components of the name into a different column and indicated where more than one name can be referring to the same individual. The spreadsheet should be easy to use for those who know how to work with such things.
And for the rest of you, in case you were wondering, there are somewhere in the range of 5,000 individual rabbis named in the literature surveyed by Hyman.
Tagging the Talmud
This essay is cross-posted at thetalmud.com where it kicks off a series on digital humanities.
The other week I attended a workshop called Classical Philology Goes Digital Workshop in Potsdam, Germany. The major goal of the workshop, which was also tied to the Humboldt Chair of Digital Humanities was to further the work of creating and analyzing open texts of the “classics”, broadly construed. We have been thinking about adding natural language processing (including morphological and syntactic tagging – or, as I learned at the workshop, more accurately “annotation”) to the Inscriptions of Israel/Palestine project. While we learned much and are better positioned to add this functionality, I was must struck by how far the world of “digital classical philology,” focused mainly on texts, has progressed and it got me thinking about the state of our own field.
Running underneath the workshop was the uneasy knowledge that classical philology, as traditionally understood and practiced, is on a sharp decline. There is increasingly little interest on the part of students, administrators, and funding agencies to support, for example, the creation of textual editions of Greek and Latin texts. The gambit at the heart of this workshop is that going digital provides a new and more exciting approach to classical philology. Instead of focusing on individual texts, philology becomes a collaborative exercise in “big data” and “distant reading.” Electronic editions of each text are prepared with this in mind and then enter this wider corpus to which a variety of digital tools can be applied. At the workshop several of the larger initiatives, such as Perseus, Open Philology, and the Digital Latin Library, were discussed. All of these projects, unlike the Thesaurus Linguae Graecae for example, are open access. Open access is a critical part of this vision as its value is not simply its accessibility but its availability to digital analysis, revision, and reuse.
Most of the presentations dealt with practical issues dealing especially with standards and annotations: What does one do to a text (other than give free access to it) to maximize its scholarly utility? How does one annotate not only morphology, syntax, and named entities (e.g., proper names and places) but also actions and events? Canonical Text Services (CTS), an architecture for precise citation of digital texts, turns out to be particularly imporant, because it facilitates the linking of lines of text in one manuscript to another (thus allowing for the automated production of synoptic editions), to parts of images, and to various translations. Tools like iAlign (being used in Leipzig) were particularly interesting in this respect. Other presentations focused on creating treebanks, that is, something that looks like the sentence diagrams I had to do in middle school. These can then be analyzed and compared across texts for rhetorical similarities and differences.
Another area of focus in the classics is interextuality and the tools that can reveal citing or reuse of one text by another. One important site that shows its utility it Tesserae, which supports this kind of analysis across several Latin texts. TRACER is also a powerful tool for Latin. While this digital approach to classics and distant reading still has its strong critics (see, e.g., here), there is little question that these and tools like them will yield import, perhaps transformational, scholarship.
And this finally brings me to rabbinic literature. Where do we stand in relation to the application of the digital to the classics, and where are the opportunities? In some surprising respects, we are very much on or ahead of the curve. Large swathes of texts already have been digitized and some (e.g., Mechon Mamre and Sefaria) are committed to an open-access policy. The Bar-Ilan Responsa project contains a vast number of texts that are also tagged for morphology, allowing, for example, searches by lemma. Two sites in particular, the Lieberman Institute and the Friedberg Jewish Manuscript Society contain digitized manuscripts, transcriptions, and some kind of CTS architecture that links images and different transcriptions, although neither allows for fully automated open access. One model is the Digital Mishnah which is open-access and has many of the features noted above.
The question that occurred to me in Potsdam is how we deploy and utilize these resources to move us into the age of “big data” and to make possible the kinds of larger-scale, cross-corpus analyses that our colleagues in classics are beginning to do. We have only just begun to think about visualizing the links between documents (as in this example from Sefaria) and, in general, applying the approach of “distant reading” to the rabbinic corpus (see, for example, the dissertation of Itay Marienberg-Milikowsky). What are the opportunities and how do we get there?
To be transparent, I confess that I have dreams. I am intrigued by the idea of creating digital editions of rabbinic texts. I would like to see links between images, transcriptions, and different translations. I would like to be able to map places and events found in this literature and to create a social network analysis of the rabbis. Using treebanks as a new approach to form analysis could be exciting. Further down the road, perhaps literary and formal structures of talmudic sugyot could be created at the push of a button. What kinds of questions would these analyses allow us to answer? What kinds of new questions would we ask?
But open-access digital editions are the first step. These editions would ideally use a CTS architecture and include multiple manuscript transcriptions and images; morphological, lexical, and syntactic annotation; links of words to such places as Ma’agarim and the Comprehensive Aramaic Lexicon; and annotations of named entities. Given what has already been done, should the various resource owners desire to cooperate, such editions might be easily and quickly produced. Of course, any efforts to do this would have to be accompanied by a viable and sustainable financial model.
In the United States and Israel (and I suspect Europe as well), the traditional practice of “rabbinics” in secular universities has, like classical philology, is in a precarious position. It is increasingly important for the survival of the field to make our texts relevant to larger academic concerns. Big data and distant reading are not the only possible approaches to making rabbinic literature more relevant, but they are receiving increased attention (and funding) and offer a largely unexplored set of new research possibilities.
This is a curve we can get ahead of. Any takers?
Digital Preservation
Over the past few years, institutional digital repositories and more broad-based digital “commons” have proliferated. Many are found at universities (Brown now has one) and sites such as Zenodo and Humanities Commons. Such platforms serve two purposes. First, they provide a (relatively) stable environment that can preserve digital data. Second, they serve as a digital repository through which scholars can make their work freely accessible to others. While it is this latter potential that has largely attracted scholarly attention and enthusiasm (Sarah Bond recently argued that depositing one’s work in a “commons” is far preferable to putting it on academia.edu, a for-profit entity), I have been thinking lately more about the issue of preservation and specifically how it relates to the modern creation of scholarly and artistic works.
The underlying problem is a simple one: most scholarly and creative work today is done digitally. My own file cabinets are almost empty now and are largely uninteresting; my Dropbox folder, on the other hand, is a teeming, semi-organized mess of notes, manuscripts, drafts, correspondence (although most of that is in my email account), grant applications, and the like. Now, I like to muck around in scholarly archives and to read biographies of scholars that are based on such archives. And while I can’t imagine that anyone will be interested in my particular files, I wonder about the archives of scholars and artists who led genuinely creative and interesting lives. What happens to that part of their literary legacy that remains alive only in digital form?
Lest my concern be taken as mere narcissistic rumbling (scholars worried about the unfinished work of other scholars? Who cares?), let me offer a more pointed example. I have had occasion recently to have a series of discussions with archaeologists about their process of publication. Archaeological excavations generate reams of data. Locations of walls and strata changes need to be carefully noted and objects – especially the often enormous quantity of pottery sherds – each need to be carefully logged and catalogued. Pictures are taken. In the past, these records were kept in field notebooks and other written forms. At the end of an excavation, a final report is produced. The data upon which that report was produced were, ideally, filed away in a personal or institutional archive.
As with any scientific data, archaeological data are valuable. Other archaeologists use them to test the reconstructions and hypotheses. Later archaeologists frequently want to ask new questions of the older data. Since archaeology is by nature destructive, often all that remains of a site is whatever previous archaeologists chose to note.
Today, nearly all archaeological recording is done electronically. There is no single way that archaeologists do this recording. Some use off-the-shelf proprietary databases, others might create their own data managements systems. In any case, when the final report is completed and the excavation is wrapped up, all of that data goes… well, where exactly?
Unlike scientists, many archaeologists and humanists have not thought very hard about the preservation of digital data. Scientists routinely deposit their raw data in institutional repositories and are called upon to articulate their digital data management and preservation plan on many grant applications. The paths open to others are less clear.
Digital platforms offer many opportunities for archaeologists and other scholars who wish to more fully document their work. An archaeological report, for example, can be published online and linked with all of this raw data. A literary work can be linked to previous drafts. There are many fascinating attempts in the digital humanities to make the data and process behind completed works more transparent. Such projects, however, generally involve a significant investment of resources. It may be ideal but remains unrealistic to expect that every scholarly product will be disseminated with links to the raw data and process from which it emerged.
It is with cases like scholarly archives and archaeological data that institutional digital depositories and digital commons provide a simple and inexpensive solution. Upon completion of a project (or a career) the entire archive of data can be converted to an accessible format (e.g., XML); bundled together with more or less organization; and then deposited. Each deposit receives a library (MARC) record, which makes it visible to the entire web (especially if the repository is part of the OCLC). One could imagine cases where someone might want to embargo these archives for a time (easy enough to do) but ordinarily such data would then be easy to locate and freely accessible.
The most involved part of this process would be conversion of data into a standard, open format such as XML. Most common proprietary programs (e.g., Microsoft Word), however, already have this capability. In other cases, the software that would perform this transformation is relatively easy and inexpensive to produce. The advantage of this conversion is that keeping the data in a simple, open access format allows future users to reconvert the data to whatever format they find most useful rather than forcing them to find the same proprietary program compatible with older data. The whole conversion process might take a few hours; the final step, for example, of wrapping up an excavation report. Scholars might bundle the XML folder of their work with their email and, voila, an archive is produced.
At worst, such a digital work would sit unaccessed and unused – hardly a unique situation for scholarly works or archives. Yet the costs, in time, money, and digital storage, are low (and continue to drop). It is time to think about digital preservation as a staple of our “best practices.”
Create, Process, Link: Some Final Thoughts on The Big Ancient Mediterranean Conference
Now back home it will take me a while to process what I’ve learned at The Big Ancient Mediterranean Conference, and even longer to work through my new, vastly expanded, to-do list. Here I want only to sketch out a few thoughts. I don’t think that any of them are particularly original but having the intellectual space and dialogue to focus on them helped me to work through and articulate them for myself.
First, I think that it is heuristically useful to think of digital humanities (DH) projects as being of three types: data creation, processing tools, and aggregators or linkers. The data creators (some of the more impressive representatives at the conference were Nomisma, Open Philology, Corpus Scriptorium, and the emerging and impressive Digital Latin Library) make digital data. The tools, such as those that do social network analysis (e.g., Gephi), natural language processing (xrenner), or plotting make that data not just accessible but also useful. And the linkers (Trismegistos, Pleaides) link different sorts of data, most often from different sites, for a variety of purposes. I find that thinking about DH projects this way is useful even if some projects fall between these cracks and most do more than one of these things.
While I think that the “linkers” are some of the more exciting DH sites, it all starts with the data. Data creation isn’t sexy. It also is of limited use if they are created for only one site or purpose. If one is going to go through the laborious process of creating digital data, one may as well try to make them not just accessible but useful. That requires structuring data in a way that existing tools can, with minor modifications, process them; including URIs so that linkers can reuse them; creating APIs to give computing access to them; and encoding them in an open rather than proprietary format so that they will be accessible when software standards change. This also applies, mutatis mutandis, to tools and aggregators. Tools should be designed to apply to a wide swath of structured data and aggregators function at their best when they can harvest or scrape data from a large number of sites.
For Inscriptions of Israel/Palestine, the road has been long and slow in large measure because the site was created ahead of the standard structures and the very existence of URIs. Over the two decades of the projects existence, we have had to transform our data several times. The transformations from SGML to XML and from our schema to EpiDoc were some of the more traumatic ones. Each required not only the custom development of an automated process but also manual cleaning and refining of the data (some of which we are still doing). Now we must add URIs to allow geographical and chronological linking. Each of these transformations was costly and I predict – despite assurances that we now have stable standards – that there will be more to come. These projects, even the data collections, are never fully complete or stable. I’m not sure how one prepares for this but it is an inevitable, and for the scholar frustrating, part of any DH project.
This brings me to a second thought. In the past DH often fell somewhere between the administrative cracks of IT and the library. In recent years the weight has shifted to the library and it has become increasingly clear to me that that is a good thing. Each of these projects – whether a data collection, a tool, or an aggregator – carries within it new knowledge. Hence, it requires preservation. We preserve in an accessible format almost all printed scholarly materials, no matter how useless or bad. The same principle needs to apply to digital projects. With the creation of digital repositories and the low cost of storage this should not be overly difficult. This includes software: Github, now a favorite place to store code and DH data, will eventually disappoint us. Similarly, just as libraries preserve new knowledge so too do they have methods for cataloging and finding it. There are already a bewildering array of digital projects and they are not systematically cataloged, whether they are active, on the “way back machine”, or mothballed. Cataloging and the development of finding aids are desiderata. In the interim, for those who work in classical antiquity. two lists, here and here, are useful although incomplete and imperfect are useful.
A third issue is the very definition of scholarship. Although I am now part of several overlapping conversations that are wrestling with the nature of DH scholarship I cannot say that I am much closer to an answer. Data collection, on its face, shouldn’t be “scholarship” – but then isn’t the creation of print critical editions of texts, which largely involves collation, considered scholarship? Digital tools – programs – are at heart intellectual models: just as in a monograph, you input data and you emerge with a synthesis or intellectual product. One of the key differences, in fact, is that scholars writing books are often not as rigorous or explicit about their assumptions and methodologies as is a computer program. Linkers bring together, even if they don’t synthesize, data in new ways that create research questions and drive our conversations – doesn’t theory do this? I am not claiming that these should all count a priori as “scholarship”, but it points to a critical need for scholars (especially those in positions of power who hire and tenure) to wrestle seriously with possibility that the meaning of scholarship is shifting in a sharp but recognizable way, and that that is not necessarily bad.
A final thought in an already too-long blog post. The issue of audience needs to be taken seriously. A scholarly DH project might justifiably be directed at just a few hundred kindred scholars, just as journal articles or monographs are. I think for most scholars engaged in DH, though, that seems unsatisfying. We recognize the enormous potential of these projects not just to speak to specialists but also to teach students and engage a wider public in intellectual pursuits in which we are deeply invested. The challenge is realizing that potential. Sites need to be designed to address and engage multiple audiences and that is no easy feat. It usually involves creating separate views or portals which is a costly endeavor – the cost of a good accessible interface could run between $15,000-$40,000. Moreover, we do not yet have good usability studies for such projects or often the infrastructure or resources to conduct them. Here perhaps we can better draw on the intellectual resources of our academic colleagues in the business schools and psychology who study and teach such things.
I owe special thanks to the organizers of this conference, Professors Sarah Bond and Paul Dilley. They created a conference that was of high intellectual value, paced humanely, with a collegial environment that facilitated useful interactions, all the while using remote technologies judiciously and effectively. As one who has organized several conferences, I know that this is no mean accomplishment.
The conference tweets have been storified and can be seen here, here, and here.