At the beginning of the semester I assembled a team of students (Gabriel Burstyn and Songkai Zhao) to explore ways in which AI and machine learning can contribute to the academic study of early rabbinic literature. This is a burgeoning field and there are several others who have been doing interesting work in it, including Joshua Waxman, Ezra Brand, and Shlomo Friedman.
Much of our work to date has been exploratory. Part of that work has been technical: What can this technology actually do, and what are the most efficient ways to do it? The more interesting part of the work, though, is trying to understand how to apply these tools to academic questions. Can we use the technology to answer old questions or frame new ones? I have already taken a stab at this in a couple of articles co-authored with Michael Sperling on social network analysis (here and here) and another team has recently published an article that revisits an old question of Talmudic authorship using quantitative techniques (here).
Our first output is a tool that maps the similarity of word use in the tractates of both the Babylonian and the Palestinian Talmuds.
This is an experimental tool that uses machine learning techniques to show word similarities in the Babylonian and Jerusalem Talmuds. It will also allow you to compare how specific tractates use a word. It works by first mapping phrases (the length of which is determined by the “Window”) into a multidimensional matrix, then computing the distance between those occurrences, and finally sorting these occurrences into clusters based on the distances. The parameters are explained further in the menu. You can hover over the points on the visualization to see more data.
A similarity score of under 0.7 generally indicates that there is a likelihood of differing usage of the same word. We suggest that you begin with a K-Means and Hierarchical Cluster of “2” for exploratory purposes and then adjust from there.
Note that this presently works only with exact strings and you must use the Hebrew Unicode alphabet. So, for example, אמר and שנאמר are treated as two separate words.
To start:
- Enter the word you want to analyze.
- Choose a window size.
- Select the source (Yerushalmi or Bavli) for the chosen tractate(s).
- Select the desired tractate(s) from the dropdown menu.
- Click “Compare”.
Please note that the comparison may take 3 to 15 minutes to process. Comparisons of words that appear more frequently in the text may take longer.
The development of this tool has been supported by Brown University and the Center for Digital Scholarship at the Brown University Library. The texts have been downloaded from Sefaria and further refined by Michael Sperling. The code for this application can be found here: GitHub Repository.
You can access the tool here.
Is it useful? Well, we’re not entirely sure yet and that is why we welcome your comments. We have run a few proof of concept experiments. For example, we have looked at the term mincha in the Babylonian Talmud Berakhot (where we expect it to refer more to the afternoon prayer) and in Zevachim (where we expect it to refer more frequently to the sacrifice) and we did indeed find a high degree of dissimilarity as illustrated in the plot below:

On the other hand, we sometimes have found divergences that are harder to explain.
We encourage you to explore the tool and send feedback! We are currently working on expanding the tool to include stems and lemmas. At the same time, we’re exploring different applications that identify loanwords, locate texts by topic, and that map the citation network of academic Jewish studies. Updates as we have them.
Note that this is a different team than the one I blogged about previously (here) that is working on ancient inscriptions.