Digital Humanities

How Do You Teach a Computer to Read a Broken Ancient Inscription?

May 8, 2026 by admin Leave a Comment

Extracting structured data from unstructured or semi-structured data has been a long-standing problem in the digital humanities. Humanities data is inherently, and in most respects I would say wonderfully, diverse. Texts, performances, music, art, among many other things – with each example reflecting the weird workings of an individual mind – testify to the amazing creativity of the human spirit.

At the same time, though, they present a thorny problem for those who wish to analyze these artifacts with digital tools. I had a Latin teacher who, when we hit a particularly difficult construction or syntactical problem, used to say something to the effect of, “hit it until stops moving, then analyze it.” Digital analysis is kind of like that. Digital tools require consistency. Humanistic artifacts resist it. One of the major barriers to the wider adaptation of digital tools in the humanities, I suspect, is the abhorrence of scholars to this process of flattening. Most of us were drawn into our fields by the very richness, color, and ineffability of our source material. Digital analysis necessarily bleeds off much of this.

Yet as even the most hardened humanist would grudgingly admit, there is a place for digital analysis. Approaches such as distant reading have demonstrated their usefulness, especially when we approach a corpus of material so large as to be beyond the human capability to read, no less analyze. Some corpora lend themselves better than others to this kind of analysis.

Ancient inscriptions, for example, comprise such a corpus. There are hundreds of thousands of extant Greek and Latin inscriptions from the Greek and Roman worlds alone. These inscriptions, as a whole, have a bit less flourish and variety than literary texts and works of art. Many mark gravesites or honor a donor and follow certain generic formulae. Most are short. Even the longest of inscriptions is dwarfed by a short story. They are also fantastic historical sources, adding an entirely new dimension to what literary texts tell us about the past. Inscriptions, in many respects, comprise an ideal dataset.

Yet it turns out that in many other respects they do not. Many are fragmentary or contain unrecoverable gaps or illegible handwriting, full of misspellings. They are sometimes written in odd ways, like winding around a column, up a stone, or scattered about a mosaic. Their original context is not always known, making their interpretation speculative. Despite what appears to be a set of conventions around the scholarly publication of inscriptions, these publications are deeply uneven and inconsistent. Scholars, like everyone else, have different levels of expertise and do not always share the same vocabulary. When an inscription is said, for example, to be from “the Hellenistic era,” what particular dates are being referenced? Is my “etched” your “carved”? Why should I trust that you got this critical but blurred letter or that longer reconstruction right? Moreover, inscriptions are a moving target. New ones are constantly found and published as journal articles in many different places and languages. Any corpus of inscriptions is provisional. New inscriptions are constantly discovered, and old readings are continually revised.

Scholars were not slow to recognize the allure of moving inscriptions (“epigraphy” is the term that scholars use to denote the study of inscriptions) to a digital platform. The pathbreaking work by the Packard Humanities Institute demonstrated, before the Internet, that being able to do simple searches on large quantities of inscriptions could transform scholarship. There was much more work to be done, and teams quickly formed advance digitization. And that is when the allure met the challenge.

As scores of teams sought to digitize particular subsets of inscriptions, they soon understood that digitized inscriptions need a shared structure to be truly useful. For example, if I want to search for all inscriptions written in Latin using Greek characters that were produced in Rome between 100 BCE – 100 CE, the data would need to be encoded to allow searches by date, language, script, and location. It was also immediately understood that a shared data structure would facilitate the development of more useful interfaces and analytical tools and would allow users to search across multiple digital collections.

It was not only the information about the inscription – known as the metadata – that needed to be standardized. Already decades ago, scholars have banded together in a massive team effort create a shared digital language (or tags) to describe the physical features of texts. Known as the Textual Encoding Initiative (TEI), this on-going project creates and maintains standardized descriptive tags that allow texts to be rendered and exchanged across multiple digital platforms. That is, it makes them interoperable.

Different texts have different kinds of features, and those who deal with ancient papyri and ostraca (texts written on shards of pottery), particularly in library settings, began to adapt the TEI schema for their own needs. Thus when epigraphers, who generally work outside the better financed institutions such as libraries, began looking to standardize their own work they turned to the papyrologists.

A seventh-century ostracon from Egypt now at the Metropolitan Museum of Art (record here)

The upshot of this effort was the development of a customized form of TEI that is now known as EpiDoc.

Digitizing an inscription into EpiDoc is not a trivial task. Information about an inscription, such as its context, size, find location, current location, language, place, etc., all must be entered into special fields in some database system (or directly given strict, uniform tags). This takes time and expertise. Even more laborious, though, is the digitization of the inscription’s text, even at its most simple level. Epigraphers typically (and ideally) transcribe an inscription in two formats. The first is called a diplomatic transcription, and seeks to record the inscription as it appears, with all the gaps, misspellings, etc. The second is sometimes called a normalized transcription. Both employ a specialized set of typographical markers.

Let me illustrate with an example. The inscription below is located in a church in Rome and dates to the middle of the fourth century CE. It has been digitally republished in the Epigraphic Database Bari (see the record here) and is shared under a Creative Commons license.

The editors do not supply a diplomatic transcription for this Latin inscription, but it is easy enough to see the damage around the inscription and the its lack of spacing and punctuation. They do supply a normalized transcription:

[— cum sa]ṇctis aeterṇ[am]
[domum] Ṃarcianus e[t —]
[—]ne compare[s —]
[— s]ibi fecerun̂t [—]

The typography follows what epigraphers call the Leiden convention (or Leiden+ conventions). The brackets with the dashes indicate gaps. The dot under the letter M means that the letter is unclear. Words in the brackets are reconstructions. This appears to be a funerary inscription for the Christians Marcianus and his wife, who “made this for themselves,” who now wish to dwell with the saints.

Now, though, look at an EpiDoc rendering:

<div type=”edition” subtype=”transcription”> <p> <gap reason=”lost” extent=”unknown” unit=”character”/> <supplied reason=“lost”>cum sa</supplied><unclear>ṇ</unclear>ctis aeter<unclear>ṇ</unclear><supplied reason=”lost”>am</supplied> <lb/>
<supplied reason=”lost”>domum</supplied> <unclear>Ṃ</unclear>arcianus e<supplied reason=”lost”>t</supplied> <gap reason=”lost” extent=”unknown” unit=”character”/>
<lb/> <gap reason=”lost” extent=”unknown” unit=”character”/>ne compare<supplied reason=”lost”>s</supplied> <gap reason=”lost” extent=”unknown” unit=”character”/>
<lb/> <gap reason=”lost” extent=”unknown” unit=”character”/> <supplied reason=”lost”>s</supplied>ibi fecerun<unclear>̂</unclear>t <gap reason=”lost” extent=”unknown” unit=”character”/> </p> </div>

This is a relatively uncomplicated inscription that has gaps, supplied texts, unclear letters, and line breaks. By encoding it this way, any platform attuned to TEI should be able to render it correctly in whatever typographical conventions it uses. Other programs that analyze such texts also can better process it. For example, a program creating word lists might include the word “domum” along with a notation that the word has been supplied or is doubtful.

Other inscriptions are more complicated. I run a project that digitizes the inscriptions of the area of Israel/Palestine that date from the sixth century BCE to the seventh century CE (Inscriptions of Israel/Palestine, or IIP). Below is a fragment of a funerary inscription for a man named Samuel and his family inscribed on a lintel that was found completely out of context in a backyard near the city of Sepphoris.

The full record is here. The EpiDoc rendering is:

<p>Σαμουῆλος υἱὼς <gap unit=”character” extent=”unknown” reason=”lost”/> 
<lb/>γαμετὴ αὐτοῦ <orig>Θ</orig><gap unit=”character” extent=”unknown” reason=”lost”/>
<lb/><expan><abbr>κα</abbr><ex>ὶ</ex></expan> σ<supplied reason=”omitted”>ύ</supplied>νγων<supplied reason=”lost”>οι</supplied> <gap unit=”character” extent=”unknown” reason=”lost”/></p>

In this case the editor had to make several decisions about how to encode this inscription. Note that it is in Greek and damaged at the end of the lines. It also has abbreviations and missing letters., especially in the last line.

The complexity of these decisions and the actual manual encoding of inscriptions, particularly multilingual ones (and even more so when they are in right-to-left languages like Hebrew and Aramaic), makes the process costly. The scale of the problem is enormous. Even short inscriptions can take trained specialists an hour to encode correctly, and there are tens of thousands still unpublished or undigitized. This is precisely the kind of repetitive but expertise-intensive work that recent AI systems may be able to assist with.

Over the past year, I have been working with a team at the Center for Digital Scholarship at the Brown University Library on exploring the use of AI, and specifically LLMs, to do this work more efficiently. I have previously reported on some early probes. We are now wrapping up the first phase of our work, and I recently had an opportunity to present it at the Tenth Epigraphy.info Workshop. At that meeting I was thrilled to discover that there are several scholarly teams working on applying various aspects of AI to inscriptions. I will discuss our paper as well as some of these initiatives in my next post.

What is Jewish Studies?

December 9, 2025 by admin

What does Jewish studies “look like”? This:

Let me explain.

Last year I set out to try to understand what we might mean by the academic field of “Jewish studies.” I had done some previous work on networks on citation analysis and so figured that this was as good a place as any to start. This was a project that I thought would be easier than it turned out to be, but now, just about a year later, I have some preliminary insights.

I started with the text of about 20,000 research articles over forty years found in about thirty Jewish studies journals. The text, supplied by JSTOR, was in JSON files. Working with a student, Gabriel Burstyn, we used an LLM running on a local machine to extract the citations, allowing us to map hundreds of thousands of citations. We cleaned the data and then began to analyze it.

The goal was to see how these citations clustered and the relationships between these clusters. We thus started by running an algorithm that identifies denser cluster networks and organizes them into communities. When I looked at the top authors in each of these machine-generated communities their themes became quite clear. Different analyses helped us to identify the relative importance of each of the major communities (as measured through the citation network) and their structural importance. In network parlance, that would be the “pagerank” and “betweenness centrality” scores. The latter is particularly interesting to me, as it helps us identify the critical domains through which knowledge flows.

In the diagram above, the size of each node is adjusted by its betweenness centrality score. The larger the node the higher the score the more important it is to the network as a whole. We can see that there are some areas with Jewish studies that are relatively self-contained; their citation patterns look inward, even if they have high pagerank scores. Some of these we might expect. New Testament and Early Christianity, for example, draws from Jewish studies scholars but their own work is relatively marginal to the Jewish studies network as a whole.

A different diagram allows us to better see how domains relate to each other:

The numbers on the axes are insignificant. Here we look simply at the clusterings. The Western Europe area is perhaps the most disconnected of all these major areas. The clustering in the bottom left shows how these areas are all still so rooted in the Medieval period.

A few broad observations:

Jewish studies isn’t really a single field. It can be seen as a network or constellation of semi-independent scholarly worlds.
History is the strongest organizing principle (perhaps reflecting also the organization of the journals that supplied the data).
There is a cluster of circulation hubs (“engines”) of the field, but several areas operate more like islands.
There are small hidden bridges that connect some of the relatively isolated areas.
There are dense areas and quiet zones that suggest that the ideas, methods, and scholars travel unevenly through the field.

Does every field look in the humanities/social sciences like this? That would require a bit more research.

We can also, though, go much deeper into this network. We ran a community detection algorithm on the “Rabbinics” cluster, for example, to break the many citations down into further subcommunities, and then drilled down into deeper into one of those subcommunities. In future posts I’ll provide some updates on those experiments.

I’ll be presenting on this project at the Association for Jewish Studies Annual Conference on Sunday, December 14, 2025.

AI and Ancient Inscriptions

February 12, 2025 by admin

Strange image generated by DALL-E for this post

First published on my Substack.

In an earlier post, I discussed a very simple and quick experiment that I did on a photograph of an ancient Jewish inscription written in Greek. The results of that experiment were not perfect – epigraphers need not fear for their jobs (yet) – but they were also not bad. An off-the-shelf, untuned ChatGPT model “knew” how to decipher the inscription; write out an edited version of it in Greek (ancient inscriptions usually do not contain punction or spaces and lack both distinctions between upper and lower cases and important accent and other marks); add the conventional markings used by epigraphers to denote things like gaps in the text and unclear letters; translate it into English; and provide a short commentary. What might it do with a bit of training?

I am working with a team from the Center for Digital Scholarship at Brown University to probe this question. The immediate impetus for our work was a practical problem I was confronting. Our funding for Inscriptions of Israel/Palestine -an accessible and robustly searchable database of ancient inscriptions from the region of modern day Israel/Palestine – has paused. We have always used human labor (mostly students) to enter inscriptions into the database. The data entry was time (and resource) expensive. Encoders would have to comb through publications of inscriptions to extract data (e.g., creation date) that is then entered into a complex XML document. Most time consuming was converting the edited version of the inscription, which is often a complex text containing many editorial markings, into our own encoding, which is based on a standard known as EpiDoc. Even with all of our voluminous documentation it takes a significant time to train an encoder. A well-trained encoder might easily take 30 minutes to encode an inscription. We then run a series of automated and manual tasks on the inscription to enter the words into our indices. Without funding it was hard for me to see a practical way to continue to add to the database.

Could, though, AI do it? Could I feed it a pdf of a published inscription and have it generate a high-quality XML file? If not all of it, what about part?

We started with one central part of the workflow, the conversion of printed, edited inscriptions into EpiDoc. On the surface, this might seem like a trivial thing. Replacing the Leiden convention typographical marks (e.g., using square brackets to denote text supplied by an editor) with the equivalent EpiDoc tags seems like a simple search and replace operation. And, perhaps 60%-70% of the time, it is. The other times, though, there can be a significant degree of decision-making. We decided to start with the inscriptions in Greek. We now feel that we have made enough progress on this problem that it is worth publicly sharing, although we have not rigorously tested it for errors. The work below was largely conducted by Daniel Kang and Justin Uhr.

We are happy with two solutions to this problem, both of which use Claude.ai.

The first approach involved a long set of instructions, with seven examples:

You are an expert system designed to translate epigraphic and papyrological inscriptions from Leiden Conventions format into XML that conforms to the EpiDoc schema. Your task is to accurately convert the given text, preserving all meaningful information while translating the special symbols into appropriate XML tags.
Here is the text in Leiden Conventions format that you need to translate:
[FILL IN YOUR TEXT HERE]
When translating the text, please make sure to meticulously follow the guideline attached as “2h. Leiden to EpiDoc Cheatsheet” for translating specific Leiden Convention symbols to EpiDoc-compliant XML
Please also follow these steps:
1. Read through the entire text to familiarize yourself with its content and structure.
2. Identify all Leiden Convention symbols and their corresponding XML translations according to the EpiDoc schema.
3. Convert each symbol to its appropriate XML tag, ensuring proper nesting when multiple features apply to the same text.
4. Preserve all alphabetic characters and spaces as they appear in the original text.
5. Review your translation to ensure all symbols have been accurately converted and tags are properly nested. Before providing your final translation, wrap your thought process in tags.
Include these in your response:
1. List all Leiden Convention symbols present in the given text.
2. Map each identified symbol to its corresponding EpiDoc XML tag using the guideline “2h. Leiden to EpiDoc Cheatsheet”
3. Consider and explain how you will handle nested tags and their proper order.
4. Outline any potential challenges in the translation and how you plan to address them.
This detailed breakdown will help ensure a thorough and accurate translation. After your analysis, provide the final XML translation wrapped in tags. Ensure that your output strictly adheres to the EpiDoc schema and conventions.
Examples are given below———————————————————
Example Leiden Input 1
Εἶς θεὸ[ς μόνο-]
ς ὁ βοηθ[ῶν]
Γαδιωναν
κ(αὶ) Ἰουλιανῷ
κ(αὶ) πᾶσιν τοῖς ἀξί-
οις
Example EpiDoc Output 1
<div type=”edition” subtype=”transcription” ana=”b1″>
<p xml:lang=”grc”>
<lb/>Εἶς θεὸ<supplied reason=”lost”>ς</supplied> <supplied reason=”lost”>μόνο</supplied><lb break=”no”/>ς ὁ
βοηθ<supplied reason=”lost”>ῶν</supplied>
<lb/>Γαδιωναν <lb/><expan><abbr>κ</abbr><ex>αὶ</ex></expan> Ἰουλιανῷ
<lb/><expan><abbr>κ</abbr><ex>αὶ</ex></expan> πᾶσιν τοῖς ἀξ<lb break=”no”/>ίοις <lb/><foreign xml:lang=”heb”>פעלהבדה</foreign></p>
</div>
Example Leiden Input 2
Κ(ύρι)ε μνήσ(θητι) τῶν πρ-
[οσ]νε(γ)καντ(ων) καὶ
[—]
Example EpiDoc Output 2
<p><expan><abbr>Κ</abbr><ex>ύρι</ex><abbr>ε</abbr></expan> <expan><abbr>μνήσ</abbr><ex>θητι</ex></expan> τῶν <expan><abbr>πρ<lb break=”no”/><supplied reason=”lost”>οσ</supplied>νε</abbr><ex>γ</ex><abbr>καντ</abbr><ex>ων</ex></expan> καὶ <lb/><gap reason=”lost” extent=”unknown” unit=”character”/></p>
Example Leiden Input 3
εἷς θεὸς ὁ νικῶν τὰ κα[κὰ]
Ἰάω θ[εὸς]
εἷς θ[εὸ]ς
Example EpiDoc Output 3
<p><lb/>εἷς θεὸς ὁ νικῶν τὰ κα<supplied reason=”lost”>κὰ</supplied> <lb/> Ἰάω θ<supplied reason=”lost”>εὸς</supplied> <lb/>εἷς θ<supplied reason=”lost”>εὸ</supplied>ς </p>
Example Leiden Input 4
Κ(ύρι)ε Ἰ(ησο)ῦ Χ(ριστ)ὲ πρόσδεξε τὴν
καρποφορίαν τῶν δούλω(ν)
σοῦ Ἰωάννου τοῦ πρ(εσβυτέρ)ου καὶ
Ἀββοσόβου ὅτι ἐξ ἰδίων κό-
πων ἤγιραν τὸν οἴκον τοῦτον.
Example EpiDoc Output 4
<p><lb/><expan><abbr>Κ</abbr><ex>ύρι</ex><abbr>ε</abbr></expan> <expan><abbr>Ἰ</abbr><ex>ησο</ex><abbr>ῦ</abbr></expan> <expan><abbr>Χ</abbr><ex>ριστ</ex><abbr>ὲ</abbr></expan> πρόσδεξε τὴν <lb/>καρποφορίαν τῶν <expan><abbr>δούλω</abbr><ex>ν</ex></expan> <lb/>σοῦ Ἰωάννου τοῦ <expan><abbr>πρ</abbr><ex>εσβυτέρ</ex><abbr>ου</abbr></expan> καὶ <lb/>Ἀββοσόβου ὅτι ἐξ ἰδίων κό<lb break=”no”/>πων ἤγιραν τὸν οἴκον τοῦτον.</p>
Example Leiden Input 5
Ἐπὶ τοῦ <δ>ὁσιωτάτου Γεωργίου δια-
κόνου καὶ Ϲαμουήλου λαμπροτ(άτου)
καὶ Ἀββεος Ζαχαρίου ἐγένετο τὸ π(ᾶν)
ἔργον τ<ῆ?>ς ψιφώσεως ταύτης
ἐν μ(ηνὶ) Ἱουν[ίῳ ἔτους] [Ἐλευθερο]πόλε(ως) βφʹ
Example EpiDoc Output 5
<p>Ἐπὶ τοῦ <supplied reason=”omitted”>δ</supplied>ὁσιωτάτου Γεωργίου δια<lb break=”no”/>κόνου καὶ Ϲαμουήλου <expan><abbr>λαμπροτ</abbr><ex>άτου</ex></expan><lb/>καὶ Ἀββεος Ζαχαρίου ἐγένετο τὸ <expan><abbr>π</abbr><ex>ᾶν</ex></expan><lb/>ἔργον τ<supplied reason=”omitted” cert=”low”>ῆ</supplied>ς ψιφώσεως ταύτης<lb/>ἐν <expan><abbr>μ</abbr><ex>ηνὶ</ex></expan> Ἱουν<supplied reason=”lost”>ίῳ</supplied> <supplied reason=”lost”>ἔτους</supplied> <supplied reason=”lost”><abbr><expan>Ἐλευθερο</expan></abbr></supplied><expan><abbr>πόλε</abbr><ex>ως</ex></expan> <num value=”502″>βφʹ</num></p>
Example Leiden Input 6
[Ἐπὶ Σι]λουανοῦ θεοφιλ(εστάτου) διακό(νου) κ(αὶ) ἡγουμέ(νου) ἡ παροῦσα
[ψήφωσ]ις ἐγένετο κ(αὶ) κόγχη κ(αὶ) ἡ προσθήκη τοῦ ναοῦ μ<ή>κος
[πήχεις … ὕ]ψους π(ή)χ(εις) ς’ μνήσθητ[ί μου] Κ(ύρι)ε ἐν [τῇ β]ασιλ<ε>ίᾳ σου.
Example EpiDoc Output 6
<p><lb/><supplied reason=”lost”>Ἐπὶ</supplied> <supplied reason=”lost”>Σι</supplied>λουανοῦ <expan><abbr>θεοφιλ</abbr><ex>εστάτου</ex></expan> <expan><abbr>διακό</abbr><ex>νου</ex></expan> <expan><abbr>κ</abbr><ex>αὶ</ex></expan> <expan><abbr>ἡγουμέ</abbr><ex>νου</ex></expan> ἡ παροῦσα <lb/><supplied reason=”lost”>ψήφωσ</supplied>ις ἐγένετο <expan><abbr>κ</abbr><ex>αὶ</ex></expan> κόγχη <expan><abbr>κ</abbr><ex>αὶ</ex></expan> ἡ προσθήκη τοῦ ναοῦ μ<supplied reason=”omitted”>ή</supplied>κος <lb/><supplied reason=”lost”>πήχεις</supplied> <gap reason=”lost” extent=”unknown” unit=”character”/> <supplied reason=”lost”>ὕ</supplied>ψους <expan><abbr>π</abbr><ex>ή</ex><abbr>χ</abbr><ex>εις</ex></expan> <num value=”6″>ς'</num> μνήσθητ<supplied reason=”lost”>ί</supplied> <supplied reason=”lost”>μου</supplied> <expan><abbr>Κ</abbr><ex>ύρι</ex><abbr>ε</abbr></expan> ἐν <supplied reason=”lost”>τῇ</supplied> <supplied reason=”lost”>β</supplied>ασιλ<supplied reason=”omitted”>ε</supplied>ίᾳ σου.</p>
Example Leiden Input 7
(+) Ἀνεπάη μακά-
ριος Ζαχαρίας
Ἐρασίνου ἐν
μηνὶ Πανέμου
δεκάτῃ ἰνδ(ικτιῶνος) ιδʹ ἡ-
μέρᾳ κυριακῇ ὧραν
τρίτῃ τῆς νυκτὸς κα-
τετέθη δὲ ἐνταῦθα
τῇ τρίτῃ τοῦ σάμ-
βατος ὥραν ὀγδόην
Πανέμῷ δοδεκα-
τῃ ἰν(δικτιῶνος) ιδʹ ἔτους κα-
τὰ Ἐλού(σην) ΥΟςʹ Κ(ύρι)ε ἀ-
νάπαυσον τὴν ψυ-
χὴν αὐτοῦ μετὰ τῶν
ἁγίων σου. Ἀμήν
Example EpiDoc Output 7
<g ref=”cross”>+</g> Ἀνεπάη μακά<lb break=”no”/>ριος Ζαχαρίας
<lb/>Ἐρασίνου ἐν
<lb/>μηνὶ Πανέμου
<lb/>δεκάτῃ <expan><abbr>ἰνδ</abbr><ex>ικτιῶνος</ex></expan> <num value=”14″>ιδʹ</num> ἡ<lb break=”no”/>μέρᾳ κυριακῇ ὧραν
<lb/>τρίτῃ τῆς νυκτὸς κα<lb break=”no”/>τετέθη δὲ ἐνταῦθα
<lb/>τῇ τρίτῃ τοῦ σάμβα<lb break=”no”/>τος ὥραν ὀγδόην
<lb/>Πανέμῷ δοδεκα<lb break=”no”/>τῃ <expan><abbr>ἰνδ</abbr><ex>ικτιῶνος</ex></expan> <num value=”14″>ιδʹ</num> ἔτους κα<lb break=”no”/>τὰ <expan><abbr>Ἐλού</abbr><ex>σην</ex></expan> <num value=”476″>ΥΟςʹ</num> <expan><abbr>Κ</abbr><ex>ύρι</ex><abbr>ε</abbr></expan> ἀ<lb break=”no”/>νάπαυσον τὴν ψυ<lb break=”no”/>χὴν αὐτοῦ μετὰ τῶν
<lb/>ἁγίων σου. Ἀμήν
We also tried an automated approach using an API call to Claude. The advantage of this approach is that it allowed for batch processing of multiple inscriptions. The full code, with examples, can be found at our Github site: https://github.com/Brown-University-Library/ai-experiments/tree/main/01_leiden-translator. The code passes along detailed instructions and a few complex examples. We were surprised at how few examples the model needs to do a good job.

The results of both approaches were excellent. The advantage of the first approach is that it is free. The second approach allows processing in bulk. There is a cost for the second approach, but it in the range of ten cents an inscription (and would be much cheaper on DeepSeek, although we haven’t tested whether it works as well) there is a cost but it is relatively inexpensive (and may become even radically more so with the introduction of DeepSeek).

We are now working on applying these same approaches to Hebrew/Aramaic inscriptions and working straight from pdfs. I’ll update as we get results.

Can ChatGPT Read Ancient Texts?

November 26, 2024 by admin

This was originally published on my Substack, available here.

For many years, I have been directing a project to make available the inscriptions that date from around 520 BCE – 620 CE (the Persian, Second Temple, and Rabbinic periods, using a Jewish chronology) from the region of Israel/Palestine. Inscriptions of Israel/Palestine now contains over 5,500 inscriptions and is growing, and allows users to make complex searches; plot inscriptions by density on maps; and more recently (I am quite proud of this, although it is mainly the work of our dedicated team) access indices and wordlists. The primary purpose of the database is to help those with an interest in history in this region and time (and I fail to understand how anyone couldn’t have such an interest) access this difficult material. More recently, however, we have found that it is a useful database for testing some new techniques in digital humanities, machine learning, and AI. With a collaborator (Daiki Tagami) I have already published the results of one such experiment.

With a crack team at the Center for Digital Scholarship at Brown University, I am investigating how the emerging AI technologies can enhance the production and use of our materials. The workflow, from deciphering a text set in stone to creating structured data that can then be made generally useful, is cumbersome, timely, and expensive. Can AI help?

Our efforts are just beginning, but in the meantime, on a whim, I ran my own quick experiment using ChatGPT 4o (paid version). I took a photo of an inscription that I believe is still unpublished from Volubilis, an ancient Roman city in Morocco.

I asked ChatGPT to read it and translate it. It told me that the picture wasn’t good enough, so it offered to enhance it. I accepted.

Now I asked again, and it gave me this reading:

ΩΔΕ ΚΟΙΜΗΤΕ
ΚΑΙ ΚΥΡΙΛΛΑΝΟΣ Ο
ΠΡΩΤΟΠΟΛΙΤΗΣ
ΠΑΤΗΡ ΤΗΣ ΣΥΝΑ
ΓΩΓΗΣ ΠΟΥ ΤΩΝ
ΣΥΝΕΤΟΥΔΩΝ ΕΝ Α
ΚΑΙ ΤΩΝ ΜΑΡΤΥΡΩΝ

This is somewhat between what we would call a diplomatic reading and an edited transcription. It keeps the capital letters and the lines, but adds spaces between words. So I asked for an edited transcription:

Ὦδε κεῖται
καὶ Κυρίλλᾱνος ὁ
πρωτοπολίτης,
πατὴρ τῆς συναγωγῆς
τῶν συνετῶν ἐν ἑνότητι
καὶ τῶν μαρτύρων.

What is fascinating about this transcriptions is that it makes appropriate corrections, essentializing normalizing the spelling. After giving the transcription, it offered notes:

The word “ΚΥΡΙΛΛΑΝΟΣ” appears to be a proper name, likely referring to an individual of high status or importance, possibly within a religious or civic context.
The phrase “ΠΡΩΤΟΠΟΛΙΤΗΣ” suggests a title, meaning “chief citizen” or someone holding a prominent position in the community.
“ΠΑΤΗΡ ΤΗΣ ΣΥΝΑΓΩΓΗΣ” indicates a religious role, specifically referring to being a “father of the synagogue,” which could denote leadership in the local Jewish or early Christian community.
“ΣΥΝΕΤΟΥΔΩΝ” and “ΜΑΡΤΥΡΩΝ” likely reference those gathered in faith or unity, with “martyrs” potentially signifying witnesses or those who testified to faith.

It then offered a translation:

Here rests
and Kyrillanos, the
chief citizen,
father of the synagogue
of those gathered in unity
and of the witnesses.

We have scores of inscriptions from the Roman Empire that look like the first part of this inscription (commemorating a person with a civic title also noted for what seems like an honor in the Jewish community). The last two lines, though, mentioning “martyrs,” are to my knowledge unparalleled in Jewish inscriptions. When asked about it, ChatGPT too could not come up with examples. The inscription thus represents a historical puzzle – one of the reasons I love inscriptions!

The point of this post, though, is not to puzzle over Kyrillanos but to take stock of how AI can be applied to ancient texts, even those written on stone. It is not quite at the stage of replacing the expert knowledge of epigraphers: we would want it to include in its transcription indications of where it was normalizing texts; probably translate the “and” in line 2 differently; and reflect more on the last two lines. But this is far better than I ever expected, and this is with a simple, off-the-shelf model rather than one optimized for ancient Greek or inscriptions. As a colleague who is an epigrapher put it when I shared it with him,

“Holy cow!”

The Talmud

August 15, 2024 by admin

In addition to teaching my course on antisemitism this fall, I’ll also be teaching a course on the Talmud (in English). The description is below, and the full downloadable syllabus, in pdf format, can be downloaded here.

This course will introduce you to the Babylonian Talmud. Compiled in the third through seventh centuries CE, the Babylonian Talmud (henceforth the Talmud) played a crucial role in the history of Judaism, perhaps rivaling even the Bible in importance. In addition to its religious importance, the Talmud is an unparalleled – if complex – source for the study of ancient history. As a text that at times seems foreign to our way of thinking, the Talmud also challenges us, as modern readers, to probe and interrogate our own logical assumptions.

We will spend much of our class time actually reading the Talmud, in English translation. In honor of the U.S. election, we will focus on passages that deal with politics and governance. There are no prerequisites and no prior knowledge is assumed.

I have also become more interested in using machine learning as a research tool for the Talmud. I am not very far along and am still working on learning the basic tools to do this. However, I did manage to produce the chart below, which maps the similarity between the different tractates of the Babylonian and Palestinian Talmud:

This is more of a proof of concept than anything else, but it showed a very clear division between the two Talmuds. This is not unexpected, but nor was it obviously so.

More to come on this front.