I direct on online project that seeks to collect, analyze, and make accessible the inscriptions of Israel/Palestine from roughly the sixth century BCE to the seventh century, CE. The site can be accessed here. Over the past few years the team working on this project has had to confront a wide variety of technical and architectural challenges, and we have been producing presentations on those challenges and our approach to them. We (although I was not there!) recently presented at TEI2019, a meeting devoted to the Textual Encoding Initiatives. In this presentation, we discussed our approach to archiving our data according to the best current framework, known as FAIR, which seeks to make data findable, accessible, interoperable, and reusable. We will soon be submitting the paper for publication, but the abstract and slides of the presentation are now available. The slides can be found here and the abstract is below:
The Inscriptions of Israel Palestine Project is an online corpus of inscriptions from Israel and Palestine, written in Hebrew, Greek, Latin and Aramaic, dating roughly from the Persian Period to the Arab Conquest. As of spring 2019, it has collected and encoded more than 4000 inscriptions, out of some 10000 relevant texts: we aim to create an exhaustive and easily accessible collection and to enable users to carry out a variety of searches and extensive textual analysis.
The FAIR Principles aim to enhance the ability of machines to automatically find and use digital objects, in addition to supporting their reuse by individuals. The principles are organized under four areas intended to ensure digital objects are findable, accessible, interoperable, and re-usable. Following epigraphy.info’s mission statement we are applying the FAIR Principles to guide our development of archival formats and processes for our corpus.
As IIP prepared to deposit files in the Brown Digital Repository, we defined formats for ensuring that our files will be as informative, self-documenting and re-usable as possible. Each inscription is contained in a single, XML file, encoded in the well-documented Epidoc subset of the TEI. These files, however, linked to externally maintained controlled vocabularies (using the xi:include feature) and bibliography (using Zotero), in order to facilitate the work of our encoders and ensure consistency. One of our challenges was to incorporate these external data into the robust, stand-alone, archival format.
We will introduce the FAIR Guiding Principles and FAIR Metrics as they apply to epigraphic corpora and TEI encoding, discuss the roadmap for implementation, and look at archival practices beyond FAIR when it comes to preservation of data as well as re-use. While the first steps to making a digital corpus findable and accessible seem straightforward—IIP texts have been ingested into the Brown Digital Repository, have unique and persistent identifiers, rich metadata, and are freely available, we can still improve on both facets. Simple interoperability and re-usability are available through the IIP API in both the production and the archival versions of the corpus, however, it will be important to do further work on controlled vocabularies, shared concepts, and encoding practices in order to enhance both of these facets.