Now back home it will take me a while to process what I’ve learned at The Big Ancient Mediterranean Conference, and even longer to work through my new, vastly expanded, to-do list. Here I want only to sketch out a few thoughts. I don’t think that any of them are particularly original but having the intellectual space and dialogue to focus on them helped me to work through and articulate them for myself.
First, I think that it is heuristically useful to think of digital humanities (DH) projects as being of three types: data creation, processing tools, and aggregators or linkers. The data creators (some of the more impressive representatives at the conference were Nomisma, Open Philology, Corpus Scriptorium, and the emerging and impressive Digital Latin Library) make digital data. The tools, such as those that do social network analysis (e.g., Gephi), natural language processing (xrenner), or plotting make that data not just accessible but also useful. And the linkers (Trismegistos, Pleaides) link different sorts of data, most often from different sites, for a variety of purposes. I find that thinking about DH projects this way is useful even if some projects fall between these cracks and most do more than one of these things.
While I think that the “linkers” are some of the more exciting DH sites, it all starts with the data. Data creation isn’t sexy. It also is of limited use if they are created for only one site or purpose. If one is going to go through the laborious process of creating digital data, one may as well try to make them not just accessible but useful. That requires structuring data in a way that existing tools can, with minor modifications, process them; including URIs so that linkers can reuse them; creating APIs to give computing access to them; and encoding them in an open rather than proprietary format so that they will be accessible when software standards change. This also applies, mutatis mutandis, to tools and aggregators. Tools should be designed to apply to a wide swath of structured data and aggregators function at their best when they can harvest or scrape data from a large number of sites.
For Inscriptions of Israel/Palestine, the road has been long and slow in large measure because the site was created ahead of the standard structures and the very existence of URIs. Over the two decades of the projects existence, we have had to transform our data several times. The transformations from SGML to XML and from our schema to EpiDoc were some of the more traumatic ones. Each required not only the custom development of an automated process but also manual cleaning and refining of the data (some of which we are still doing). Now we must add URIs to allow geographical and chronological linking. Each of these transformations was costly and I predict – despite assurances that we now have stable standards – that there will be more to come. These projects, even the data collections, are never fully complete or stable. I’m not sure how one prepares for this but it is an inevitable, and for the scholar frustrating, part of any DH project.
This brings me to a second thought. In the past DH often fell somewhere between the administrative cracks of IT and the library. In recent years the weight has shifted to the library and it has become increasingly clear to me that that is a good thing. Each of these projects – whether a data collection, a tool, or an aggregator – carries within it new knowledge. Hence, it requires preservation. We preserve in an accessible format almost all printed scholarly materials, no matter how useless or bad. The same principle needs to apply to digital projects. With the creation of digital repositories and the low cost of storage this should not be overly difficult. This includes software: Github, now a favorite place to store code and DH data, will eventually disappoint us. Similarly, just as libraries preserve new knowledge so too do they have methods for cataloging and finding it. There are already a bewildering array of digital projects and they are not systematically cataloged, whether they are active, on the “way back machine”, or mothballed. Cataloging and the development of finding aids are desiderata. In the interim, for those who work in classical antiquity. two lists, here and here, are useful although incomplete and imperfect are useful.
A third issue is the very definition of scholarship. Although I am now part of several overlapping conversations that are wrestling with the nature of DH scholarship I cannot say that I am much closer to an answer. Data collection, on its face, shouldn’t be “scholarship” – but then isn’t the creation of print critical editions of texts, which largely involves collation, considered scholarship? Digital tools – programs – are at heart intellectual models: just as in a monograph, you input data and you emerge with a synthesis or intellectual product. One of the key differences, in fact, is that scholars writing books are often not as rigorous or explicit about their assumptions and methodologies as is a computer program. Linkers bring together, even if they don’t synthesize, data in new ways that create research questions and drive our conversations – doesn’t theory do this? I am not claiming that these should all count a priori as “scholarship”, but it points to a critical need for scholars (especially those in positions of power who hire and tenure) to wrestle seriously with possibility that the meaning of scholarship is shifting in a sharp but recognizable way, and that that is not necessarily bad.
A final thought in an already too-long blog post. The issue of audience needs to be taken seriously. A scholarly DH project might justifiably be directed at just a few hundred kindred scholars, just as journal articles or monographs are. I think for most scholars engaged in DH, though, that seems unsatisfying. We recognize the enormous potential of these projects not just to speak to specialists but also to teach students and engage a wider public in intellectual pursuits in which we are deeply invested. The challenge is realizing that potential. Sites need to be designed to address and engage multiple audiences and that is no easy feat. It usually involves creating separate views or portals which is a costly endeavor – the cost of a good accessible interface could run between $15,000-$40,000. Moreover, we do not yet have good usability studies for such projects or often the infrastructure or resources to conduct them. Here perhaps we can better draw on the intellectual resources of our academic colleagues in the business schools and psychology who study and teach such things.
I owe special thanks to the organizers of this conference, Professors Sarah Bond and Paul Dilley. They created a conference that was of high intellectual value, paced humanely, with a collegial environment that facilitated useful interactions, all the while using remote technologies judiciously and effectively. As one who has organized several conferences, I know that this is no mean accomplishment.