Over the past few years, institutional digital repositories and more broad-based digital “commons” have proliferated. Many are found at universities (Brown now has one) and sites such as Zenodo and Humanities Commons. Such platforms serve two purposes. First, they provide a (relatively) stable environment that can preserve digital data. Second, they serve as a digital repository through which scholars can make their work freely accessible to others. While it is this latter potential that has largely attracted scholarly attention and enthusiasm (Sarah Bond recently argued that depositing one’s work in a “commons” is far preferable to putting it on academia.edu, a for-profit entity), I have been thinking lately more about the issue of preservation and specifically how it relates to the modern creation of scholarly and artistic works.
The underlying problem is a simple one: most scholarly and creative work today is done digitally. My own file cabinets are almost empty now and are largely uninteresting; my Dropbox folder, on the other hand, is a teeming, semi-organized mess of notes, manuscripts, drafts, correspondence (although most of that is in my email account), grant applications, and the like. Now, I like to muck around in scholarly archives and to read biographies of scholars that are based on such archives. And while I can’t imagine that anyone will be interested in my particular files, I wonder about the archives of scholars and artists who led genuinely creative and interesting lives. What happens to that part of their literary legacy that remains alive only in digital form?
Lest my concern be taken as mere narcissistic rumbling (scholars worried about the unfinished work of other scholars? Who cares?), let me offer a more pointed example. I have had occasion recently to have a series of discussions with archaeologists about their process of publication. Archaeological excavations generate reams of data. Locations of walls and strata changes need to be carefully noted and objects – especially the often enormous quantity of pottery sherds – each need to be carefully logged and catalogued. Pictures are taken. In the past, these records were kept in field notebooks and other written forms. At the end of an excavation, a final report is produced. The data upon which that report was produced were, ideally, filed away in a personal or institutional archive.
As with any scientific data, archaeological data are valuable. Other archaeologists use them to test the reconstructions and hypotheses. Later archaeologists frequently want to ask new questions of the older data. Since archaeology is by nature destructive, often all that remains of a site is whatever previous archaeologists chose to note.
Today, nearly all archaeological recording is done electronically. There is no single way that archaeologists do this recording. Some use off-the-shelf proprietary databases, others might create their own data managements systems. In any case, when the final report is completed and the excavation is wrapped up, all of that data goes… well, where exactly?
Unlike scientists, many archaeologists and humanists have not thought very hard about the preservation of digital data. Scientists routinely deposit their raw data in institutional repositories and are called upon to articulate their digital data management and preservation plan on many grant applications. The paths open to others are less clear.
Digital platforms offer many opportunities for archaeologists and other scholars who wish to more fully document their work. An archaeological report, for example, can be published online and linked with all of this raw data. A literary work can be linked to previous drafts. There are many fascinating attempts in the digital humanities to make the data and process behind completed works more transparent. Such projects, however, generally involve a significant investment of resources. It may be ideal but remains unrealistic to expect that every scholarly product will be disseminated with links to the raw data and process from which it emerged.
It is with cases like scholarly archives and archaeological data that institutional digital depositories and digital commons provide a simple and inexpensive solution. Upon completion of a project (or a career) the entire archive of data can be converted to an accessible format (e.g., XML); bundled together with more or less organization; and then deposited. Each deposit receives a library (MARC) record, which makes it visible to the entire web (especially if the repository is part of the OCLC). One could imagine cases where someone might want to embargo these archives for a time (easy enough to do) but ordinarily such data would then be easy to locate and freely accessible.
The most involved part of this process would be conversion of data into a standard, open format such as XML. Most common proprietary programs (e.g., Microsoft Word), however, already have this capability. In other cases, the software that would perform this transformation is relatively easy and inexpensive to produce. The advantage of this conversion is that keeping the data in a simple, open access format allows future users to reconvert the data to whatever format they find most useful rather than forcing them to find the same proprietary program compatible with older data. The whole conversion process might take a few hours; the final step, for example, of wrapping up an excavation report. Scholars might bundle the XML folder of their work with their email and, voila, an archive is produced.
At worst, such a digital work would sit unaccessed and unused – hardly a unique situation for scholarly works or archives. Yet the costs, in time, money, and digital storage, are low (and continue to drop). It is time to think about digital preservation as a staple of our “best practices.”