Guest Column

The Importance of Data Infrastructures for the Humanities


Hans Jørgen Marker

Humanities research depends more and more on the availability of a variety of digital resources, and these resources have an increasing internal complexity as well as many external relationships. The data production, management and dissemination processes are widely organized in a distributed manner, which results in great fragmentation. This needs to be considered when designing research infrastructures for Humanities.

Humanities use not only includes primary data such as audio/video recordings or collections of digitized primary sources in the humanities area but also the secondary data that is derived from these sources as well as the continuous enrichments resulting from multiple annotations for example. All these data need to be made persistently available to the users.

Research in arts and humanities has created a large quantity of digital material that represents a significant investment, both in terms of public funding and of intellectual effort. These resources are often hosted in their home institutions using a variety of approaches and technologies. This situation is quite dangerous. Without on-going maintenance a resource will cease to be usable at all as the technologies in which it was created become obsolete and unsupported. Even if the resources are maintained it is far from certain that they are in a state which allows them to be used with the most relevant techniques at the time of reuse. Access to legacy resources may be limited to a simple download or by browser access in a website.

The impact of research in the humanities is felt many years after the original research was undertaken. Sustainability does not just mean keeping the data alive, but enabling the exploitation of advances both in technology – making the data accessible in new ways – and forging connections between resources that lead to new discoveries and broader impact. This is essential to ensure long-term interest and sustainability of these resources.

Humanities data need to be made available via centres of expertise, which can provide the stability and reliability that is needed by the research community. These centres should be specialized on community related access and enrichment tasks, and should also provide basic services such as persistent identifiers, AAI (Distributed Authorization and Authentication Infrastructure) or long-term preservation services, which obviously cross over disciplines and infrastructures.

The centres should also provide sustainability of resources created by projects. In some cases the output of a project is something that is well dealt with by existing infrastructures – but in many cases the results created represent big challenges. Take for example a web resource presenting the legal and ethical rules presently in force in the social science and humanities domain in Europe. Such a resource will very rapidly be reduced to a historical document of limited interest if it is not maintained.

Similarly, software tools and services represent a specific challenge. Although there is no escaping that maintenance costs money, paying attention to software sustainability can limit the costs. Maintenance of tools and services is important for a number of reasons: an example is that many results need the original tools to be reproducible. So the need for the establishment of more humanities data centres and for a close knit cooperation between these centres is urgent.

Hans Jørgen Marker
Historian, former director of the SND (retired)

Creative Commons -license