Text: Kaisa Järvelä, Picture: SND

Vast Archaeological Data Collection Multiplies Humanities Data Numbers in Sweden

Data protection regulations have made archiving health data so challenging that the Swedish National Data Service (SND) has decided to concentrate on just providing metadata on health data in its web catalogue, without actually archiving and providing access to the datasets themselves. Another challenge for data archiving is that providing open access to research data is still completely voluntary for researchers in Sweden.

The SND has had the mandate to archive humanities and health research data since 2008. In practice, expansion to these new fields, in addition to the previously covered social sciences, started in earnest two years later.

kuvituskuva

Sofia Arvidsson works at SND as a Research Coordinator in the Humanities and Elisabeth Strandhagen as a Research Coordinator in Health Sciences.

Elisabeth Strandhagen has worked as a Research Coordinator in Health Sciences at SND since 2010 and Sofia Arvidsson as a Research Coordinator in the Humanities since 2011. At the moment, the data archive has three persons working in the Health Sciences team and five in the Humanities team.

In terms of numbers, acquiring and archiving humanities data has started well. Humanities data formed 46 per cent of all archived data at the end of 2014, with humanities data forming a whopping 71 per cent of the data archived that year. The SND data catalogue contains 623 humanities datasets and 489 humanities studies including all their sub-datasets and reports. Two thirds of the archived humanities data are freely available for all users.

However, a large majority of humanities data archived belong to one vast archaeological data collection.

—If we did not have the archaeological GIS data collection submitted by Uppsala University, the amount of humanities data archived would be small so far, Sofia Arvidsson says.

The GIS data originates from archaeological studies conducted in Östergötland during the 2000s, deposited at the SND by the Department of Archaeology and Ancient History of Uppsala University. The number of datasets originating from this collection adds up to 440 at this point.

In addition to archaeological data, the data archive has humanities data from the fields of history and religion. Last year the archive received its first language data.

Learning to archive new types of data

Sofia Arvidsson explains that archiving humanities data has entailed and still entails learning new things and creating new practices.

—Almost for every humanities dataset we archive we need to start a learning process to find out how this type of data can be archived, she says.

Humanities datasets may contain images, sound or video. Many contain both qualitative and quantitative material. One dataset may contain thousands of images while another consists of sound recordings of African languages that are at danger of dying out.

—Each archiving process requires developing one’s own skills and routines, Ms Arvidsson describes.

Archiving data containing personal data impossible

The biggest barrier to archiving humanities data in Sweden are data protection regulations. The SND can accept data containing personal data only from the University of Gothenburg, which is the host organisation of the data archive. Data from other sources must be completely anonymized.

—This is the reason why we have in practice concentrated on data that contain no personal data, Sofia Arvidsson explains.

Almost all health data contain at least some personal data. Therefore strict regulations on data protection have made it practically impossible for the SND to archive health data.

—The archive is not allowed to archive any data coming from outside of the University of Gothenburg, if even one researcher has a code key to the personal data, says Elisabeth Strandhagen.

Even in Finland, delivering such data to users is not possible at the moment but it is still allowed to archive and preserve the data long-term.

The SND has tried to adapt to the regulations by focusing on collecting and publishing metadata on existing health data. At the moment, 138 datasets are described. Only about a dozen health datasets have been actually archived at the archive.

Most described health data are epidemiological studies. There are as yet few described datasets relating to clinical studies on human beings but Ms Strandhagen hopes that their numbers will grow in the future.

The main benefit offered by the metadata descriptions is that they enable researchers to easily find out what other researchers have studied and what kind of data they have collected in Sweden.

—This type of overall picture is often not provided even by big research units, Elisabeth Strandhagen says.

Ms Strandhagen believes that most health data that are described in the SND catalogue cannot be requested without problems from the original data creators, either. On the other hand, she thinks health and medicine researchers do not show a lot of interest in existing data.

—If a health science researcher in Sweden gets interested in the subject or data of another researcher or research team, most frequently he or she suggests research cooperation instead of just asking to use the data collected by others, she explains.

Guidelines being prepared for open access

In addition to anonymization regulations, archiving is made more difficult by the fact that archiving and providing access to research data has so far been completely voluntary in Sweden. There are no national regulations, nor do research funders recommend or demand archiving data.

—We can tell researchers what the benefits of data archiving would be for them but we have no official mandate to support our arguments, Sofia Arvidsson says.

It seems, however, that EU recommendations and open access policies of international science journals have made Swedish researchers more aware of the benefits of archiving.

—We at the SND promote archiving by explaining to researchers, for instance, that we will preserve their data and studies long-term, and will describe them both in Swedish and English, thereby increasing their visibility both here and abroad, she explains.

Both Sofia Arvidsson and Elisabeth Strandhagen believe that most researchers are in principle interested in the possibility to archive their data but lack the time to carry out the voluntary archiving process in the midst of everything else.

—Often the issue of archiving only comes up at the end of the research project when the researcher is already starting a new project. If funders required a data management plan at the beginning of research, it would be easier for researchers to launch into the archiving process, Ms Arvidsson says.

For health data, the SND has created a digital form that researchers can use to enter and send the metadata needed by the archive for its catalogue, thus trying to make it as easy as possible for researchers. The form has recently been amended in cooperation with health researchers and will be developed further based on user experiences.

Both interviewees are keenly waiting to see if there will be national guidelines in the near future on archiving and sharing of research data.

The major research funder Swedish Research Council has in fact already approached the Swedish Government on the issue, making a proposal promoting open access to data collected with public funding. Government response is expected by the end of the year or in early 2016.

—The Swedish Research Council is such a big funder here that it could by creating its own policy on data sharing make it a norm to be followed by other funders, Sofia Arvidsson estimates.

—I’m waiting with interest to see what’s going to happen, she adds.

Creative Commons -license