Text and Picture: Kaisa Järvelä

Anonymisation: Nullification of Data or Prerequisite for Archiving?

Does a researcher need to know who speaks in data? When are data so sensitive that personal information has to be erased before archiving? Anni Ojajärvi, a sociologist whose doctoral thesis focused on young men's health behaviour during military service, and Juha Nirkko, a folklorist and researcher with the Finnish Literature Society, raised an interdisciplinary discussion on the anonymisation of research data.

A muggy afternoon in the middle of a very hot week, the first and – as it turns out – last of its kind this summer. Sociologist Anni Ojajärvi and folklorist Juha Nirkko are crammed into a tiny meeting room in the basement of the Finnish Literature Society to discuss data anonymisation. All signs point to the conversation becoming a heated one.


Folklorist Juha Nirkko and sociologist Anni Ojajärvi raised an interdisciplinary discussion on the anonymisation of research data.

—Anonymisation doesn't necessarily make data any less good. Non-identifiable data can even be easier to share and use as you don't have to give that much thought to privacy issues, Ojajärvi begins.

—In most cases anonymisation does make data generally less interesting. And I don't mean to scientists and researchers alone – things just sound more interesting when said by an actual person, Nirkko responds.

Conflicting perspectives show how differently those from different disciplines view data anonymisation. As a sociologist, Ojajärvi is more interested in phenomena than the people describing them.

Nirkko, on the other hand, works as a researcher at the Folklore Archive of the Finnish Literature Society, and is bound by tradition to try to preserve a piece of each era for future generations.

—From the viewpoint of future historians, a sample is more authentic and valuable with all identifiers in place, Nirkko points out.

At the Folklore Archive, only the donor's personal information is erased, if any.

—Our policy is to get the subjects to consent to archiving soon as data collection starts. In addition, access to the archives is closely monitored, Nirkko explains.

The young may regret giving consent

For her doctoral thesis, Anni Ojajärvi not only conducted 40 interviews, but also observed conscripts in an ethnographic field study over their initial period of training. In principal, she could have asked subjects' permission to archive the data with identifiers for future research purposes, but felt it was ethically wrong.

—My subjects were of legal age, but still young. A young person may well say that his interview can be published, even online, with name and all, but in 20 years' time feel very differently. I feel I have an ethical obligation to anticipate this, she explains.

Nirkko notes that unless otherwise agreed, at the Folklore Archive minors are not entered in the Finnish Literature Society's personal data register.

—Most such materials, however, come from schools and have the names and grades of the pupils written at the top of the papers. And since these data are not cut out, even the material we collect from minors is not completely anonymous, he says.

"Time protects the subjects"

Ojajärvi believes she could consider archiving personal data if it were particularly relevant to the topic. But as far as her military service data are concerned, she doubts whether a potential reuser would need to know who exactly said what.

—The participants were randomly selected. If a researcher wishes to probe, for example, young men's attitudes towards military service, the identity of the interviewees shouldn't matter much.

She believes that a sociologist, at least, is not bothered by anonymisation as long as there is enough context in the data. The facelessness of the subjects can be avoided by using pseudonyms. Nirkko notes, however, that in some disciplines exact personal data are very important.

—An historian, for example, would be so bothered by the anonymity of your data that he'd probably turn to other sources to dig out the information.

Ojajärvi admits that in historical research, personal data are of fundamental importance.

—If an historian 50 years from now wants to use the data in its authentic form, I can see myself consenting to that, she says.

Ojajärvi and Nirkko agree that time is an effective means of protecting subjects. Nirkko, however, knows from personal experience that sometimes even that does not prevent feelings from being hurt.

—Living relatives may be offended by something somebody said about their great aunt or uncle. We have, for example, archived some pretty wild research interviews on traditions where the narrator is depicted as a known liar. Such material cannot be handed over to just anyone, he posits.

Potential victim or fellow researcher?

Nirkko believes that when thinking about personal data protection, a good rule of the thumb is to consider whether the publication or handing over of identifying data would cause anyone harm.

—At its simplest, you can ask yourself if someone's feelings will get hurt.

Ojajärvi likes the idea and adds that harm does not have to mean public scandal or similar – it is enough if a subject is recognised by, say, a brother or child.

Nirkko points out that for some people the erasure of identifiers can also be upsetting. In this sense, too, monitoring of reuse could prove to be more effective than anonymisation.

—People always think that naming a person could hurt them, when in fact not naming them might have exactly the same effect. Some written materials may be classified as literary works, in which case anonymisation could easily lead to a breach of the Copyright Act.

For example, the Finnish Literature Society has a large group of individuals who regularly provide information and feel very much part of the research.

Nirkko can remember offhand at least one case where the researcher published the names of all interviewees to emphasise their role as active, independent participants.

—In his dissertation which he published in the late 1990s, Jyrki Pöysä used material collected in 1969 and decided to publish the names of the interviewees because he felt they were his co-researchers.

What is sensitive information?

Ojajärvi points out that the sensitivity of data can only be determined on a case-by-case basis. Therefore, ethical decisions must, in practice, be made separately for each study.

—One person might find something sensitive while someone else will say that it's exactly what they wanted to write or talk about using their own name.

Nirkko continues by saying that not even the law always provides clear guidelines on determining what is sensitive and what is not.

—According to the Personal Data Act, such unambiguous things as religion or ethnic origin are sensitive, but then again so are, say, people's hobbies. In the strictest sense of the law, you might say that anything a person reveals about their life is sensitive information.

Web causes further turmoil

Ojajärvi believes that online material will only add to the complexity of privacy protection issues.

—For example, from the perspective of youth research, the Internet is full of interesting material, but nobody seems to have a clear idea of whether it can be used in research and, if so, under what terms.

—My own moral compass tells me that although the material is freely available, as researchers we can't use it as we please, at least not without good reason, she says.

Nirkko feels that, in general, data that have been created or have emerged without proper discussion pose the most problems. New data collections have the advantage that things can be communicated and agreed upon.

—All in all, I think we can say the pursuit of ethics should be absolute, and individual cases agreed accordingly on a case-by-case basis, he sums up.

Ojajärvi nods happily. And so, with the surrounding air almost unbearably heavy, the two experts find a common note.

Creative Commons -license