Text and photos: Kaisa Järvelä

Well-curated Data Provides Benefits for Data Creators and Others

FSD's Aila and Data Reuse seminar held in Tampere in September attracted a record number of interested participants.

The head of research for the School of Health Sciences at the University of Tampere (UTA), Outi Jolanki, gave a speech about managing and archiving qualitative data. Professor Elina Kestilä-Kekkonen from the School of Management in UTA explained how research data can be utilised and produced in university education. Information Services Manager Hannele Keckman-Koivuniemi from the FSD provided useful information about the usage of Aila Data Service Portal.

"Creating a background variable table for the qualitative data took months but was worth it" says Outi Jolanki

The normal size for qualitative data is often about twenty interviews but how to keep it all under control when a dataset consists of almost eighty long interviews?

Outi Jolanki, along with other members of the WoCaWo group of the University of Jyväskylä, was faced with such huge dataset when studying caregivers back in 2009.

The research group searched for people who were employed but functioned as caregivers to their elderly or disabled relatives at the same time. It soon became evident that many people had something to say on the issue as the group received almost one hundred contacts from different areas in Finland.

At the end of the data collection period, the data consisted of 76 interviews. The average duration of the interviews about one and half hours.

What was relevant was revealed only bit by bit

–The great thing about a dataset of 76 interviews is that it is easy to discover similarities, continuities and exceptions, Jolanki said at the seminar.

However, managing the whole data was challenging as one cannot remember 76 interviews without some additional tools.


Head of Research Outi Jolanki spoke about managing and archiving quantitative data.

This is why the group decided to construct a background variable table from the interviews to help with data management. However, there were certain difficulties with this as well.

–The aim of interview data is to surprise the researchers and reveal something new, so it is impossible to know which background variables will prove to be relevant.

For example, the importance of housing arrangements of caregivers and the persons they were helping, or the reason behind the care needed were only revealed during the interviews.

In practice, every new relevant background variable meant that the group had to update the background variable table as well. In the end, the table became so large that the only way to print it on A3 paper was to use font size 10.

–The process was much more time-consuming than we expected but definitely worth it.

The finalised table was an indispensable tool for understanding the data. As the project proceeded, it was also useful for picking out information for articles and presentations.

Some interviews were destroyed before archiving

Another problem presented itself in the archiving stage: were all the 76 interviews so anonymous that no interviewee could be identified?

All direct identifiers had been removed and indirect identifiers had been modified before the analysis stage. For example, all the names of the interviewees had been replaced with pseudonyms and medical conditions generalised, for example, "schizophrenia" had been replaced with "mental health problem". However, the variables relevant to the study, such as age and marital status, had been retained in the data.

In the end, the group decided to remove all the interviews where multiple rare characteristics were present from the data delivered to the FSD.

–We did not want to take the risk that somebody could recognize the interviewees, explains Jolanki.

The usability of the data did not suffer from these precautions, as more than 60 interviews were still archived.

All direct identifiers and most indirect identifiers were removed from the background variable table. For example, exact professions were replaced with categorised occupation data. The researchers' hard work in constructing the background variable table proved to be a great help for the data processors at the FSD.

–They were very happy with it, reveals Outi Jolanki.

The data for WoCaWo project Informal Care and Employment 2008–2009 is available in Aila in Finnish for research purposes as well as for Master's and doctoral theses.

"Students prefer to use data from their own field of study when learning research methods" says Elina Kestilä-Kekkonen

When Elina Kestilä-Kekkonen first started teaching quantitative research methods to social scientists at the University of Tampere, the reception was everything but excited. The students were convinced that quantitative methods were difficult and a waste of their time.

–Many of them explained that they had started studying social sciences because they hated math.


Professor Elina Kestilä-Kekkonen from the School of Management in the University of Tampere utilises data from the FSD in teaching.

Kestilä-Kekkonen set out to break down the barrier with quantitative datasets that were directly related to the students' own field of study.

–It is important that data from the students' own field of study is used when studying research methods. Only data that are interesting for the students gives meaning to the numbers and connects them to the theory, Kestilä-Kekkonen explained at the seminar.

According to her, it is also very important that students can access reliable methodology guides.

–Google offers a lot of information, but absolutely one the best sources is KvantiMOTV, the guide for quantitative based research methods maintained by the FSD.

As the years have gone by, the resistance has decreased and many of Kestilä-Kekkonen's students has gone on to use quantitative methods and data downloaded from Aila in their theses.

Research methods courses have produced two great datasets

Social science students at the University of Tampere have also had the chance to collect their own quantitative data under the guidance of Kestilä-Kekkonen. Eight students, who were at different stages of their studies, participated in the first research project course in 2015.

The students were offered a choice between three different projects. Their choice was to study the societal values of UTA students studying in different programs.

–We managed to gather a great dataset containing responses from more than a thousand students and well representing the different fields of study at the UTA.

Inspired by the dataset, the students wrote a peer-reviewed article on the subject. The article was published in the Politiikka-magazine in 2016. The students also obtained scientific merit by archiving the dataset in the FSD.

The first project was such a success that it was decided that a similar course would be arranged in 2016. On this occasion, the students researched the political affiliation of upper secondary school students. The results were again great.

The research report was being assessed by the JUFO2-magazine at the time of the Aila seminar and the dataset will soon be available in Aila.

Kestilä-Kekkonen assured that she will continue using Aila in her teaching in the future as well.

–It is imperative for students to see that datasets from their own field of study exist and that they are the gateway to find answers to interesting questions.

The Values of University of Tampere Students 2015 dataset is available for download from Aila for research, teaching and studying purposes. The FSD will translate it into English on request.

Aila is the flagship of the FSD

  • The FSD provides human science datasets for students, teachers and researchers.
  • The Data Service Portal Aila contains more than 1,250 archived datasets. Most archived datasets are quantitative. In September, 175 of the datasets in Aila were qualitative.
  • There were 2,600 registered users in Aila in September. Approximately 90 % of the users are students or employees of Finnish universities and research institutes. Foreigners make up for about 7 % of Aila's users.
  • All the datasets in Aila have study descriptions in both English and Finnish. The data of 320 oquantitative datasets (questions and response categories) are available in English and more are translated on request.
  • Most datasets in Aila (924) can be used for research, teaching and studying. 58 datasets were freely available for everyone, 166 were available for research only and 104 available with a permission from the data creators only.
  • A question and variable bank for quantitative data will soon be available in Aila. Furthermore, the search engine will be renovated in the spring of 2017. A tool for depositing data for archiving will most likely be launced by the end of the year 2016. This will allow researchers to provide archived data directly to Aila.
  • The Aila Data Service Portal was constructed with infrastructure funding from the Academy of Finland. The portal was launched in May 2014.
Creative Commons -license