Text and photo: Kaisa Järvelä

Groundbreaking Maternity Cohort Soon to be Openly Accessible in Aila

Per Ashorn's research team wanted to archive their Lungwena Antenatal Intervention Study (LAIS) dataset because they believe it is in the interests of both the team and the entire world. Ashorn predicts that providing access to research data will become mandatory for all medical researchers within ten years.

When everything falls into place, individual medical studies can have a revolutionary impact. However, sometimes the data from several well-conducted studies need to be analysed for the world to change.

The LAIS cohort, studied by the research team led by Director Per Ashorn from the Tampere Center for Child Health Research, is an excellent example of a dataset that has had a crucial impact on the health of thousands of pregnant women and their children – but only as a part of an international meta-analysis.


Tampere Center for Child Health Research Director Per Ashorn says access to LAIS data will be provided gradually as original research projects are finished.

The LAIS dataset includes health data from 1,320 pregnant Malawian women and their children up to five years of age. Ashorn's team collected the data to discover whether preterm deliveries could be prevented or pregnancy outcomes otherwise improved by intensifying the treatment of the mothers' infections, particularly malaria, during pregnancy.

The findings of the intervention study clearly indicate that pregnant women should receive preventive broad-spectrum therapy for malaria more frequently than twice during pregnancy, which was the standard in high-risk malaria areas at the time of conducting the study. However, the findings of the individual study did not immediately affect even Malawi's national practices.

The big wheel only began to turn a few years later when the LAIS dataset was included in an international meta-analysis that extensively covered East and West Africa. The international team of researchers asked their vast data practically the same questions that Ashorn's team had previously asked theirs and received essentially the same answers.

This time, the World Health Organization (WHO) gave a new recommendation based on the study: Preventive malaria treatment should be provided more frequently than twice during pregnancy. This new recommendation soon helped change national practices – in other words, the health of mothers and children in the world's malaria areas improved drastically.

FSD the pick of the archival options

The groundbreaking LAIS data will soon be much easier to access than before for any researcher, as Ashorn's team has agreed to provide access to the data at the Aila Data Service of the Finnish Social Science Data Archive.

The team investigated different archiving options, including several commercial services, before signing the agreement. Many of the commercial alternatives proved fairly expensive, and the free FSD seemed to be the best option on the whole.

Ashorn wanted to provide access to the data because he believes this promotes the interests of both his research team and global health.

'Providing access to the data helps us increase networking with the researchers of the world and therefore increases our opportunities to conduct research in cooperation with others. Furthermore, the issue itself is important, and it moves forward every time someone studies it regardless of whether we are involved in the project or not', Ashorn lists the advantages of providing access to the data.

Ashorn himself has also benefited from data others have made accessible. He once found a dataset he could use in a publication discussing the effects that growth stunting in early life has on health in adulthood. The original team had collected data on the mental health, early childhood and growth issues of people living in the Philippines.

'We extracted the growth data and certain questions related to mental health from the original dataset and managed to answer our own question without needing to bother the original researchers', Ashorn explains.

Providing access to data requires funding

While Ashorn considers providing access to data to benefit both the researcher and the world, he also understands that many still decide against providing access to their own data. He predicts that this will continue as long as providing access remains optional and researchers are not paid for doing so.

'For the time being, publishing is inevitably prioritised over describing datasets because grants are awarded solely for publications, not for describing datasets or providing access to them.'

Describing large datasets requires such a large amount of time that Ashorn's team, for example, has hired one person to work exclusively on it. Providing detailed and accurate metadata has been essential for their team from the outset because members of the team work all around the world and everyone must be able to understand what the dataset and its variables mean.

However, Ashorn believes that the majority of other research teams do not have the opportunity to hire someone to focus on describing datasets. Fortunately, the job of researchers using the FSD is made significantly easier by the fact that describing the datasets, which is necessary in order to provide access to the data, is largely done at the FSD on the customer's behalf.

Doctoral students' interests must be protected

Data protection issues also introduce their own challenges for providing access to data containing sensitive health data.

'Big data experts have estimated that individuals can be identified when ten different variables related to them are known.'

Ashorn's team stores clear identifiers, such as the names of the research participants and GPS data, separately from other data. However, information such as a research participant having thirteen children will remain in the dataset when access is provided to it.

'That kind of information is already quite revealing. However, a massive amount of data would be lost and building many variable models would be made more difficult by actions like categorising the data', Ashorn muses.

The rights of the original creator of data must also be ensured along with the rights of research participants. In epidemiological studies, for instance, data collection takes a long time but data analysis is a relatively quick process. Such scenarios require finding a solution that both protects the interests of the entire world and ensures that the doctoral student who collected the data gets to publish their articles before anyone else.

'Medical researchers in Finland are usually doctoral students whose doctoral degree hangs on them getting their research findings published. International publishers recommend that data are openly accessible six months after being created, but this is challenging from a Finnish perspective', Ashorn indicates.

As for the LAIS dataset, Ashorn's team has solved the problem by providing access to the data in increments. During the first stage, the FSD's Aila Data Service will only receive data collected from pregnant mothers and the monitoring data on infants up to one month of age. Access to the rest of the data will only be provided after Lotta Hallamaa, who is currently writing her doctoral thesis on the data, has finished her work.

Open access likely to become mandatory

Ashorn believes that providing access to data will become mandatory for everyone within ten years. Large US funders and some of the journals that publish research articles have already demanded that his team provide access to their data.

'When providing access to data becomes a mandatory part of research projects, research teams will begin to invest in it in earnest, too. However, the process will not work without research funding specifically directed for this purpose. Funders must understand how much work describing, processing and systematising data requires from the research team', Ashorn analyses.

The LAIS dataset

  • The 'LAIS' abbreviation stands for Lungwena Antenatal Intervention Study.
  • It features a maternal cohort collected in Malawi primarily during the years of 2006–2011.
  • The extensive dataset includes data such as children's growth data, developmental assessment performed at the age of five, children's morbidity and mortality data, socioeconomic data collected during pregnancy, growth environment data, mothers' health and infection data and some data on laboratory results.
Creative Commons -license