Unique software for unique needs
This article was published in Finnish in the latest issue of the FSD Bulletin (1/2019).
According to IT Services Manager Matti Heinonen, the technical service structure of the Finnish Social Science Data Archive can best be described by comparing it to an onion. "Digital research data form the core around which different types of services and applications are built layer by layer. All outer layers aim to preserve the core, that is, the archived data."
Heinonen emphasises the importance of preserving data for the long term as the analogy continues: "The onion’s outer layers getting damaged would not be too dramatic, although fixing broken applications and services is a lot of work. But if the core were damaged, the outer layers would be of no use, either. Secure long-term preservation of data is thus a prerequisite for FSD to operate."
Maintaining and constantly developing its information systems is a central part of FSD’s operations. Without a working information system that handles the preservation of data reliably, the basic operations of depositing, preserving and reusing research data would not be possible. FSD’s operational database management system makes it possible to transfer data from one system to another and to use data in different types of applications.
Archival copies of research data
FSD does not archive data in the form deposited by researchers, because the data require some processing to be suitable for reuse. Each dataset is processed at FSD and turned into a reusable archival version. This process includes complementing the metadata because fluent processing of data in operational database management systems requires consistent metadata.
"When FSD began its operations in 1999, we chose XML as the language for documenting metadata. We use XML to describe the variables in a dataset and other features of the data," Heinonen says.
When data are archived, they can be transferred from one system to another within the operational database. Through FSD’s control interface, datasets are used by different applications and by Aila Data Service, which is the interface for end-users of FSD’s data services.
Data transfers are automated
The newest application is a service developed for Aila that enables researchers to easily and securely deposit their datasets at FSD. The service was launched in the autumn of 2018.
"The basic process has stayed the same, but the system has expanded little by little, and of course more and more information is received. New information systems have been taken into use and internal management processes have been automated."
"When research data are transferred to Aila and disseminated for reuse via the Internet, each phase includes an information system that has been developed at FSD for our own needs."
The operational database enables FSD to operate according to the FAIR principles of Findable, Accessible, Inter-operable and Reusable data.
Since late 2017, the long-term preservation of Finnish research data has been the responsibility of the National Digital Library (NDL) project carried out by the Ministry of Education and Culture. The long-term preservation service (KDK-PAS) developed in this project enables libraries, archives and museums to deposit digital data to be preserved for the long term, that is, for hundreds of years into the future.
The data in FSD’s holdings are transferred to the service semi-automatically – a human is still needed for choosing the data to be transferred. Heinonen promises that this, too, will be handled automatically in the future: "Before long, transferring data for long-term preservation will be completely automated."
Communicating with machines and people
Matti Heinonen describes FSD’s technical environment, which he manages, as a people-intensive resource, because machines are only a small fraction of the costs. The personnel’s human capital carries the weight of the operations and also enables development work, which is why FSD needs a diverse range of professional competence. More than a third of FSD’s personnel work in the Archive’s Technical Services module.
"Some of them communicate with machines, others with people. For instance, those working with long-term preservation communicate with machines, whereas those developing Aila talk with people," Heinonen generalises and emphasises that collaboration between different personnel groups is important. Maintaining operations and services requires dialogue and teamwork.
Systems Specialists make sure that everyday IT functions run smoothly. They also ensure that when software developed elsewhere is used, it is altered into an FSD version which is integrated to function with the rest of the system.
Useful software is shared
FSD’s software developers create new applications for the Archive’s own needs.
"Developing our own software is kind of the only possibility because our needs are unique and suitable commercial software does not exist. Adopting new third party systems would also be more expensive and not as agile to work with."
FSD’s information systems are open source and they are developed in a Linux environment. As a publicly funded organisation that provides its services free of charge, it is the only possible mode of operation.
"It is important to create a license that enables others to use the software created on an open source basis. We only share such software that, in our view, can have useful applications in other environments."
In Heinonen’s opinion, the best way to keep up with the swift development of the field is ongoing employee self-development and education, which also includes international conferences and maintaining contact networks.
Being located right next to Tampere University is also a significant advantage for FSD. "I sometimes sit in the back row to listen to interesting lectures at the university," Heinonen reveals.
"Moreover, student projects are of great importance to us, because they provide us with fresh perspectives. Collaboration and sharing information openly is usual in IT."
Since FSD began operating in 1999, it has worked in close cooperation with European data archives and has been actively involved in development projects of CESSDA. Heinonen describes the collaboration with other archives before CESSDA’s ERIC phase as a type of "peer support".
"All archives used the same basic metadata format but there were no strict requirements for using it."
Since CESSDA became an ERIC, there have been new requirements on metadata standardisation, and collaboration has become more systematised. Common tools are developed for European archives in order to ensure fluent international cooperation. For instance, the new CESSDA data catalogue enables broad international dissemination of research data.
"FSD has been able to prepare well for future collaboration, and we are now able to assist smaller archives that are just starting out," Heinonen says. He has positive expectations of the future.
"Development seems to move toward a more networked type of cooperation, in which services are integrated and interoperable. Machines communicate with each other and basic functions will no longer require as much human input as before."
Text and photo: Eija Savolainen