Records Management and Archives Formation Plan of FSD FSD Operational Guidelines
This part of the Archives Formation Plan first describes how current legislation is taken into account in data archiving work and what the Finnish Social Science Data Archive's (FSD) data acquisition and selection principles are. Next, it reviews FSD's work process as well the documents and document series that are processed and produced at different stages of this process. Finally, it describes FSD's main information system (Tiipii), data security and privacy measures and practices.
- 1. Compliance with legislation
- 2. Data acquisition and selection criteria
- 3. FSD's work process
- 4. Website
- 5. Information systems, data security and long-term preservation
- 6. Collection thus far and anticipated accumulation
- 7. Continuity plan
1. Compliance with legislation
FSD is organisationally part of Tampere University but has a national-level service function. The Universities Act (558/2009) guarantees the autonomy of Finnish universities. According to the Act, universities themselves, and not the administrative authorities of the State, have the right to make decisions in matters belonging to their internal administration. In the Government Proposal on the Universities Act, it was stated that appropriate preservation of research data shall be safeguarded (Government Proposal 7/2009).
The most important activity of FSD is to document, catalogue, maintain the usability of, and preserve digital research data collected for scientific research. FSD follows good data management practices as laid down the Archives Act and the Act on the Openness of Government Activities.
To succeed in its activities, the Data Archive must carefully plan all stages of research data archiving. The Archives Formation Plan is the highest directive concerning the archiving work performed at FSD. The Plan is updated annually and published on the FSD website. The structure of the Plan is based on FSD's archival processes and workflows. Operating in accordance with the Copyright Act as well as the relevant data protection legislation is crucial for research data in the social sciences and humanities.
Copyright Act (404/1961)
Persons or organisations depositing their data to be archived give FSD the rights to archive the data and disseminate them for reuse with the terms and conditions agreed upon in the deposit agreement PDF . The original creators of data retain other rights to their data, including ownership, copyright and associated intellectual property rights in all deposited material. In accordance with the terms of the deposit agreement, FSD has the right to process deposited data as required by established data protection and data security norms and long-term preservation practices.
The moral rights of the author to their data are honoured with normal scientific citation practices. By agreeing to the Terms and Conditions for Data Use, reusers of data commit themselves to citing the data used and specifying the author(s) of such data in any publications or presentations based on the data.
The Finnish Copyright Act does not recognise the so-called copyright exception for research, which means that it is not allowed to archive and disseminate copyrighted material collected by researchers for research purposes without a separate licence agreement or permission from the author. FSD and Finnish copyright society Kopiosto signed an agreement in 2015 that allows FSD to archive and disseminate for reuse certain digital or digitised works analysed in research, namely works belonging to the fields of rightsholder organisations represented by Kopiosto (for instance, magazine articles, photographs, illustrations and comics). Audiovisual works and musical compositions are not covered by the agreement.
When research data contains material created by research participants that is subject to copyright, the researcher should agree on the transfer of rights with the participants before depositing the data for archiving.
General Data Protection Regulation of the European Union (2016/679)
Prior to depositing, FSD requests depositors to remove personal information from data in accordance with guidance given by FSD. Exceptions include newspaper data and types of data where research publications derived from the data contain research participants' personal information (e.g. interviews with experts or artists, copyrighted material). The exception is grounded on Section 27 of the Data Protection Act (1050/2018), which, based on Article 85 of the GDPR, lays down provisions on the processing of personal data for journalistic purposes or the purposes of academic, artistic or literary expression.
Data determined by researchers to be anonymous often still contain at least indirect identifiers to such an extent that FSD deems them as containing personal data. Therefore, FSD requires prior to archiving that depositors accept the terms of an agreement on personal data processing PDF for assessing the suitability of the data for archiving.
FSD performs the role of a data processor which the data controller has appointed and to which the data controller has provided the data. The aforementioned agreement ensures the statutory requirements for the deposit as well as the ability of both parties to demonstrate accountability in compliance with the General Data Protection Regulation. Accepting the terms of the agreement is always required for each dataset even if the depositor deemed the data already anonymous or anonymised before the deposit. The Agreement is handled electronically in connection with email correspondence and does not require a separate signature. By transferring the data to FSD, the depositor consents to the terms and conditions laid out in the Agreement. For the Agreement to have the force of law, FSD requests the depositor to send via e-mail the information regarding the data controller of research data, types of personal data, and categories of data subjects in the data.
FSD carries out all technical and organisational measures needed to ensure the security of data processing. In accordance with Article 32 of the GDPR, FSD takes into account the risks that are presented by processing when assessing the appropriate level of security and determining the necessary security measures. FSD provides further information upon request regarding the technical and organisational security measures employed in personal data processing. The personnel at FSD whose work tasks involve processing research data adhere to applicable statutory obligations to secrecy and confidentiality. The data processing personnel also sign a non-disclosure agreement and receive appropriate training with regard to data security and data protection.
When FSD removes identifiers from data, it asks the data controller or a representative to approve the removals and edits. The data controller is also advised to erase any versions of the data that still might contain identifiers. When the data are deemed suitable for archiving, FSD and the data controller formulate a deposit agreement.
Confidentiality is also emphasised in the Terms and Conditions for the Use of Data, which data reusers must agree to before gaining access. Reusers commit themselves not to endanger the privacy of individuals or organisations connected to the data. Moreover, reusers must comply with good research ethics in privacy and data protection issues, and erase the data as soon as the use purpose has ended.
The annually published "data balance sheet" (in Finnish only) contains detailed descriptions of technical and organisational security measures in the processing of user data or research datasets.
2. Data acquisition and selection criteria
The data collection of FSD accumulates actively and selectively: FSD acquires datasets actively but accepts data for archiving selectively. The data to be archived at FSD must comply with certain qualitative, technical and legislative criteria.
2.1. Qualitative criteria
The dataset must fulfil at least one of the following criteria:
- it can be used for temporal or content-related comparative study;
- it can be used to complement other data;
- it has thus far been analysed only partly;
- it can be used in a manner that differs from its original use (for example, it enables new hypotheses or methodological foci);
- it can be used for studying or teaching research methods; or
- it is scientifically and/or culturally unique.
2.2. Technical criteria
Both of the following criteria must be fulfilled:
- the data are in a reasonable technical state, meaning that they can be processed/converted for reuse at a reasonable cost, and
- the information content of the data is sufficiently clearly organised and supplementary, contextualising materials are adequate to allow metadata creation and the processing of the data.
Recommended file formats are listed in a separate table.
2.3. Legislative criteria
Processing the data for research purposes must comply with current legislation:
- Ownership and copyright of the data are sufficiently clear.
- The data are anonymous or can be anonymised on assignment without significantly compromising their usability.
- If the data are anonymous or meant to be anonymised and have been collected after 25 May 2018, the privacy notice for the study has to have included information regarding archiving at FSD after the research has ended. If the data have been collected before the mentioned date without informing participants about archiving, FSD assesses suitability for archiving on a case-by-case basis, and if FSD decides to archive the data, it will provide rationale for the decision.
- If the data contain personal information, archiving of the data has to be necessary and proportionate to the aim of public interest pursued and to the rights of the data subject in accordance with Section 4, paragraph 4 of the Data Protection Act (1050/2018), and research participants have to have been informed about archiving at FSD (with personal data included) in the privacy notice of the study. The researcher has to provide rationale for the necessity and proportionality of archiving. Data minimisation must always be adhered to (GDPR 2016/679 Article 89(1)).
- Research subjects have given their explicit consent for the archiving of interview data or self-administered writings that are archived on the basis of Section 27 of the Data Protection Act.
- Data containing material subject to copyright fall under the scope of the agreement between FSD and Finnish copyright society Kopiosto, or an agreement on the transfer of rights has been made with the authors of the material (research participants).
- Research data received for a definite duration from authorities (e.g. register data) can be processed if the access permission stipulates that the data be deposited at FSD without identifiers. For instance, access permissions for research based the Act on the Openness of Government Activities (621/1999) do not usually include a right to transfer the data to FSD. If a researcher wants to archive the anonymous data, he/she needs to get a separate permission for archiving from the organisation that provided the data.
2.4. Other criteria
- FSD does not archive digital data used in research that are available at the National Archives of Finland. However, FSD may archive materials that are archived as hard copies at the National Archives but have been digitised by the researcher for research purposes and there are sufficient metadata for citation.
- FSD can archive newspaper and magazine material as well as photographs, cartoons and illustrations in books that have been collected by researchers for their study but have been created by someone else. In accordance with an agreement between FSD and the Finnish copyright society Kopiosto, FSD can archive and disseminate for reuse in research such digital or digitised works belonging to the fields of rightsholder organisations represented by Kopiosto. The agreement does not apply to audiovisual material or compositions.
- Hard copies (paper materials), such as newspaper or magazine articles, other texts, or physical photographs, are converted into digital format as part of the dissemination information package (DIP) of the dataset if these materials have been used as a research instrument (e.g. as stimulation material for discussion, interview or survey).
- FSD does not archive audiovisual recordings. Such files are archived by the Language Bank of Finland, which specialises in long-term preservation and reuse management of audiovisual material. The Language Bank operates under the University of Helsinki.
3. FSD's work process
This chapter gives an overview of the three key work processes at FSD: data ingest, processing, and dissemination for reuse. Documents and/or document series pertaining to each process are listed at the end of each overview. All documents and related document management measures are described in more detail in Appendix 1 (tables with archiving instructions).
3.1. Data ingest
Data ingest is a process whereby a researcher, research group or research unit deposits research data for archiving at the Finnish Social Science Data Archive. FSD requires prior to the deposit of the Submission Information Package (SIP) that depositors accept the terms of an agreement on personal data processing PDF for assessing the suitability of the data for archiving. The agreement enables transferring the data to FSD for assessment in compliance with the GDPR also in the case that the data still contain personal information (e.g. indirect identifiers). The agreement is always required even if the researchers deemed their data already anonymous.
Once the depositor has accepted the terms of the agreement, data files are transferred to FSD's server through Aila Data Service. The deposit process begins when the depositor receives an activation link via email from FSD User Services. After logging in to Aila, the activation link directs the depositor to an electronic form which the depositor uses to transfer data files and fill in information regarding the deposit (metadata). The activation link can be used only once.
The connection to Aila is secured with HTTPS. The depositor is authenticated using HAKA identity federation sign-on for Finnish universities or, in the exceptional case that the depositor cannot authenticate through HAKA, using FSD's own identity server. A deposit can be edited by the depositor as well as those FSD employees whose work tasks include a role in data ingest or maintenance of the deposit service. FSD erases information related to the deposit from the system once the deposit is finished. Information regarding the deposit is saved in FSD's internal database Tiipii.
The original data in electronic format are usually submitted to FSD in the file format of some statistical software package, or as text or image files in the case of qualitative data (see list of file formats). During the deposit, the depositor complements the data with background materials and metadata information in the deposit form on Aila. The supplementary materials sent through Aila are stored until FSD has processed the data and produced the necessary metadata for the dataset.
The Deposit Agreement PDF is signed either after the assessment of the suitability of the data for archiving or, at the latest, when the data have been processed into the Dissemination Information Package (DIP) suitable for reuse. The original data are destroyed once the DIP has been produced and the depositor has approved the descriptive metadata as well as the processed data.
FSD assigns each dataset a persistent identifier (PID). FSD's PIDs are URNs (Uniform Resource Name). PIDs ensure that datasets are findable and accessible even if they were moved to another location, for instance.
Data ingest includes the following documents/document series:
- Deposit Agreement (L series: Appendix 1, Table 1)
- Dataset description form (KL series: Appendix 1, Table 2)
- Original data (ORIG/DA-series: Appendix 1, Table 3)
- Other material describing and contextualising how the data were produced (ORIG/OT-series: Appendix 1, Table 4)
3.2. Data processing
FSD processes the Submission Information Package (SIP), turning it into an Archival Information Package (AIP) for long-term storage. The AIP is used for producing a Dissemination Information Package (DIP) that makes the data suitable for reuse. In the majority of cases, the AIP is the same as the DIP. The AIP and DIP include data, metadata and any other material related to the data. If the depositor wishes to retain all primary use to parts of the data (e.g. certain variables still needed for carrying out the original research), FSD produces a separate AIP that includes variables embargoed for a definite period of time.
The aim of the processing is (1) to process the data so that they remain accessible in the long term, both in terms of technical format and content, and (2) to protect the research subjects' privacy. This is achieved, for example, by choosing appropriate technical formats, creating detailed metadata and anonymising the data. The aims of data processing are the same for different data types, but different types of data are processed in different ways. The key features of qualitative and quantitative data processing are described below.
Data processing produces the following document series:
- Archival information packages (AR series: Appendix 1, Table 5)
- Dissemination information packages (DA series: Appendix 1, Table 6)
- Metadata (ME series: Appendix 1, Table 7)
- Digital material describing/contextualising the production of the data (OT series: Appendix 1, Table 8)
- Data processing files (SY series: Appendix 1, Table 9)
3.2.1. Quantitative data
The original data deposited at FSD can be in many different formats (e.g data in SPSS, Excel or ASCII formats and supplementary materials in Word, Excel or text files or as hard copies). The objective is to produce a well-documented data file in which the contents and structure correspond as closely as possible to the collection instrument (e.g. questionnaire). This is why the DIP does not normally include variables constructed by the researchers from other variables in the data.
The Archive uses SPSS statistical software for reviewing and processing data and for adding variable-level metadata. Data processors produce detailed documentation on how the AIP and DIP were produced. Variable information, any amendments made to the data and other observations are noted in the SPSS syntax file. The international DDI2 documentation standard is used for describing and storing metadata relating to the content and methodology of the data. Archival and packaging information is stored in the Data Archive's internal database Tiipii (see chapter 5).
FSD requests that depositors anonymise their quantitative data before submission to the Archive (more information on identifiers and anonymisation in the Data Management Guidelines). FSD reviews the anonymisation and makes additional changes if necessary. The necessary anonymisation measures taken by FSD are itemised and sent to the depositor for approval.
Anonymisation is planned for each dataset on a case-by-case basis. The first step in anonymisation is to chart the features of the data (population and sampling; content of the data; dataset age; information on the respondents available in other sources; usability). Identifiers can be removed in the following ways:
- Removing variables, values and units of observation
- Recoding variable values
- Editing/redacting parts of responses in open-ended variables
- K-anonymity and l-diversity
- Adding noise
- Anonymising pseudonymised data: In addition to the strategies listed above, units of observation in pseudonymised data are always randomly assigned new id numbers and the data are rearranged according to these new id numbers. Original id numbers are erased. After the data have been anonymised in this way, it is no longer possible to add new information about the research participants to the data.
3.2.2. Qualitative data
The Archive accepts qualitative, or non-numerical, research data for archiving in many formats. Archived qualitative data are mainly textual data, originating from interviews or different types of interactions, or self-administered writings (e.g. biographies, diaries or thematical texts).
Digital images are archived only if the researcher has gotten permission to archive the images from the authors (so-called transfer of rights). Audiovisual material is archived only in exceptional cases. For example, expert interviews of people well known in their field can be archived if the individuals featured in the material have given explicit consent for the archiving and reuse and if Section 27 of the Data Protection Act is applicable to the material. Most interview data do not fall under this provision.
When processed, digital text files of qualitative data are converted to TXT or RTF format. Image files are stored in JPEG, PNG, TIFF or DNG format and audio files in FLAC or MP3 format. Hard copies are converted into PDF, RTF or TIFF format, whichever is considered best depending on the case. Consistency of the dataset's internal metadata (such as file names and descriptive background data) is also verified. Measures and actions taken to produce the Dissemination Information Package (DIP) of the data are noted in detail in a dataset-specific text file. The international DDI2 documentation standard is used for describing and storing metadata relating to the content and methodology of the data. Textual data DIPs contain an HTML index facilitating data reuse. Archival and packaging information is stored in FSD's internal database Tiipii (see chapter 5).
Removal of identifiers
FSD requests that researchers remove identifiers from their qualitative data before submitting them to the Archive (more information on identifiers and anonymisation in the Data Management Guidelines). FSD reviews the anonymisation and makes additional changes if necessary. The necessary anonymisation measures taken by FSD are itemised and sent to the depositor for approval.
A plan on the removal of identifiers is made for each dataset on a case-by-case basis. Identifiers are removed from all personal information, concerning either research participants or third parties. Identifiers are removed in the following ways:
- Additional data files containing direct identifiers (such as personal names, addresses, telephone numbers, email addresses or social security numbers) are deleted.
- Person names (both of research participants and of third parties mentioned by them) are replaced with aliases (Elisabeth -> [Ann]), or the names are removed (Elisabeth -> [wife]). Original names are erased.
- Specific locations (schools, workplaces etc.) are categorised (Hennes & Mauritz -> [clothing store]).
- Background information of participants (e.g. age, municipality of residence, education, occupation, household composition, nationality or ethnicity) is categorised.
- Parts of data that contain significant numbers of identifiers are removed.
- Exceptions: 1) Data archived in accordance with the Copyright Act 404/1961 and Section 27 of the Data Protection Act are only minimised. 2) Those datasets are not anonymised or minimised that are archived exclusively on the basis of Section 27 of the Data Protection Act and to which the agreement between FSD and Kopiosto applies (e.g. magazine articles).
If anonymisation would significantly reduce the usability of the data, and the data are of considerable scientific value with personal information included, FSD recommends that the researcher(s) negotiate with the Data Protection Officer of their organisation about the possibility to archive the data with personal information included on the grounds of the provisions in the Data Protection Act concerning archiving.
3.3. Dissemination of data for reuse
The datasets archived at FSD are disseminated for reuse in accordance with the access conditions set in the data deposit agreement. A small part of the datasets is openly accessible for all users.
Most datasets are available only for registered users. Students and personnel at Finnish universities and polytechnics register themselves using the authentication system provided by the Haka identity federation. Other users (e.g. students and personnel at universities abroad or state research institutes) are required to complete a short registration form to get a user account. Once FSD has checked the contact details and the purpose for requesting data access supplied in the form, the applicant will be sent a confirmation message and an Aila username to the email address she or he has provided.
Data are available for users in accordance with the access conditions set out for the dataset, which can be divided into four distinct categories of availability:
- available for all users
- available for research, teaching and study (requires registration).
- available for research only (requires registration)
- available only by permission from the depositor (requires registration)
The following documents and files are related to the delivery of data to users:
- Data download information, starting from year 2014 (AL-series, Appendix 1, table 12)
- Access applications (hard copies), till year 2014 (P1-series, Appendix 1, table 10)
- Terms and Conditions of Use agreements (hard copies), till year 2014 (P2-series, Appendix 1, table 11)
One of the key tasks of FSD is to disseminate information about archived research data. Detailed descriptions of archived datasets are openly available on Aila Data Service. The catalogue is constantly updated.
On its website, the Archive also provides guidelines for researchers on data management. The guidelines cover information given to research participants, anonymisation, file formats, metadata and data security. Moreover, FSD provides learning materials on research methods in Finnish. All issues of the Archive's newsletter FSD Bulletin are also published on the website.
The website draws its information from FSD's founding documents, archived data files, data archiving literature and press releases, and the research literature. A permanently archived copy of the website is extracted annually and whenever major changes are made to it. Modifications to the website are documented using a version control system. The website is located at https://www.fsd.tuni.fi/.
5. Information systems, data security and long-term preservation
The Archive uses two central information systems: Aila Data Service and the operational database Tiipii. Aila Data Service includes an online data catalogue and data portal, a user registration and sign-on system, and a client register of data users: Information system description (Aila). Tiipii, the operational database, is an internal documentation system for all archiving work: Information system description (Tiipii).
Moreover, FSD has other internal information systems accessible only by in-house staff. Access to the systems is restricted by, for instance, firewalls or requiring login.
In addition, FSD utilises a national digital preservation service owned by the Ministry of Education and Culture and provided by CSC - IT Center for Science Ltd.
Digital data are stored on the servers of Tampere University, which are located in a data centre of the University. The server space is accessible only by employees with a role that necessitates it. Electronic access to data is restricted only to server administrators and FSD staff who have a role requiring access.
Some of FSD's data can also be accessed outside the University premises using the official remote access service provided by Tampere University. Other modes of remotely accessing the data are not provided nor supported. In remote work, FSD personnel adhere to separate data security guidelines.
Against possible physical damage to storage media (e.g. through hard disk failure or fire), the data on the Archive's servers are backed up in accordance with the backup policy of Tampere University ICT Services. In addition to this, certain FSD servers are backed up using the Archive's own backup system. In both cases, the storage location of the backup copies is the data centre of Tampere University ICT Services. The data centre is locked, fire-safe and under access control. Tampere University ICT Services provide the storage service to organisations within the Tampere university communities (Tampere University and Tampere University of Applied Sciences).
Discarded storage media (e.g. hard disk drives) are sent for disposal in accordance with Tampere University regulations relating to the destruction of storage media containing personal data. The storage media of the University ICT Services are also disposed of in accordance with the same regulations. A partner of the University carries out the disposal in accordance with agreed security standards and provides a report of the handling of each medium.
The hard disk drives of computers used by FSD staff are encrypted.
Tampere University requires data systems administrators to sign a non-disclosure agreement (an undertaking of confidentiality). The same procedure applies both to FSD's technical service staff and the administrative and maintenance staff of Tampere University ICT Services.
The data archived by FSD are also transferred to a national digital preservation service. For preservation, the data are packaged into digital archival information packages, file formats are harmonised, and the package is complemented with technical metadata and provenance data as required by the service. The transfer is encrypted. The digital preservation service is a secure cluster of resources intended specifically for the long-term preservation of cultural heritage and research data. The service is maintained by CSC, the data centres and ICT services of which are all ISO 27001 certified. The data can only be transferred back to FSD. The website digitalpreservation.fi includes the digital preservation service's privacy notice (in Finnish) and specifications for packaging, file formats and APIs.
6. Collection thus far and anticipated accumulation
Hard copies of deposit agreement documents: the collection from 1999 until July 2019 encompasses a total of 5 archiving folders (0.4 shelf metres). Anticipated future accumulation: 1 archiving folder (0.08 shelf metres) in four years.
Permanently stored hard copies of contextualising research material: the collection from 1999 until July 2019 encompasses a total of 41 archiving folders (3.28 shelf metres). Anticipated future accumulation: 1 archiving folder (0.16 shelf metres) in four years. The majority of data since 2010 is in digital format.
Digital research data and other connected digital material: the collection from 1999 until July 2019 totals 16.9 GB containing 38,261 files. Anticipated future accumulation: approximately 0.8 GB per annum.
Hard copies of FSD's administrative documents are archived at the long-term archive of Tampere University in accordance with the information management plan of the University. A minor share of the administrative documents is preserved at FSD. These documents encompassed 1 archiving folder in July 2019.
Hard copies of access applications and Terms and Conditions of Use documents: the collection from 1999 until April 2014 encompasses a total of 20 archiving folders (1.6 shelf metres). Anticipated future accumulation: none. The document series ended with the launch of the online Aila Data Service on 23 April 2014.
7. Continuity plan
The base funding of FSD is sufficient to maintain the core activities (data archiving and dissemination, information service). In the unlikely event that the Archive's funding and continuity of operations are at risk, the Director of FSD appoints a task group to map out required functions for the controlled transferring of the data to another institution. The task group are to take administrative and technical aspects into account when planning such transfer. Representatives of funders, members of the FSD Advisory Board and experts in data archiving shall be included in the task group, as well as representatives of other necessary stakeholder groups.