Data Management Planning
A data management plan is an integral part of the research plan. The data plan can be reviewed and expanded during research but main principles and procedures should be determined before the research starts, at the latest before data collection begins.
The aim with data management planning is to ensure that good scientific practice is followed in the research, data are kept safe and secure at all stages of research, and data sharing is possible after the original research has been completed.
At the planning stage of the research, researchers should find out whether the research requires ethical review. If data are collected outside Finland, it is best to find out beforehand what practices are followed regarding ethical review in the country of collection. University websites often have instructions on the matter (you can search with phrases such as 'ethics review board', 'ethics review committee', 'human participants').
Finnish Advisory Board on Research Integrity: Ethical review in human sciences (Opens in a new tab)
The guidelines provided on this web page refer to the situation in Finland, and may not be applicable in other countries due to differences in legislation and research infrastructure.
Data management plan
A data management plan describes how research data are collected or created, how data are used and stored during research and how made accessible for others after the research has been completed. You can attach the following kind of concise data management plan to your research plan:
If a concise data management plan has been used for funding application, you will need to expand and specify the plan once the research has started. If circumstances change, the plan needs to be updated. A good data management plan contains a solution to all the questions specified below.
1. The data
- What kind of data are collected/generated?
- In what way are data collected/generated?
What kind of data are collected is mainly determined by the research questions. Research data are typically questionnaire surveys, interviews, focus group discussions, written material, visit or meeting recordings, official documents, archival material, websites, or register or media data.
Data collection methods are determined by the type of data sought for. Quantitative data can be collected through interviews, postal or online questionnaires, by using existing source material, or by measuring. Qualitative data are often collected by recording individual interviews, group interviews, sessions or meetings as audio or video files. Written material collection is usually initiated by publishing writing requests or invitations, and then collecting the writings via email, post or a specifically created website. Official documents can nowadays often be obtained from the Internet but some are available only by request or by obtaining permission to use them for research purposes. Access to register data generally requires applying for permission. More information on how to apply for access to register data on the Finnish Information Centre for Register Research web site.
Finnish Information Centre for Register Research (Opens in a new tab)
Researchers and research teams can collect the data themselves or can contract a data collection company to do it. If data collection is contracted out, it is best to send the call for tender to several companies. Data management plans are useful for drawing up tender calls.
2. Rights
- Who owns the copyright, Intellectual Property Rights and management rights to data?
- Who has the right to grant access to data?
- What procedures are used to inform research participants?
Copyright issues may be relevant for research data even though most empirical research data are outside the scope of the Finnish Copyright Act. If there are copyright issues involved, the owner of the copyright determines how the data can be used. Data use requires permission from the copyright owner. Regardless of whether data are protected by copyright or not, it is important to clarify the roles of persons involved in the research, since reusers of the data will cite the creators of the data when using it.
Research teams should always make an agreement on data ownership and usage rights. Usage rights should be determined both for the research project and for usage after the project has been completed. Before making any agreements, the requirements of the research funder(s) should be investigated in order to make the agreement follow their guidelines.
If an external contractor is used for data collection, the research team should determine, at the latest when the contract is being made, who is the owner of the data, who has data management responsibility and in what ways research participants are informed of future uses of the data. If the data will be archived for data sharing at the Finnish Social Science Data Archive (FSD), the agreement may state that the external contractor delivers copies of the data and associated metadata straight to the Archive.
Data archives specify access rights to archived data. Official documents are generally freely accessible to researchers. If data created from research have access restrictions, the depositor of the data will determine access conditions to it. Openly available web material is openly available for research as well but archiving such material for reuse may not always be possible due to restrictions set by the Finnish Copyright or Personal Data Acts.
When information is collected directly from research participants, reuse possibilities of the data are determined by the information given to participants on the future uses of the data.
3. Confidentiality and data security
- How is confidentiality of data ensured?
- What kind of rights different user groups have to access and process data files?
- How is data security ensured?
- How are back-ups of data files handled?
Confidentiality in the research environment basically means planned and careful processing of personal data. Personal data should only be collected and processed to the degree necessary for the research, and unauthorised access to the data must be prevented. In line with the transparency principle of data protection, it is particularly important to ensure that research subjects are informed of the personal data processing as required by the EU’s Data Protection Regulation.
Data security means keeping personal information collected, as well as computer systems, data files and transfers of data safe. It is easy to copy and disseminate digital research data files, or unintentionally destroy or change them. Making back-ups of data files and preventing unauthorised access to them are thus integral parts of data security.
It is recommended that data files requiring a large storage capacity are stored locally at the institutional data storage facility or in the IDA Storage Service provided by the Ministry of Education and Culture. IDA is a useful and safe solution also for collaborative research projects where the same data are analysed in more than one university. The service is aimed at Finnish universities and at the projects or research infrastructures funded by the Research Council of Finland.
- Official information on informing data subjects about processing (Opens in a new tab)
- More on Informing Research Participants
- More on Data Security
- Ministry of Education and Culture: IDA Storage Service (Opens in a new tab)
4. File formats and programs
- What software programs are used to store and process data?
- What file formats and storage media are used?
A variety of statistical software are available for processing quantitative data. There are other software for processing and analysing qualitative data although many researchers still continue to use word processors for analysing textual data. The software chosen determines the file formats used. Because software systems keep developing and changing, it is best to store at least one copy of data in a software-independent format or in a standard format that many software are capable of interpreting.
Data files can be stored and transferred with optical media (e.g. CD, DVD, Blue-ray) and with non-volatile memory (e.g. memory cards or USB-sticks). The safest way to preserve data files is to store them on duplicate copies of magnetic media (e.g. hard drives or tapes).
More information on physical storage
5. Documentation on data processing and content
- How is the (technical) quality of data ensured?
- How are data processing methods documented?
- Where are metadata describing data collection methods and data content stored?
Technical and content decisions made at data entry stage influence the quality of data. Decisions to be made include, for example, whether to enter information into a matrix or the technical solution chosen for audio or video recording. Solutions chosen for post-collection processing also have an impact on data quality. For instance, in the case of quantitative data, the naming and organisation of data files, naming of variables, and documenting the codes and reasons for missing values. In the case of qualitative data, the transcription level chosen.
Sufficient data documentation in different stages of data collection and processing is a crucial factor for quality. Data documentation is also important for long-term preservation and usability. Good documentation, that is, carefully created metadata enable informed re-use of data and long data life cycle.
6. Life cycle
- What happens to data after research has been finalised?
Subsequent use value of research data is largely dependent on data management measures carried out during the research. Effective data management before and during data collection and processing is an essential requirement for generating data that can be used afterwards for new research, learning, or teaching of methodology.
If data are to be stored by an individual researcher, university department, research unit or organisation, data owners must provide a solution to all aspects of subsequent data management: storage, archival and dissemination packages, terms of use, data delivery, and dissemination of metadata. If data are deposited with the Finnish Social Science Data Archive, the archive will take care of all these aspects and in addition ensure confidentiality and data security.
It is not worthwhile to preserve all research data permanently. When considering whether to archive a dataset or not, uniqueness of the data, its usability, access conditions, re-use value in research and education, and archiving costs all need to be taken into consideration. Insufficient or poor data management during the research stage considerably increases the costs of archiving, as it often is time-consuming to process the data for reuse afterwards and find out information needed for metadata. Still, destroying a dataset must always be a conscious decision and not the result of an inadequate or careless data management.