Records Management and Archives Formation Plan of FSD Appendix 6. Digital File Formats Used at the Finnish Social Science Data Archive for Different Types of Data
Released 5 October 2015, updated 21 March 2016
Preservation (FSD) refers to the digital file formats the Archive uses for long-term preservation.
Delivery to users means the file formats used in data transmission through Aila Data Service. The service allows online data download.
Classification of file formats according to data types should not be strictly interpreted. Recommended formats can be used for all data types.
Data Types and Digital File Formats
Type of data | Ingest formats accepted | Preservation (FSD) | Delivery to users | Notes |
---|---|---|---|---|
Data matrix |
No restrictions Recommended: SPSS portable or other statistical software formats (e.g. SAS, Stata, Excel, csv) |
por (SPSS Portable), CSV, (ods) | por (SPSS Portable), (CSV, ods) |
In exceptional circumstances, the Archive can deliver data matrices in other formats (e.g. sav, csv). OpenDocument format (.od*) may be used as a preservation format in case the layout needs to be preserved. The Archive is aware of the restrictions imposed by SPSS, and therefore keeps track of developments in statistical software and of the format choices made in data repositories in other countries. Increased archiving of health data may require re-evaluation of matrix formats. |
Textual data, e.g. interview transcripts or responses to open-ended questions |
No restrictions Recommended: plain text or widely-used office formats (e.g. docx) |
UTF-8 encoded TXT or CSV, xml, html/xhtml, odt | UTF-8 encoded TXT or CSV, xml, html/xhtml, odt |
No restrictions to using other field separators than commas in csv files (e.g. tabulator). If the layout of the archived material or images embedded into the document need to be preserved for the material to be understood, using PDF/A or OpenDocument format is recommended. Increased archiving of humanities data may request re-evaluation of file formats. |
Internal data processing documentation | - | UTF-8 encoded TXT or CSV, PDF/A, odt | - |
The program used in the processing will determine the file extension (e.g. for syntax SPS, for python source code PY etc.) PDF/A or OpenDocument format (odt) may be used as a preservation format in case the layout needs to be preserved. |
Image |
No restrictions Recommended: JPEG, PNG, TIFF |
JPEG, PNG, TIFF, (DNG) | JPEG, PNG |
In exceptional circumstances, DNG can be considered for long-term preservation. When the Archive digitises images, the adopted long-term preservation format is TIFF or DNG. The Archive takes into account the digitisation guidelines maintained by the National Archives. Animated GIF files can be converted either into video or to a series of PNG images. |
Audio |
No restrictions Recommended: FLAC, WAV |
FLAC, (MP3) | FLAC, MP3 |
The Archive keeps track of audio format recommendations and changes formats if needed. MP3 is accepted as a long-term preservation format only if the original material was in this format. |
Video |
No restrictions Recommended: MPEG-4 H.264 |
MPEG-4 H.264 | MPEG-4 H.264 |
The formats recommended/accepted for video may change. They will be reviewed as soon as there is need. Compression level is decided on a case-by-case basis. Archival and dissemination information packages (AIP and DIP) of a dataset may differ in compression level. |