Records Management and Archives Formation Plan of FSD Appendix 6. Digital File Formats Used at the Finnish Social Science Data Archive for Different Types of Data
Initially released 5 October 2015, latest update 8 July 2025.
- Ingest formats accepted refers to the digital file formats the Archive accepts for processing.
- Preservation (FSD) refers to the digital file formats the Archive uses for long-term preservation.
- Delivery to users means the file formats used in data transmission to customers through Aila Data Service.
Classification of file formats according to data types should not be strictly interpreted. Recommended formats can be used for all data types.
The Archive recommends converting files to the recommended ingest formats before deposit. Data files sent in formats other than preservation or distribution formats are converted to accepted file formats during the archiving process. The original file in another format is not retained after the archiving has been completed. Although the ingest formats are not restricted, the files to be submitted must be such that they can be opened at FSD.
Data Types and Digital File Formats
| Type of data | Ingest formats accepted | Preservation (FSD) | Delivery to users | Notes | 
|---|---|---|---|---|
| Data matrix | No restrictions Recommended: SPSS SAV or other statistical software formats (e.g. SAS, Stata, Excel), CSV or SPSS Portable POR | CSV, SPSS Portable, (ODS) | SPSS SAV, CSV, (SPSS Portable POR, ODS) | Recently archived data files are delivered in SAV and CSV formats. Older SPSS Portable files will be converted to sav format in the future. OpenDocument format (.od*) may be used as a preservation format in case the layout needs to be preserved. The Archive is aware of the restrictions imposed by SPSS, and therefore keeps track of developments in statistical software and of the format choices made in data repositories in other countries. | 
| Textual data, e.g. interview transcripts or responses to open-ended questions | No restrictions Recommended: plain text or widely-used office formats (e.g. docx) | UTF-8 encoded TXT or CSV, xml, html/xhtml, odt | UTF-8 encoded TXT or CSV, xml, html/xhtml, odt | No restrictions to using other field separators than commas in csv files (e.g. tabulator). If the layout of the archived material or images embedded into the document need to be preserved for the material to be understood, using PDF/A or OpenDocument format is recommended. Increased archiving of humanities data may request re-evaluation of file formats. | 
| Materials describing or contextualising the research data | No restrictions Recommended: widely-used office formats or the original format (e.g., image, audio, or a form) | UTF-8 encoded TXT or CSV, xml, html/xhtml, odt, PDF/A tai od* | UTF-8 encoded TXT or CSV, xml, html/xhtml, odt, PDF/A tai od* | If preserving the formatting or visual elements is necessary for comprehension, the recommended formats are PDF/A or OpenDocument. | 
| Internal data processing documentation | - | UTF-8 encoded TXT or CSV, PDF/A, odt | - | The program used in the processing will determine the file extension (e.g. for syntax SPS, for python source code PY etc.) PDF/A or OpenDocument format (odt) may be used as a preservation format in case the layout needs to be preserved. | 
| Image | No restrictions Recommended: JPEG, PNG, TIFF, SVG | JPEG, PNG, TIFF, (SVG, DNG) | JPEG, PNG, (SVG) | In exceptional circumstances, DNG can be considered for long-term preservation. When the Archive digitises images, the adopted long-term preservation format is TIFF or DNG. The Archive takes into account the up-to-date digitisation guidelines. The SVG-format is used for vector images. Animated GIF files can be converted either into video or to a series of PNG images. Camera RAW formats are generally not accepted. Note: The Archive is able to ingest all Adobe file formats, but recomments open formats instead. | 
| Audio | No restrictions Recommended: FLAC, WAV | FLAC, (MP3) | FLAC, MP3 | The Archive keeps track of audio format recommendations and changes formats if needed. MP3 is accepted as a long-term preservation format only if the original material was in this format. | 
| Video | No restrictions Recommended: MPEG-4 H.264 | MPEG-4 H.264, (JPEG 2000) | MPEG-4 H.264 | The formats recommended/accepted for video may change. They will be reviewed as soon as there is need. The JPEG 2000 sequence is preserved as is and not converted to another format. Compression level is decided on a case-by-case basis. Archival and dissemination information packages (AIP and DIP) of a dataset may differ in compression level, resolution and format. | 
| Geographic information | Dealt with case-by-case | Dealt with case-by-case | Same as the preservation format | Any geospatial information related to the data are dealt with on a case-by-case basis, taking into account the specifications provided by the national digital long-term preservation solution. GeoTIFF-files can be deposited and preserved as TIFF-files. |