CC0 licence for the public metadata of datasets

The public descriptive metadata of research datasets archived by FSD have been licensed under CC0 as of 19 March 2026. As a result of this licensing change, the metadata form an open and extensive metadata corpus that can also be used in AI-based purposes.

The metadata describing research datasets are therefore now fully and freely reusable. Under the previously used CC BY 4.0 licence, users were required to credit the author of the metadata and the original source in accordance with the licence terms. This prevented, among other things, the use of metadata in many commercial AI applications. With the change in licence, the metadata are now suitable for a wider range of uses.

In line with good scholarly practice, users are nevertheless encouraged to cite the original source when metadata are used, for example in research.

The metadata currently include basic information on nearly 2,200 research datasets, such as creators, abstracts, and sampling methods, in Finnish and English. In addition, metadata are available for over 420,000 variables, including question texts.

The CC0-licensed metadata are available in several different ways and formats.

  • The DDI Codebook 2.5 XML files, downloadable as ZIP packages and through the Aila Data Service, provide the most comprehensive structured descriptions of each dataset. These include the basic dataset information and, in many cases, variable‑level metadata.
  • The Kuha2 OAI-PMH interface provides metadata in DDI Codebook, Dublin Core, and EAD3 formats.
  • The FSD SKG-IF OpenAPI interface delivers basic dataset metadata in accordance with the SKG-IF specifications.

The licensing change was implemented as part of the AI-focused work package of the FSD AIMS 2030 project.